Biosurveillance 2.0: Lecture at Emory University

38 %
62 %
Information about Biosurveillance 2.0: Lecture at Emory University
Health & Medicine

Published on March 24, 2009

Author: kasshout

Source: slideshare.net

Description

Invited lecture at Emory University Rollins School of Public Health. We presented our InSTEDD global early warning and response social platform; Evolve (http://instedd.org/evolve) with live demonstration.

Biosurveillance 2.0 Collaboration and Web 2.0/3.0 Semantic Technologies for Better Early Disease Warning and Effective Response Taha Kass-Hout Nicolás di Tada Invited by Dr. Barbara Massoudi, PhD, MPH Lecture at Emory University Rollins School of Public Health Public Health Informatics, INFO 503 Atlanta, GA, USA

 

Background

Late Detection and Response DAY CASES Opportunity for control Background

Early Detection and Response DAY CASES Opportunity for control Background

Public Health Measures Representativeness Completeness Predictive Value Timeliness Background

Representativeness

Completeness

Predictive Value

Timeliness

Public Health Measures 1000 Malaria infections (100%) 50 Malaria notifications (5%) Specificity / Reliability Sensitivity / Timeliness Main attributes Representativeness Completeness Predictive value positive Background Get as close to the bottom of the pyramid as possible Urge frequent reporting: Weekly  daily  immediately

Main attributes

Representativeness

Completeness

Predictive value positive

Public Health Measures Analyze and interpret Automated analysis/ thresholds Time Main attributes Timeliness Health care hotline Background Signal as early as possible

Main attributes

Timeliness

Public Health – Two Perspectives Case management Individual cases of notifiable diseases Relationship networks (contact tracing) Population surveillance Larger risk patterns Background

Case management

Individual cases of notifiable diseases

Relationship networks (contact tracing)

Population surveillance

Larger risk patterns

Case Management Questions and problems: Is a case due to recent transmission? If so, does the case share any feature with other recent cases? Current methods: Investigations and interviews Meeting with other investigators Background

Questions and problems:

Is a case due to recent transmission?

If so, does the case share any feature with other recent cases?

Current methods:

Investigations and interviews

Meeting with other investigators

Population Surveillance Questions and problems: Are more cases happening than expected? Does an excess suggest ongoing transmission in a specific region? Current methods: Semi-automated routine temporal and space-time statistical analysis Background

Questions and problems:

Are more cases happening than expected?

Does an excess suggest ongoing transmission in a specific region?

Current methods:

Semi-automated routine temporal and space-time statistical analysis

Why location matters: Case Management If you are studying a case of a certain disease that was just declared It is harder to picture the situation by looking at something like this... Background

If you are studying a case of a certain disease that was just declared

It is harder to picture the situation by looking at something like this...

Why location matters: Case Management Background

Why location matters: Case Management Than by looking at this.. Background

Than by looking at this..

Why location matters: Case Management Background

Why location matters: Population Surveillance If you are studying the spatial distribution of a set of disease clusters, this next slide seems more difficult… Background

If you are studying the spatial distribution of a set of disease clusters, this next slide seems more difficult…

Why location matters: Population Surveillance Background

Why location matters: Population Surveillance Than this... Background

Than this...

Why location matters: Population Surveillance Background

The Problem Space Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans We have systems in place for those threats we have been faced with before The Problem

Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans

We have systems in place for those threats we have been faced with before

Traditional DISEASE SURVEILLANCE In the past two decades focus was on automatically detecting anomalous patterns in data (often a single stream) Modern methods rely on human input and judgment incorporate temporal , spatial , and multivariate information The Problem

In the past two decades focus was on

automatically detecting anomalous patterns in data (often a single stream)

Modern methods

rely on human input and judgment

incorporate temporal , spatial , and multivariate information

Traditional DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm “ What are we supposed to do with this?” Too many alerts The Problem

Our Approach Human-based Collaborative and cross-disciplinary Web 2.0/3.0 platform Our Approach

Human-based

Collaborative and cross-disciplinary

Web 2.0/3.0 platform

Information Sources Event-based - ad-hoc unstructured reports issued by formal or informal sources Indicator-based - (number of cases, rates, proportion of strains…) Timeliness, Representativeness, Completeness, Predictive Value, Quality, … Our Approach

Event-based - ad-hoc unstructured reports issued by formal or informal sources

Indicator-based - (number of cases, rates, proportion of strains…)

MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop Our Approach Fewer and more actionable alerts Effective and coordinated response

Evolve: Main Components Feature extraction, reference and baseline information Tags Multiple Data Streams User-Generated and Machine Learning Metadata Comments Spatio-temporal Flags/Alerts/Bookmarks Evolve Bot Event Classification, Characterization and Detection Previous Event Training Data Previous Event Control Data Metadata extraction Machine learning Social network Professional feedback Anomaly detection Collaborative Spaces Hypotheses generation esting Our Solution

Evolve: Main Components Our Solution

Evolve: Process Item Hypothesis Field Actions and Verifications Feedback / Confirmation Our Solution Item Item Item Item Item Item Item Item

Advantages of Machine Learning P(malaria) = 22% P(influenza) = 13% P(other ILI) = 33% Our Solution

Machine Learning Techniques Classifiers Clustering Bayesian Statistics Neural Networks Genetic Algorithms Our Solution

Classifiers

Clustering

Bayesian Statistics

Neural Networks

Genetic Algorithms

How to represent a document: cold fever Our Solution

(1) Classifiers: Problem Definition Map items to vectors (Feature extraction) Normalize those vectors Train the classifier Measure the results with new information Feedback the classifier Separate classes in feature space Our Solution

Map items to vectors (Feature extraction)

Normalize those vectors

Train the classifier

Measure the results with new information

Feedback the classifier

Separate classes in feature space

Classifiers: Support Vector Machines (SVM) Our Solution

SVM – Margin Maximization Support vectors define the separator Our Solution

Support vectors define the separator

SVM – Non-linear? Φ : x -> φ ( x ) Map to higher-dimension space Our Solution

SVM – Filtering or classifying Classifier Document 1 Document 2 Document 3 Positives Negatives Training Document Training Document Our Solution

(2) Clustering: Problem Definition Map items to vectors (Feature extraction) Normalization Agglomerative or Partitional Our Solution

Map items to vectors (Feature extraction)

Normalization

Agglomerative or Partitional

Clustering: AGGLOMERATIVE Our Solution

Clustering: PARTITIONAL Our Solution

(3) Bayesian Statistics Probability of disease A (flu) once symptom B (fever) is observed Probability of fever once flu is confirmed Probability of flu (prior or marginal) Probability of fever (prior or marginal) Our Solution

(4) Neural Networks Given a set of stimuli, train a system to produce a given output… Our Solution

Given a set of stimuli, train a system to produce a given output…

Neural Network: Structure Hidden Layer Output Layer Input Layer […] […] {I 0 ,I 1 ,……I n } {O 0 ,O 1 ,……O n } Weight Our Solution

Neural Network: Application Event? Our Solution

(5) Genetic Algorithm: Basic Define the model that you want to optimize Create the fitness function Evolve the gene pool testing against the fitness function. Select the best individual Our Solution

Define the model that you want to optimize

Create the fitness function

Evolve the gene pool testing against the fitness function.

Select the best individual

Genetic Algorithm: Model Model the transmission process using a set of parameters ( e.g., an infectious disease ): Onset time between an infection and illness Latency period Incubation period Symptomatic period Infectious period (Onset, Latency, Incubation, Symptomatic , Infectious) ( 2 days, 3 days, 1 day, 4 days, 3 days) Our Solution

Model the transmission process using a set of parameters ( e.g., an infectious disease ):

Onset time between an infection and illness

Latency period

Incubation period

Symptomatic period

Infectious period

Genetic Algorithm: Model Fitness Fitness = 1/Area Our Solution

Genetic Algorithm: Process Create an initial population of candidates Use operators to generate new candidates (mating and mutation) Discard worst individuals or select best individuals in generation Repeat from 2 until you find a candidate that satisfies the solution searched Our Solution

Create an initial population of candidates

Use operators to generate new candidates (mating and mutation)

Discard worst individuals or select best individuals in generation

Repeat from 2 until you find a candidate that satisfies the solution searched

Genetic Algorithm: Process (4, 5 ,6, 3 ,5) (4,3,6,2,5) (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2) (2,3,4,6,5) (3,4,5,2,6) (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6) (4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 ) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 ) Our Solution

Result of incorporating all 5 techniques: Improved Surveillance Our Solution

Our Solution InSTEDD Evolve Related items (e.g., News articles) are grouped into a thread. Threads are later associated with events (hypothesized or confirmed). InSTEDD Evolve : ( http://instedd.org/evolve ) Tag cloud and semantic heatmap

Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Filter feature which automatically filters for related items, updates the map and associated tags

Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Auto-generated (machine-learning) tags. These tags are semantically ranked (a statistical probability match). Users can further train the classifier by accepting or rejecting a suggestion. Users can similarly train the geo-locator by simply accepting or rejecting and updating a location.

Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Tracking the recent Avian Influenza Outbreak in Egypt (reports started to appear late January 2009). Notice the pattern of reported incidents along the Nile river.

Acknowledgements

Through funding from:

Thank You! Taha Kass-Hout Nicolás di Tada

Taha Kass-Hout

BACKGROUND MATERIAL

Index Disease surveillance References Computing Automating Laboratory Reporting Using EMR data for disease surveillance Related Projects Misc Readings Open Source Software (OSS) References Open Source License References Open Source References Open Source and Public Health References Architectural Matters Service Oriented Architecture (or SOA) Synchronization Architecture Cloud Architecture

Disease surveillance References

Computing

Automating Laboratory Reporting

Using EMR data for disease surveillance

Related Projects

Misc Readings

Open Source Software (OSS) References

Open Source License References

Open Source References

Open Source and Public Health References

Architectural Matters

Service Oriented Architecture (or SOA)

Synchronization Architecture

Cloud Architecture

DISEASE SURVEILLANCE References and Related-Efforts

References and Related-Efforts

REFERENCES Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic Detection, AMIA 2007, Symposium Proceedings 2007 EpiNorth-Based material ( http://www.epinorth.org ): Mereckiene, J., Outbreak Investigation Operational Aspects. Jurmala, Latvia, 2006 Bagdonaite, J., and Mereckiene, J., Outbreak Investigation Methodological aspects. Jurmala, Latvia, 2006 Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006 Daniel Neil, Incorporating Learning into Disease Surveillance Systems

Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic Detection, AMIA 2007, Symposium Proceedings 2007

EpiNorth-Based material ( http://www.epinorth.org ):

Mereckiene, J., Outbreak Investigation Operational Aspects. Jurmala, Latvia, 2006

Bagdonaite, J., and Mereckiene, J., Outbreak Investigation Methodological aspects. Jurmala, Latvia, 2006

Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006

Daniel Neil, Incorporating Learning into Disease Surveillance Systems

REFERENCES Computing The Future of Statistical Computing in Wilkinson (2008) Complex Event Processing Over Uncertain Data in Wasserkrug (2008) Outbreak detection through automated surveillance A review of the determinants of detection in Buckeridge (2007) Approaches to the evaluation of outbreak detection methods in Watkins (2006) Algorithms for rapid outbreak detection a research synthesis Buckeridge (2004) Data mining in bioinformatics using Weka in Frank (2004) Aho-Corasick Algorithm in Kilpeläinen Automating Laboratory Reporting Automatic Electronic Laboratory-Based Reporting in Panackal (2002) Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable Diseases in Nguyen (2007)

Computing

The Future of Statistical Computing in Wilkinson (2008)

Complex Event Processing Over Uncertain Data in Wasserkrug (2008)

Outbreak detection through automated surveillance A review of the determinants of detection in Buckeridge (2007)

Approaches to the evaluation of outbreak detection methods in Watkins (2006)

Algorithms for rapid outbreak detection a research synthesis Buckeridge (2004)

Data mining in bioinformatics using Weka in Frank (2004)

Aho-Corasick Algorithm in Kilpeläinen

Automating Laboratory Reporting

Automatic Electronic Laboratory-Based Reporting in Panackal (2002)

Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable Diseases in Nguyen (2007)

REFERENCES Using EMR Data for Disease Surveillance Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007) Electronic Medical Record Support for PH in Klompas (2007) A knowledgebase to support notifiable disease surveillance in Doyle (2005) Automated Detection of Tuberculosis Using Electronic Medical Record Data in Calderwood (2007) Misc Readings Breakthrough in modeling emerging disease hotspots in Jones (2008) Use of data mining techniques to investigate disease risk classification as a proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008) Euclidean distance: http://en.wikipedia.org/wiki/Euclidean_distance Tags/Folksonomy: Tag Decay: A View Over Aging Folksonomy in Russell (2007) Cloudalicious: Folksonomy Over Time in Russell (2006)

Using EMR Data for Disease Surveillance

Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007)

Electronic Medical Record Support for PH in Klompas (2007)

A knowledgebase to support notifiable disease surveillance in Doyle (2005)

Automated Detection of Tuberculosis Using Electronic Medical Record Data in Calderwood (2007)

Misc Readings

Breakthrough in modeling emerging disease hotspots in Jones (2008)

Use of data mining techniques to investigate disease risk classification as a proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008)

Euclidean distance: http://en.wikipedia.org/wiki/Euclidean_distance

Tags/Folksonomy:

Tag Decay: A View Over Aging Folksonomy in Russell (2007)

Cloudalicious: Folksonomy Over Time in Russell (2006)

RELATED PROJECTS InSTEDD Evolve : ( http://instedd.org/evolve ) Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html ) ALPACA "ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application." http://2008.hfoss.org/ALPACA Weka An open source "...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/~ml/weka/

InSTEDD Evolve : ( http://instedd.org/evolve )

Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html )

ALPACA "ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application." http://2008.hfoss.org/ALPACA

Weka An open source "...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/~ml/weka/

RELATED PROJECTS The R Project for statistical computing: http://www.r-project.org Surveillance Project: An Open Source R-package disease surveillance framework for "...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data." http://surveillance.r-forge.r-project.org The R package surveillance in Höhle (multiple articles) Google's Research Publications: MapReduce Simplified Data Processing on Large Clusters ( http://labs.google.com/papers/mapreduce.html ) Hadoop : a software platform that lets one easily write and run applications that process vast amounts of data ( http://hadoop.apache.org/core )

The R Project for statistical computing: http://www.r-project.org

Surveillance Project: An Open Source R-package disease surveillance framework for "...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data." http://surveillance.r-forge.r-project.org

The R package surveillance in Höhle (multiple articles)

Google's Research Publications: MapReduce Simplified Data Processing on Large Clusters ( http://labs.google.com/papers/mapreduce.html )

Hadoop : a software platform that lets one easily write and run applications that process vast amounts of data ( http://hadoop.apache.org/core )

OPEN SOURCE SOFTWARE References and Related-Efforts

References and Related-Efforts

REFERENCES Open Source License References http://www.opensource.org/licenses http://openacs.org/about/licensing/open-source-licensing Open Source References http://www.lifehack.org/articles/technology/open-source-life-how-the-open-movement-will-change-everything.html http://en.wikipedia.org/wiki/Open_source http://www.opensource.org/   Open Source and Public Health References http://www.ibiblio.org/pjones/wiki/index.php/Open_Source_Software_for_Public_Health http://en.wikipedia.org/wiki/List_of_open_source_healthcare_software http://www.epha.org/a/320 Open Source Development for Public Health: A Primer with Examples of Existing Enterprise Ready Open Source Applications in Turner (2006) A Quick Survey of Open Source Software for Public Health Organizations in Mirabito and Kass-Hout (2007)

Open Source License References

http://www.opensource.org/licenses

http://openacs.org/about/licensing/open-source-licensing

Open Source References

http://www.lifehack.org/articles/technology/open-source-life-how-the-open-movement-will-change-everything.html

http://en.wikipedia.org/wiki/Open_source

http://www.opensource.org/  

Open Source and Public Health References

http://www.ibiblio.org/pjones/wiki/index.php/Open_Source_Software_for_Public_Health

http://en.wikipedia.org/wiki/List_of_open_source_healthcare_software

http://www.epha.org/a/320

Open Source Development for Public Health: A Primer with Examples of Existing Enterprise Ready Open Source Applications in Turner (2006)

A Quick Survey of Open Source Software for Public Health Organizations in Mirabito and Kass-Hout (2007)

ARCHITECTURAL MATTERS References and Related-Efforts

References and Related-Efforts

REFERENCES Service Oriented Architecture (or SOA) Proposal for Fulfilling Strategic Objectives of the U.S. Roadmap for National Action on Decision Support through a Service—oriented Architecture Leveraging HL7 Services in Kawamoto (2007) Service-oriented Architecture in Medical Software: Promises and Perils in Nadkarni (2007) Wiki sources: SOA: http://en.wikipedia.org/wiki/Service_Orientated_Architecture Semantic service oriented architecture: http://en.wikipedia.org/wiki/Semantic_service_oriented_architecture Synchronization Architecture InSTEDD’s Mesh4x: http://mesh4x.org Cloud Architecture Google App Engine: Google App Engine Goes Up Against Amazon Web Services in Gartner Report (2008)

Service Oriented Architecture (or SOA)

Proposal for Fulfilling Strategic Objectives of the U.S. Roadmap for National Action on Decision Support through a Service—oriented Architecture Leveraging HL7 Services in Kawamoto (2007)

Service-oriented Architecture in Medical Software: Promises and Perils in Nadkarni (2007)

Wiki sources:

SOA: http://en.wikipedia.org/wiki/Service_Orientated_Architecture

Semantic service oriented architecture: http://en.wikipedia.org/wiki/Semantic_service_oriented_architecture

Synchronization Architecture

InSTEDD’s Mesh4x: http://mesh4x.org

Cloud Architecture

Google App Engine: Google App Engine Goes Up Against Amazon Web Services in Gartner Report (2008)

Add a comment

Related presentations

Related pages

Biosurveillance | LinkedIn

View 1006 Biosurveillance posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn.
Read more

Oxford Studies Events - Oxford College of Emory University

Emory Sites; Events Calendar; ... Oxford Studies Events. Performances and Lectures, Spring 2016 ... Lecture: David Lynn, Human 2.0: ...
Read more

Emory University School of Medicine Junior Faculty ...

Emory University School of Medicine ... 2.0 hr University ... Overview of how to give a lecture
Read more

Special Courses and Programs - Oxford College

Special Courses & Programs. Ways of ... cultural and religious diversity at Oxford College and Emory University. ... of at least 2.0 is required to enroll ...
Read more

School of Medicine Information Technology Services

Emory University Technology Services ... 2.0 Service provider’s ... Emory University School of Medicine uses Turning Point’s ResponseWare for ...
Read more

Higher Ed 2.0 : Emory Magazine - Emory University

Higher Ed 2.0. Emory finds its ... is warranted and the ability to replay lectures when ... officio members from University Technology Services and Emory ...
Read more

New Frontiers in Race & Difference Lecture Series ...

New Frontiers in Race & Difference Lecture Series: Michelle Alexander at Emory ... 2 0. Don't like this ... Michelle Alexander Lecture at Emory ...
Read more

sph.emory.edu

LETTER FROM THE DEAN | 5 There has never been a more exciting or challenging time to enter the field of public health. Today’s students must face complex ...
Read more

Emory Magazine:Autumn 1998: Alumni News - Emory University

A Rejuvenating Week. Fifth Alumni University offers participants a pastiche of popular programs. In June, the Association of Emory Alumni presented the ...
Read more