Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada

50 %
50 %
Information about Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout...
Health & Medicine

Published on October 11, 2008

Author: kasshout

Source: slideshare.net

Description

The majority of the designs, analyses and evaluations of early detection (or biosurveillance) systems have been geared towards specific data sources and detection algorithms. Much less effort has been focused on how these systems will "interact" with humans. For example, consider multiple domain experts working at different levels across different organizations in an environment where numerous biosurveillance algorithms may provide contradictory interpretations of ongoing events. We present a framework that consists of a collection of autonomous, machine learning-enabled analytic processes, services and tools that; for the first time, will seamlessly integrate surveillance and response systems with human experts.

Taha Kass-Hout, MD, MS Nicolás di Tada October 2008 MACHINE LEARNING AND DISEASE SURVEILLANCE

Image source: http://www.birds.cornell.edu/crows/images/deadcrow.jpg Image source: http://farm3.static.flickr.com/2029/2239605500_6ef2fd2295.jpg?v=0

LATE DETECTION – RESPONSE DAY CASES Opportunity for control

EARLY DETECTION AND RESPONSE DAY CASES Opportunity for control

INFORMATION SOURCES Event-based – ad-hoc unstructured reports issued by formal or informal sources Indicator-based – (number of cases, rates, proportion of strains…)

Event-based – ad-hoc unstructured reports issued by formal or informal sources

Indicator-based – (number of cases, rates, proportion of strains…)

PUBLIC HEALTH MEASURES Representativeness Completeness Predictive Value Timeliness

Representativeness

Completeness

Predictive Value

Timeliness

PUBLIC HEALTH MEASURES 1000 Malaria infections (100%) 50 Malaria notifications (5%) Specificity / Reliability Sensitivity / Timeliness Main attributes Representativeness Completeness Predictive value positive Get as close to the bottom of the pyramid as possible Urge frequent reporting: Weekly  daily  immediately

Main attributes

Representativeness

Completeness

Predictive value positive

PUBLIC HEALTH MEASURES Analyze and interpret Automated analysis/ thresholds Time Main attributes Timeliness Health care hotline Signal as early as possible

Main attributes

Timeliness

THE PROBLEM SPACE Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans We have systems in place for those threats we have been faced with before

Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans

We have systems in place for those threats we have been faced with before

PUBLIC HEALTH – TWO PERSPECTIVES Case management Individual cases of notifiable diseases Relationship networks (contact tracing) Population surveillance Larger risk patterns

Case management

Individual cases of notifiable diseases

Relationship networks (contact tracing)

Population surveillance

Larger risk patterns

CASE MANAGEMENT Questions/problems: Is a case due to recent transmission? If so, does the case share any feature with other, recent cases? Ways it's being done: Investigations/interviews Meeting with other investigators

Questions/problems:

Is a case due to recent transmission?

If so, does the case share any feature with other, recent cases?

Ways it's being done:

Investigations/interviews

Meeting with other investigators

POPULATION SURVEILLANCE Questions/problems: Are more cases happening than expected? Does an excess suggest ongoing transmission in a specific region? Way it's being done: Semi-automated routine temporal and space-time statistical analysis

Questions/problems:

Are more cases happening than expected?

Does an excess suggest ongoing transmission in a specific region?

Way it's being done:

Semi-automated routine temporal and space-time statistical analysis

WHY LOCATION MATTERS – CASE MANAGEMENT If you are studying a case of a certain disease that was just declared It is harder to picture the situation by looking at something as this..

If you are studying a case of a certain disease that was just declared

It is harder to picture the situation by looking at something as this..

WHY LOCATION MATTERS – CASE MANAGEMENT

WHY LOCATION MATTERS – CASE MANAGEMENT Than by looking at this..

Than by looking at this..

WHY LOCATION MATTERS – CASE MANAGEMENT

WHY LOCATION MATTERS – POP SURVEILLANCE If you are studying the spatial distribution of a set of disease clusters This would seem more difficult..

If you are studying the spatial distribution of a set of disease clusters

This would seem more difficult..

WHY LOCATION MATTERS – POP SURVEILLANCE

WHY LOCATION MATTERS – POP SURVEILLANCE Than this..

Than this..

WHY LOCATION MATTERS – POP SURVEILLANCE

MODERN DISEASE SURVEILLANCE In the past two decades, much disease surveillance research has focused on developing analytical methods for automatically detecting anomalous patterns in data Modern methods can achieve timely detection of anomalies by incorporating temporal , spatial , and multivariate information

In the past two decades, much disease surveillance research has focused on developing analytical methods for automatically detecting anomalous patterns in data

Modern methods can achieve timely detection of anomalies by incorporating temporal , spatial , and multivariate information

MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm “ What are we supposed to do with this?” Too many alerts

MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop

ADVANTAGES OF MACHINE LEARNING P(malaria) = 22% P(influenza) = 13% P(other ILI) = 33%

MACHINE LEARNING TECHNIQUES Classifiers Clustering Bayesian Statistics Neural Networks Genetic Algorithms

Classifiers

Clustering

Bayesian Statistics

Neural Networks

Genetic Algorithms

HOW TO REPRESENT A DOCUMENT? “ This morning I woke up with fever, I might have a flu.” “ I had a flu last month. […] I had a flu early this week.” flu fever

CLASSIFIERS – PROBLEM DEFINITION Map items to vectors (Feature extraction) Normalize those vectors Train the classifier Measure the results with new information Feedback the classifier Separate classes in feature space

Map items to vectors (Feature extraction)

Normalize those vectors

Train the classifier

Measure the results with new information

Feedback the classifier

Separate classes in feature space

CLASSIFIERS - SVM

SVM – MARGIN MAXIMIZATION Support vectors define the separator

Support vectors define the separator

SVM – NON LINEAR? Φ : x -> φ ( x ) Map to higher-dimension space

SVM – FILTERING OR CLASSIFYING Document 1 Document 2 Document 3 Positives Negatives Training Document Training Document Classifier

CLUSTERING – PROBLEM DEFINITION Map items to vectors (Feature extraction) Normalization Agglomerative and Partitional

Map items to vectors (Feature extraction)

Normalization

Agglomerative and Partitional

CLUSTERING - AGGLOMERATIVE

CLUSTERING - PARTITIONAL

BAYESIAN STATISTICS Probability of disease A (flu) once symptoms B (fever) are observed Probability of fever once flu is confirmed Probability of flu (prior or marginal) Probability of fever (prior or marginal)

NEURAL NETWORKS Given a set of stimulus, train a system to produce a given output

Given a set of stimulus, train a system to produce a given output

NEURAL NETWORKS - STRUCTURE Hidden Layer Output Layer Input Layer […] […] {I 0 ,I 1 ,……I n } {O 0 ,O 1 ,……O n } Weight

NEURAL NETWORK - APPLICATION Event?

GENETIC ALGORITHM - BASICS Define the model that you want to optimize Create the fitness function Evolve the gene pool testing against the fitness function. Select the best individual

Define the model that you want to optimize

Create the fitness function

Evolve the gene pool testing against the fitness function.

Select the best individual

GENETIC ALGORITHM – MODEL Model the transmission process using a set of parameters: Onset time between an infection and illness Latency period Incubation period Symptomatic period Infectious period (Onset, Latency, Incubation, Symptomatic , Infectious) ( 2 days, 3 days, 1 day, 4 days, 3 days)

Model the transmission process using a set of parameters:

Onset time between an infection and illness

Latency period

Incubation period

Symptomatic period

Infectious period

GENETIC ALGORITHM – MODEL FITNESS Fitness = 1/Area

GENETIC ALGORITHM – PROCESS Create an initial population of candidates Use operators to generate new candidates (mating and mutation) Discard worst individuals or select best individuals in generation Repeat from 2 until you find a candidate that satisfies the solution searched

Create an initial population of candidates

Use operators to generate new candidates (mating and mutation)

Discard worst individuals or select best individuals in generation

Repeat from 2 until you find a candidate that satisfies the solution searched

GENETIC ALGORITHM - PROCESS (4, 5 ,6, 3 ,5) (4,3,6,2,5) (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2) (2,3,4,6,5) (3,4,5,2,6) (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6) (4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 ) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 )

RESULTS – IMPROVED SURVEILLANCE

Q&A

THANK YOU! Taha Kass-Hout, MD, MS http://www.instedd.org [email_address] http://taha.instedd.org Nicolás di Tada http://www.manas.com.ar [email_address] http://weblogs.manas.com.ar/ndt/

Taha Kass-Hout, MD, MS

http://www.instedd.org

[email_address]

http://taha.instedd.org

Nicolás di Tada

http://www.manas.com.ar

[email_address]

http://weblogs.manas.com.ar/ndt/

BACKUP SLIDES

REFERENCES Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic Detection, AMIA 2007, Symposium Proceedings 2007 EpiNorth-Based material ( http://www.epinorth.org ): Mereckiene, J., Outbreak Investigation Operational Aspects. Jurmala, Latvia, 2006 Bagdonaite, J., and Mereckiene, J., Outbreak Investigation Methodological aspects. Jurmala, Latvia, 2006 Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006 Daniel Neil, Incorporating Learning into Disease Surveillance Systems

Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic Detection, AMIA 2007, Symposium Proceedings 2007

EpiNorth-Based material ( http://www.epinorth.org ):

Mereckiene, J., Outbreak Investigation Operational Aspects. Jurmala, Latvia, 2006

Bagdonaite, J., and Mereckiene, J., Outbreak Investigation Methodological aspects. Jurmala, Latvia, 2006

Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006

Daniel Neil, Incorporating Learning into Disease Surveillance Systems

REFERENCES Algorithms Complex Event Processing Over Uncertain Data in Wasserkrug (2008) Outbreak detection through automated surveillance A review of the determinants of detection in Buckeridge (2007) Approaches to the evaluation of outbreak detection methods in Watkins (2006) Algorithms for rapid outbreak detection a research synthesis Buckeridge (2004) Data mining in bioinformatics using Weka in Frank (2004)

Algorithms

Complex Event Processing Over Uncertain Data in Wasserkrug (2008)

Outbreak detection through automated surveillance A review of the determinants of detection in Buckeridge (2007)

Approaches to the evaluation of outbreak detection methods in Watkins (2006)

Algorithms for rapid outbreak detection a research synthesis Buckeridge (2004)

Data mining in bioinformatics using Weka in Frank (2004)

REFERENCES Automating Laboratory Reporting Automatic Electronic Laboratory-Based Reporting in Panackal (2002) Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable Diseases in Nguyen (2007) Using EMR Data for Disease Surveillance Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007) Electronic Medical Record Support for PH in Klompas (2007) A knowledgebase to support notifiable disease surveillance in Doyle (2005) Automated Detection of Tuberculosis Using Electronic Medical Record Data in Calderwood (2007) Misc Readings Breakthrough in modeling emerging disease hotspots in Jones (2008) Use of data mining techniques to investigate disease risk classification as a proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008)

Automating Laboratory Reporting

Automatic Electronic Laboratory-Based Reporting in Panackal (2002)

Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable Diseases in Nguyen (2007)

Using EMR Data for Disease Surveillance

Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007)

Electronic Medical Record Support for PH in Klompas (2007)

A knowledgebase to support notifiable disease surveillance in Doyle (2005)

Automated Detection of Tuberculosis Using Electronic Medical Record Data in Calderwood (2007)

Misc Readings

Breakthrough in modeling emerging disease hotspots in Jones (2008)

Use of data mining techniques to investigate disease risk classification as a proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008)

RELATED PROJECTS InSTEDD RNA (or Event Evolution): Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html ) ALPACA "ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application." http://2008.hfoss.org/ALPACA Surveillance Project An Open Source R-package disease surveillance framework for "...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data." http://surveillance.r-forge.r-project.org/ Weka An open source "...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/~ml/weka/

InSTEDD RNA (or Event Evolution): Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html )

ALPACA "ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application." http://2008.hfoss.org/ALPACA

Surveillance Project An Open Source R-package disease surveillance framework for "...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data." http://surveillance.r-forge.r-project.org/

Weka An open source "...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/~ml/weka/

 

Add a comment

Related presentations

Related pages

Bio Surveillance 2.0 Kass-Hout and Di Tada - scribd.com

Biosurveillance 2. 0 Collaboration for Early Disease Warning and Effective Response… Taha Kass-Hout Nicolás di Tada Invited by Dr. Barbara Massoudi, PhD ...
Read more

Manas Technology Solutions

... 2008 by Nicolás di Tada ... about Machine Learning applied in Biosurveillance at ... Machine Learning And Disease Surveillance by ...
Read more

Lecture at Mahidol - manas.com.ar

... InSTEDD presentation about Machine Learning applied in Biosurveillance at the Public ... Machine Learning And Disease Surveillance by ... di Tada ...
Read more

Lecture at Mahidol - manas.tech

disease surveillance, machine learning, ... about Machine Learning applied in Biosurveillance at ... Disease Surveillance by Kass-Hout Di Tada ...
Read more

Bio Surveillance 2.0 Kass-Hout and Di Tada - es.scribd.com

Biosurveillance 2. 0 Collaboration for Early Disease Warning and Effective Response… Taha Kass-Hout Nicolás di Tada Invited by Dr. Barbara Massoudi, PhD ...
Read more

Kass-Hout T. International System for Total Early Disease

Taha A. Kass-Hout, M.D., M.S., Nicolas di Tada InSTEDD, ... and machine learning algorithms for detection, ... Advances in Disease Surveillance 2008;5:108.
Read more

Tada Intro - Documents

... Machine Learning And Disease Surveillance by Kass-Hout Di Tada ... (or biosurveillance) ... Software For Public Health Kass-Hout Di Tada 1.
Read more

Taha Kass-Hout | LinkedIn

... deep machine learning, and early disease identification ... Kass-Hout, Nicolas di Tada, ... detection in syndromic surveillance systems. Kass-Hout, ...
Read more