Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining for the Early Detection of Disease Outbreaks Weng-Keen Wong, School of EECS, Oregon State University Email: [email protected] Joint work with the RODS Lab (University of Pittsburgh) and the AUTON Lab (Carnegie Mellon University) Introduction The threat of a deadly disease outbreak is very real. There are two scenarios of concern: Before public health can respond, we first need to be able to detect that an outbreak is occurring. 1. Naturally occurring outbreaks eg. SARS, Asian bird flu. The earlier we detect the outbreak, the more we can reduce morbidity and mortality. 2. Outbreaks due to bioterrorist attacks eg. anthrax, smallpox. Examples of Prediagnosis Data Many cities throughout the US have established syndromic surveillance systems to monitor the health of the community. Syndromic surveillance systems collect and analyze health-related data that precede diagnosis. The Syndromic Surveillance Pipeline 1. Identify useful data sources 2. Collect data Challenges 1. Finding anomalies in rich multivariate data that includes spatial, temporal, demographic and symptomatic information. 3. Analyze data Over-thecounter medication sales School/Work absenteeism Computer Science comes in here in the form of data mining: find anomalies that correspond to disease outbreaks Veterinarian data Emergency Department records 2. Finding anomalies that are truly indicative of a disease outbreak of interest. 3. Combining information from multiple data sources eg. Emergency Department data and over-thecounter medication sales. Telephone triage calls Lab test requests 911 Calls Data being monitored is HIPAA compliant with personal identifying information removed. The “What’s Strange About Recent Events” (WSARE) Algorithm Recent ED records Primary Key Date Time Gender Age … 100 10/29/05 9:12 M 20-30 … 101 : 10/29/05 10:45 : : F : … 50-60 : : Baseline (from a model that takes temporal fluctuations and other factors into account) Find which rules predict unusually high proportions in recent records when compared to the baseline eg. 50/200 records from Baseline have Gender = Male AND Home Location = NW The Population-wide Anomaly Detection and Assessment (PANDA) Algorithm Anthrax Release Time Of Release … … Female 20-30 Gender = Male AND Home Location = NW 50-60 Gender Home Zip Anthrax Infection Respiratory CC From Other Anthrax Infection False Respiratory from Anthrax Respiratory CC From Other Respiratory CC ED Admit from Other ED Admit from Anthrax Respiratory CC When Admitted ED Admission Other ED Disease 15146 Respiratory CC ED Admit from Anthrax Gender Home Zip Other ED Disease 15213 Respiratory from Anthrax Male Age Decile Age Decile Yesterday 90/180 records from Recent have Location of Release Unknown ED Admit from Other Respiratory CC When Admitted never ED Admission Models every individual in the population in order to improve detection of an airborne release of inhalational anthrax