Download Data Mining Techniques overview

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining Applications In Healthcare
TEPR 2004
May 21, 2004
V. “Juggy” Jagannathan
VP of Research
[email protected]
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Introduction
Goals of today’s presentation:
Provide an overview of the
technologies that are
relevant to the development
and deployment of data
mining solutions in
healthcare
Allow participants
to evaluate where
the technology is
useful
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
What isknowledge
Divining
Data
mining?
from
data
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
.Topic Outline
Data mining
• Uses
• Algorithms
• Technology
• Applications in
healthcare
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
.Data Mining Uses
• Descriptive
Understand and characterize
Clustering
Summarization
Association Rules
Sequence Discovery
• Predictive
Extrapolate and forecast
Classification
Regression
Time-Series
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Data Mining Algorithms
• Classification
> Statistical
> K-nearest
neighbors
> Decision trees
▲
▲
ID3
C4.5
> Neural
Networks (Self
Organizing
Maps)
• Clustering
> Hierarchical
> Partitioned
> Genetic
• Association
> Apriori
Algorithm
> If….Then rules
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Technology solutions
Technology
Data Mining Infrastructure Technologies
• Database Technologies
• On-Line Analytical Processing
(OLAP)
• Visualization Technologies
• Data scrubbing technologies
• Natural Language Processing
(NLP)
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Database Technologies
•Database
•OLAP
• Data warehouse vs. Data mart
•Visualization
• Relational technologies
> Oracle
> Microsoft
•Scrubbing
•NLP
• XML-databases
> Raining Data
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
On-Line Analytical Processing
•Database
•OLAP
•Visualization
• Analyze multi-dimensional
data
•Scrubbing
• N-dimensional data cubes
•NLP
• Operations
> Roll-up
> Drill-down
> Slice and dice
> Pivot
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Visualization
•Database
•OLAP
• 2D/3D Charts
•Visualization
• Topographic displays
•Scrubbing
• Cluster displays
•NLP
• Histograms
• Scatter plots
• Advanced visualization (genomic data
patterns)
• http://www.ncbi.nlm.nih.gov/Tools/
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
•Database
•OLAP
•Visualization
•Scrubbing
•NLP
• Data cleansing
• Filling in missing data
• In healthcare, there is a
strong need for deidentification to protect
privacy
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
De-Identification of Medical Records *
•
Names;
•
social security numbers;
•
all elements of a street address, city, county,
precinct, zip code, & their equivalent
•
medical record numbers;
•
health plan beneficiary numbers;
geocodes, except for the initial three digits of
a zip code for areas that contain over 20,000
people;
•
account numbers;
•
certificate/license numbers;
all elements of dates (except year) for dates
directly related to the individual, (e.g., birth
date, admission/discharge dates, date of
death); and all ages over 89
•
license plate numbers, vehicle identifiers
and serial numbers;
•
device identifiers and serial numbers;
and all elements of dates (including year)
indicative of such age, except that such
ages and elements may be aggregated into
a single category of age 90 or older;
•
URL addresses;
•
Internet Protocol (IP) address numbers;
•
biometric identifiers, including finger and
voice prints;
•
•
•
•
telephone numbers;
•
fax numbers;
•
•
full face photographic images and
comparable images;
e-mail addresses;
•
any other unique identifying number except
as created by IHS to re-identify information.
* Source: Policy and Procedures for De-Identification of Protected Health Information and Subsequent Re-Identification 45
CFR 164.514(a)-(c) posted by IHS (Indian Health Services)
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Natural Language Processing
•Database
•OLAP
•Visualization
•Scrubbing
•NLP
• NLP Uses
> translation,
summarization,
information
extraction,
document
retrieval or
categorization
• NLP Companies in
health care
> A-Life
> Language and
Computing
• NLP Approaches
> Clustering,
Classification,
Linguistic
analysis,
knowledge-based
analysis
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Applications in Healthcare
• Safety and quality
• Clinical Research
• Financial
• Public Health
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
“To err is Human” IOM Report
•Safety and Quality
•Clinical Research
•Financial
•Public Health
• Characterization
> JCAHO Core Measures
> CMS Quality measures starter
set
> Improves patient care –
reactive response
• Prediction
> Identifying cases that can
result in bad clinical outcomes
and raising appropriate alarms
> Impacts patient care –
proactive response
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Quality Measures – Initial Set*
Starter Set of 10 Hospital Quality Measures
Measure
Aspirin at arrival
Condition
Acute Myocardial Infarction (AMI)/Heart attack
Aspirin at discharge
Beta-Blocker at arrival
Beta-Blocker at discharge
ACE Inhibitor for left ventricular systolic dysfunction
Left ventricular function assessment
Heart Failure
ACE inhibitor for left ventricular systolic dysfunction
Initial antibiotic timing
Pneumonia
Pneumococcal vaccination
Oxygenation assessment
*Source: http://www.cms.hhs.gov/quality/hospital/overview.pdf
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Safety and Quality
• University of Mississippi Medical Center
> Data Warehouse Technologies to understand
Medication Errors – Funded by AHRQ
> Anonymous report data collection
> Data mining technologies
> Use of Neural networks and associative rule inference
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Clinical Research & Clinical Trials
•Safety and Quality
•Clinical Research
•Financial
•Public Health
• Pharmacy and medical
claims data
• Drug efficacy and clinical
trials – for example how
effective is a particular drug
regimen
• Protein structure analysis
• Genomic data mining
• Diagnostic Imaging data
research
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
The bottom line on cost
•Safety and Quality
•Clinical Research
•Financial
•Public Health
• General Utilization review –
does the care provided meet
accepted clinical and cost
guidelines
• Drug Utilization review
• Outlier analysis – exceptions
to treatment – analyzing
treatments which cost more
than the normal or less than
normal.
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Data mining in public health
•Safety and Quality
•Clinical Research
• Syndromatic surveillance
•Financial
• Bio-terrorism detection
•Public Health
• Communicable disease
reporting (Centers for Disease
Control (CDC))
Example effort: AEGIS
• DAWN (Drug Awareness and
Warning Network)
• Federal Drug Agency (FDA) –
reporting of adverse drug
events.
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Conclusion
•Descriptive
•Predictive
•Classification
•Clustering
Data mining
• Uses
•Database
•OLAP
•Association rules
•Visualization
•Scrubbing
• Algorithms
•NLP
•Safety and Quality
• Technology
•Clinical Research
• Applications in
healthcare
•Financial
•Public Health
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Technology solutions
Conclusion
[email protected]
uestions?
01010010010100100101001010101000101010101000101010010101010101010100101001001010100101010010010010001001001010010010000101010101001010101001001001001001010010101
01010010010010010100101010010010010010010010010010101000101000101001010010010010010101010010100100100100100100100100100100100100100101001010010010010010001010010
Related documents