Download H 566 Data Mining Syllabus

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
College of Health and Human Sciences
Department of Public Health
Course Name: Data Mining in Public Health
Course Number: H 566
Course Credits: 3 (3 hours per week of lecture)
Course Catalog Description: This course presents an introduction to public health data mining
techniques used as an information technology tool to extract previously unknown and potentially
useful information from large databases in the biosciences.
Prerequisites: H 581 (Generalized Linear Models and Categorical Data Analysis); H 564
(recommended)
Measurable Student Learning Outcomes: Upon completion of this course the student will be
able to:
1.
Identify the appropriate applications of advanced methods for high-dimensional
data analysis and data mining
2.
Identify and differentiate amongst methods used for high-dimensional data
analysis and data mining in public health applications
3.
Conduct a thorough exploratory analysis of a high-dimensional data set from a
biological, medical, or public health application
4.
Provide clear and concise interpretations and written and oral presentations of an
analysis of a high-dimensional data set arising in a biological, medical, or public
health application using at least one modern technique.
Course Content: This course is designed as a survey of and introduction to high-dimensional
data analysis and data mining with applications in biology, medicine, and public health.
Students will be introduced to key statistical concepts and methods in high-dimensional analysis
(supervised vs. unsupervised analysis, statistical decision theory, complexity, bias-variance
tradeoff, model selection, constrained models, kernel methods, and tree-based methods). They
will also gain experience in using modern techniques to analyze high-dimensional biology and
public health data sets.
Weeks 1-2
a. Course introduction (formulation of high-dimensional data analysis
problem, unsupervised vs. supervised learning)
b. Examples in medicine, biology, and public health
c. Exploratory data analysis (hierarchical clustering, clustering heatmaps)
d. Some elementary methods (K-means, PAM, K-nearest-neighbor)
Week 3
Review of Generalized Linear Models
a. Linear and advanced regression
b. Linear classification
Weeks 4Theoretical and Methodological Considerations
start of 5
a. Statistical decision theory, curse of dimensionality, bias-variance tradeoff
b. Model selection (AIC, BIC, cross-validation)
Weeks 5-6
Intermediate Methods
a. Basis expansion
Weeks 7-8
Weeks 9-10
b. Kernel methods
c. Mixture models
Advanced Methods
a. (Generalized) Additive Models
b. Tree-based methods (CART, random forests)
Advanced Methods
a. Support vector machines and kernel regression
b. Neural networks
Evaluation of Student Performance: The performance of the student will be assessed through at
most 5 homework sets (50%), a take-home midterm examination (20%), and a final project
(30%). Letter grades will be assigned according to the following breakdown:
Range
Grade
Range
Grade
94-100%
A
64-68%
C
89-93%
A59-63%
C84-88%
B+
54-58%
D+
79-83%
B
49-53%
D
74-78%
B44-48%
D69-73%
C+
< 44%
F
Learning Resources:
Recommended text: T. Hastie, R. Tibshirani, J. Friedman (2001). The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Springer.
Statement Regarding Students with Disabilities
"Accommodations are collaborative efforts between students, faculty and Disability Access
Services (DAS). Students with accommodations approved through DAS are responsible for
contacting the faculty member in charge of the course prior to or during the first week of the
term to discuss accommodations. Students who believe they are eligible for accommodations but
who have not yet obtained approval through DAS should contact DAS immediately at 7374098."
Link to Statement of Expectations for Student Conduct
http://oregonstate.edu/studentconduct/regulations/index.php#acdis
Diversity Statement:
The College of Health and Human Sciences strives to create an affirming climate for all students
including underrepresented and marginalized individuals and groups. Diversity encompasses
differences in age, color, ethnicity, national origin, gender, physical or mental ability, religion,
socioeconomic background, veteran status, sexual orientation, and marginalized groups. We
believe diversity is the synergy, connection, acceptance, and mutual learning fostered by the
interaction of different human characteristics.
Religious Holiday Statement
Oregon State University strives to respect all religious practices. If you have religious holidays
that are in conflict with any of the requirements of this class, please see me immediately so that
we can make alternative arrangements.