Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
College of Health and Human Sciences Department of Public Health Course Name: Data Mining in Public Health Course Number: H 566 Course Credits: 3 (3 hours per week of lecture) Course Catalog Description: This course presents an introduction to public health data mining techniques used as an information technology tool to extract previously unknown and potentially useful information from large databases in the biosciences. Prerequisites: H 581 (Generalized Linear Models and Categorical Data Analysis); H 564 (recommended) Measurable Student Learning Outcomes: Upon completion of this course the student will be able to: 1. Identify the appropriate applications of advanced methods for high-dimensional data analysis and data mining 2. Identify and differentiate amongst methods used for high-dimensional data analysis and data mining in public health applications 3. Conduct a thorough exploratory analysis of a high-dimensional data set from a biological, medical, or public health application 4. Provide clear and concise interpretations and written and oral presentations of an analysis of a high-dimensional data set arising in a biological, medical, or public health application using at least one modern technique. Course Content: This course is designed as a survey of and introduction to high-dimensional data analysis and data mining with applications in biology, medicine, and public health. Students will be introduced to key statistical concepts and methods in high-dimensional analysis (supervised vs. unsupervised analysis, statistical decision theory, complexity, bias-variance tradeoff, model selection, constrained models, kernel methods, and tree-based methods). They will also gain experience in using modern techniques to analyze high-dimensional biology and public health data sets. Weeks 1-2 a. Course introduction (formulation of high-dimensional data analysis problem, unsupervised vs. supervised learning) b. Examples in medicine, biology, and public health c. Exploratory data analysis (hierarchical clustering, clustering heatmaps) d. Some elementary methods (K-means, PAM, K-nearest-neighbor) Week 3 Review of Generalized Linear Models a. Linear and advanced regression b. Linear classification Weeks 4Theoretical and Methodological Considerations start of 5 a. Statistical decision theory, curse of dimensionality, bias-variance tradeoff b. Model selection (AIC, BIC, cross-validation) Weeks 5-6 Intermediate Methods a. Basis expansion Weeks 7-8 Weeks 9-10 b. Kernel methods c. Mixture models Advanced Methods a. (Generalized) Additive Models b. Tree-based methods (CART, random forests) Advanced Methods a. Support vector machines and kernel regression b. Neural networks Evaluation of Student Performance: The performance of the student will be assessed through at most 5 homework sets (50%), a take-home midterm examination (20%), and a final project (30%). Letter grades will be assigned according to the following breakdown: Range Grade Range Grade 94-100% A 64-68% C 89-93% A59-63% C84-88% B+ 54-58% D+ 79-83% B 49-53% D 74-78% B44-48% D69-73% C+ < 44% F Learning Resources: Recommended text: T. Hastie, R. Tibshirani, J. Friedman (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. Statement Regarding Students with Disabilities "Accommodations are collaborative efforts between students, faculty and Disability Access Services (DAS). Students with accommodations approved through DAS are responsible for contacting the faculty member in charge of the course prior to or during the first week of the term to discuss accommodations. Students who believe they are eligible for accommodations but who have not yet obtained approval through DAS should contact DAS immediately at 7374098." Link to Statement of Expectations for Student Conduct http://oregonstate.edu/studentconduct/regulations/index.php#acdis Diversity Statement: The College of Health and Human Sciences strives to create an affirming climate for all students including underrepresented and marginalized individuals and groups. Diversity encompasses differences in age, color, ethnicity, national origin, gender, physical or mental ability, religion, socioeconomic background, veteran status, sexual orientation, and marginalized groups. We believe diversity is the synergy, connection, acceptance, and mutual learning fostered by the interaction of different human characteristics. Religious Holiday Statement Oregon State University strives to respect all religious practices. If you have religious holidays that are in conflict with any of the requirements of this class, please see me immediately so that we can make alternative arrangements.