Download Str. Teodor Mihali nr. 58-60

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Facultatea de Științe Economice și Gestiunea Afacerilor
Str. Teodor Mihali nr. 58-60
Cluj-Napoca, RO-400951
Tel.: 0264-41.86.52-5
Fax: 0264-41.25.70
[email protected]
www.econ.ubbcluj.ro
DETAILED SYLLABUS
Methods in Data Science
1. Information about the study program
1.1 University
1.2 Faculty
1.3 Department
1.4 Field of study
1.5 Program level (bachelor or
master)
1.6 Study program /
Qualification
Babeș Bolyai
Economic Sciences and Business Administration
Business Information Systems
Business Information Systems
Master
Business Modeling and Distributed Computing
2. Information about the subject
2.1 Subject title
Methods in Data Science
2.2 Course activities professor Lect. Dr. Darie Moldovan
2.3 Seminar activities
Lect. Dr. Darie Moldovan
professor
2.4 Year of
2.6 Type of
I 2.5 Semester I
Summative 2.7 Subject regime Mandatory
study
assessment
3. Total estimated time (teaching hours per semester)
3.1 Number of hours per week
out of which: 3.2
course
out of which: 3.5
56
course
4
2
3.3
seminar/laboratory
3.6
seminar/laboratory
2
3.4 Total number of hours in
28
28
the curriculum
Time distribution
Hours
Study based on textbook, course support, references and notes
38
Additional documentation in the library, through specialized databases and field activities 24
Preparing seminars/laboratories, essays, portfolios and reports
45
Tutoring
8
Assessment (examinations)
4
Others activities
0
3.7 Total hours for individual
119
study
3.8 Total hours per semester
175
3.9 Number of credits
7
1
NOTE: This document represents an informal translation performed by the faculty.
4. Preconditions (if necessary)
4.1 Curriculum
4.2 Skills
Not necessary
Basic programming skills, basic statistics knowledge
5. Conditions (if necessary)
5.1. For course
development
5.2. For seminar /
laboratory
development
Notebook, beamer, Internet connection
Computers with Internet connection, specialized software: SAS Enterprise
Miner, Weka
6. Acquired specific competences
Professional
competences

Transversal
competences

Obtain key competences in data science
- Cleaning and sampling data sets
- Data management
- Exploratory data analysis
- Prediction based on statistical methods
- Communication of results
Gain competences in working within a team, segregate tasks, are able to learn
from different areas connected to the addressed problem.
7. Subject objectives (arising from the acquired specific competences)
7.1 Subject’s general objective
7.2 Specific objectives

Students must be familiar with data science methods and
work through a data science project end to end.
Students have to:
 learn how to analyze a dataset
 be able to access big data
 explore data and generate hypotheses
 use specific methods such as regression and
classification for prediction
 communicate the results of their research using
visualization tools and summaries
8. Contents
8.1 Course
1. Introduction. Course overview. About Data Science.
2. Univariate linear regression. Applications.
3. Multivariate linear regression. Applications
Teaching methods Observations
Lecture,
demonstration, open 1 lecture
discussion
Lecture,
demonstration, open 1 lecture
discussion
Lecture, open
1 lecture
discussion
2
NOTE: This document represents an informal translation performed by the faculty.
Lecture,
demonstration, open 2 lectures
discussion
Lecture,
1 lecture
open discussion
Lecture,
1 lecture
open discussion
Lecture, open
discussion, case
1 lecture
studies
Lecture, open
discussion,
1 lecture
demonstration
Lecture, open
discussion,
1 lecture
demonstration
Lecture,
4 lectures
demonstration
4. Classification methods. Logistic regression. Decision
Trees.
5. Neural networks.
6. Applying learning algorithms. Data preprocessing.
7. Results evaluation and model implementation
8. Unsupervised learning. Clustering
9. Data mining applications.
12. Large-scale data mining. HPC Computing.
References:
1. Ian H. Witten, Eibe Frank, Datamining: practical machine learning tools and techniques, Morgan
Kaufmann, 2011, 3rd ed.
2. Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning.
Springer, 2009
3. Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets, Cambridge,
2011
4. Pan-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Datamining, Addison Wesley,
2006
5. Richard Duda, Peter Hart and David Stork, Pattern Classification, 2nd ed. John Wiley & Sons,
2001.
6. Drew Conway, John Myles White, Machine Learning for Hackers. Case Studies and Algorithms
to Get You Started, O'Reilly Media, 2012
7. Tom Mitchell, Machine Learning. McGraw-Hill, 1997.
8. S. Haykin, Neural Networks and Machine Learning, 3rd ed., Prentice Hall, 2008
9. J. Georges, J. Thompson, C. Wells. Applied Analytics Using SAS Enterprise Miner, Course
Notes, SAS Publishing, 2010
8.2 Seminar/laboratory
Demonstrative example case
Building a simple linear regression model
Multivariate linear regression in practice.
Classification methods. Naïve Bayes, Decision trees,
Logistic regression.
Teaching methods
Observations
Running examples and
individual exercises/
Homework
Running examples and
individual exercises/
Homework
Running examples and
individual exercises/
Homework
Running examples and
individual exercises/
Homework
1 Laboratory
1 Laboratory
1 Laboratory
2
Laboratories
3
NOTE: This document represents an informal translation performed by the faculty.
Neural networks.
Running examples and
individual exercises/
Homework
Data Visualization tools.
Running examples and
individual exercises/
Homework
Feature selection, sampling the datasets and other
Running examples and
preprocessing operations.
individual exercises/
Homework
Models comparison. Deploying the solution.
Running examples and
individual exercises/
Homework
Clustering.
Running examples and
individual exercises/
Homework
Large datasets analysis, Map Reduce tools, SAS EM HPC. Running examples and
individual exercises/
Homework
References:
1. Ian H. Witten, Eibe Frank, Datamining: practical machine learning tools and
techniques, Morgan Kaufmann, 2011, 3rd ed.
1 Laboratory
1 Laboratory
1 Laboratory
1 Laboratory
1 Laboratory
4
Laboratories
2. Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of
Statistical Learning. Springer, 2009
3. Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets,
Cambridge, 2011
4. Pan-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Datamining,
Addison Wesley, 2006
5. Richard Duda, Peter Hart and David Stork, Pattern Classification, 2nd ed. John
Wiley & Sons, 2001.
6. Drew Conway, John Myles White, Machine Learning for Hackers. Case Studies
and Algorithms to Get You Started, O'Reilly Media, 2012
7. Tom Mitchell, Machine Learning. McGraw-Hill, 1997.
8. S. Haykin, Neural Networks and Machine Learning, 3rd ed., Prentice Hall, 2008
9. J. Georges, J. Thompson, C. Wells. Applied Analytics Using SAS Enterprise Miner,
Course Notes, SAS Publishing, 2010
9. Corroboration / validation of the subject’s content in relation to the expectations
coming from representatives of the epistemic community, of the professional
associations and of the representative employers in the program’s field.
 This subject is included in the certification offered by the Association of Chartered Certified Accountants (ACCA);
 The profession of data scientist has recently become very popular due to the growing data
available for analysis. The increasing computational power has generated new possibilities for
statisticians and other specialists working with data to access a new field: the automated data
analysis, which requires interdisciplinary skills: statistics, machine learning and their
applications.
10. Assessment (examination)
4
NOTE: This document represents an informal translation performed by the faculty.
Type of activity 10.1 Assessment criteria
10.2 Assessment methods
10.4 Course
Multiple choice quiz
Multiple choice test grid and
Practical exam on a Data Mining
software.
10.5 Seminar/
laboratory
10.3 Weight
in the final
grade
80%
Practical exam
Homework assignments, laboratory
activities
10.6 Minimum performance standard
• Minimum 50% of total points
Date of filling
15.01.2016
Signature of the course professor
Lect.Dr. Darie Moldovan
Date of approval by the department
21.01.2016
20%
Signature of the seminar professor
Lect. Dr. Darie Moldovan
Head of department’s signature
Prof. habil. Dr. Gheorghe Cosmin Silaghi
5
NOTE: This document represents an informal translation performed by the faculty.