Download CE417 - Data Mining Course - Fall 1386 Homework

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
CE417 - Data Mining Course - Fall 1386
Homework Assignment #1
All homeworks must be solved and written independently. If you use someone else’s work including
books, papers or any other material, then you have to acknowledge it and directly cite those
resources in every place in your document that they are used.
Questions and Problems
(1) Define each of the following data mining functionalities: characterization, discrimination, association and correlation analysis, classification, prediction, clustering, and evolution analysis.
(2) Discuss various techniques for handling missing values in a dataset. What are
the advantages and disadvantages of each technique?
(3) Explain why ”the curse of dimensionality” principles are especially important
in understanding large data sets.
(4) Explain what we gain and what we lose with dimensionality reduction in large
data sets in the preprocessing phase of data mining.
(5) Describe three classifications of dimensionality reduction techniques, and discuss the advantages and disadvantages of each class in each classification.
(6) Imagine that we have a function Dc that accepts as input only a single numerical
attribute of a dataset and converts it to a nominal attribute. What problems
might occur if we use Dc to discretize each of the numerical attributes of a
multidimensional dataset? Demonstrate this problem through an illustrative
example. Propose at least one solution to this problem. Show how your solution
alleviates or solves the problem using the example you presented.
(7) Given a one-dimensional data set X = {-5.0, 23.0, 17.6, 7.23, 1.11}, normalize
the data set using
—Decimal scaling on interval [-1, 1].
—Min-max normalization on interval [0, 1].
—Min-max normalization on interval [-1, 1].
—Standard deviation normalization.
—Compare the results of previous normalizations and discuss the advantages
and disadvantages of different techniques.
Submission Instructions
You should submit your solutions in PDF format to s [email protected], before 1st of Aban. The subject of the email should conform to the following format:
[DM C][HW 1][your student number]
Your email should have one PDF attachment that contains your solutions. The
name of the file should be your student number and the file should reflect your full
Sharif University of Technology, Computer Engineering Department
2
·
name. You should also deliver a hard copy of your solutions to Dr. Abolhassani in
the first session of the class after the deadline.
Sharif University of Technology, Computer Engineering Department