Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CE417 - Data Mining Course - Fall 1386 Homework Assignment #1 All homeworks must be solved and written independently. If you use someone else’s work including books, papers or any other material, then you have to acknowledge it and directly cite those resources in every place in your document that they are used. Questions and Problems (1) Define each of the following data mining functionalities: characterization, discrimination, association and correlation analysis, classification, prediction, clustering, and evolution analysis. (2) Discuss various techniques for handling missing values in a dataset. What are the advantages and disadvantages of each technique? (3) Explain why ”the curse of dimensionality” principles are especially important in understanding large data sets. (4) Explain what we gain and what we lose with dimensionality reduction in large data sets in the preprocessing phase of data mining. (5) Describe three classifications of dimensionality reduction techniques, and discuss the advantages and disadvantages of each class in each classification. (6) Imagine that we have a function Dc that accepts as input only a single numerical attribute of a dataset and converts it to a nominal attribute. What problems might occur if we use Dc to discretize each of the numerical attributes of a multidimensional dataset? Demonstrate this problem through an illustrative example. Propose at least one solution to this problem. Show how your solution alleviates or solves the problem using the example you presented. (7) Given a one-dimensional data set X = {-5.0, 23.0, 17.6, 7.23, 1.11}, normalize the data set using —Decimal scaling on interval [-1, 1]. —Min-max normalization on interval [0, 1]. —Min-max normalization on interval [-1, 1]. —Standard deviation normalization. —Compare the results of previous normalizations and discuss the advantages and disadvantages of different techniques. Submission Instructions You should submit your solutions in PDF format to s [email protected], before 1st of Aban. The subject of the email should conform to the following format: [DM C][HW 1][your student number] Your email should have one PDF attachment that contains your solutions. The name of the file should be your student number and the file should reflect your full Sharif University of Technology, Computer Engineering Department 2 · name. You should also deliver a hard copy of your solutions to Dr. Abolhassani in the first session of the class after the deadline. Sharif University of Technology, Computer Engineering Department