Download Test

IT 241 Information Discovery and Architecture Exam 3 Page 1 December 11, 2014 Name _____________________________ This exam permits one page of handwritten notes. 1. Interaction and presentation concepts. [5 pts each=15] a. How do you distinguish between panning and scrolling? b. Consider an interactive visualization of census data (population and income levels). What would zooming in and zooming out on this visualization accomplish in each case? c. Consider an interactive implementation of overview+detail to a census visualization on a map. How could this be useful? What could be shown with this concept? 2. Order the Knowledge Discovery in Databases (KDD) steps into their proper order. Number them 1,2,…,7. [6 pts] _____ Data Mining _____ Data transformation _____ Interpretation and evaluation _____ Action on the data _____ Creation of a target data set _____ Data preprocessing _____ Goal identification IT 241 Information Discovery and Architecture Exam 3 Page 2 3. Indicate which is a data mining (DM) result or task or if it is simply a database query (Q) result or task. [5 pts] ____ Looking up a phone number in a directory ____ Finding that particular names are prevalent in certain locations. ____ Using a search engine for web sites on the term “Data Mining” ____ Grouping together similar documents according to their context. ____ Determining the average salaries of the departments of a company. 4. When developing a data mining model, we often split the data into two sets: training and test. How are they both used? [5 pts] 5. Describe a decision tree that results from data mining. How is it then used? [5 pts] [8 pts] 6. True/false. _____ Association rule generation is a supervised data mining method. _____ The K-means clustering algorithm is an unsupervised data mining method. _____ Attribute selection typically results in choosing and keeping those with a high correlation to another attributes. _____ A decision tree generation algorithm will use the whole data set and then check veracity of the model with the same data set. _____ Neural networks are artificial intelligence algorithms that determine a classification model by learning and revising based on the training data. _____ Linear regression requires all attributes to be nominal or converted to nominal. _____ A decision tree is an unsupervised data mining method. _____ Data mining is limited in the data set size—no big data, including commercial tools. IT 241 Information Discovery and Architecture Exam 3 Page 3 [15 pts] 7. Decision trees. a. Given the decision tree rule for the above dataset IF Sex=Female && MagPromo=Yes THEN WatchPromo=Yes Determine its coverage = ___________% and its accuracy = ___________% b. Draw a decision tree to correspond with these three production rules. (Not all leaves are defined.) IF Sex=Female THEN CreditCardInsurance=No IF Sex=Male && WatchPromo=No THEN CreditCardInsurance=No IF Sex=Male && IncomeRange =30-40K THEN CreditCardInsurance=Yes c. In predicting MagazinePromotion, why is the entropy=0 bits for Salary=”50-60K”? d. In predicting WatchPromotion, the entropy for Salary=”30-40K” is expressed as info([ ___ , ___ ]) = entropy ( ____/___ , ___/___) There are only 3 different numbers in these 6 blanks. IT 241 Information Discovery and Architecture Exam 3 Page 4 8. Association Rules. [15 pts] a. Using the credit card data above of 15 records, identify 5 single items sets would be generated with a confidence threshold of 50%? (for the age attribute use split of age<40 and age>=40) single item sets (e,g, MagPromo=No) Number of items A. B. C. D. E. b. What pairings of your 5 item sets A-E, if any, also meet the 50% confidence threshold? c. If you had the pairing (which you may not necessarily have) of Sex=Male and LifeInsPromo=Yes, what two rules could be expressed? And then, calculate their coverage as a ratio. i. IF ________________________ THEN _________________________ (___ / ____) ii. IF ________________________ THEN _________________________ (___ / ____) IT 241 Information Discovery and Architecture Exam 3 Page 5 9. Given the confusion matrix of correctly classified and incorrectly classified animals, answer the following questions. [8 pts] Actual\Predicted Cat Dog Rabbit Cat 13 5 2 Dog 3 10 2 Rabbit 1 3 16 a. Total number of dogs = ____________ b. Number of cats classified as a dog = _______ c. Number of dogs incorrectly classified = ________ d. Percent classified correctly for rabbits = ________ 10. Clustering data mining true/false. [8 pts] _____ The user must pre-specify the number of clusters. _____ The Kmeans clustering algorithm can only be applied to 2 attributes at a time. _____ Random data points are chosen to initialize the cluster centroids. _____ Euclidean distances are preferred in the K-means clustering when associating data points to cluster centroids. _____ The K-means clustering algorithm determines the final clusters in 2 iterations. _____ Different random number seeds are good to experiment with to start the K-means algorithm. _____ The Perceptron method is a form of neural networks. _____ The Perceptron method adjusts its biases away from an instance when it discovers a misclassified instance. 11. Discuss the linkage between data visualization and data mining. How did you leverage the two topics on your project, for instance? Your essay should be at least 100 words. [10 pts] IT 241 Information Discovery and Architecture Exam 3 Page 6 Course feedback regarding the shared lectures with University of Applied Sciences in Germany. I know this isn’t very anonymous, but hope you will still give us constructive suggestions of what worked and what didn’t work and what we might consider doing differently or better. What was good about sharing lectures? What didn’t work for you? What suggestions do you have for us? How did the collaboration with a German student or students work for you? What was good? What needs to be improved? Thank you!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Test