Download Test

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
IT 241
Information Discovery and Architecture Exam 3
Page 1
December 11, 2014
Name _____________________________
This exam permits one page of handwritten notes.
1. Interaction and presentation concepts.
[5 pts each=15]
a. How do you distinguish between panning and scrolling?
b. Consider an interactive visualization of census data (population and income levels). What would zooming in
and zooming out on this visualization accomplish in each case?
c. Consider an interactive implementation of overview+detail to a census visualization on a map. How could
this be useful? What could be shown with this concept?
2. Order the Knowledge Discovery in Databases (KDD) steps into their proper order. Number them 1,2,…,7.
[6 pts]
_____ Data Mining
_____ Data transformation
_____ Interpretation and evaluation
_____ Action on the data
_____ Creation of a target data set
_____ Data preprocessing
_____ Goal identification
IT 241
Information Discovery and Architecture Exam 3
Page 2
3. Indicate which is a data mining (DM) result or task or if it is simply a database query (Q) result or task.
[5 pts]
____ Looking up a phone number in a directory
____ Finding that particular names are prevalent in certain locations.
____ Using a search engine for web sites on the term “Data Mining”
____ Grouping together similar documents according to their context.
____ Determining the average salaries of the departments of a company.
4. When developing a data mining model, we often split the data into two sets: training and test. How are they both
used?
[5 pts]
5. Describe a decision tree that results from data mining. How is it then used?
[5 pts]
[8 pts]
6. True/false.
_____ Association rule generation is a supervised data mining method.
_____ The K-means clustering algorithm is an unsupervised data mining method.
_____ Attribute selection typically results in choosing and keeping those with a high correlation to another attributes.
_____ A decision tree generation algorithm will use the whole data set and then check veracity of the model with the same
data set.
_____ Neural networks are artificial intelligence algorithms that determine a classification model by learning and revising
based on the training data.
_____ Linear regression requires all attributes to be nominal or converted to nominal.
_____ A decision tree is an unsupervised data mining method.
_____ Data mining is limited in the data set size—no big data, including commercial tools.
IT 241
Information Discovery and Architecture Exam 3
Page 3
[15 pts]
7. Decision trees.
a. Given the decision tree rule for the above dataset
IF Sex=Female && MagPromo=Yes
THEN WatchPromo=Yes
Determine its coverage = ___________% and its accuracy = ___________%
b. Draw a decision tree to correspond
with these three production rules.
(Not all leaves are defined.)
IF Sex=Female
THEN CreditCardInsurance=No
IF Sex=Male && WatchPromo=No
THEN CreditCardInsurance=No
IF Sex=Male && IncomeRange =30-40K
THEN CreditCardInsurance=Yes
c. In predicting MagazinePromotion, why is the entropy=0 bits for Salary=”50-60K”?
d. In predicting WatchPromotion, the entropy for Salary=”30-40K” is expressed as
info([ ___ , ___ ]) = entropy ( ____/___ , ___/___) There are only 3 different numbers in these 6 blanks.
IT 241
Information Discovery and Architecture Exam 3
Page 4
8. Association Rules.
[15 pts]
a. Using the credit card data above of 15 records, identify 5 single items sets would be generated with a
confidence threshold of 50%? (for the age attribute use split of age<40 and age>=40)
single item sets (e,g, MagPromo=No)
Number of items
A.
B.
C.
D.
E.
b. What pairings of your 5 item sets A-E, if any, also meet the 50% confidence threshold?
c. If you had the pairing (which you may not necessarily have) of Sex=Male and LifeInsPromo=Yes, what two
rules could be expressed? And then, calculate their coverage as a ratio.
i. IF ________________________ THEN _________________________ (___ / ____)
ii. IF ________________________ THEN _________________________ (___ / ____)
IT 241
Information Discovery and Architecture Exam 3
Page 5
9. Given the confusion matrix of correctly classified and incorrectly classified animals, answer the following questions.
[8 pts]
Actual\Predicted
Cat
Dog
Rabbit
Cat
13
5
2
Dog
3
10
2
Rabbit
1
3
16
a. Total number of dogs = ____________
b. Number of cats classified as a dog = _______
c. Number of dogs incorrectly classified = ________
d. Percent classified correctly for rabbits = ________
10. Clustering data mining true/false.
[8 pts]
_____ The user must pre-specify the number of clusters.
_____ The Kmeans clustering algorithm can only be applied to 2 attributes at a time.
_____ Random data points are chosen to initialize the cluster centroids.
_____ Euclidean distances are preferred in the K-means clustering when associating data points to cluster centroids.
_____ The K-means clustering algorithm determines the final clusters in 2 iterations.
_____ Different random number seeds are good to experiment with to start the K-means algorithm.
_____ The Perceptron method is a form of neural networks.
_____ The Perceptron method adjusts its biases away from an instance when it discovers a misclassified instance.
11. Discuss the linkage between data visualization and data mining. How did you leverage the two topics on your project,
for instance? Your essay should be at least 100 words.
[10 pts]
IT 241
Information Discovery and Architecture Exam 3
Page 6
Course feedback regarding the shared lectures with University of Applied Sciences in Germany. I know this isn’t very
anonymous, but hope you will still give us constructive suggestions of what worked and what didn’t work and
what we might consider doing differently or better.
What was good about sharing lectures?
What didn’t work for you?
What suggestions do you have for us?
How did the collaboration with a German student or students work for you? What was good? What needs to be improved?
Thank you!