Download 9781449699390_TB_ch07 - Department of Computer Science

File: chap07, Chapter 7 Multiple Choice 1. The equation d = X2 – X1 is used to find ____. A. B. C. D. Euclidean distance clustering a data mine the Pythagorean theorem Ans: A Page: 239 2. A(n) ____ is defined as the mean of a collection of data points. A. B. C. D. point collection cluster Euclid centroid Ans: D Page: 241 3. In the K-means algorithm, the number of clusters is represented by ____. A. B. C. D. c k the absolute value of X2 – X1 a star Ans: B Pages: 241-242 Case Study 1 1. def readFile(filename ): 2. datafile = open(filename , "r") 3. datadict = {} 4. 5. key = 0 6. for aline in datafile: 7. key = key + 1 8. score = int(aline) 9. 10. datadict[key ] = [ score ] 11. 12. return datadict 4. Refer to the session in the accompanying case study. What happens in Line 10? A. B. C. D. The score is entered in the dictionary associated with the key. The line is read from the file. The key is computed. The file is opened. Ans: A Refer to: Case Study 1 Pages: 242-243 5. Refer to the session in the accompanying case study. What is the purpose of the program code [ score ]? A. B. C. D. It restricts the dictionary score to only hold a list. It allows for multidimensional data points. It uses the absolute value of score. It places the variable score in a cluster. Ans: B Refer to: Case Study 1 Pages: 242-243 6. What best describes the type of iteration that is shown in the code below? for num in [1,2,3,4,5]: print("hello") A. definite B. indefinite C. nested D. numeric Ans: A Pages: 243-244 7. What Python statement is used to create indefinite iteration? A. B. C. D. for while if range Ans: B Page: 244 Case Study 2 1. 2. 3. 4. 5. 6. total = 0 anum = 1 while anum <= 10: total = total + anum anum = anum + 1 print(total) 8. Refer to the session in the accompanying case study. Which line represents the initialization of the loop? A. B. C. D. 1 2 3 5 Ans: B Refer to: Case Study 2 Page: 246 9. Refer to the session in the accompanying case study. Which line checks the condition of the loop? A. B. C. D. 1 2 3 5 Ans: C Refer to: Case Study 2 Page: 246 10. What is the problem with the loop shown below? 1 total = 0 2 anum = 1 3 while anum <= 10: 4 total = total + anum 5 print(total) A. B. C. D. There is no initialization statement. The condition is not checked. It is a definite loop. It is an infinite loop. Ans: D Page: 246 11. Latitude values run north–south with zero latitude located at the equator. The north pole of the globe is +90 and the south pole is ____. A. B. C. D. −180 −90 0 +90 Ans: B Page: 254 Case Study 3 >>> aline ' 3.7 2006/10/18 05:34:15 62.326 -151.224 85.9 CENTRAL ALASKA' >>> items = aline.split() >>> items ['3.7', '2006/10/18', '05:34:15', '62.326', '-151.224', '85.9', 'CENTRAL', 'ALASKA'] >>> items[3] >>> items[6:] 12. Refer to the session in the accompanying case study. What is printed for items[3]? A. B. C. D. '3.7' '62.326' '-151.224' ['CENTRAL', 'ALASKA'] Ans: B Refer to: Case Study 3 Page: 255 13. Refer to the session in the accompanying case study. What is printed for items[6:]? A. B. C. D. '3.7' '62.326' '-151.224' ['CENTRAL', 'ALASKA'] Ans: D Refer to: Case Study 3 Page: 255 14. What method is used to set a background image for a turtle screen? A. B. C. D. bgpic Screen bg screensize Ans: A Page: 258 15. Once the turtle has been directed to the proper location, what method will plot a point using the current tail color? A. B. C. D. point draw dot color Ans: C Page: 258 True or False 16. One of the most important steps in the cluster analysis algorithm is to classify data points with regard to their similarity to other data points. Ans: True Page: 239 17. When using the K-means algorithm, points will always remain in the same cluster even after several iterations. Ans: False Page: 242 18. A for loop is used to create indefinite iteration. Ans: False Page: 244 19. Longitude values run west–east, with the zero being the prime meridian, an imaginary line that runs north–south through Greenwich, England. Ans: True Page: 254 20. The process of “visualizing” data can be quite useful, especially if one is looking for hard-to-see relationships that may not be readily apparent from long lists of data. Ans: True Page: 256 Matching 21. Match each phrase with a definition below: ___ Allows loop body statements to be executed until a condition becomes false ___ Allows a group of statements to be repeated, once for each value in a sequence ___ A loop that never stops A. infinite loop B. while loop C. for loop Ans: B, C, A Page: 244, 243, 246 Short Answer 22. What is data mining? Provide an example of an application in which data mining would be useful. Ans: Data mining is the application of automated techniques that attempt to discover underlying patterns. These techniques can be applied to any number of data domains. For example, in business, data mining is often used for marketing purposes to find patterns exhibited by consumers. Once these patterns are identified, they can be used to recommend the products that a customer might purchase. In addition, there are many applications in science and medicine where finding patterns in large amounts of data is required. Page: 236 23. What is cluster analysis? Ans: Cluster analysis is a data mining technique that attempts to divide the data into meaningful groups called clusters. These clusters represent data values that show some kind of similarity to each other while exhibiting a dissimilar relationship to data values outside of the cluster. Page: 236 24. How is the distance between two points calculated? Ans: There are many ways to measure the distance between two data points. For our purposes here, we use a simple measure of distance known as Euclidean distance. Consider the two data points, A and B. If we assume that point A has location X1 and point B has location X2, then the distance between these, d, will be the simple difference between the two location values d = X2 − X1. However, since we do not know whether this difference will be positive or negative, the absolute value should be used. Page: 239 25. What is a centroid? Ans: A centroid is defined as the mean of a collection of data points. Each cluster will have a centroid that represents the center of the cluster. It is important to note that the centroid does not need to be an actual point in the cluster. It is simply the “point” that tends to be in the center of all others. Page: 241 26. What are the basic steps in the K-means algorithm? Ans: 1. Decide how many clusters you would like to create, and call this number k. 2. Randomly choose k of the data points to serve as the initial centroids for the k clusters. 3. Repeat the following steps: (a) Assign each data point to a cluster corresponding to the centroid it is closest to. (b) Recompute the centroids for each of the k clusters. 4. Show the clusters. Page: 242 27. How do you retrieve a random data value in Python? Ans: To implement this, we use the randint function from the random module. This random number generator will pick an integer in the range [a,b] including the endpoints. For example, randint(2,5) will return a random integer between 2 and 5 inclusive. Page: 243 28. Explain how the while loop works in Python. Ans: The condition can be any Boolean expression—that is, any expression that evaluates to True or False. The statements in the body will be executed repeatedly until the condition evaluates to False. It is important to note that if the Boolean expression is False initially, the statements will never be executed. In other words, the statements in the body will be executed zero or more times, depending on the value of the condition. Page: 244 29. One of the weaknesses of the K-means cluster analysis algorithm is that the clusters can become empty. Describe how this problem occurs and its effect. Ans: Clusters may become empty as the iteration process continues. In the text’s implementation, once a cluster becomes empty, there is no way for it to be repopulated because it no longer has a centroid. When a cluster becomes empty, some method might be employed to create a new centroid so that data points can be added in the next iteration. Of course, it is always possible to leave the cluster empty and produce fewer clusters than originally specified. Page: 259 30. One of the weaknesses of the K-means cluster analysis algorithm is that the clusters can become too large. Describe how this problem occurs and a possible solution. Ans: Sometimes a cluster can get too large or can encompass data points that are seemingly not related. This can happen when there are data points in the data set that are clearly different from the rest (sometimes referred to as outliers). When an outlier is found, it may be possible to provide some special processing so as to create an additional cluster, or to exclude it from any cluster, thereby nullifying the impact the outlier might have on the centroid calculations. Page: 259

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 9781449699390_TB_ch07 - Department of Computer Science