Download 9781449699390_TB_ch07 - Department of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geographic information system wikipedia , lookup

Theoretical computer science wikipedia , lookup

Pattern recognition wikipedia , lookup

Data analysis wikipedia , lookup

Page replacement algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
File: chap07, Chapter 7
Multiple Choice
1. The equation d = X2 – X1 is used to find ____.
A.
B.
C.
D.
Euclidean distance
clustering
a data mine
the Pythagorean theorem
Ans: A
Page: 239
2. A(n) ____ is defined as the mean of a collection of data points.
A.
B.
C.
D.
point collection
cluster
Euclid
centroid
Ans: D
Page: 241
3. In the K-means algorithm, the number of clusters is represented by ____.
A.
B.
C.
D.
c
k
the absolute value of X2 – X1
a star
Ans: B
Pages: 241-242
Case Study 1
1. def readFile(filename ):
2.
datafile = open(filename , "r")
3.
datadict = {}
4.
5.
key = 0
6.
for aline in datafile:
7.
key = key + 1
8.
score = int(aline)
9.
10.
datadict[key ] = [ score ]
11.
12. return datadict
4. Refer to the session in the accompanying case study. What happens in Line 10?
A.
B.
C.
D.
The score is entered in the dictionary associated with the key.
The line is read from the file.
The key is computed.
The file is opened.
Ans: A Refer to: Case Study 1
Pages: 242-243
5. Refer to the session in the accompanying case study. What is the purpose of the
program code [ score ]?
A.
B.
C.
D.
It restricts the dictionary score to only hold a list.
It allows for multidimensional data points.
It uses the absolute value of score.
It places the variable score in a cluster.
Ans: B Refer to: Case Study 1
Pages: 242-243
6. What best describes the type of iteration that is shown in the code below?
for num in [1,2,3,4,5]:
print("hello")
A. definite
B. indefinite
C. nested
D. numeric
Ans: A
Pages: 243-244
7. What Python statement is used to create indefinite iteration?
A.
B.
C.
D.
for
while
if
range
Ans: B
Page: 244
Case Study 2
1.
2.
3.
4.
5.
6.
total = 0
anum = 1
while anum <= 10:
total = total + anum
anum = anum + 1
print(total)
8. Refer to the session in the accompanying case study. Which line represents the
initialization of the loop?
A.
B.
C.
D.
1
2
3
5
Ans: B Refer to: Case Study 2
Page: 246
9. Refer to the session in the accompanying case study. Which line checks the condition
of the loop?
A.
B.
C.
D.
1
2
3
5
Ans: C Refer to: Case Study 2
Page: 246
10. What is the problem with the loop shown below?
1 total = 0
2 anum = 1
3 while anum <= 10:
4
total = total + anum
5 print(total)
A.
B.
C.
D.
There is no initialization statement.
The condition is not checked.
It is a definite loop.
It is an infinite loop.
Ans: D
Page: 246
11. Latitude values run north–south with zero latitude located at the equator. The north
pole of the globe is +90 and the south pole is ____.
A.
B.
C.
D.
−180
−90
0
+90
Ans: B
Page: 254
Case Study 3
>>> aline
' 3.7 2006/10/18 05:34:15 62.326 -151.224 85.9 CENTRAL
ALASKA'
>>> items = aline.split()
>>> items
['3.7', '2006/10/18', '05:34:15', '62.326', '-151.224',
'85.9',
'CENTRAL', 'ALASKA']
>>> items[3]
>>> items[6:]
12. Refer to the session in the accompanying case study. What is printed for
items[3]?
A.
B.
C.
D.
'3.7'
'62.326'
'-151.224'
['CENTRAL', 'ALASKA']
Ans: B Refer to: Case Study 3
Page: 255
13. Refer to the session in the accompanying case study. What is printed for
items[6:]?
A.
B.
C.
D.
'3.7'
'62.326'
'-151.224'
['CENTRAL', 'ALASKA']
Ans: D Refer to: Case Study 3
Page: 255
14. What method is used to set a background image for a turtle screen?
A.
B.
C.
D.
bgpic
Screen
bg
screensize
Ans: A
Page: 258
15. Once the turtle has been directed to the proper location, what method will plot a
point using the current tail color?
A.
B.
C.
D.
point
draw
dot
color
Ans: C
Page: 258
True or False
16. One of the most important steps in the cluster analysis algorithm is to classify data
points with regard to their similarity to other data points.
Ans: True
Page: 239
17. When using the K-means algorithm, points will always remain in the same cluster
even after several iterations.
Ans: False
Page: 242
18. A for loop is used to create indefinite iteration.
Ans: False
Page: 244
19. Longitude values run west–east, with the zero being the prime meridian, an
imaginary line that runs north–south through Greenwich, England.
Ans: True
Page: 254
20. The process of “visualizing” data can be quite useful, especially if one is looking for
hard-to-see relationships that may not be readily apparent from long lists of data.
Ans: True
Page: 256
Matching
21. Match each phrase with a definition below:
___ Allows loop body statements to be executed until a condition becomes false
___ Allows a group of statements to be repeated, once for each value in a sequence
___ A loop that never stops
A. infinite loop
B. while loop
C. for loop
Ans: B, C, A
Page: 244, 243, 246
Short Answer
22. What is data mining? Provide an example of an application in which data mining
would be useful.
Ans: Data mining is the application of automated techniques that attempt to discover
underlying patterns. These techniques can be applied to any number of data domains. For
example, in business, data mining is often used for marketing purposes to find patterns
exhibited by consumers. Once these patterns are identified, they can be used to
recommend the products that a customer might purchase. In addition, there are many
applications in science and medicine where finding patterns in large amounts of data is
required.
Page: 236
23. What is cluster analysis?
Ans: Cluster analysis is a data mining technique that attempts to divide the data into
meaningful groups called clusters. These clusters represent data values that show some
kind of similarity to each other while exhibiting a dissimilar relationship to data values
outside of the cluster.
Page: 236
24. How is the distance between two points calculated?
Ans: There are many ways to measure the distance between two data points. For our
purposes here, we use a simple measure of distance known as Euclidean distance.
Consider the two data points, A and B. If we assume that point A has location X1 and
point B has location X2, then the distance between these, d, will be the simple difference
between the two location values d = X2 − X1. However, since we do not know whether
this difference will be positive or negative, the absolute value should be used.
Page: 239
25. What is a centroid?
Ans: A centroid is defined as the mean of a collection of data points. Each cluster will
have a centroid that represents the center of the cluster. It is important to note that the
centroid does not need to be an actual point in the cluster. It is simply the “point” that
tends to be in the center of all others.
Page: 241
26. What are the basic steps in the K-means algorithm?
Ans:
1. Decide how many clusters you would like to create, and call this number k.
2. Randomly choose k of the data points to serve as the initial centroids for the k clusters.
3. Repeat the following steps:
(a) Assign each data point to a cluster corresponding to the centroid it is closest
to.
(b) Recompute the centroids for each of the k clusters.
4. Show the clusters.
Page: 242
27. How do you retrieve a random data value in Python?
Ans: To implement this, we use the randint function from the random module. This
random number generator will pick an integer in the range [a,b] including the
endpoints. For example, randint(2,5) will return a random integer between 2 and 5
inclusive.
Page: 243
28. Explain how the while loop works in Python.
Ans: The condition can be any Boolean expression—that is, any expression that
evaluates to True or False. The statements in the body will be executed repeatedly
until the condition evaluates to False. It is important to note that if the Boolean
expression is False initially, the statements will never be executed. In other words, the
statements in the body will be executed zero or more times, depending on the value of the
condition.
Page: 244
29. One of the weaknesses of the K-means cluster analysis algorithm is that the clusters
can become empty. Describe how this problem occurs and its effect.
Ans: Clusters may become empty as the iteration process continues. In the text’s
implementation, once a cluster becomes empty, there is no way for it to be repopulated
because it no longer has a centroid. When a cluster becomes empty, some method might
be employed to create a new centroid so that data points can be added in the next
iteration. Of course, it is always possible to leave the cluster empty and produce fewer
clusters than originally specified.
Page: 259
30. One of the weaknesses of the K-means cluster analysis algorithm is that the clusters
can become too large. Describe how this problem occurs and a possible solution.
Ans: Sometimes a cluster can get too large or can encompass data points that are
seemingly not related. This can happen when there are data points in the data set that are
clearly different from the rest (sometimes referred to as outliers). When an outlier is
found, it may be possible to provide some special processing so as to create an additional
cluster, or to exclude it from any cluster, thereby nullifying the impact the outlier might
have on the centroid calculations.
Page: 259