Download Data Mining Assignment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Total Score 66 out of 70
Data Mining Assignment – Week 3
Score 10 out of 10
1a.) Data cleaning is a data preprocessing technique that attempts to fill in missing
values, smooth out noise while identifying outliers and correct inconsistencies in the
data. Data transformation is the processing of data to transform or consolidate it into
forms appropriate for data mining that includes smoothing, aggregation,
generalization, normalization and attribute construction. Good
1b.) Data smoothing is processing data to reduce the number of values by removing
noise in the data. It is external smoothing if done before classification and internal
smoothing if done during the classification procedure. Good
1c). Decimal scaling accomplishes the normalization of values by moving the
decimal point of the values of an attribute. Z-score normalization is the scaling of
attribute data so that values fall within a small specified range based on the mean and
standard deviation of values for the attribute. Good
Score 10 out of 10
2. a.)The instance similarity was raised to 99 in order to generate 15 clusters. The
clustering shows a class structure similar to the actual classes found in the data.
b.) All classes except classes 4, 5 and 8 have a resemblance score of 1. c.) None
of the classes experienced intermixing of instances, except 4, 5 and 6.
In general, s-deciduous, n-deciduous, dark-barren, br-barren-1, brbarren-2 and urban form their own clusters. Shallow and deep water
cluster together. The two agricultural classes cluster together. Marsh,
turf-grass, wooded_swamp as well as shrub_swamp form a single
cluster.
Score 10 out of 10
3. The objective was to determine how well the input attributes define the classes
contained in the data. The instance similarity setting was set at 45 to produce only
two clusters. Two clusters were created whose resemblance scores (0.564 and
0.607) only slightly exceeded the domain resemblance score of 0.52. The healthy
and sick instances do cluster together. OK
One cluster will contain 112 sick instances and 28 healthy instances.
The second cluster will contain 137 healthy instances and 26 sick
instances.
Score 10 out of 10
4. The transformed value for age = 35 to a value between 0 and 1 is 0.44. Good
Score 10 out of 10
5. Fitness scores ( i.e., percent correctly classified expressed as decimal values) are
as follows: Pop. Element1 = 0.60, Pop. Element2 = 0.60 and Pop. Element3 =
0.80. Very good
Score 7 out of 10
6. a. 93 Good, b. 2 0, c. #colored vessels = 2 or 3 and angina = true Good, d. 74.9
% of individuals with #colored vessels = 0 are healthy The hypothesis is
supported by the fact that there are 134 healthy individuals with
#colored vessels = 0 and 45 sick individuals with #colored vessels =
0., and e. 83% show no symptom of angina. The hypothesis is verified by
the fact that 115 of the 165 healthy individuals show no symptoms of
angina and have the value normal for thal.
Score 9 out of 10
7. a. 2 Good, b. 2 Zero, c. 0 Good, and d. one individual who purchased credit
card insurance purchased all three promo offerings. Good