Download Data Mining Assignment

Total Score 66 out of 70 Data Mining Assignment – Week 3 Score 10 out of 10 1a.) Data cleaning is a data preprocessing technique that attempts to fill in missing values, smooth out noise while identifying outliers and correct inconsistencies in the data. Data transformation is the processing of data to transform or consolidate it into forms appropriate for data mining that includes smoothing, aggregation, generalization, normalization and attribute construction. Good 1b.) Data smoothing is processing data to reduce the number of values by removing noise in the data. It is external smoothing if done before classification and internal smoothing if done during the classification procedure. Good 1c). Decimal scaling accomplishes the normalization of values by moving the decimal point of the values of an attribute. Z-score normalization is the scaling of attribute data so that values fall within a small specified range based on the mean and standard deviation of values for the attribute. Good Score 10 out of 10 2. a.)The instance similarity was raised to 99 in order to generate 15 clusters. The clustering shows a class structure similar to the actual classes found in the data. b.) All classes except classes 4, 5 and 8 have a resemblance score of 1. c.) None of the classes experienced intermixing of instances, except 4, 5 and 6. In general, s-deciduous, n-deciduous, dark-barren, br-barren-1, brbarren-2 and urban form their own clusters. Shallow and deep water cluster together. The two agricultural classes cluster together. Marsh, turf-grass, wooded_swamp as well as shrub_swamp form a single cluster. Score 10 out of 10 3. The objective was to determine how well the input attributes define the classes contained in the data. The instance similarity setting was set at 45 to produce only two clusters. Two clusters were created whose resemblance scores (0.564 and 0.607) only slightly exceeded the domain resemblance score of 0.52. The healthy and sick instances do cluster together. OK One cluster will contain 112 sick instances and 28 healthy instances. The second cluster will contain 137 healthy instances and 26 sick instances. Score 10 out of 10 4. The transformed value for age = 35 to a value between 0 and 1 is 0.44. Good Score 10 out of 10 5. Fitness scores ( i.e., percent correctly classified expressed as decimal values) are as follows: Pop. Element1 = 0.60, Pop. Element2 = 0.60 and Pop. Element3 = 0.80. Very good Score 7 out of 10 6. a. 93 Good, b. 2 0, c. #colored vessels = 2 or 3 and angina = true Good, d. 74.9 % of individuals with #colored vessels = 0 are healthy The hypothesis is supported by the fact that there are 134 healthy individuals with #colored vessels = 0 and 45 sick individuals with #colored vessels = 0., and e. 83% show no symptom of angina. The hypothesis is verified by the fact that 115 of the 165 healthy individuals show no symptoms of angina and have the value normal for thal. Score 9 out of 10 7. a. 2 Good, b. 2 Zero, c. 0 Good, and d. one individual who purchased credit card insurance purchased all three promo offerings. Good

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Mining Assignment