Download Post Analysis Project1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ch. Eick
Christoph F. Eick
Ch. Eick
Post Analysis Project1
Disclaimer
 The main purpose of these slides is not criticize groups but
rather to learn how to do a better job when analyzing data and
interpreting data mining results.
 Most of you do not have much experience in these tasks
 Learning without making errors is impossible; therefore,
students can benefit from discussing errors of other students
Visualization
 Use large, high resolution displays—some students used displays
that did not reveal much because of too high density.
 Quality of the visualization impacts what you are able to see
 If you compare displays, put them next to each other!!
2
 Use the same coordinate systems/scale in displays you compare
Ch. Eick
Post Analysis Project1 Part2
Interpretation
 Scatterplot: the key question is if the attribute/pair of attributes
can provide some evidence for the dominance of a particular
class in a particular region in the attribute space; not if the
attribute pair clearly separates the classes.
 Vague interpretation of quantitative results; e.g. “Att1 seems to
be more important that Att2” versus “the fact the regression
coefficient of Att1 is 12 times as large as the regression
coefficient of Att2 suggest that attribute Att1 has a much
stronger impact on class membership”.
 Overlooking patterns in displays; e.g. regions that are dominated
by one class or only looking for pattern in E/W direction when
there are also clear patterns in N/S direction.
 Not giving summaries at all or giving very “quick” summaries
3
Ch. Eick
Some Displays
4
Discuss Scatter Plots generated by Group 8
Ch. Eick
5
Ch. Eick
Regression Results
Mean Value
GlucoseConc
121.6867628
BloodP
72.4051842
BMI
32.4574637
Pedigree
0.4718763
No Scaling:
R2 :
Multiple R-squared: 0.286
Adjusted R-squared: 0.282
Coefficients:
(Intercept)
V2
V3
V6
V7
-0.9930791
0.0066490 0.0006933 0.0126270 0.1399540
With Scaling:
Coefficients
Intercept
0.343923
scale(GlucoseConc
)
0.204457
scale(BloodP)
scale(BMI)
scale(Pedigree)
0.008583
0.086987
0.046509
The fact that the R2 is 0.28 suggests that the results a suggestive but do not
Indicate a strong finding about the importance of the attributes.
6
Ch. Eick
Box Plots
Thanks to Group 10!
7
Ch. Eick
Post Analysis Project1 Part3
Statistical Summaries
 If there are minor disagreement I took away 1 point
 If the results do not make any sense, I took away a lot of points (only
happened once)
 If it was not clear how the results were generated (no R-code or incomplete
R-code or lack of explanation), I also took away points.
Other
 You were also supposed to interpret the histograms, but the project
specification failed to ask you to do that! discuss another example
inReview2
Importance of Attributes
 GC is definitely very helpful for diagnosing diabetes (scatter plot,
regression); e.g. if it is quite low, it is very unlikely that the person has
diabetes (useful for diabetes test)
 BMI (boxplot, scatterplot, regression coefficients) and to a lesser extend
Pedigree have some usefulness in diagnosing diabetes.
 No evidence has been suggested by any group that DBP has any usefulness
in diagnosing diabetes, although it has a week positive correlation of 0.28
with BMI
8
Ch. Eick
Post Analysis Project1 Part4
Linear Regression
 If you do not scale data, interpretation of the observed coefficients is
quite complicated (see previous slide).
 Lack of quantitative assessment of results
Star Plots
 What is in your opinion the usefulness of this techniques?
 I myself have difficulties making sense of those, but some of you do
seem to like Star Plots much more...
Conclusion/Other Findings
 Half of the groups of quite short conclusions and most summaries are
somewhat vague; e.g. they do not write about




The importance/usefulness of the attributes
The usefulness of the employed techniques
Knowledge about diabetes generated in Project1
…
Project Weights Fall 2013
Project2>Project3??>Project4 Project1
9