Download Instructor Planning Sheet

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Group # 2
Wib Leonard
Nick Pajewski
Megan Marchini
Activity Planning Worksheet
1. Learning objectives
a. Explain how outlying observations effect numeric summary measures of
central tendency and dispersion
b. Explain why rank-based statistics are more robust to outlying observations
c. Recognize outlying observations graphically through boxplots, etc.
2. Context
a. Graphically represent distributions (histograms, boxplot, etc.)
b. Calculate summary measures of central tendency and dispersion (mean,
median, mode, standard deviation, Inter-quartile range)
3. Mechanics
a. Activity assumes each student will have a calculator
b. Worksheet will be given to each group in order to guide data collection
c. Break students into groups of 5 (assuming a class of roughly 30 students)
i. Have students collect and record their ages as well as the ages of
any siblings on attached worksheet
ii. Have students also record the age of the oldest grandparent in the
group
d. Have students compute summary measures for each of the two datasets
i. (1) w/o grandparent
(2) w/ grandparent
e. Have students put their computed statistics on board in order to compare
results across the class
i. This comparison should allow illustrating how the effect of outliers
diminishes as the sample size increases
ii. May need to combine groups to illustrate this point
f. If necessary, computer-based presentation to formalize concepts
i. Consider using a dataset like the Shark dataset (Agresti page 45)
and setting up an Excel spreadsheet like in the file shark_example
.xls
ii. Or, for a more visual presentation, an applet like the one at …. Can
be used.
4. Variety
a. Basic calculations of summary measures
b. Thinking about distributional shapes
c. Thinking about how outliers adjust summary statistics
5. Summary
a. When using only the ages of your group and its siblings, the mean,
median, and mode should be similar. However, when adding the age of the
oldest grandparent, we would expect the mean to be greater than the
median.
b. Summary measures like the mean and sample standard deviation are more
sensitive to outlying observations than rank-based measures like the
median and Inter-quartile range.
c. The effect of outliers diminishes as the sample size increases.
6. Follow-up
a. Students will be asked question on topic during next quiz / exam
b. A follow-up discussion should center on what to do with outlying
observations ( exclude from analysis, provide separate analyses, etc. )
c. As a long term extension, this activity ties into hypothesis testing in two
fashions
i. First, it can illustrate the effect of outliers on parametric tests, like
the two-sample t-test, where population means are compared
ii. Second, it provides a motivation for using rank-based nonparametric tests in the situation where outliers cause violations in
model assumptions
d. This activity also introduces the concept of outliers for use in the linear
regression setting (Residual analysis, influential observations, etc.)
e. This can also be tied into a discussion on data transformations, such as
using a log transformation, to lessen an outlier’s effect