Download Statistics Measures of Variation Unit Plan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Association rule learning wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Measures of Variation Unit Plan
I. Basic Measures of Variation
Consider these two sets:
𝐴: {5,5,5,5,5,5}
𝐡: {0, 0, 0, 10, 10, 10}
They have the same mean, median and midrange, yet they are very different. Clearly something other than
Measures of Central Tendency is needed to fully describe the nature of data sets.
We will use Measures of Variation to quantify the obvious differences between sets A and B. These
measures attempt to quantify the concept of β€œaverage difference from the average.”
In this unit we will look at Five Basic Measures of Variation: Range, Deviation, Sum of Squares, Variance
and Standard Deviation.
We already know how to find range from when we made Frequency Tables: It is the maximum minus the
minimum.
The next Measure of Variation we will look at is deviation. Each data point in a set has its own deviation. It
is found by the following formula:
𝑑π‘₯ = π‘₯ βˆ’ π‘šπ‘’π‘Žπ‘›
But this tells us little about the variation of a set as a whole. To do that, we need to make a data table of
three columns, called a variation table. The first two are x (our data points), and d (their deviations.)
We can sum this second column as a check. The sum of the d column should always be zero:
Ξ£(𝑑) = 0 (check)
Then, we want to square the deviations and add them up, to find the Sum of Squares, which has the
symbol 𝑆𝑆π‘₯ . In formula notation:
𝑆𝑆π‘₯ = βˆ‘(𝑑 2 )
The reason we want to square the deviations is to get rid of the negatives (remember, the square of a
negative number is positive.) One might think simply taking the absolute value would be easier, but doing
this causes problems later on, so we square instead.
Next is the tricky part: To calculate the next MOV, variance, it matters whether you have a population or a
sample…
If you have a population, you divide Sum of Squares by N, the population size. The symbol is 𝜎 2 , β€œlittle
sigma squared.” That is:
𝜎2 =
𝑆𝑆π‘₯
𝑁
If you have a sample, you divide Sum of Squares by one less than the sample size (n). The symbol is 𝑠 2 , β€œs
squared.” That is:
𝑠2 =
𝑆𝑆π‘₯
(𝑛 βˆ’ 1)
The reason for the difference in these calculations is something called Degrees of Freedom. It is an
advanced topic and you don’t need to worry about it for this course. However, if you’re curious, to begin to
understand Degrees of Freedom, think what would happen if you tried to find the variation of a population
by taking a sample of size 1. Would your result have any meaning?
Next, we have to get rid of the squares we put in to these calculations. Doing this will give us standard
deviation, which is the MOV we will be using later on in the course. Remember, square root cancels square,
so:
𝜎 = √𝜎 2 (population standard deviation)
𝑠 = βˆšπ‘  2 (sample standard deviation)
Collect a data set from the class and find its variance and standard deviation.
HW: Basic Measures of Variation Topic Practice
II. Estimating the Variance of a Frequency Table
Just like we could estimate the mean of a frequency table, so we can estimate the variance and standard
deviation of one. The basic idea is the same: Let the midpoints stand in for data points. Find the estimated
mean in the same way as the last section:
π‘šπ‘’π‘Žπ‘› (𝑒𝑠𝑑. ) =
Ξ£(π‘₯𝑓)
𝑁 π‘œπ‘Ÿ 𝑛
Here, remember, x stands for midpoint, as we don’t know the original data points!
Now, find the deviation of each class using the same formula as before (d = x – mean), except now x is the
midpoint. We can square the deviations as before, and as in the estimated mean formula, we must weight
each square by the frequency of its class. So the formula for Sum of Squares changes to:
𝑆𝑆π‘₯ (𝑒𝑠𝑑. ) = Ξ£(𝑑2 𝑓)
After that, variance and standard deviation can be calculated as above.
Give an example.
A similar technique can be used for other forms of grouped data that are not frequency distributions.
HW: Estimating the Variance of a Frequency Table Topic Practice
III. Chebychev’s Theorem
This is the first important Theorem we will be learning in Statistics. It states that the portion of data within k
standard deviations of the mean must exceed 1-1/(k^2). That is:
𝑃 β‰₯1βˆ’
1
π‘˜2
Sometimes it is not immediately apparent what the value of k is. In such times, this formula can be helpful:
π‘˜=
|π‘šπ‘’π‘Žπ‘› βˆ’ π‘π‘œπ‘’π‘›π‘‘|
π‘ π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘›
There are also times when P is given but k is unknown. In these cases we have to remember some Algebra
skills and solve for k.
Give some examples of Chebychev-style problems.
HW: Chebychev’s Theorem Topic Practice
IV. Using Measures of Variation to Classify and Compare Data Sets
We can use our basic measure of variation both to quantify our notion of skew from the last unit, and
thereby be able to determine which of two sets is more skewed, and to compare the variation of sets with
data denominated in different units:
1.) Pearson’s Index of Skew
We can now quantify the idea of skew we learned in the last section with the following formula:
𝑃=
3(π‘šπ‘’π‘Žπ‘› βˆ’ π‘šπ‘’π‘‘π‘–π‘Žπ‘›)
π‘ π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘›
2.) Coefficient of Variation
It is improper to compare the variation of two sets if the data were collected in different units (for
example, speeding data collected in the US vs. that in Canada – one would be in miles and one in
kilometers.) To get around this problem, we have Coefficient of Variation (CV). The formula is:
𝐢𝑉 =
π‘ π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘›
π‘šπ‘’π‘Žπ‘›
This measure is β€œunitless,” so we can use it to compare the variation of sets with different units. CV
is usually expressed as a percent, so multiply the answer you get from the formula by 100.
HW: p. 90-91 #45,50
V. Unit Review
HW: Measures of Variation Worksheets
Measures of Variation Test