Download Lecture4

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Review
• Sections 2.1-2.4
• Descriptive Statistics
–
–
–
–
Qualitative (Graphical)
Quantitative (Graphical)
Summation Notation
Qualitative (Numerical)
• Central Measures (mean, median, mode and modal
class)
• Shape of the Data
1
Review
• Sections 2.1-2.4
• Descriptive Statistics
–
–
–
–
Qualitative (Graphical)
Quantitative (Graphical)
Summation Notation
Qualitative (Numerical)
• Central Measures (mean, median, mode and modal
class)
• Shape of the Data
• Measures of Variability
2
Outlier
A data measurement which is unusually large
or small compared to the rest of the data.
Usually from:
– Measurement or recording error
– Measurement from a different population
– A rare, chance event.
3
Advantages/Disadvantages
Mean
• Disadvantages
– is sensitive to outliers
• Advantages
– always exists
– very common
– nice mathematical properties
4
Advantages/Disadvantages
Median
• Disadvantages
– does not take all data into account
• Advantages
–
–
–
–
always exists
easily calculated
not affected by outliers
nice mathematical properties
5
Advantages/Disadvantages
Mode
• Disadvantages
– does not always exist, there could be just one
of each data point
– sometimes more than one
• Advantages
– appropriate for qualitative data
6
Review
A data set is skewed if one tail of the
distribution has more extreme observations
than the other.
http://www.shodor.org/interactivate/activities/
SkewDistribution/
7
Review
Skewed to the right: The mean is bigger than
the median.
M
x
8
Review
Skewed to the left: The mean is less than the
median.
x
M
9
Review
When the mean and median are equal, the
data is symmetric
x M
10
Numerical Measures of Variability
These measure the variability or spread of
the data.
11
Numerical Measures of Variability
These measure the variability or spread of
the data.
x M
Relative
Frequency
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
12
Numerical Measures of Variability
These measure the variability or spread of
the data.
Relative
Frequency
0.5
0.4
x M
0.3
0.2
0.1
0
1
2
3
4
5
13
Numerical Measures of Variability
These measure the variability or spread of
the data.
Relative
Frequency
0.5
0.4
x M
0.3
0.2
0.1
0
1
2
3
4
5
6
14 7
Numerical Measures of Variability
These measure the variability, spread or
relative standing of the data.
–
–
–
–
Range
Standard Deviation
Percentile Ranking
Z-score
15
Range
The range of quantitative data is denoted R
and is given by:
R = Maximum – Minimum
16
Range
The range of quantitative data is denoted R
and is given by:
R = Maximum – Minimum
In the previous examples the first two graphs
have a range of 5 and the third has a range of
7.
17
Range
R = Maximum – Minimum
Disadvantages:
– Since the range uses only two values in the
sample it is very sensitive to outliers.
– Give you no idea about how much data is in the
center of the data.
18
What else?
We want a measure which shows how far
away most of the data points are from the
mean.
19
What else?
We want a measure which shows how far
away most of the data points are from the
mean.
One option is to keep track of the average
distance each point is from the mean.
20
Mean Deviation
The Mean Deviation is a measure of dispersion
which calculates the distance between each data
point and the mean, and then finds the average of
these distances.
Mean Deviation 
sum xi  x
n
x


i
x
n
21
Mean Deviation
Advantages: The mean deviation takes into
account all values in the sample.
Disadvantages: The absolute value signs are
very cumbersome in mathematical equations.
22
Standard Deviation
The sample variance, denoted by s², is:
s
2
(x  x)


2
i
n 1
23
Standard Deviation
The sample variance, denoted by s², is:
s
2
(x  x)


2
i
n 1
The sample standard deviation is s  s .
The sample standard deviation is much more
commonly used as a measure of variance.
2
24
Example
Let the following be data from a sample:
2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
Find:
a) The range
b) The standard deviation of this sample.
25
Sample: 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
a) The range
R
b) The standard deviation of this sample.
x
2
xi
4
3
2
5
2
1
4
5
2
( xi  x )
( xi  x )
2
26
Sample: 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
a) The range
R  5 1  4
b) The standard deviation of this sample.
2  4  3  2  5  2  1  4  5  2 30
x

3
10
10
2
xi
4
3
2
5
2
1
4
5
2
( xi  x )
( xi  x )
2
27
Sample: 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
a) The range
R  5 1  4
b) The standard deviation of this sample.
2  4  3  2  5  2  1  4  5  2 30
x

3
10
10
2
xi
( xi  x )
( xi  x )
4
3
-1 1
0
2
5
2
1
4
5
2
2
28
Sample: 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
a) The range
R  5 1  4
b) The standard deviation of this sample.
2  4  3  2  5  2  1  4  5  2 30
x

3
10
10
2
xi
( xi  x )
( xi  x )
2
4
3
2
5
2
1
4
5
2
-1 1
0 -1 2 -1 -2 1
2 -1
1
0
4
1
1
4
1
4
1
1
29
Sample: 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
2
xi
( xi  x )
( xi  x )
s
2
2
4
3
2
5
2
1
4
5
2
-1 1
0 -1 2 -1 -2 1
2 -1
1
0
4
1
(x  x)


1
4
1
4
1
1
2
i
n 1
30
Sample: 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
2
xi
( xi  x )
( xi  x )
s 
2
2
4
3
2
5
2
1
4
5
2
-1 1
0 -1 2 -1 -2 1
2 -1
1
0
4
1
1
4
1
4
1
1
2
(
x

x
)
 i
n 1
11 0 1 4 1 4 1 4 1

10  1
31
Sample: 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
2
xi
( xi  x )
( xi  x )
s
2
2
4
3
2
5
2
1
4
5
2
-1 1
0 -1 2 -1 -2 1
2 -1
1
0
4
1
(x  x)


1
4
1
4
1
1
2
i
n 1
11 0 1 4 1 4 1 4 1

2
10  1
32
Sample: 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
s
2
(x  x)


2
i
n 1
11 0 1 4 1 4 1 4 1

2
10  1
Standard Deviation:
s  s  2  1.41
2
33
More Standard Deviation
There is a “short cut” formula for finding the
variance and the standard deviation
34
More Standard Deviation
There is a “short cut” formula for finding the
variance and the standard deviation

x

 x  n
2
2
s 
2
i
i
n 1
35
More Standard Deviation

x

 x  n
2
2
s2 
i
i
n 1
Use this to find the standard deviation of the
previous example:
36
More Standard Deviation

x

 x  n
2
2
s2 
i
i
n 1
Use this to find the standard deviation of the
previous example:
xi
xi
2
4
3
2
5
2
1
4
5
2
2
37
More Standard Deviation

x

 x  n
2
2
s2 
i
i
n 1
Use this to find the standard deviation of the
previous example:
xi
xi
2
2
4
3
4 16 9
2
5
2
4 25 4
1
4
5
2
1 16 25 4
38
More Standard Deviation

x

 x  n
2
2
s2 
i
i
n 1
Use this to find the standard deviation of the
previous example:
xi
xi
2
2
4
3
4 16 9
2
5
2
4 25 4
1
4
5

2
1 16 25 4
39
More Standard Deviation

x

 x  n
2
2
s2 
i
i
n 1
Use this to find the standard deviation of the
previous example:
xi
xi
2
2
4
3
4 16 9
2
5
2
4 25 4
1
4
5
2
1 16 25 4

30
108
40
More Standard Deviation
xi
xi
2
2
4
3
4 16 9
2
5
2
4 25 4
1
4
5
2
1 16 25 4

30
108

x

 x  n
2
2
s2 
i
i
n 1
41
More Standard Deviation
xi
xi
2
2
4
3
4 16 9
2
5
2
4 25 4

x

 x  n
2
2
s 
2
n 1
4
5
2
1 16 25 4
30
108

30
108 
2
i
i
1


10  2
10  1
42
More Standard Deviation
xi
xi
2
2
4
3
4 16 9
2
5
2
4 25 4

x

 x  n
2
2
s 
2
n 1
4
5
2
1 16 25 4
30
108

30
108 
2
i
i
1


10  2
10  1
s  s  2  1.41
2
43
More Standard Deviation
Like the mean, we are also interested in the
population variance (i.e. your sample is the
whole population) and the population
standard deviation.
The population variance and standard
deviation are denoted σ and σ2 respectively.
44
More Standard Deviation
The population variance and standard
deviation are denoted σ and σ2 respectively.
****The formula for population variance is
slightly different than sample variance

x

 x  n
2
 
2
 ( xi  x )
 
n
2
2

i
i
n
2
45
Example - Calculator
Find the mean, median, mode, range and
standard deviation for the following sample of
data:
2.3, 2.5, 2.6, 2.7, 3.0, 3.4,
3.4, 3.5, 3.5, 3.5, 3.7, 3.8
Use your calculator
46
Using your Calculator
• Change calculator to statistics mode. (SD if
you have it)
• Enter in the data and then press the S key,
or data key.
• Keep entering data by pressing the S key, or
data key until complete.
• To obtain the summary data, find the x
key for the sample mean and the s key or
n-1 key to display the sample standard
deviation.
47
• Change calculator to statistics mode. (SD if
you have it)
• Enter in the data and then press the S key,
or data key.
• Keep entering data by pressing the S key, or
data key until complete.
• To obtain the summary data, find the x
key for the sample mean and the s key or
n-1 key to display the sample standard
deviation.
2.3, 2.5, 2.6, 2.7, 3.0, 3.4,
3.4, 3.5, 3.5, 3.5, 3.7, 3.8
48
Example - Calculator
Find the mean, median, mode, range and
standard deviation for the following sample of
data:
2.3, 2.5, 2.6, 2.7, 3.0, 3.4,
3.4, 3.5, 3.5, 3.5, 3.7, 3.8
Answer:
Mode = 3.5
x  3.16
M = 3.4
s  0.51
Range = 1.5
49
Example – Using Standard
Deviation
Here are eight test scores from a previous
Stats 201 class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and
16.7, respectively.
50
Example – Using Standard
Deviation
Here are eight test scores from a previous
Stats 201 class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and
16.7, respectively.
We wish to know if any of are data points are
outliers. That is whether they don’t fit with the
general trend of the rest of the data.
51
Example – Using Standard
Deviation
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and
16.7, respectively.
We wish to know if any of are data points are
outliers. That is whether they don’t fit with the
general trend of the rest of the data.
To find this we calculate the number of standard
deviations each point is from the mean.
52
Example – Using Standard
Deviation
To find this we calculate the number of standard
deviations each point is from the mean.
To simplify things for now, work out which data
points are within
a) one standard deviation from the mean i.e.
( x  s, x  s )
b) two standard deviations from the mean i.e.
( x  2 s, x  2 s )
c) three standard deviations from the mean i.e.
( x  3s, x  3s)
53
Example – Using Standard
Deviation
Here are eight test scores from a previous Stats 201
class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and 16.7,
respectively. Work out which data points are within
a) one standard deviation from the mean i.e.
(70.4  16.7, 70.4  16.7)  (53.7, 87.1)
b) two standard deviations from the mean i.e.
(70.4  2(16.7), 70.4  2(16.7))  (37.0, 103.8)
c) three standard deviations from the mean i.e.
(70.4  3(16.7), 70.4  3(16.7))  (21.3, 120.5)
54
Example – Using Standard
Deviation
Here are eight test scores from a previous Stats 201
class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and 16.7,
respectively. Work out which data points are within
a) one standard deviation from the mean i.e.
59, 70, 73, 75, 81, 84, 86
b) two standard deviations from the mean i.e.
59, 70, 73, 75, 81, 84, 86
c) three standard deviations from the mean i.e.
35, 59, 70, 73, 75, 81, 84, 86
55