Download CH 4 Summary Statistics: Measures of Location and Dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
CH 4 Summary Statistics: Measures of Location and Dispersion
4.1 Summation Notation
The sum of values, x1  x2    xn , can be denoted as
n
x
i 1
i
.
Example
Select 4 students and ask “how many brothers and sisters do you have?”
Data: 2,3,1,3
The data values would represent x values.
x1  2
x2  3
x3  1
x4  3
4
Therefore,
x
i 1
 2  3 1 3  9
i
Or we can write
x  9
4.2 The Mean, Median, and the Mode
Measure of Central Tendency
Description of Average (Typical Value)
sample mean:
X
X
(simple average)
n
where n is the sample size
Example: number of siblings
Data: 2,3,1,3
2  3 1 3
 2.25
X 
4
Suppose we had selected a 5th person for our sample which had 10 siblings.
New Data: 2,3,1,3,10
X
Note:
X

2  3  1  3  10
 3.8
5
is not necessarily a possible value and is sensitive to extreme scores
sample median:



~
X
(middle score)
rank data from smallest to largest
if n is odd, median is the middle score
if n is even, median is the average of two middle scores
half of the data will fall above the median and half below
Example (number of siblings)
Data: 2,3,1,3
1,2,3,3
~ 23
X  2  2.5
New Data: 2,3,1,3,10
1,2,3,3,10
Note:
~
X 3
~
X is not sensitive to extreme scores
Therefore, the median is a better measure of central tendency if extreme scores exist.
If extreme scores are unlikely, the mean varies less from sample to sample than the
median and is a better measure.
Suppose we were studying cost of a house or income. What method of central tendency
should be used? What if Bill Gates ends up in our sample?
sample mode: most frequent score
Do example:
Number of brothers and sisters for 4 people.
2,3,1,3
Mode = 3
for 5 people
2,3,1,3,10
Mode = 3
Note: does not always exist/can be more than one
Unstable (What happens if a 3 is changed in our example)
can be used with qualitative data
Example
Average hair color
Data (black, brown, black, red, blonde, brown, brown)
Midrange:
Low  High
2
Example (number of siblings)
Data: 2,3,1,3
Low  High 1  3

2
Midrange =
2
2
New Data: 2,3,1,3,10
Low  High 1  10

 5.5
Midrange =
2
2
Note: totally dependent on extreme scores.
4.3 Quartiles and Percentiles
Quartiles - divide the data into four equally sized parts
First Quartile, Q1: 25% of the data lies below Q1, 75% of the data lies above Q1
Second Quartile (median), Q2: 50% of the data lies below Q2, 50% of data lies above Q2
Third Quartile, Q3: 75% of the data lies below Q3, 25% of the data lies above Q3
1)
2)
3)
4)
Procedure to Compute Quartiles
Order the data from smallest to largest
Find the median. This is the 2nd Quartile
Q1 is the median of the lower half of the data; that is, it is the median of the data
falling below Q2 (not including Q2)
Q3 is the median of the upper half of the data (same as above)
Note: there are a few different ways to calculate the quartiles. I will accept any valid
method.
Interquartile range (IQR) = Q3 – Q1
 Middle 50% of the data
Example
Amount of money in pockets of the members of a meeting
Students
Faculty
1
10
3
15
8
15
5
43
6
28
5
20
10
25
0
73
0
31
7
24
5
28
stem and leaf displays
Students
Faculty
0 0013555678
0
1 0
1 055
2
2 04588
3
3 1
4
4 3
5
5
6
6
7
7 3
Students
Q1 = 1
Q2 = 5
Q3 = 7
Faculty
Q1 = 15
Q2 = 25
Q3 = 31
5 number summary:
The low score, Q1, Q2, Q3, and the high score are known as the Five number summary
of a data set.
Example
Students
Min = 0
Q1 = 1
Q2 = 5
Q3 = 7
Max = 10
Faculty
Min = 10
Q1 = 15
Q2 = 25
Q3 = 31
Max = 73
Example: exam scores for 40 students
Low = 5.6
Q1 = 71.5
Q2 = 80
Q3 = 88.5
High = 100
Note: middle 50% of scores between 71.5 and 88.5
25% of scores above 88.5
How many data values fall within any two values? 10
Boxplots:
1)
2)
3)
4)
Procedure
Draw a scale to include the lowest and highest data value
To the right of the scale draw a box from Q1 to Q3
Include a solid line through the box at the median level
Draw solid lines, called whiskers, from Q1 to the lowest value and from Q3 to the
highest value
Example
Students
Min = 0
Q1 = 1
Q2 = 5
Q3 = 7
Max = 10
Faculty
Min = 10
Q1 = 15
Q2 = 25
Q3 = 31
Max = 73
Boxplots
80
70
Students
60
50
40
30
20
10
0
Students
Faculty
Identifying shapes from Boxplots
As seen in previous example this is bivariate data with One Qualitative and One
Quantitative variable
Example
How does tread design affect an automobiles stopping distance?
Tread Design (A, B, or C)
Stopping Distance
A
43
38
33
A
B
C
4.4 Measures of Dispersion
Distribution #1
1
2 5
3 5555555
4 5
5
Distribution #2
1 5
2 55
3 555
4 55
5 5
Do these distributions look the same?
Calculate all measures of central tendency for both distributions.
Distribution #1
X
~
X
= 35
=35
mode = 35
midrange =35
sample range:
Distribution #2
X
~
X
=35
=35
mode = 35
midrange = 35
high score - low score
Example: Years of experience of faculty
1, 30, 22, 10, 5
sample range = 30-1 = 29 years
Note: totally sensitive to extreme scores
easy to compute
sample variance: measures squared distances from
S2
 X  X

2
n 1

 X    X 
2
n
X
2
n n  1
Note: large values of S 2 suggest large variability
Example: Years of experience of faculty
1, 30, 22, 10, 5
X2
1
900
484
100
25_
1510
X_
1
30
22
10
5_
Total 68
S 
2

nX
2
  X 
nn  1
2

5(1510)  (68) 2 7550  4624 2926


 146.3
5(4)
20
20
sample standard deviation:
S  S2
Example: Years of experience of faculty
1, 30, 22, 10, 5
S  S 2  146.3  12.095
Note: standard deviation uses the same units as the data
4.5 Empirical Rule and Standardized Scores
Z-score:
Gives the number of standard deviations an observation is above or below the mean.
z
xx
s
Example
Test scores
X
= 79, s = 9
88  79 9
 1
9
9
Your score is 1 standard deviation above the mean
If your score is 88%, your z-score is z 
61  79  18

 2
9
9
Your score is 2 standard deviations below the mean
If your score is 61%, your z-score is z 
Empirical rule:
For mound shaped distributions
 Approximately 68% of the data fall within 1 standard deviation of the mean
( x  s, x  s )
 Approximately 95% of the data fall within 2 standard deviations of the mean
( x  2 s, x  2 s )
 Approximately 99.7% of the data fall within 3 standard deviations of the mean
( x  3s, x  3s)
Example
Suppose that the amount of liquid in “12 oz.” Pepsi cans is a mound shaped distribution
with x  12 oz. and s = 0.1 oz.
68% of the data lies between what two values?
(11.9 oz. , 12.1 oz.)