Download Interpreting the standard deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 5 Part 1
Using the Mean and Standard
Deviation Together
z-scores
68-95-99.7 rule
Changing units (shifting and
rescaling data)
1
Z-scores: Standardized Data
Values
Measures the distance of a
number from the mean in units of
the standard deviation
2
z-score corresponding to y
y y
z
s
where
y  original data value
y  the sample mean
s  the sample standard deviation
z  the z-score corresponding to y
3
If data has mean y and standard deviation s,
then standardizing a particular value of y
indicates how many standard deviations y
is above or below the mean y .

Exam 1: y1 = 88, s1 = 6; exam 1 score: 91
Exam 2: y2 = 88, s2 = 10; exam 2 score: 92
Which score is better?
z1 
z2 
91  88
6
92  88


3
 .5
6
4
 .4
10
10
91 on exam 1 is better than 92 on exam 2
4
Comparing SAT and ACT
Scores
 SAT
Math: Eleanor’s score 680
SAT mean =500 sd=100
 ACT Math: Gerald’s score 27
ACT mean=18 sd=6
 Eleanor’s z-score: z=(680-500)/100=1.8
 Gerald’s z-score: z=(27-18)/6=1.5
 Eleanor’s score is better.
5
Z-scores add to zero
Student/Institutional Support to Athletic Depts For the 9 Public ACC
Schools: 2013 ($ millions)
School
Support
y - ybar
Z-score
Maryland
15.5
6.4
1.79
UVA
13.1
4.0
1.12
Louisville
10.9
1.8
0.50
UNC
9.2
0.1
0.03
VaTech
7.9
-1.2
-0.34
FSU
7.9
-1.2
-0.34
GaTech
7.1
-2.0
-0.56
NCSU
6.5
-2.6
-0.73
Clemson
3.8
-5.3
-1.47
Mean=9.1000,
s=3.5697
Sum = 0
Sum = 0
6
In a recent year the mean tuition at 4-yr public
colleges/universities in the U.S. was $6185 with a
standard deviation of $1804. In NC the tuition
was $4320. What is NC’s z-score?
1.
2.
3.
4.
5.
1.03
-1.03
2.39
1865
-1865
7
68-95-99.7 rule
Mean and
Standard Deviation
(numerical)
Histogram
(graphical)
68-95-99.7 rule
9
The 68-95-99.7 rule; applies
only to mound-shaped data

approximately 68% of the measurements
are within 1 standard deviation of the mean,
that is, in ( y  s, y  s )

approx. 95% of the measurements are within
2 stand. dev. of the mean, i.e., in ( y  2s, y  2s )

almost all the measurements are within 3 stan.
dev of the mean, i.e., in ( y  3s, y  3s )
10
68-95-99.7 rule: 68% within 1
stan. dev. of the mean
0.4
0.35
0.3
0.25
68%
0.2
0.15
0.1
34%
34%
0.05
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
y-s
y
y+s
11
68-95-99.7 rule: 95% within 2
stan. dev. of the mean
0.4
0.35
0.3
0.25
95%
0.2
0.15
0.1
47.5% 47.5%
0.05
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
y-2s
y
y+2s
12
Example: textbook costs
286
328
349
367
382
398
425
480
291
340
354
369
385
409
426
307
342
355
371
385
409
428
308
346
355
373
387
410
433
315
347
360
377
390
418
434
316
348
361
380
390
422
437
327
348
364
381
397
424
440
n  50
y  375.48
s  42.72
13
Example: textbook costs (cont.)
286
340
355
373
390
422
440
291
342
355
377
390
424
480
307
346
360
380
397
425
308
347
361
381
398
426
315
348
364
382
409
428
316
348
367
385
409
433
327
349
369
385
410
434
328
354
371
387
418
437
1 standard deviation interval about the mean
y  375.48 s  42.72
( y  s, y  s )  (332.76, 418.20)
32
percentage of data values in this interval
 64%;
50 14
68-95-99.7 rule:  68%
Example: textbook costs (cont.)
286
340
355
373
390
422
440
291
342
355
377
390
424
480
307
346
360
380
397
425
308
347
361
381
398
426
315
348
364
382
409
428
316
348
367
385
409
433
327
349
369
385
410
434
328
354
371
387
418
437
2 standard deviation interval about the mean
y  375.48 s  42.72
( y  2 s, y  2 s )  (290.04, 460.92)
48
percentage of data values in this interval
 96%;
50 15
68-95-99.7 rule:  95%
Example: textbook costs (cont.)
286
340
355
373
390
422
440
291
342
355
377
390
424
480
307
346
360
380
397
425
308
347
361
381
398
426
315
348
364
382
409
428
316
348
367
385
409
433
327
349
369
385
410
434
328
354
371
387
418
437
3 standard deviation interval about the mean
y  375.48 s  42.72
( y  3s, y  3s )  (247.32, 503.64)
50
percentage of data values in this interval
 100%;
50
16
68-95-99.7 rule:  99.7%
The best estimate of the standard
deviation of the men’s weights
displayed in this dotplot is
1.
2.
3.
4.
10
15
20
40
17
Changing Units of
Measurement
How shifting and rescaling
data affect data summaries
Shifting and rescaling: linear
transformations
Original data x1, x2, . . . xn
Linear transformation:
x* = a + bx, (intercept a, slope b)
Shifts data
by a
Changes
scale
x*
a
0
x
Linear Transformations
2.54
32
12
40
100
00
0a+
9/5 x
x* = 150
b
Examples: Changing
1. from feet (x) to inches (x*): x*=12x
2. from dollars (x) to cents (x*): x*=100x
3. from degrees celsius (x) to degrees
fahrenheit (x*): x* = 32 + (9/5)x
4. from ACT (x) to SAT (x*): x*=150+40x
5. from inches (x) to centimeters (x*):
x* = 2.54x
Shifting data only: b = 1
x* = a + x
 Adding the same value a to each value in
the data set:
 changes the mean, median, Q1 and Q3 by a
 The standard deviation, IQR and variance are
NOT CHANGED.
Everything shifts together.
Spread of the items does not change.
Shifting data only: b = 1
x* = a + x (cont.)
 weights of 80 men age 19 to 24
of average height (5'8" to 5'10")
x = 82.36 kg
 NIH recommends maximum healthy
weight of 74 kg. To compare their
weights to the recommended
maximum, subtract 74 kg from each
weight; x* = x – 74 (a=-74, b=1)
 x* = x – 74 = 8.36 kg
1.
No change in
shape
2.
No change in
spread
3.
Shift by 74
Shifting and Rescaling data:
x* = a + bx, b > 0
Original x data:
x1, x2, x3, . . ., xn
Summary statistics:
mean x
median m
1st quartile Q1
3rd quartile Q3
stand dev s
variance s2
IQR
x* data: x* = a + bx
x1*, x2*, x3*, . . ., xn*
Summary statistics:
new mean x* = a + bx
new median m* = a+bm
new 1st quart Q1*= a+bQ1
new 3rd quart Q3* = a+bQ3
new stand dev s* = b  s
new variance s*2 = b2  s2
new IQR* = b  IQR
Rescaling data:
x* = a + bx, b > 0 (cont.)
 weights of 80 men age 19 to 24,
of average height (5'8" to 5'10")
 x = 82.36 kg
 min=54.30 kg
 max=161.50 kg
 range=107.20 kg
 s = 18.35 kg
 Change from kilograms to pounds:
x* = 2.2x (a = 0, b = 2.2)
 x* = 2.2(82.36)=181.19 pounds
 min* = 2.2(54.30)=119.46 pounds
 max* = 2.2(161.50)=355.3 pounds
 range*= 2.2(107.20)=235.84 pounds
 s* = 18.35 * 2.2 = 40.37 pounds
Example of x* = a + bx
4 student heights in inches
(x data)
not
62, 64, 74, 72
necessary!
UNC
x = 68 inches
method
s = 5.89 inches
Suppose we want
centimeters instead:
Go directly to
x* = 2.54x
this. NCSU
(a = 0, b = 2.54) method
4 student heights in centimeters:
157.48 = 2.54(62)
162.56 = 2.54(64)
187.96 = 2.54(74)
182.88 = 2.54(72)
x* = 172.72 centimeters
s* = 14.9606 centimeters
Note that
x* = 2.54x = 2.54(68)=172.2
s* = 2.54s = 2.54(5.89)=14.9606
Example of x* = a + bx
x data:
Percent returns from 4
investments during
2003:
5%, 4%, 3%, 6%
not
x = 4.5%
necessary!
s = 1.29%
Inflation during 2003:
2%
x* data:
Inflation-adjusted returns.
Go directly to
this
x* = x – 2%
(a=-2, b=1)
x* data:
3% = 5% - 2%
2% = 4% - 2%
1% = 3% - 2%
4% = 6% - 2%
x* = 10%/4 = 2.5%
s* = s = 1.29%
x* = x – 2% = 4.5% –2%
s* = s = 1.29% (note! that
s* ≠ s – 2%) !!
Example
 Original data x: Jim Bob’s jumbo watermelons from his
garden have the following weights (lbs):
23, 34, 38, 44, 48, 55, 55, 68, 72, 75
s = 17.12; Q1=38, Q3 =68; IQR = 68 – 38 = 30
Melons over 50 lbs are priced differently; the
amount each melon is over (or under) 50 lbs is:
x* = x  50 (x* = a + bx, a=-50, b=1)
-27, -16, -12, -6, -2, 5, 5, 18, 22, 25
s* = 17.12; Q*1 = 38 - 50 =-12, Q*3 = 68 - 50 = 18
IQR* = 18 – (-12) = 30
NOTE: s* = s, IQR*= IQR
Z-scores: a special linear
transformation a + bx
z
xx
s

x
s

1
s
x  a  bx where a  
x
s
,b 
1
s
Example. At a community college, if a student takes x credit
hours the tuition is x* = $250 + $35x. The credit hours taken by
students in an Intro Stats class have mean x = 15.7 hrs and
standard deviation s = 2.7 hrs.
Question 1. A student’s tuition charge is $941.25. What is the z-score of this
tuition?
x* = $250+$35(15.7) = $799.50; s* = $35(2.7) = $94.50
z
941.25  799.50 141.75

 1.5
94.50
94.50
Z-scores: a special linear
transformation a + bx (cont.)
Example. At a community college, if a student takes x credit hours
the tuition is x* = $250 + $35x. The credit hours taken by students
in an Intro Stats class have mean x = 15.7 hrs and standard
deviation s = 2.7 hrs.
Question 2. Roger is a student in the Intro Stats class who has a
course load of x = 13 credit hours. The z-score is
z = (13 – 15.7)/2.7 = -2.7/2.7 = -1.
What is the z-score of Roger’s tuition?
Roger’s tuition is x* = $250 + $35(13) = $705
Since x* = $250+$35(15.7) = $799.50; s* = $35(2.7) = $94.50
The z-score does not depend
705 - 799.50 -94.50
on the unit of measurement.
z=
=
=-1
94.50
94.50
This is why z-scores are so
useful!!
SUMMARY: Linear
Transformations x* = a + bx
Assembly Time (seconds)
Assembly Time (minutes)
30
20
15
10
5
0
Frequency
Frequency
25
30
20
10
0
Linear transformations do not affect the shape
of the distribution of the data
-for example, if the original data is rightskewed, the transformed data is right-skewed
SUMMARY: Shifting and
Rescaling data, x* = a + bx, b > 0
original data x1 , x2 , x3 ,... transformed data x1* , x2* , x3* ,...
summary statistics
mean x    
median m
  
summary statistics
new mean x *  a  bx
new median m*  a  bm
1st Q1
   
new Q1*  a  bQ1
3rd Q3
   
new Q3*  a  bQ3
st dev s    
var. s 2
IQR
   
   
new st dev s * = bs
new var. s * 2 = b 2 s 2
new IQR* = bIQR
End of Chapter 5 Part 1.
Next: Part 2 Normal Models
32