Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5 Part 1 Using the Mean and Standard Deviation Together z-scores 68-95-99.7 rule Changing units (shifting and rescaling data) 1 Z-scores: Standardized Data Values Measures the distance of a number from the mean in units of the standard deviation 2 z-score corresponding to y y y z s where y original data value y the sample mean s the sample standard deviation z the z-score corresponding to y 3 If data has mean y and standard deviation s, then standardizing a particular value of y indicates how many standard deviations y is above or below the mean y . Exam 1: y1 = 88, s1 = 6; exam 1 score: 91 Exam 2: y2 = 88, s2 = 10; exam 2 score: 92 Which score is better? z1 z2 91 88 6 92 88 3 .5 6 4 .4 10 10 91 on exam 1 is better than 92 on exam 2 4 Comparing SAT and ACT Scores SAT Math: Eleanor’s score 680 SAT mean =500 sd=100 ACT Math: Gerald’s score 27 ACT mean=18 sd=6 Eleanor’s z-score: z=(680-500)/100=1.8 Gerald’s z-score: z=(27-18)/6=1.5 Eleanor’s score is better. 5 Z-scores add to zero Student/Institutional Support to Athletic Depts For the 9 Public ACC Schools: 2013 ($ millions) School Support y - ybar Z-score Maryland 15.5 6.4 1.79 UVA 13.1 4.0 1.12 Louisville 10.9 1.8 0.50 UNC 9.2 0.1 0.03 VaTech 7.9 -1.2 -0.34 FSU 7.9 -1.2 -0.34 GaTech 7.1 -2.0 -0.56 NCSU 6.5 -2.6 -0.73 Clemson 3.8 -5.3 -1.47 Mean=9.1000, s=3.5697 Sum = 0 Sum = 0 6 In a recent year the mean tuition at 4-yr public colleges/universities in the U.S. was $6185 with a standard deviation of $1804. In NC the tuition was $4320. What is NC’s z-score? 1. 2. 3. 4. 5. 1.03 -1.03 2.39 1865 -1865 7 68-95-99.7 rule Mean and Standard Deviation (numerical) Histogram (graphical) 68-95-99.7 rule 9 The 68-95-99.7 rule; applies only to mound-shaped data approximately 68% of the measurements are within 1 standard deviation of the mean, that is, in ( y s, y s ) approx. 95% of the measurements are within 2 stand. dev. of the mean, i.e., in ( y 2s, y 2s ) almost all the measurements are within 3 stan. dev of the mean, i.e., in ( y 3s, y 3s ) 10 68-95-99.7 rule: 68% within 1 stan. dev. of the mean 0.4 0.35 0.3 0.25 68% 0.2 0.15 0.1 34% 34% 0.05 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 y-s y y+s 11 68-95-99.7 rule: 95% within 2 stan. dev. of the mean 0.4 0.35 0.3 0.25 95% 0.2 0.15 0.1 47.5% 47.5% 0.05 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 y-2s y y+2s 12 Example: textbook costs 286 328 349 367 382 398 425 480 291 340 354 369 385 409 426 307 342 355 371 385 409 428 308 346 355 373 387 410 433 315 347 360 377 390 418 434 316 348 361 380 390 422 437 327 348 364 381 397 424 440 n 50 y 375.48 s 42.72 13 Example: textbook costs (cont.) 286 340 355 373 390 422 440 291 342 355 377 390 424 480 307 346 360 380 397 425 308 347 361 381 398 426 315 348 364 382 409 428 316 348 367 385 409 433 327 349 369 385 410 434 328 354 371 387 418 437 1 standard deviation interval about the mean y 375.48 s 42.72 ( y s, y s ) (332.76, 418.20) 32 percentage of data values in this interval 64%; 50 14 68-95-99.7 rule: 68% Example: textbook costs (cont.) 286 340 355 373 390 422 440 291 342 355 377 390 424 480 307 346 360 380 397 425 308 347 361 381 398 426 315 348 364 382 409 428 316 348 367 385 409 433 327 349 369 385 410 434 328 354 371 387 418 437 2 standard deviation interval about the mean y 375.48 s 42.72 ( y 2 s, y 2 s ) (290.04, 460.92) 48 percentage of data values in this interval 96%; 50 15 68-95-99.7 rule: 95% Example: textbook costs (cont.) 286 340 355 373 390 422 440 291 342 355 377 390 424 480 307 346 360 380 397 425 308 347 361 381 398 426 315 348 364 382 409 428 316 348 367 385 409 433 327 349 369 385 410 434 328 354 371 387 418 437 3 standard deviation interval about the mean y 375.48 s 42.72 ( y 3s, y 3s ) (247.32, 503.64) 50 percentage of data values in this interval 100%; 50 16 68-95-99.7 rule: 99.7% The best estimate of the standard deviation of the men’s weights displayed in this dotplot is 1. 2. 3. 4. 10 15 20 40 17 Changing Units of Measurement How shifting and rescaling data affect data summaries Shifting and rescaling: linear transformations Original data x1, x2, . . . xn Linear transformation: x* = a + bx, (intercept a, slope b) Shifts data by a Changes scale x* a 0 x Linear Transformations 2.54 32 12 40 100 00 0a+ 9/5 x x* = 150 b Examples: Changing 1. from feet (x) to inches (x*): x*=12x 2. from dollars (x) to cents (x*): x*=100x 3. from degrees celsius (x) to degrees fahrenheit (x*): x* = 32 + (9/5)x 4. from ACT (x) to SAT (x*): x*=150+40x 5. from inches (x) to centimeters (x*): x* = 2.54x Shifting data only: b = 1 x* = a + x Adding the same value a to each value in the data set: changes the mean, median, Q1 and Q3 by a The standard deviation, IQR and variance are NOT CHANGED. Everything shifts together. Spread of the items does not change. Shifting data only: b = 1 x* = a + x (cont.) weights of 80 men age 19 to 24 of average height (5'8" to 5'10") x = 82.36 kg NIH recommends maximum healthy weight of 74 kg. To compare their weights to the recommended maximum, subtract 74 kg from each weight; x* = x – 74 (a=-74, b=1) x* = x – 74 = 8.36 kg 1. No change in shape 2. No change in spread 3. Shift by 74 Shifting and Rescaling data: x* = a + bx, b > 0 Original x data: x1, x2, x3, . . ., xn Summary statistics: mean x median m 1st quartile Q1 3rd quartile Q3 stand dev s variance s2 IQR x* data: x* = a + bx x1*, x2*, x3*, . . ., xn* Summary statistics: new mean x* = a + bx new median m* = a+bm new 1st quart Q1*= a+bQ1 new 3rd quart Q3* = a+bQ3 new stand dev s* = b s new variance s*2 = b2 s2 new IQR* = b IQR Rescaling data: x* = a + bx, b > 0 (cont.) weights of 80 men age 19 to 24, of average height (5'8" to 5'10") x = 82.36 kg min=54.30 kg max=161.50 kg range=107.20 kg s = 18.35 kg Change from kilograms to pounds: x* = 2.2x (a = 0, b = 2.2) x* = 2.2(82.36)=181.19 pounds min* = 2.2(54.30)=119.46 pounds max* = 2.2(161.50)=355.3 pounds range*= 2.2(107.20)=235.84 pounds s* = 18.35 * 2.2 = 40.37 pounds Example of x* = a + bx 4 student heights in inches (x data) not 62, 64, 74, 72 necessary! UNC x = 68 inches method s = 5.89 inches Suppose we want centimeters instead: Go directly to x* = 2.54x this. NCSU (a = 0, b = 2.54) method 4 student heights in centimeters: 157.48 = 2.54(62) 162.56 = 2.54(64) 187.96 = 2.54(74) 182.88 = 2.54(72) x* = 172.72 centimeters s* = 14.9606 centimeters Note that x* = 2.54x = 2.54(68)=172.2 s* = 2.54s = 2.54(5.89)=14.9606 Example of x* = a + bx x data: Percent returns from 4 investments during 2003: 5%, 4%, 3%, 6% not x = 4.5% necessary! s = 1.29% Inflation during 2003: 2% x* data: Inflation-adjusted returns. Go directly to this x* = x – 2% (a=-2, b=1) x* data: 3% = 5% - 2% 2% = 4% - 2% 1% = 3% - 2% 4% = 6% - 2% x* = 10%/4 = 2.5% s* = s = 1.29% x* = x – 2% = 4.5% –2% s* = s = 1.29% (note! that s* ≠ s – 2%) !! Example Original data x: Jim Bob’s jumbo watermelons from his garden have the following weights (lbs): 23, 34, 38, 44, 48, 55, 55, 68, 72, 75 s = 17.12; Q1=38, Q3 =68; IQR = 68 – 38 = 30 Melons over 50 lbs are priced differently; the amount each melon is over (or under) 50 lbs is: x* = x 50 (x* = a + bx, a=-50, b=1) -27, -16, -12, -6, -2, 5, 5, 18, 22, 25 s* = 17.12; Q*1 = 38 - 50 =-12, Q*3 = 68 - 50 = 18 IQR* = 18 – (-12) = 30 NOTE: s* = s, IQR*= IQR Z-scores: a special linear transformation a + bx z xx s x s 1 s x a bx where a x s ,b 1 s Example. At a community college, if a student takes x credit hours the tuition is x* = $250 + $35x. The credit hours taken by students in an Intro Stats class have mean x = 15.7 hrs and standard deviation s = 2.7 hrs. Question 1. A student’s tuition charge is $941.25. What is the z-score of this tuition? x* = $250+$35(15.7) = $799.50; s* = $35(2.7) = $94.50 z 941.25 799.50 141.75 1.5 94.50 94.50 Z-scores: a special linear transformation a + bx (cont.) Example. At a community college, if a student takes x credit hours the tuition is x* = $250 + $35x. The credit hours taken by students in an Intro Stats class have mean x = 15.7 hrs and standard deviation s = 2.7 hrs. Question 2. Roger is a student in the Intro Stats class who has a course load of x = 13 credit hours. The z-score is z = (13 – 15.7)/2.7 = -2.7/2.7 = -1. What is the z-score of Roger’s tuition? Roger’s tuition is x* = $250 + $35(13) = $705 Since x* = $250+$35(15.7) = $799.50; s* = $35(2.7) = $94.50 The z-score does not depend 705 - 799.50 -94.50 on the unit of measurement. z= = =-1 94.50 94.50 This is why z-scores are so useful!! SUMMARY: Linear Transformations x* = a + bx Assembly Time (seconds) Assembly Time (minutes) 30 20 15 10 5 0 Frequency Frequency 25 30 20 10 0 Linear transformations do not affect the shape of the distribution of the data -for example, if the original data is rightskewed, the transformed data is right-skewed SUMMARY: Shifting and Rescaling data, x* = a + bx, b > 0 original data x1 , x2 , x3 ,... transformed data x1* , x2* , x3* ,... summary statistics mean x median m summary statistics new mean x * a bx new median m* a bm 1st Q1 new Q1* a bQ1 3rd Q3 new Q3* a bQ3 st dev s var. s 2 IQR new st dev s * = bs new var. s * 2 = b 2 s 2 new IQR* = bIQR End of Chapter 5 Part 1. Next: Part 2 Normal Models 32