Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Rescaling and shifting • A fancy way of changing one variable to another • Main concepts involve: – Adding or subtracting a number (shifting) – Multiplying or dividing by a number (rescaling) Where have you seen this before? • Going from Fahrenheit to Celsius – C = (5/9)*(F-32) • Going from Celsius to Fahrenheit – F = [(9/5)*C]+32 • Going from pounds to kilograms – 1 lb = 0.45359237 kg • Going from kilograms to pounds – 1 kg = 2.204622622 lbs What does adding a constant do to data? • All measures of position (5 number summary, mean) will increase (if adding) or decrease (if subtracting) by the constant • All measures of spread (range, IQR, standard deviation) STAY THE SAME Example • Say we have the following temperatures (in Fahrenheit): 32, 34, 33, 36, 38, 38, 21 – 5 number summary: • • • • • Min: 21 Q1: 32 Median: 34 Q3: 38 Max: 38 – IQR= 6 – s = 5.84 Example (con’t) • Now say we subtract 32 from each data value • Temperatures become: 0,2,1,4,6,6,8 – 5 number summary: • • • • • Min: -11 Q1: 0 Median: 2 Q3: 6 Max: 6 – IQR= 6 – s = 5.84 Example (con’t) • Can see comparing the two that IQR and s didn’t change by subtracting 32 from each temperature • The 5 number summary changed by subtracting 32 from each element • Bottom line: shifting data DOES NOT change the spread What does multiplying or dividing by a number do to data? • Changes the: – position – spread • If we multiply all the data by a number, measures of position and measures of spread are multiplied by that number • If we divide all the data by a number, measures of position and measures of spread are divided by that number Example (con’t) • Say we multiply the previous temperatures by (5/9) • The temperatures of the original data are now in degrees Celsius : 1.11, 0.55, 2.22, 3.33, 3.33, -6.11 Example (con’t) • For the Celsius data: – 5 number summary: • • • • • Min: -6.11 Q1: 0 Median: 1.11 Q3: 3.33 Max: 3.33 – IQR = 3.33 – s = 3.246 Example (con’t) • We can see both measures of position and measures of spread change • All measures of position and spread were multiplied by (5/9) • Bottom line: rescaling data DOES change spread Standardizing variables • This is just a special application of shifting and rescaling • We shift by subtracting the mean • We scale by dividing by the standard deviation Standardizing variables y− y z= s • z has no units (just a number) • Puts variables on same scale – Mean (center) at 0 – Standard deviation (spread) of 1 • Does not change shape of distribution Standardizing variables • z = # of standard deviations away from mean – Negative z – number is below mean – Positive z – number is above mean Why standardize variables? • It is a way to find how many standard deviations from the mean something is • It is a way to compare and individual value to a data set • It is a way to compare two different looking values Standardizing Variables • Height of women y = 66, s y = 2.5 • Height of men x = 70, sx = 3 • I am 67 inches tall • My friend Dirk is 72 inches tall • Who is taller (comparatively)? Standardizing Variables y − y 67 − 66 z= = = 0. 4 sy 2.5 x − x 72 − 70 z= = = 0.67 sx 3 Standardizing Variables • I am 0.4 standard deviations above mean height for women • Dirk is 0.67 standard deviations above mean height for men • Dirk is taller (comparatively) SAT vs. ACT • You took SAT and scored 550 • Your friend took ACT and scored 30 • Which score is better? – SAT has mean 500 and standard deviation 100 – ACT has mean 18 and standard deviation 6 SAT vs. ACT • Your score • Friend’s score SAT vs. ACT • Your score • Friend’s score 550 − 500 = 0.5 100 SAT vs. ACT • Your score 550 − 500 = 0.5 100 • Friend’s score 30 − 18 =2 6 • Your friend scored better on ACT than you did on SAT Heights of 150 Stat 101 Women Heights # Heights # 59.5 < X ≤ 60.5 3 66.5 < X ≤ 67.5 25 60.5 < X ≤ 61.5 3 67.5 < X ≤ 68.5 15 61.5 < X ≤ 62.5 10 68.5 < X ≤ 69.5 10 62.5 < X ≤ 63.5 12 69.5 < X ≤ 70.5 7 63.5 < X ≤ 64.5 14 70.5 < X ≤ 71.5 7 64.5 < X ≤ 65.5 16 71.5 < X ≤ 72.5 1 65.5 < X ≤ 66.5 25 72.5 < X ≤ 73.5 2 Height of 150 Stat 101 Women • Distribution – Shape • Symmetric • Unimodal • Bell-Shaped – Center around 66.5 – Spread from 59.5 to 73.5 • Model with a Normal Distribution Normal Distributions • Bell Curve • Physical Characteristics – Ex. Height – Ex. Weight – Ex. Length of wings of birds • Most important distribution in statistics Normal Distributions • Two parameters (not calculated) – Mean µ (pronounced “meeoo”) • Locates center of curve • Splits curve in half – Standard deviation σ (pronounced “sigma”) • Controls spread of curve • Ruler of distribution • Write as N(µ,σ) Standard Normal Distribution • Puts all normal distributions on same scale z= y−µ σ – z has center (mean) at 0 – z has spread (standard deviation) of 1 Standard Normal Distribution • z = # of standard deviations away from mean µ – Negative z, number is below the mean – Positive z, number is above the mean • Written as N(0,1)