Download Lecture Slides for Linear Transformations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Rescaling and shifting
• A fancy way of changing one variable to
another
• Main concepts involve:
– Adding or subtracting a number (shifting)
– Multiplying or dividing by a number (rescaling)
Where have you seen this before?
• Going from Fahrenheit to Celsius
– C = (5/9)*(F-32)
• Going from Celsius to Fahrenheit
– F = [(9/5)*C]+32
• Going from pounds to kilograms
– 1 lb = 0.45359237 kg
• Going from kilograms to pounds
– 1 kg = 2.204622622 lbs
What does adding a constant do
to data?
• All measures of position (5 number
summary, mean) will increase (if adding) or
decrease (if subtracting) by the constant
• All measures of spread (range, IQR,
standard deviation) STAY THE SAME
Example
• Say we have the following temperatures (in
Fahrenheit): 32, 34, 33, 36, 38, 38, 21
– 5 number summary:
•
•
•
•
•
Min: 21
Q1: 32
Median: 34
Q3: 38
Max: 38
– IQR= 6
– s = 5.84
Example (con’t)
• Now say we subtract 32 from each data value
• Temperatures become: 0,2,1,4,6,6,8
– 5 number summary:
•
•
•
•
•
Min: -11
Q1: 0
Median: 2
Q3: 6
Max: 6
– IQR= 6
– s = 5.84
Example (con’t)
• Can see comparing the two that IQR and s
didn’t change by subtracting 32 from each
temperature
• The 5 number summary changed by
subtracting 32 from each element
• Bottom line: shifting data DOES NOT
change the spread
What does multiplying or
dividing by a number do to data?
• Changes the:
– position
– spread
• If we multiply all the data by a number, measures
of position and measures of spread are multiplied
by that number
• If we divide all the data by a number, measures of
position and measures of spread are divided by
that number
Example (con’t)
• Say we multiply the previous temperatures
by (5/9)
• The temperatures of the original data are
now in degrees Celsius : 1.11, 0.55, 2.22,
3.33, 3.33, -6.11
Example (con’t)
• For the Celsius data:
– 5 number summary:
•
•
•
•
•
Min: -6.11
Q1: 0
Median: 1.11
Q3: 3.33
Max: 3.33
– IQR = 3.33
– s = 3.246
Example (con’t)
• We can see both measures of position and
measures of spread change
• All measures of position and spread were
multiplied by (5/9)
• Bottom line: rescaling data DOES change
spread
Standardizing variables
• This is just a special application of shifting
and rescaling
• We shift by subtracting the mean
• We scale by dividing by the standard
deviation
Standardizing variables
y− y
z=
s
• z has no units (just a number)
• Puts variables on same scale
– Mean (center) at 0
– Standard deviation (spread) of 1
• Does not change shape of distribution
Standardizing variables
• z = # of standard deviations away from
mean
– Negative z – number is below mean
– Positive z – number is above mean
Why standardize variables?
• It is a way to find how many standard
deviations from the mean something is
• It is a way to compare and individual value
to a data set
• It is a way to compare two different looking
values
Standardizing Variables
• Height of women y = 66, s y = 2.5
• Height of men x = 70, sx = 3
• I am 67 inches tall
• My friend Dirk is 72 inches tall
• Who is taller (comparatively)?
Standardizing Variables
y − y 67 − 66
z=
=
= 0. 4
sy
2.5
x − x 72 − 70
z=
=
= 0.67
sx
3
Standardizing Variables
• I am 0.4 standard deviations above mean
height for women
• Dirk is 0.67 standard deviations above mean
height for men
• Dirk is taller (comparatively)
SAT vs. ACT
• You took SAT and scored 550
• Your friend took ACT and scored 30
• Which score is better?
– SAT has mean 500 and standard deviation 100
– ACT has mean 18 and standard deviation 6
SAT vs. ACT
• Your score
• Friend’s score
SAT vs. ACT
• Your score
• Friend’s score
550 − 500
= 0.5
100
SAT vs. ACT
• Your score
550 − 500
= 0.5
100
• Friend’s score
30 − 18
=2
6
• Your friend scored better on ACT than you
did on SAT
Heights of 150 Stat 101 Women
Heights
#
Heights
#
59.5 < X ≤ 60.5
3
66.5 < X ≤ 67.5 25
60.5 < X ≤ 61.5
3
67.5 < X ≤ 68.5 15
61.5 < X ≤ 62.5
10 68.5 < X ≤ 69.5 10
62.5 < X ≤ 63.5
12 69.5 < X ≤ 70.5 7
63.5 < X ≤ 64.5
14 70.5 < X ≤ 71.5 7
64.5 < X ≤ 65.5
16 71.5 < X ≤ 72.5 1
65.5 < X ≤ 66.5
25 72.5 < X ≤ 73.5 2
Height of 150 Stat 101 Women
• Distribution
– Shape
• Symmetric
• Unimodal
• Bell-Shaped
– Center around 66.5
– Spread from 59.5 to 73.5
• Model with a Normal Distribution
Normal Distributions
• Bell Curve
• Physical Characteristics
– Ex. Height
– Ex. Weight
– Ex. Length of wings of birds
• Most important distribution in statistics
Normal Distributions
• Two parameters (not calculated)
– Mean µ (pronounced “meeoo”)
• Locates center of curve
• Splits curve in half
– Standard deviation σ (pronounced “sigma”)
• Controls spread of curve
• Ruler of distribution
• Write as N(µ,σ)
Standard Normal Distribution
• Puts all normal distributions on same scale
z=
y−µ
σ
– z has center (mean) at 0
– z has spread (standard deviation) of 1
Standard Normal Distribution
• z = # of standard deviations away from
mean µ
– Negative z, number is below the mean
– Positive z, number is above the mean
• Written as N(0,1)