Download PowerPoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
MAT 1000
Mathematics in Today's World
Last Time
We looked at the standard deviation, a measurement
of the spread of a distribution.
We introduced a special type of distribution, the
normal distribution. These highly symmetric
distributions are very common.
We saw how, using only the mean and standard
deviation, we can find the first and third quartiles of
a normal distribution.
Today
Using the mean and standard deviation, we can find
out much more about a normal distribution.
In particular, we will be able to easily find any of the
percentiles of the distribution. To do so, we need to
find standard scores, or z-scores.
First, we address the question of why normal
distributions are so common.
Today
Note: Today’s material is not in the
textbook.
Example of normal distributions
• Physical characteristics like height or weight.
• The annual returns on the S&P 500 over the last 50
•
•
years.
Cars in the parking lot of a mall.
How long it takes a kernel of popcorn to pop in the
microwave.
Why are normal distributions so
common?
Normal distributions are “bell” shaped, so most of the
data is close to the center (the mean), and only rarely
are there numbers far from the mean.
We expect this distribution whenever there are many
conflicting forces that tend to cancel each other out.
This is the case in lots of situations.
Why are normal distributions so
common?
Example
What forces can affect stock returns during a year?
Usually lots of little things: new products, bad publicity,
changing government regulations, even the weather.
If we combine the returns of 500 companies, then all of
these small factors tend to cancel out. This means the
S&P 500 return will usually be close to the average.
Why are normal distributions so
common?
Example
What forces determine a person’s height?
Lots of reasons. There are genetic factors, but things
like childhood nutrition or illness also play a role.
With lots of small forces that tend to conflict, it’s no
surprise that most people tend to be close to average
height.
Why are normal distributions so
common?
Isn’t it true that in any data set most of the data will be
close to the mean?
Absolutely not!
Suppose 10 people take a test. Five score 0, and five
score 100. The mean is 50, but nobody is close to that.
Why do people believe that most of the data in a
distribution should be close to the mean? Precisely
because normal distributions are so common.
Percentiles
The median and the first and third quartiles are
examples of what are called percentiles.
For any number P between 0 and 100, we can find the
Pth percentile of a distribution.
By definition, P percent of the data is less than the Pth
percentile
For example, Q1 could also be called the 25th
percentile—25% of the data is less than Q1
Percentiles
Example
The heights of adult men in the US are normally
distributed with mean 69.3 in. (5′ 9") and standard
deviation 2.9 in.
We will see that the 90th percentile of this distribution
is: 73.1 in (6′ 1")
This tells us that a man who is 6′ 1" is taller than 90%
of the men in the US
Percentiles
Example
On the other hand, a man who is 66 in. tall (5′ 6") has
a height equal to about the 14th percentile.
So 14% of the adult men in the US are less than 5′ 6"
tall.
Percentiles
Percentiles tell us what percent of the data is below a
number. What if we want to know what percent is
above that number?
Example
If 14% of adult men in the US are shorter than 5′ 6",
what percent are taller than 5′ 6" ?
The percent of men shorter than 5′ 6" plus the percent
of men taller 5′ 6" adds up to 100%
Percentiles
Example
Why? Think of it this way: any man is either taller than
5′ 6" or he is not. (With an accurate enough ruler we
can assume no one is exactly 5′ 6“)
(% of men shorter than 5′ 6" ) + (% of men taller than 5′ 6" ) = 100%
14% + (% of men taller than 5′ 6" ) = 100%
% of men taller than 5′ 6" = 100% − 14%
% of men taller than 5′ 6" = 86%
Standard scores
For a data value in a normally distributed data set, we
can find its percentile by first finding its standard
score.
Let’s call the data value 𝑥𝑖 . With our usual notation 𝑥
for the mean and 𝑠 for the standard deviation, the
standard score (also called the z score) is:
𝑥𝑖 − 𝑥
𝑧=
𝑠
Standard scores
Example
As I said earlier, the heights of adult men in the US are
normally distributed with mean 69.3 in. (5′ 9") and
standard deviation 2.9 in.
What is the standard score for a man who is 73.1 in
(6′ 1") tall?
73.1 − 69.3
𝑧=
2.9
𝑧 = 1.31
Finding percentiles
Example
The standard score for a man who is 73.1 in (6′ 1") tall
is 𝑧 = 1.31.
Using the standard score we can consult a table to tell
us the percentile.
The table uses standard scores rounded to the nearest
tenth, so we need to look up the percentile
corresponding to 𝑧 = 1.3
Table of percentiles
Standard
Score
–3.4
–3.3
–3.2
–3.1
–3.0
–2.9
–2.8
–2.7
–2.6
–2.5
–2.4
–2.3
–2.2
–2.1
–2.0
–1.9
–1.8
–1.7
–1.6
–1.5
–1.4
–1.3
–1.2
Percentile
0.03
0.05
0.07
0.10
0.13
0.19
0.26
0.35
0.47
0.62
0.82
1.07
1.39
1.79
2.27
2.87
3.59
4.46
5.48
6.68
8.08
9.68
11.51
Standard
Score
–1.1
–1.0
–0.9
–0.8
–0.7
–0.6
–0.5
–0.4
–0.3
–0.2
–0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
Percentile
13.57
15.87
18.41
21.19
24.20
27.42
30.85
34.46
38.21
42.07
46.02
50.00
53.98
57.93
61.79
65.54
69.15
72.58
75.80
78.81
81.59
84.13
86.43
Standard
Score
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
Percentile
88.49
90.32
91.92
93.32
94.52
95.54
96.41
97.13
97.73
98.21
98.61
98.93
99.18
99.38
99.53
99.65
99.74
99.81
99.87
99.90
99.93
99.95
99.97
Finding percentiles
Example
The percentile is 90, so that means a height of 73.1 in
(6′ 1") tall is the 90th percentile of all heights of
American men.
In other words, a man who is 6′ 1" is taller than 90% of
the men in the US.
Note
Use the same table for standard scores from any data
set.
Finding percentiles
Example
Scores on the SAT math exam are normally distributed
with a mean of 500 points and a standard deviation of
100 points. What percent of test takers score below
450? What percent are below 600?
We need to find the percentiles. Start with the
standard scores:
𝑧=
450−500
100
= −0.5
𝑧=
600−500
100
Now find the percentile from the table:
=1
Table of percentiles
Standard
Score
–3.4
–3.3
–3.2
–3.1
–3.0
–2.9
–2.8
–2.7
–2.6
–2.5
–2.4
–2.3
–2.2
–2.1
–2.0
–1.9
–1.8
–1.7
–1.6
–1.5
–1.4
–1.3
–1.2
Percentile
0.03
0.05
0.07
0.10
0.13
0.19
0.26
0.35
0.47
0.62
0.82
1.07
1.39
1.79
2.27
2.87
3.59
4.46
5.48
6.68
8.08
9.68
11.51
Standard
Score
–1.1
–1.0
–0.9
–0.8
–0.7
–0.6
–0.5
–0.4
–0.3
–0.2
–0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
Percentile
13.57
15.87
18.41
21.19
24.20
27.42
30.85
34.46
38.21
42.07
46.02
50.00
53.98
57.93
61.79
65.54
69.15
72.58
75.80
78.81
81.59
84.13
86.43
Standard
Score
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
Percentile
88.49
90.32
91.92
93.32
94.52
95.54
96.41
97.13
97.73
98.21
98.61
98.93
99.18
99.38
99.53
99.65
99.74
99.81
99.87
99.90
99.93
99.95
99.97
Finding percentiles
Example
The percentile corresponding to a standard score of
− 0.5 is 30.85, and the percentile corresponding to a
standard score of 1 is 84.13
This means that (roughly) 31% of test takers score
below 450 on the SAT math exam, and 84% are below
600.
What percent of test takers score between 450 and
600?
Finding percentiles
Example
We can find the percent who score between 450 and 600 using
the fact that 31% of test takers are below 450 and 84% are
below 600.
Take the number of people who score below 600 and subtract
the number who scored below 450. The result is the number
who scored between 450 and 600.
The same is true for percentages:
(% below 600)-(% below 450)= (% between 450 and 600)
So the percent who score between 450 and 600 is:
84% − 31% = 53%
Comparing percentiles
Using the mean and standard deviation of a normal
distribution, we can find the percentile of any data
value from that distribution.
Percentiles are also very useful for comparing data
values from different distributions.
Is a 600 on the SAT math test better or worse than a
26 on the ACT math test?
We can’t compare the numbers—the SAT is out of 800
and the ACT is out of 36.
Comparing percentiles
Instead of comparing the numbers, we compare these
test scores using percentiles.
Scores on the SAT are normally distributed with a
mean of 500 and standard deviation of 100. Scores on
the ACT are normally distributed with a mean of 18
and standard deviation of 6.
Find the standard scores:
600 − 500
=1
100
26 − 18 8
= ≈ 1.3
6
6
Comparing percentiles
From the table, a standard score of 1 is the 84th
percentile, while a standard score of 1.3 is the 90th
percentile. So a 26 on the ACT is better than a 600 on
the SAT.
In what sense is it a better score?
Percentiles describe these scores relative to all the
other test takers.
Scoring higher than 90% of the people who took a test
is better than scoring higher than 84%.
Another normal distribution
One of the most important examples of a normal
distribution is the sampling distribution of statistics
In a sample survey, we choose a sample and compute
a statistic.
A different sample would have given a different
statistic.
If we consider every possible sample, we would have a
distribution of statistics (which numbers occur, and
how often they occur).
Another normal distribution
It turns out that if our sample size is large enough, the
distribution of statistics will be normal.
What is a large enough sample?
A general rule of thumb is a sample size of 30.
Another normal distribution
Example
In 2012 Barack Obama won the presidential election
with 51.1% of the vote.
In the run up to the election, there were many polls of
likely voters.
These polls were producing statistics to estimate a
parameter: the proportion of all voters who were
going to vote for Obama. Now, we know the value of
this parameter to be 51.1%
Another normal distribution
Example
Suppose a polling company sampled 100 voters
before the election. It turns out that the distribution
of statistics for a sample size of 100 is normal with
mean 51.1% and standard deviation 5%.
(We’ll see the formulas for these later in the course.)
As decimals these are 0.511 and 0.05.
Another normal distribution
Example
In what percent of samples of 100 would more than
50% of the sample support Obama?
This is a question about percentiles.
As a decimal 50% is 0.5
Find the standard score:
0.5 − 0.511
= −0.2
0.05
Another normal distribution
Example
From the table, the corresponding percentile is 42.07.
This means that in 42.07% of samples of 100 voters,
the proportion who supported Obama would have
been less than 50%.
To answer our question we must subtract the
percentile from 100:
100 − 42.07 = 57.93
Another normal distribution
Example
So, in about 58% of the possible samples of 100
voters, we would have seen more than 50% of the
sample supporting Obama.
Another normal distribution
This graphic illustrates the idea of a sampling
distribution of statistics (denoted 𝑝). Imagine taking
many samples of size 100 from a population with
parameter 0.511.