Download Lecture 4 - Math - Wayne State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Mathematics 1000, Winter 2008
Lecture 4
Sheng Zhang
Department of Mathematics
Wayne State University
January 16, 2008
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Announcement
Monday is Martin Luther King Day
NO CLASS
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Today’s Topics
1
Curves and Histograms
2
The Normal Distribution
3
The 68–95–99.7 Rule
4
Two Variable Statistics
Scatterplots
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
From histograms to curves
A histogram consists of several rectangles, each of the same
width, all based on a line. The height of each rectangle
represents the number of data points in a given range.
For large data sets, we can have many rectangles, each quite
narrow. If they are narrow enough, the histogram smoothes out,
and the top of it resembles a curve, not merely a jagged line.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Number of finishers per hour, New York Marathon,
2005
15000
10000
5000
2 – 2:59 3 – 3:59 4 – 4:59 5 – 5:59 6 – 6:59 7 – 7:59 8 – 8:59
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Number of finishers per half-hour, New York
Marathon 2005
8000
6000
4000
2000
2
2:30 3
3:30 4
4:30 5
5:30 6
6:30 7
7:30 8
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Number of finishers per ten minutes New York
Marathon, 2005
3000
2500
2000
1500
1000
S. Zhang
8 – 8:09
7 – 7:09
7:30 – 7:39
6:30 – 6:39
6 – 6:09
5:30 – 5:39
5 – 5:09
4:30 – 4:39
4 – 4:09
3:30 – 3:39
3 – 3:09
2 – 2:09
2:10 – 2:19
2:20 – 2:29
2:30 – 2:39
500
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
New York Marathon results, five minute by five
minute histogram.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Per capita GDP by country in 115 poor countries
Number of countries
6
90
60
30
0
2500
Per capita GDP in dollars
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Per capita GDP by country in 115 poor countries
Number of countries
6
60
40
20
0
10
20
30
40
Per capita GDP in thousands of dollars
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Per capita GDP by country in 115 poor countries
6
Number of countries
40
30
20
10
0
5
10
15
20
25
30
35
40
45
Per capita GDP in hundreds of dollars
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Per capita GDP by country in 115 poor countries
Number of countries
6
15
10
5
0
4
8
12 16 20 24 28 32 36 40 44 48
Per capita GDP in hundreds of dollars
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Interpreting curves
When a distribution is given by a curve, the rectangles have
vanished. The areas below the curve represent the proportion
of the data that lie within a given horizontal range.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Special typse of curves
With a large number of data points, distributions tend to
resemble curves.
There are many possible curves that can serve as “models” for
how the data should lie.
We will only consider one of them.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Example of a normal distribution
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Properties of the normal distribution
symmetric
mean and median are the same
quartiles lie about 2/3 of a standard deviation from the
mean
satisfies the 68 - 95 - 99.7 rule
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
The 68 – 95 – 99.7 Rule
In a normal distribution with mean x̄ and standard deviation s:
68% of the observations lie between x̄ − s and x̄ + s
95% of the observations lie between x̄ − 2s and x̄ + 2s
99.7% of the observations lie between x̄ − 3s and x̄ + 3s
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Application
ACT scores approximately follow a normal distribution with
mean 20.8 and standard deviation 4.8.
This means that 68% of such scores lie between 16 ( = 20.8 4.8) and 25.6 ( = 20.8 + .48).
So 32% lie outside that range.
Since the distribution is symmetric, roughly half of the 32% will
lie below 16. That is, 16% of scores will lie below 16.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Exercise
If adult women’s heights are distributed normally with a mean of
64.5 inches and a standard deviation of 2.5 inches, what
proportion of women will be under 59.5 inches tall?
Answer: 59.5 inches is 2 standard deviations below the mean.
So 5% ( = 100% - 95%) of women will have height farther from
the mean than that. Half of them will be short, and half will be
tall. So 2.5% will be shorter than 59.5 inches tall.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Exercise
If adult women’s heights are distributed normally with a mean of
64.5 inches and a standard deviation of 2.5 inches, what
proportion of women will be under 59.5 inches tall?
Answer: 59.5 inches is 2 standard deviations below the mean.
So 5% ( = 100% - 95%) of women will have height farther from
the mean than that. Half of them will be short, and half will be
tall. So 2.5% will be shorter than 59.5 inches tall.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
WSU application
If there are 10,000 women at this university, how many would
we expect to be under 59.5 inches tall?
Answer: About 250, 2.5% of 10,000.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
WSU application
If there are 10,000 women at this university, how many would
we expect to be under 59.5 inches tall?
Answer: About 250, 2.5% of 10,000.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Calculators
The Chapter 6 material will require the calculator more
intensively than the material we covered up to now.
If you need help with using the calculator, then you should be
sure to get the one that we support.
The quiz instructors and the tutors in the Mathematics
Resource Center may well be able to help you with calculators.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Some applications of two-variable statistics
Do students who spend more time studying get better
grades?
Do people who smoke tend to die earlier?
What is the relationship between the amount of carbon
dioxide in the atmosphere and the average global
temperature?
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Two variable statistics
One variable statistics:
Study shape, center, spread, look for outliers.
Step 1: Draw a picture.
Two variable statistics:
Study patterns, relationship (correlation and regression),
look for outliers.
Step 1: Draw a picture.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Two variable statistics
One variable statistics:
Study shape, center, spread, look for outliers.
Step 1: Draw a picture.
Two variable statistics:
Study patterns, relationship (correlation and regression),
look for outliers.
Step 1: Draw a picture.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Overview of next four lectures
Scatterplots (today)
Regression lines
Correlation and regression
Interpretation
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Example 1 from the text
Student
Beers
BAC
Student
Beers
BAC
1
5
0.10
9
3
0.02
2
2
0.03
10
5
0.05
3
9
0.19
11
4
0.07
S. Zhang
4
8
0.12
12
6
0.10
Lecture 4
5
3
0.04
13
5
0.085
6
7
0.095
14
7
0.09
7
3
0.07
15
1
0.01
8
5
0.06
16
4
0.05
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
We plot one variable on the horizontal axis, another on the
vertical axis.
The variable on the horizontal axis is called the explanatory
variable.
The variable on the variable axis is called the response
variable.
Key question: How much does the explanatory variable explain
the response variable?
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Beer and Blood Alcohol
Blood alcohol content
0.20
q
0.15
q
q
0.10
q
q
q
q
q
0.05
q
q
q
q
q
q
q
q
0.00
0
2
4
6
8
10
Beers
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Beer and Blood Alcohol
Blood alcohol content
0.20
q
0.15
q
q
0.10
q
q
q
q
q
0.05
q
q
q
q
q
q
q
q
0.00
0
2
4
6
8
10
Beers
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
State versus national standards
There is a national test to measure the proficiency of fourth
graders in mathematics.
Most students are not proficient in mathematics according to
this measure.
Each of the states has its own separate proficiency test.
Let’s compare how students did on their two tests.
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
State versus national standards
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Smoking and mortality
A much debated example: Does smoking cause health
problems, or do people likely to have bad health just tend to
smoke more?
S. Zhang
Lecture 4
Curves and Histograms
The Normal Distribution
The 68–95–99.7 Rule
Two Variable Statistics
Scatterplots
Smoking and mortality scatterplot
Mortality rate from coronary heart disease and number of
cigarettes smoked per day
t
Death rate
375
t
250
40-49 year old males
t
125
t
t
0
0
10
20
30
40
Cigarettes per day
S. Zhang
50
Source: 1969 Surgeon General’s Report
Lecture 4