Download Boxplots, Standard Deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Transcript
Enter these data into
your calculator!!!
A researcher measured 30 newly hatched chicks and recorded
their weights in grams as shown below.
79.5
84.0
91.8
87.5
84.8
87.0
88.5
88.2
87.7
89.2
88.2
88.0
91.6
82.9
85.4
84.5
89.8
94.4
82.1
89.2
91.3
82.3
94.1
86.4
85.7
88.0
85.7
89.8
91.1
86.0
Boxplots,
Standard Deviation
Section 9.7b 
Last day of notes
for PreCalculus!!
Boxplots
Boxplot (Box-and-Whisker Plot) – a graphical
representation of the five-number summary of a data set.
Consists of a central rectangle (box) that extends from the
first quartile to the third quartile, with a vertical segment
marking the median. Line segments (whiskers) extend at the
ends of the box to the minimum and maximum values.
Practice with Boxplots
Let’s create a boxplot for the five-number summary for male life
expectancies in South American nations from last class:
59.0,64.1,68.75,71.65,72.6
59.0
Min
55
64.1
68.75 71.65
72.6
Q1
60
Med
65
Q3
70
Are these data skewed in any way???
Max
75
80
 Skewed Left!!!
Practice with Boxplots
Draw boxplots for the male and female data for life expectancies
in South American nations and describe the information displayed.
59.0,64.1,68.75,71.65,72.6
Females: 66.2,70.25,74.5,77.7,79.4
Males:
Males:
Females:
55
60
65
70
75
80
Practice with Boxplots
Draw boxplots for the male and female data for life expectancies
in South American nations and describe the information displayed.
Males:
Females:
55
60
65
70
75
80
The middle half of female life expectancies are all greater
than the median of male life expectancies. The median
life expectancy for women is greater than the maximum
for the men.
Practice with Boxplots
Create a boxplot for Roger Maris’s annual home run totals:
14, 28,16,39,61,33, 23, 26,8,13,9,5
Five-Number Summary: 5,11,19.5,30.5,61
5
0
11
10
19.5
30.5
20
30
61
40
50
60
70
Is Maris’s 61 home run total an outlier ??? How can we tell???
Practice with Boxplots
Our “Rule of Thumb”:
A number in a data set can be considered an outlier if it is more
than 1.5 x IQR below the first quartile or above the third quartile.
Five-Number Summary
for Maris’s home run totals:
5,11,19.5,30.5,61
IQR = 30.5 – 11 = 19.5
Q3 + 1.5 x IQR = 30.5 + (1.5)(19.5) = 59.75
Since 61 > 59.75, our new rule identifies it as an outlier.
Practice with Boxplots
When dealing with outliers, sometimes a modified boxplot is
used, showing the outliers as isolated points…
Two boxplots for Roger Maris’s annual home run totals
Regular Boxplot:
Modified Boxplot:
0
10
20
30
40
50
60
70
Variance and
Standard Deviation
These are measures of variability that are better indicators than
the interquartile range…
The standard deviation of the numbers
x1, x2 ,
, xn  is
n
1
2

 xi  x 

n i 1
where x denotes the mean. The variance is
of the standard deviation.

2
, the square
Note: The standard deviation is generally not very resistant…
Variance and
Standard Deviation
Most calculators actually give two standard deviations, the
other denoted as s:
n
2
1
s
 xi  x 

n  1 i 1

The difference is that
applies to the true parameter, which
means only if the data is from the whole population. If the data
comes from a sample, then the s formula actually gives a better
estimate of the parameter…
Variance and
Standard Deviation
A researcher measured 30 newly hatched chicks and recorded
their weights in grams as shown below.
79.5
84.0
91.8
87.5
84.8
87.0
88.5
88.2
87.7
89.2
88.2
88.0
91.6
82.9
85.4
84.5
89.8
94.4
82.1
89.2
91.3
82.3
94.1
86.4
85.7
88.0
85.7
89.8
91.1
86.0
Based on the sample, estimate the mean and standard deviation
for the weights of newly hatched chicks. Are these measures
useful in this case, or should we use the five-number summary?
First, enter the data into your calculator  L1
Then, choose STAT  CALC  1-Var Stats  ENTER
Variance and
Standard Deviation
A researcher measured 30 newly hatched chicks and recorded
their weights in grams as shown below.
Mean = x = 87.49 grams
Standard Deviation = S x = 3.510 grams
Because these data have no real outliers or skewness,
the mean and standard deviation are appropriate measures
(there is no need to include a five-number summary).
Our last new info in
Chapter 9…
These Distributions are
Normal…
First, use your calculator to create a histogram of the data on
weight of newly hatched chicks from last class (use Xscl = 2
and window [ 75, 98 ] by [ 0, 10 ] ):
79.5
84.0
91.8
87.5
84.8
87.0
88.5
88.2
87.7
89.2
88.2
88.0
Frequency Table:
78.0-79.9
1
80.0-81.9
0
82.0-83.9
3
84.0-85.9
6
86.0-87.9
5
88.0-89.9
9
90.0-91.9
4
92.0-93.9
0
94.0-95.9
2
91.6
82.9
85.4
84.5
89.8
94.4
82.1
89.2
91.3
82.3
94.1
86.4
Histogram:
85.7
88.0
85.7
89.8
91.1
86.0
First, use your calculator to create a histogram of the data on
weight of newly hatched chicks from last class (use Xscl = 2
and window [ 75, 98 ] by [ 0, 10 ] ):
Histogram:
What do you notice about this histogram?
The distribution is roughly
symmetric, with no strong outliers
or skewness. Most of the data
cluster around a central point.
 The distribution is approximately NORMAL!!!
Normal Distributions
In math-land, “normal” is actually a technical term…
Graph the given function in the window [–3, 3] by [0, 1]:
f  x  e
x 2
2
This curve, called the Gaussian curve or normal curve is a
precise mathematical model for normal behavior.
A great many naturally-occurring phenomena yield a
normal distribution when displayed as a histogram.
Examples???
The 68-95-99.7 Rule


If the data for a population are normally distributed with mean
and standard deviation
, then
• Approximately 68% of the data lie between
  1
and
  1
• Approximately 95% of the data lie between
  2
and
  2
• Approximately 99.7% of the data lie between
  3
and
  3
The 68-95-99.7 Rule
About 68% of the data in any normal distribution
lie within 1 standard deviation of the mean…
The 68-95-99.7 Rule
About 95% of the data in any normal distribution
lie within 2 standard deviations of the mean…
The 68-95-99.7 Rule
About 99.7% of the data in any normal distribution
lie within 3 standard deviations of the mean…
Returning to the data for newly hatched chicks:
79.5 87.5 88.5 89.2 91.6 84.5 82.1 82.3
84.0 84.8 88.2 88.2 82.9 89.8 89.2 94.1
91.8 87.0 87.7 88.0 85.4 94.4 91.3 86.4
85.7
88.0
85.7
89.8
91.1
86.0
What are the mean and standard deviation???
Mean = x = 87.49 grams
Standard Deviation = S x = 3.510 grams
Based on these data, would a chick weighing 95 grams be in the
top 2.5% of all newly hatched chicks?
We assume that the weights of newly hatched chicks are
normally distributed in the whole population. Since we do
not know the mean and standard deviation for the whole
population (the parameters  and
), we use x and Sx
as estimates.

79.5
84.0
91.8
87.5
84.8
87.0
88.5
88.2
87.7
89.2
88.2
88.0
91.6
82.9
85.4
84.5
89.8
94.4
82.1
89.2
91.3
82.3
94.1
86.4
85.7
88.0
85.7
89.8
91.1
86.0
Mean = x = 87.49 grams
Standard Deviation = S x = 3.510 grams
Based on these data, would a chick weighing 95 grams be in the
top 2.5% of all newly hatched chicks?
Because 95% of the data must lie within 2 standard
deviations, 2.5% of the data must be beyond this limit on
either end. To be in the top 2.5%, a chick will have to weigh
at least 2 standard deviations more than the mean:
x  2Sx  87.49  2  3.51  94.51 grams
Since 95 > 94.51, a 95-gram chick is indeed in the top 2.5%!!!
Proctor measures the mean and standard deviation of his Ch. 9
test to be 44.3 points and 3.7 points, respectively. Assuming
the test scores fall on a normal distribution, answer the following:
1. Approximately 68% of all students earned scores between
what two numbers?
Between 40.6 and 48 points
2. Yolanda earned a 33.2 on the test, meaning that she scored
in the bottom _______ percent of students taking the test.
Bottom 0.15 percent
3. Pip earned a 51.7 on the test, meaning that she scored better
than what percentage of students taking the test?
Better than 97.5 percent of students