Download Chapter 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Time series wikipedia, lookup

Misuse of statistics wikipedia, lookup

Transcript
Chapter 3
Section 3
Measures of variation
Measures of Variation
• Example 3 – 18
Suppose we wish to test
two experimental brands
of outdoor paint to see
how long will last before
fading. Let’s say we have
six gallons of each paint to
test. We have six cans of
each type of paint. Lets
find the mean for each
brand.
Brand A
Brand B
(time in months)
(time in months)
10
35
60
45
50
30
30
35
40
40
20
25
Measures of Variation
Brand A
(10+60+50+30+40+20)/6
=210/6 = 35 months
Brand B
(35+45+30+35+40+25)/6
=210/6 = 35 months
Brand A
Brand B
(time in months)
(time in months)
10
35
60
45
50
30
30
35
40
40
20
25
Measures of Variation
• So even though the
means are the same for
both brands, the
spread, or variation, is
quite different. By
comparing the ranges of
each you can see that
Brand B is more
consistent.
70
60
50
40
Brand A
30
Brand B
20
10
0
0
5
10
Measures of Variation
• So even though the
means are the same for
both brands, the
spread, or variation, is
quite different. By
comparing the ranges of
each you can see that
Brand B is more
consistent.
• Range Brand A
60-10=50
Range Brand B
45-25=20
Measures of Variation
1.
2.
3.
4.
5.
6.
Find the mean.
Subtract the mean from each data value.
Square each result.
Find the sum of the squares.
Divide the sum by N to get the variance.
Take the square root of the variance to get
the standard deviation.
Measures of Variation
Variance
2
(𝑋
−
𝜇)
𝜎2 =
𝑁
Standard Deviation
𝜎=
𝜎=
(𝑋 − 𝜇)2
𝑁
Chapter 3
Section 3
Measures of variance
Variance
•The variance is a measure of variability that uses all
the data
•The variance is based on the difference between each
observation (xi) and the
mean ( xfor the sample and μ for the population).
The variance is the average of the
squared differences between the
observations and the mean value
For the population:
For the sample:
Standard Deviation
• The Standard Deviation of a data set is the
square root of the variance.
• The standard deviation is measured in the
same units as the data, making it easy to
interpret.
Computing a standard deviation
For the population:
For the sample:
Shortcut or computational Formulas
2
for s and s
n(å X ) - (å X )
2
s =
2
2
n(n -1)
n(å X ) - (å X )
2
s=
n(n -1)
2
Variance and Standard Deviation for
Grouped Data
1.
2.
3.
4.
5.
1.
Make a table as shown and
find the midpoint of each
class.
Multiply the frequency by
the midpoint.
Multiply the frequency by
the square of the midpoint.
Find the sums of B, D, and E.
Substitute in the formula.
(See next slide)
Take the Square root to get
the standard deviation.
A
B
C
D
E
CLASS
FREQ.
MIDPT
F Xm
F(Xm)
^2
Formula
2
𝑠
=
𝑛
𝑓
2
∙ 𝑋𝑚
−
𝑓 ∙ 𝑋𝑚
𝑛(𝑛 − 1)
2
Coefficient of Variation
Just divide the
standard deviation
by the mean and
multiply times 100
Computing the coefficient of
variation:
For the population
For the sample
Chapter 3
Section 3
Measures of variance
Measures of Variance
• The Coefficient of Variance, denoted CVar, is
the standard deviation divided by the mean.
The result is expressed as a percentage.
• The coefficient of variance is used when you
want to compare standard deviations of two
different types of variables.
Coefficient of Variation
Just divide the
standard deviation
by the mean and
multiply times 100
Computing the coefficient of
variation:
For the population
For the sample
Measures of Variance
• Range Rule of Thumb:
– A rough estimate of the standard deviation is
𝑟𝑎𝑛𝑔𝑒
𝑠≈
4
• The range rule of thumb is only an
approximation and should be used when the
distribution of the data values is unimodal and
roughly symmetric.
Chebyshev’s Theorem
• Chebyshev was a Russian mathematician.
• Chebyshev’s theorem:
The proportion of values from a data set that
will fall within k standard deviations of the mean
1
will be at least 1- 2, where k is a number greater
𝑘
than 1 ( k is not necessarily an integer).
Chebyshev’s Theorem
• The theorem states that three-fourths, or 75%
of the data values will fall within 2 standard
deviations of the mean of the data set. This is
a result found by substituting k=2 in the
expression.
• Furthermore, the theorem states that at least
eight-ninths, or 88.89%, of the data will fall
within 3 standard deviation of the mean.
Chebyshev’s Theorem
• The theorem can be applied to any
distribution regardless of its shape.
• How to use Chebyshev’s theorem to find out
information.
Chebyshev’s Theorem
Example
1. The mean price of houses in a certain
neighborhood is $50,000, and the standard
deviation is $10,000. Find the price range for
which at least 75% of the houses will sell.
– We do this by adding and subtracting 2 times the
standard deviation.
Chebyshev’s Theorem
We are given that 𝜇 = $50,000 and that 𝜎 =
$10,000.
So,
$50,000 + 2($10,000) = $50,000 + $20,000 =
$70,000
And
$50,000 - 2($10,000) = $50,000 - $20,000 =
$30,000
Chebyshev’s Theorem
• A survey of local companies found that the
mean amount of travel allowance for
executives was $0.25 per mile. The standard
deviation was $0.02. Using Chebyshev’s
theorem, find the minimum percentage of the
data that will fall between $0.20 and $0.30.
Chebyshev’s Theorem
• A survey of local
companies found that the
mean amount of travel
allowance for executives
was $0.25 per mile. The
standard deviation was
$0.02. Using Chebyshev’s
theorem, find the
minimum percentage of
the data that will fall
between $0.20 and $0.30.
Step 1 – Subtract the mean from the
larger value.
$0.30 - $0.25=$0.05
Step 2 – Divide the difference by the
standard deviation to get k.
0.05
k=
= 2.5
0.02
Step 3 - Use Chebyshev’s theorem to
find the percentage.
1
1
1
1− 2 =1−
=1−
𝑘
2.5
6.25
= 1 − 0.16 = 0.84
or 84%
The Empirical (Normal) Rule
• Chebyshev’s theorem applies to any distribution
regardless of shape. However, when a
distribution is Bell-Shaped ( or what is called
normal), the following statements, which make
up the empirical rule, are true.
1. Approx. 68% of the data values fall within 1
standard deviation of the mean.
2. Approx. 95% of the data values fall within 2
standard deviation of the mean.
3. Approx. 99.7% of the data values fall within 3
standard deviation of the mean.
Chebyshev’s Theorem
Chapter 3
Section 4
Measures of Position
Standard Scores
• “You can’t compare apples and oranges.” But
with Statistics it can be done to some extent.
• Example Music test and an English exam.
– Number of question
– Values of each question
– And so on
Z Score or Standard Score
• The z-score uses the mean and the standard
deviation
• Definition – A z score or standard score for a
value is obtained by subtracting the mean
from the value and dividing the result by the
standard deviation. The symbol for the
standard score is z.
• The z-score represent the number of standard
deviations away from the mean a value is.
Z Score or Standard Score
𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛
𝑧=
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
For a sample
𝑋 −𝑋
𝑧=
𝑠
For a population
𝑋−𝜇
𝑧=
𝜎
Z Score or Standard Score
Examples
Chapter 3
Section 4
Measures of position
Measures of position
Percentiles
• Percentiles divide the set into 100 equal parts.
• Percentiles are used to compare individuals’
test scores with national test scores.
• Percentiles are not to be confused with the
percent grade you receive on a test.
Percentiles
• Percentiles are represented by,
𝑃1 , 𝑃2 , 𝑃3 , … , 𝑃99
And divide the distribution into 100 groups.
P1, P2 , P3,...., Pn
P1, P2 , P3,...., Pn
Percentiles Example
Systolic Blood Pressure
The frequency from the
systolic blood pressure
readings (in millimeters of
mercury, mm Hg) of 200
randomly selected college
students is shown here.
Construct a percentile
graph.
A
Class
boundaries
B
Frequency
89.5-104.5
24
104.5-119.5
63
119.5-134.5
73
134.5-149.5
26
149.5-164.6
12
164.5-179.5
4
200
C
Cumulative
Frequency
D
Cumulative
Percent
Percentiles Example
Steps:
Step 1 Find the cumulative
frequencies and place
them in column C
A
Class
boundaries
B
Frequency
C
Cumulative
Frequency
89.5-104.5
24
24
104.5-119.5
63
86
119.5-134.5
73
158
134.5-149.5
26
184
149.5-164.6
12
196
164.5-179.5
4
200
200
D
Cumulative
Percent
Percentiles Example
Steps:
Step 2 Find the cumulative
percentages and place
them in column D. To do
this step use the formula
Cumulative % =
𝐶𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑛
A
Class
boundaries
B
Frequency
C
Cumulative
Frequency
89.5-104.5
24
24
104.5-119.5
63
86
119.5-134.5
73
158
134.5-149.5
26
184
149.5-164.6
12
196
164.5-179.5
4
200
⋅ 100%
200
D
Cumulative
Percent
Percentiles Example
Steps:
Step 3 Graph the data, using
class boundaries for the x
axis and the percentages
for the y axis.
A
Class
boundaries
B
Frequency
C
Cumulative
Frequency
D
Cumulative
Percent
89.5-104.5
24
24
12
104.5-119.5
63
86
43
119.5-134.5
73
158
79
134.5-149.5
26
184
92
149.5-164.6
12
196
98
164.5-179.5
4
200
100
200
Percentiles Example
Percentile Formula
The percentile corresponding to a given value X
is computed using the following formula:
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 + 0.5
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 =
⋅ 100%
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
Percentile Example
Test Scores
A teacher gives a 20-point test to 10 students.
The scores are shown here. Find the percentile
rank of a score or 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Percentile Example
• Step 1
– Arrange the data
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
• Step 2
– Substitute into the formula
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 + 0.5
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 =
⋅ 100%
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
Since there are 6 values below 12 the solution is:
6 + 0.5
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 =
⋅ 100% = 65𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
10
Percentile Example
Find the value corresponding to a given
percentile.
How do we do this?
Percentile Example
• Using The values from the Previous example: 18,
15, 12, 6, 8, 2, 3, 5, 20, 10
Step 1:
Arrange the data:
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Step 2:
Compute:
𝑛⋅𝑝
𝑐=
100
Where n = total # of values and p = percentile
• Time for work this will be due next week but
you will have the option to turn it in today.
• Also if you have the work sheet from yester
day you may turn that in.
• Turn in all work into the box.
• Complete Problems 10-22 from Chapter 3
section 3 from the book. Pg.153-154
Chapter 3
Section 4
Measures of position
Quartiles and Deciles
• Quartiles similar to percentiles divide a data
set into four groups, separated by 𝑄1 , 𝑄2 , 𝑄3 .
Note that 𝑄1 is the same as the 25th percentile
How do you find data values that correspond to
𝑄1 , 𝑄2 , 𝑄3 .
Quartiles and Deciles
1. Arrange the data in order from lowest to
highest.
2. Find the median of the data values. This is
the value for 𝑄2 .
3. Find the Median of the data values that fall
below 𝑄2 . This is the value for 𝑄1 .
4. Find the Median of the data values that fall
above 𝑄2 . This is the value for 𝑄3 .
Quartiles and Deciles
• Example
Find 𝑄1 , 𝑄2 , 𝑎𝑛𝑑 𝑄3 for the data set:
15, 13, 6, 5, 12, 50, 22, 18
1. Arrange the data in order
5, 6, 12, 13, 15, 18, 22, 50
2. Find median.
Between 13 and 15. So
13+15
2
= 14.
Quartiles and Deciles
Find 𝑄1 , 𝑄2 , 𝑎𝑛𝑑 𝑄3 for the data set:
15, 13, 6, 5, 12, 50, 22, 18
3. Find median below and above 𝑄2 .
5, 6, 12, 13
9
15, 18, 22, 50
20
Thus 𝑄1 = 9, 𝑄2 = 14, 𝑎𝑛𝑑 𝑄3 = 20
Quartiles and Deciles
• Interquartile Range:
– This is defined by the difference between
𝑄1 𝑎𝑛𝑑 𝑄3 and is the range of the middle 50% of
the data.
Quartiles and Deciles
• Deciles
– Just like percentiles and quartiles, deciles divide a
data set into 10 groups, denoted 𝐷1 , 𝐷2 , … , 𝐷9
On page 151 there is a summary table
Quartiles and Deciles
• Outliers – An outlier is an extremely high or
low value when compared with the rest of the
data values.
Chapter 3
Section 4
Exploratory Data Analysis
Exploratory Data Analysis
• Exploratory data analysis is used to examine
data to find out what information can be
discovered about the data such as the center
and the spread.
Exploratory Data Analysis
The five number summary and boxplots
1. The lowest value of the data set (i.e. minimum)
2. 𝑄1
3. The median
4. 𝑄3
5. The highest value of the data set (i.e. maximum)
These values are called the five number summary.
Exploratory Data Analysis
• Boxplot – A boxplot is a graph of a data set
obtained by drawing a horizontal line from the
minimum data value to 𝑄1 , drawing a
horizontal line from 𝑄3 to the maximum data
value, and drawing a box whose vertical sides
pass through 𝑄1 and 𝑄3 with a vertical line
inside the box passing through the median or
𝑄2 .
Exploratory Data Analysis
• How to construct a box plot
1. Find the five number summary for the data values.
2. Draw a horizontal axis with a scale such that it
includes the maximum and minimum data values.
3. Draw a box whose vertical sides go through 𝑄1 and
𝑄3 , and draw a vertical line through the median.
4. Draw a line from the minimum data value to the left
side of the box and a line from the maximum to the
right side of the box.
Exploratory Data Analysis
Information obtained from a boxplot
1.
a)
b)
c)
If the median is near the center of the box, the distribution is
approximately symmetric.
If the median falls to the left of the center of the box, the
distribution is positively skewed.
If the median falls to the right of the center, the distribution is
negatively skewed.
2.
a)
b)
c)
If the lines are about the same length, the distribution is
approximately symmetric.
If the right line is larger then the left line, the distribution is
positively skewed.
If the left line is larger than the right line, the distribution is
negatively skewed.
Exploratory Data Analysis
• Resistant Statistic – these statistics are less
affected by outliers.
Median and the interquartile range.