Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

Regression analysis wikipedia, lookup

Linear regression wikipedia, lookup

Choice modelling wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Least squares wikipedia, lookup

Transcript
Welcome to MM207!
Unit 9 Seminar
End of term deadlines
• Final Project due Tuesday, by 11:59 pm ET
• Unit 10 contains several discussion questions and an
internet resource. While these are not graded, you
should complete them.
Discussion Question #1
List one statistical concept that you will use in your
profession. Why?
Discussion Question #2
Which specific statistical concepts are still unclear? What
do you need to make them clearer?
Discussion Question #3
List a specific statistical concept that you would feel
comfortable explaining to another. Why do you feel you
mastered this concept?
Example: Identifying Sampling Techniques
You are doing a study to determine the opinion of
students at your school regarding stem cell research.
Identify the sampling technique used.
1. You divide the student population with respect
to majors and randomly select and question
some students in each major.
Solution:
Stratified sampling (the students are divided into
strata (majors) and a sample is selected from each
major)
Larson/Farber 4th ed.
6
Example: Identifying Sampling Techniques
2. You assign each student a number and generate
random numbers. You then question each student
whose number is randomly selected.
Solution:
Simple random sample (each sample of the same
size has an equal chance of being selected and
each student has an equal chance of being
selected.)
Larson/Farber 4th ed.
7
Example: Comparing z-Scores from
Different Data Sets
In 2007, Forest Whitaker won the Best Actor Oscar at
age 45 for his role in the movie The Last King of
Scotland. Helen Mirren won the Best Actress Oscar at
age 61 for her role in The Queen. The mean age of all
best actor winners is 43.7, with a standard deviation of
8.8. The mean age of all best actress winners is 36, with
a standard deviation of 11.5. Find the z-score that
corresponds to the age for each actor or actress. Then
compare your results.
Larson/Farber 4th ed.
8
Solution: Comparing z-Scores from
Different Data Sets
• Forest Whitaker
z
x

• Helen Mirren
z
Larson/Farber 4th ed.
x

45  43.7

 0.15
8.8
0.15 standard
deviations above
the mean
61  36

 2.17
11.5
2.17 standard
deviations above
the mean
9
Solution: Comparing z-Scores from
Different Data Sets
z = 0.15
z = 2.17
The z-score corresponding to the age of Helen Mirren
is more than two standard deviations from the mean,
so it is considered unusual. Compared to other Best
Actress winners, she is relatively older, whereas the
age of Forest Whitaker is only slightly higher than the
average age of other Best Actor winners.
Larson/Farber 4th ed.
10
Distinguishable Permutations
Distinguishable Permutations
• The number of distinguishable permutations of n
objects where n1 are of one type, n2 are of another
type, and so on
n!
■
n1 ! n2 ! n3 !   nk !
where n1 + n2 + n3 +∙∙∙+ nk = n
Larson/Farber 4th ed
11
Example: Distinguishable Permutations
A building contractor is planning to develop a
subdivision that consists of 6 one-story houses, 4 twostory houses, and 2 split-level houses. In how many
distinguishable ways can the houses be arranged?
Solution:
• There are 12 houses in the subdivision
• n = 12, n1 = 6, n2 = 4, n3 = 2
12!
6! 4! 2!
 13,860 distinguishable ways
Larson/Farber 4th ed
12
Example: Finding Probabilities
You have 11 letters consisting of one M, four Is, four
Ss, and two Ps. If the letters are randomly arranged in
order, what is the probability that the arrangement spells
the word Mississippi?
Larson/Farber 4th ed
13
Solution: Finding Probabilities
• There is only one favorable outcome
• There are
11!
 34, 650
1! 4! 4! 2!
11 letters with 1,4,4, and 2
like letters
distinguishable permutations of the given letters
1
P( Mississippi ) 
 0.000029
34650
Larson/Farber 4th ed
14
Example: Graphing a Binomial
Distribution
Fifty-nine percent of households in the U.S. subscribe to
cable TV. You randomly select six households and ask
each if they subscribe to cable TV. Construct a
probability distribution for the random variable x. Then
graph the distribution. (Source: Kagan Research, LLC)
Solution:
• n = 6, p = 0.59, q = 0.41
• Find the probability for each value of x
Larson/Farber 4th ed
15
Solution: Graphing a Binomial
Distribution
x
0
1
2
3
4
5
6
P(x)
0.005
0.041
0.148
0.283
0.306
0.176
0.042
Histogram:
Subscribing to Cable TV
0.35
Probability
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
6
Households
Larson/Farber 4th ed
16
Mean, Variance, and Standard Deviation
• Mean: μ = np
• Variance: σ2 = npq
• Standard Deviation:   npq
Larson/Farber 4th ed
17
Example: Finding the Mean, Variance,
and Standard Deviation
In Pittsburgh, Pennsylvania, about 56% of the days in a
year are cloudy. Find the mean, variance, and standard
deviation for the number of cloudy days during the
month of June. Interpret the results and determine any
unusual values. (Source: National Climatic Data Center)
Solution: n = 30, p = 0.56, q = 0.44
Mean: μ = np = 30∙0.56 = 16.8
Variance: σ2 = npq = 30∙0.56∙0.44 ≈ 7.4
Standard Deviation:   npq  30  0.56  0.44  2.7
Larson/Farber 4th ed
18
Solution: Finding the Mean, Variance, and
Standard Deviation
μ = 16.8 σ2 ≈ 7.4
σ ≈ 2.7
• On average, there are 16.8 cloudy days during the
month of June.
• The standard deviation is about 2.7 days.
• Values that are more than two standard deviations
from the mean are considered unusual.
 16.8 – 2(2.7) =11.4, A June with 11 cloudy days
would be unusual.
 16.8 + 2(2.7) = 22.2, A June with 23 cloudy
days would also be unusual.
Larson/Farber 4th ed
19
Sample Size
• Given a c-confidence level and a margin of error E,
the minimum sample size n needed to estimate p is
2
 zc 
ˆ ˆ 
n  pq
E
• This formula assumes you have an estimate for p̂
and qˆ .
• If not, use pˆ  0.5 and qˆ  0.5.
Larson/Farber 4th ed
20
Example: Sample Size
You are running a political campaign and wish to
estimate, with 95% confidence, the proportion of
registered voters who will vote for your candidate. Your
estimate must be accurate within 3% of the true
population. Find the minimum sample size needed if
1. no preliminary estimate is available.
Solution:
Because you do not have a preliminary estimate
for p̂ use pˆ  0.5 and qˆ  0.5.
Larson/Farber 4th ed
21
Solution: Sample Size
• c = 0.95
zc = 1.96
2
E = 0.03
2
 zc 
 1.96 
ˆ ˆ    (0.5)(0.5) 
n  pq
  1067.11
 0.03 
E
Round up to the nearest whole number.
With no preliminary estimate, the minimum sample
size should be at least 1068 voters.
Larson/Farber 4th ed
22
z-Test for a Population Proportion
z-Test for a Population Proportion
• A statistical test for a population proportion.
• Can be used when a binomial distribution is given
such that np ≥ 5 and nq ≥ 5.
• The test statistic is the sample proportion p̂ .
• The standardized test statistic is z.
z
Larson/Farber 4th ed.
pˆ   pˆ
 pˆ
pˆ  p

pq n
23
Using a z-Test for a Proportion p
Verify that np ≥ 5 and nq ≥ 5
In Words
1. State the claim mathematically
and verbally. Identify the null
and alternative hypotheses.
2. Specify the level of significance.
In Symbols
State H0 and Ha.
Identify α.
3. Sketch the sampling distribution.
4. Determine any critical value(s).
Larson/Farber 4th ed.
Use Table 5 in
Appendix B.
24
Using a z-Test for a Proportion p
In Words
In Symbols
5. Determine any rejection
region(s).
6. Find the standardized test
statistic.
7. Make a decision to reject or
fail to reject the null
hypothesis.
8. Interpret the decision in the
context of the original claim.
Larson/Farber 4th ed.
p̂  p
z
pq n
If z is in the rejection
region, reject H0.
Otherwise, fail to
reject H0.
25
Example: Hypothesis Test for
Proportions
Zogby International claims that 45% of people in the
United States support making cigarettes illegal within
the next 5 to 10 years. You decide to test this claim and
ask a random sample of 200 people in the United States
whether they support making cigarettes illegal within the
next 5 to 10 years. Of the 200 people, 49% support this
law. At α = 0.05 is there enough evidence to reject the
claim?
Solution:
• Verify that np ≥ 5 and nq ≥ 5.
np = 200(0.45) = 90 and nq = 200(0.55) = 110
Larson/Farber 4th ed.
26
Solution: Hypothesis Test for Proportions
•
•
•
•
• Test Statistic
pˆ  p
0.49  0.45
z

pq n
(0.45)(0.55) 200
H0: p = 0.45
Ha: p ≠ 0.45
 = 0.05
Rejection Region:
0.025
-1.96
0.025
0
1.96
1.14
Larson/Farber 4th ed.
z
 1.14
• Decision: Fail to reject H0
At the 5% level of significance,
there is not enough evidence to
reject the claim that 45% of
people in the U.S. support
making cigarettes illegal within
the next 5 to 10 years.
27
Example: Using Technology to Find a
Regression Equation
Use a technology tool to find the
equation of the regression line for
the Old Faithful data.
Larson/Farber 4th ed.
Duration
x
Time,
y
Duration
x
Time,
y
1.8
56
3.78
79
1.82
58
3.83
85
1.9
62
3.88
80
1.93
56
4.1
89
1.98
57
4.27
90
2.05
57
4.3
89
2.13
60
4.43
89
2.3
57
4.47
86
2.37
61
4.53
89
2.82
73
4.55
86
3.13
76
4.6
92
3.27
77
4.63
91
3.65
77
28
Solution: Using Technology to Find a
Regression Equation
100
50
Larson/Farber 4th ed.
1
5
29
Example: Predicting y-Values Using
Regression Equations
The regression equation for the advertising expenses (in
thousands of dollars) and company sales (in thousands
of dollars) data is ŷ = 50.729x + 104.061. Use this
equation to predict the expected company sales for the
following advertising expenses. (Recall from section 9.1
that x and y have a significant linear correlation.)
1. 1.5 thousand dollars
2. 1.8 thousand dollars
3. 2.5 thousand dollars
Larson/Farber 4th ed.
30
Solution: Predicting y-Values Using
Regression Equations
ŷ = 50.729x + 104.061
1. 1.5 thousand dollars
ŷ =50.729(1.5) + 104.061 ≈ 180.155
When the advertising expenses are $1500, the
company sales are about $180,155.
2. 1.8 thousand dollars
ŷ =50.729(1.8) + 104.061 ≈ 195.373
When the advertising expenses are $1800, the
company sales are about $195,373.
Larson/Farber 4th ed.
31
Solution: Predicting y-Values Using
Regression Equations
3. 2.5 thousand dollars
ŷ =50.729(2.5) + 104.061 ≈ 230.884
When the advertising expenses are $2500, the
company sales are about $230,884.
Prediction values are meaningful only for x-values in
(or close to) the range of the data. The x-values in the
original data set range from 1.4 to 2.6. So, it would
not be appropriate to use the regression line to predict
company sales for advertising expenditures such as 0.5
($500) or 5.0 ($5000).
Larson/Farber 4th ed.
32