Download FAQ final review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Regression toward the mean wikipedia , lookup

Coefficient of determination wikipedia , lookup

Confidence interval wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
1
Final Review
2
Review
2.1
CI 1-propZint
Scenario 1
A TV manufacturer claims in its warranty brochure that in the past not
more than 10 percent of its TV sets needed any repair during the first two
years of operation. To test the validity of this claim, a government testing
agency selected a random sample of 100 sets and found that 14 sets required
some repair within the first two years of operation.
1. What is the critical value for this 95% confidence interval?
2. What is the standard error of this confidence interval?
3. What is the margin of error?
4. Set up a 95% confidence interval estimate of the population proportion
of TV sets that need repair in the first two years of operation?
5. What conclusion can we draw from this confidence interval?
6. Interpret the 95% confidence interval.
7. What sample size should be taken if the agency wants 95% confidence
when the margin of error is 0.05?
1
2.2
CI 2-independent samples
Scenario 2
The purchasing director for an industrial factory is investigating the possibility of purchasing a new milling machine. She determines that the new
machine will be purchased if there is evidence that the parts produced a
higher breaking strength than those from the old machine. The sample standard deviation of the breaking strength for the old machine is 10 kilograms
and for the new machine is 9 kilograms. A sample of 25 parts taken from
the old machine indicated a sample mean of 65 kilograms, whereas a similar
sample of 25 from the new machine indicated a sample mean of 72 kilograms.
1. What are the degrees of freedom?
2. What is the critical value for this 95% confidence interval?
3. What is the standard error of this confidence interval?
4. What is the margin of error?
5. Set up a 95% confidence interval of the population difference between
the two means?
6. What conclusion can we draw from this confidence interval?
7. Interpret the 95% confidence interval.
2
2.3
CI 1 sample T
Scenario 3
Suppose an independent testing agency has been contracted to determine
whether the contracting company should use a gasoline additive. The current
gasoline mileage for it vehicles is 18.5 mpg. A random sample of 30 vehicles
from the company’s fleet produced a sample average of 19.34 mpg and a
sample standard deviation of 5.2 mpg.
1. What are the degrees of freedom?
2. What is the critical value for this 95% confidence interval?
3. What is the standard error of this confidence interval?
4. What is the margin of error?
5. Set up a 95% confidence interval of the population average of the of
MPG with gasoline additive?
6. What conclusion can we draw from this confidence interval?
7. Interpret the 95% confidence interval.
8. What sample size should be taken if the agency wants 95% confidence
when the margin of error is 1.5?
3
2.4
CI paired t
Scenario 4
Suppose a shoe company wants to test material for the soles of shoes. For
each pair of shoes the new material is placed on one shoe and the old material
is placed on the other shoe. After a given period of time a random sample
of 10 pairs of shoes is selected. The wear is measured on a 10 point scale
(higher is better) with the following results. The average of the differences
is 0.3 and it standard deviation is 1.767.
1. What are the degrees of freedom?
2. What is the critical value for this 95% confidence interval?
3. What is the standard error of this confidence interval?
4. What is the margin of error?
5. Set up a 95% confidence interval of the population difference of paired
observations of shoe soles?
6. What conclusion can we draw from this confidence interval?
7. Interpret the 95% confidence interval.
8. What sample size should be taken if the agency wants 95% confidence
when the margin of error is 0.6?
4
2.5
hypotheses test 1-propZint
Scenario 1
A TV manufacturer claims in its warranty brochure that in the past not
more than 10 percent of its TV sets needed any repair during the first two
years of operation. To test the validity of this claim, a government testing
agency selected a random sample of 100 sets and found that 14 sets required
some repair within the first two years of operation. The company uses a 5%
level of significance.
1. How many tails have for this test?
2. What are the hypotheses?
3. What is the standard error of the proportion?
4. What is the test statistic?
5. What is the p-value?
6. What conclusion can we draw from this test?
7. What is the critical value?
5
2.6
hypotheses test 2-independent samples
Scenario 2
The purchasing director for an industrial factory is investigating the possibility of purchasing a new milling machine. She determines that the new
machine will be purchased if there is evidence that the parts produced a
higher breaking strength than those from the old machine. The sample standard deviation of the breaking strength for the old machine is 10 kilograms
and for the new machine is 9 kilograms. A sample of 25 parts taken from
the old machine indicated a sample mean of 65 kilograms, whereas a similar
sample of 25 from the new machine indicated a sample mean of 72 kilograms.
The director uses a 5% level of significance.
1. How many tails have for this test?
2. What are the hypotheses?
3. What is the test statistic?
4. What are the degrees of freedom?
5. What is the p-value?
6. Should you reject the null hypothesis (decision)?
7. What conclusion can we draw from this test?
8. What is the critical value?
6
2.7
Hypotheses testing 1 sample T
Scenario 3
Suppose an independent testing agency has been contracted to determine
whether the contracting company should use a gasoline additive. The current gasoline mileage for it vehicles is 18.5 mpg. A random sample of 30
vehicles from the company’s fleet produced a sample average of 19.34 mpg
and a sample standard deviation of 5.2 mpg. Is there evidence that putting
an additive into the gasoline of the company vehicles will improve the performance (i.e., MPG) of the company vehicles. The company uses a 5% level
of significance.
1. How many tails have for this test?
2. What are the hypotheses?
3. What is the test statistic?
4. What are the degrees of freedom?
5. What is the p-value?
6. Should you reject the null hypothesis (decision)?
7. What conclusion can we draw from this test?
8. What is the critical value?
7
2.8
Hypotheses test paired t
Scenario 4
Suppose a shoe company wants to test material for the soles of shoes. For
each pair of shoes the new material is placed on one shoe and the old material
is placed on the other shoe. After a given period of time a random sample
of 10 pairs of shoes is selected. The wear is measured on a 10 point scale
(higher is better) with the following results. The average of the differences
is 0.3 and it standard deviation is 1.767.
1. How many tails have for this test?
2. What are the hypotheses?
3. What is the test statistic?
4. What are the degrees of freedom?
5. What is the p-value?
6. Should you reject the null hypothesis (decision)?
7. What conclusion can we draw from this test?
8. What is the critical value?
8
2.9
χ2 -test
Scenario 5
Suppose the head of the HR division of a mid-sized company wants to
determine if she should let Red Cross have a give blood day in the company
cafeteria. She take a random sample of size 49. The follow contingency table
is constructed.
Blood Donor
Status
Yes No Total
Men
5 17
22
Women
7 20
27
Total
12 37
49
1. What are the hypotheses?
2. What is the test statistic?
3. What are the degrees of freedom?
4. What is the p-value?
5. Should you reject the null hypothesis (decision)?
6. What conclusion can we draw from this test?
7. What is the expected value for cell row 2 column 2?
9
2.10
SLR
Scenario 6
A statistician for an American automobile manufacturer would like to
develop a statistical model for predicting delivery time (the days between
initiating the order to the actual delivery of the new car) of custom-ordered
new automobile. The statistician believes there is a linear relationship between the number of options ordered on a car and the delivery time. A
random sample of 16 cars is selected with the following results.
Options Ordered vs Delivery Time
70
Residuals vs Fitted
4
10
0
Residuals
50
-2
-4
Delivery Time
2
60
13
40
30
Regression Statistics
Multiple R
0.9785
R square
0.9575
Adj R sq
0.9545
Standard error 3.0446
Observations
16
5
10
15
20
25
3
30
40
Options Ordered
df
Regression 1
Residual 14
Total 15
intercept
optionsOrdered
ANOVA
SS
MS
F Significance F
2927.23 2927.23 315.8
0
129.77
9.27
3057.00
Coefficients
Coefficient Std error
t Stat p-value
21.9254
1.5908 13.7823
0.0
2.0687
0.1164 17.7707
0.0
50
60
Low 95% Up 95%
18.51
25.34
1.819
2.3184
1. Identify which variable is the X, independent, or explanatory variable.
2. Identify which variable is the Y, dependent, or response variable.
3. Describe the pattern of points as they appear on the graph.
4. What kind of relationship do you see?
5. Are there any ”outliers?”
6. Describe the strength and direction of the correlation.
7. Compare this relationship with the pattern of points on the scatter
diagram between the two variables.
10
70
Fitted values
lm(Time ~ Options)
8. Write the specific estimated regression equation for this problem.
9. Using the estimated regression equation predict the average delivery
time for the average car with 16 options ordered.
10. Is the previous prediction extrapolation?
11. Interpret the slope estimate, that is, explain what is means in terms of
this problem.
12. Compute the coefficient of determination or how much variation in delivery time is accounted for by this regression model? Express your
answer as a percent. What measure did you use to answer this question?
13. What is the standard error of the estimated regression line? Include
the unit of measurement in your answer.
14. Using a 5% level of significance, is there evidence of a linear relationship between delivery time and options ordered? Be sure to state the
hypotheses, test statistic, p-value, and the conclusion.
15. Give a 95% confidence interval for the true (i.e., population) slope.
16. If the original correlation coefficient between these two variables were
not known, how could it be calculated using the statistics in the regression output? How do you determine the sign of the correlation
coefficient?
17. Describe what you see on the residual plot.
18. For the data set, look at the 9th pair of observations (Options, Time)
or (12, 44). Calculate the residual, i.e., ei = Yi − Ŷi .
19. Is the model a good fit for the data? Be sure to state your decision and
give the reasons that support your decision.
11
2.11
MLR
Scenario 7
Suppose a consumer organization wanted to develop a model to predict
gasoline mileage as measured by miles per gallon (MPG) based on the horsepower of the car’s engine and the weight of the car. A sample of 50 recent
car models was selected, with the results summarized below.
Descriptive Statistics
Regression Statistics
MPG Horsepower
Weight
Multiple R
0.8657
Mean
28.5
90.8
2756.5
R square
0.7494
Std
Err
1.16
3.85
89.81
Adj R sq
0.7388
Std Dev
8.17
27.26
635.05
Standard error 4.1766
Variance
66.77
743.04 403289.76
Observations
50
Minimum
15.5
48
1755
Maximum
46.6
165
4360
Sum
1427.1
4542
137826
Count
50
50
50
Correlation Coefficient
Min - Max
MPG
HP WT
x-variable Min Max
MPG
1
HP
48 165
HP -0.7882
1
WT 1755 4360
WT -0.8248 0.7419
1
ANOVA
df
SS
MS
F Significance F
Regression 2 2451.97 1225.99 70.2813
0
Residual 47 819.87
17.44
Total 49 3271.84
Coefficients
Coefficient Std error
t Stat p-value Low 95% Up 95%
intercept
58.1508
2.6582 21.8780
0.0
52.81
63.50
Horsepower
-0.1175
0.0326 -3.6003 0.0008
-0.1832 -0.0519
Weight
-0.0069
0.0014 -4.9035
0.0
-0.0097 -0.0041
1. Identify which variables are the X, independent, or explanatory variables.
2. Identify which variable is the Y, dependent, or response variable.
3. Describe the strength and direction of the correlation.
12
4. Write the specific estimated regression equation for this problem.
5. Using the estimated regression equation predict the average MPG for
a car that has 60 HP and weighs 2000 lbs.
6. Is the previous prediction extrapolation?
7. Interpret the slope estimate, that is, explain what is means in terms of
this problem.
8. Determine the coefficient of multiple determination or how much variation in MPG is accounted for by this regression model? Express your
answer as a percent. What measure did you use to answer this question?
9. What is the standard error of the estimated regression line? Include
the unit of measurement in your answer.
10. Using a 5% level of significance, is there evidence of a linear relationship
between MPG and WT? Be sure to state the hypotheses, test statistic,
p-value, and the conclusion.
11. Give a 95% confidence interval for the true (i.e., population) slope of
MPG and HP.
12. For the data set, look at the 1st set of observations (MPG, HP, WT)
or (43.1, 48, 1985). Calculate the residual, i.e., ei = Yi − Ŷi .
13. Is the model a good fit for the data? Be sure to state your decision and
give the reasons that support your decision.
Questions
Questions?
13