Download You construct a 95% confidence interval for the mean time taken to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
You construct a 95% confidence interval for the mean
time taken to process a new insurance policy. The values
are (11,12) days.
Which are the following statements is correct?
1. Only 5% of all policies take less than 11 or more than
12 days to process
2. Only 5% of all policies take between 11 and 12 days
to process
3. The probability is 0.95 that all policies take between
11 and 12 days to process
4. About 95 out of every 100 such intervals constructed
from random samples of the same size will contain
the population mean processing time
Which of the following statements about CI is incorrect?
1. If we keep the sample size fixed, the CI gets wider as
we increase the confidence level.
2. A CI for a mean always contains the sample mean.
3. If we keep the confidence level fixed, the CI gets
narrower as we increase the sample size.
4. If the population standard deviation increases the
confidence interval decreases in width.
5. If the confidence intervals for two means do not
overlap. There is evidence that the two population
means are different.
1
A Canadian railway company claims that its trains block
crossing take 8 min per train on average. To examine this
claim 10 randomly selected trains block crossing were
recorded. The average was 9.05. The population standard
deviation for the times is known to be 0.9.
1. Test the company’s claim at a 5% significance level.
2. Provide the 95% confidence interval.
The average growth of a certain variety of pine tree is
10.1 inches in three years. A biologist claims that a new
variety will have a greater three year growth. A SRS of
25 of the new variety has an average three year growth
of 10.8 inches. Assume that the population standard
deviation of the new variety average three year growth
is 2.1 inches.
1. Write the appropriate hypotheses.
2. Tests the hypotheses
3. What is your conclusion
2
Which are the following statements is correct?
1. The p-value measures the probability that the
hypothesis is true
2. The larger the p-value, the stronger the evidence
against the null hypothesis
3. A large p-value indicates that the data is consistent
with the alternative hypothesis.
4. An extremely small p-value indicates that the actual
data differs markedly from the expected if the null
hypothesis were true.
Statistics 111 - Lecture 12
Introduction to Inference
More Hypothesis Testing
3
Tests and Intervals
• There is a close connection between confidence intervals
and two-sided hypothesis tests
• 100·C % confidence interval contains likely values for a
population parameter, like the pop. mean 
• Interval is centered around sample mean
• Width of interval is a multiple of s
n
• A -level hypothesis test rejects the null hypothesis that 
= 0 if the test statistic T has a p-value less than 
T
X  0
s
n
Tests and Intervals
4
Example: NYC blackout baby boom
• Births per day from two weeks in August 1966
T
X  0 433.6  430

 0.3418
s
39.4
n
14
p-value for NYC dataset
T with 13df
prob =0.3689
T= -0.342
prob = 0.3689
T= 0.342
• Since are alternative hypothesis was two-sided our pvalue is the sum of both tail probabilities
• p-value is 0.73796
5
Example: NYC blackout baby boom
• Births per day from two weeks in August 1966
• Difference between our sample mean and the population
mean 0 = 430 had a p-value of 0.7379, so we did not
reject the null hypothesis at -level of 0.05
• We could have also calculated a 100·(1-) % = 95 %
confidence interval:

s
s  
39.4
39.4 
 X  t ( n 1) 
   433.6  2.16 
, X  t( n 1) 
,433.6  2.16 



n
n 
14
14 
2
2

 (410.855,456.345)
• What is the meaning of this 95% Confidence interval?
Example: NYC blackout baby boom
More Hypothesis Testing
6
Another Example: Calcium in the Diet
• Calcium most abundant element in body, and one of the
most important. Recommended daily allowance (RDA)
for adults is 850 mg/day
• Random sample of 18 people below poverty level:
• Does the data support claim that people below the
poverty level have a different calcium intake from RDA?
Hypothesis Test for Calcium
• Let  be the mean calcium intake for people below the
poverty line
• Null hypothesis is that calcium intake for people below
poverty line is not different from RDA: 0 = 850 mg/day
• Two-sided alternative hypothesis: 0  850 mg/day
• To calculate test statistic we know s = 188 mg
T
747.4  850
 2.315
188
18
• Need p-value: if 0 = 850, what is the probability we get a
sample mean as extreme (or more) than 747.4 ?
7
p-value for Calcium
• We have two-sided alternative, so p-value includes
standard normal probabilities on both sides:
prob = 0.017
prob = 0.017
T = -2.315
T = 2.315
• Looking up probability in table, we see that the two-sided
p-value is 0.017+0.017 = 0.034
• Since the p-value is less than 0.05, we can reject the null
hypothesis
• Conclusion: people below the poverty line have significantly (at a
=0.05 level) different calcium intake than the RDA
Confidence Interval for Calcium
• Alternatively, we calculate a confidence interval for the
calcium intake of people below poverty line
• Use confidence level 100·C = 100·(1-) = 95%
• 95% confidence level means critical value T*=2.109
188
188 

,747.4  2.109 
 747.4  2.109 

18
18 

(653.509,840.890)
• Since our hypothesized value 0 = 850 mg is not in the
95% confidence interval, we can reject that hypothesis
right away!
8
Tests and Intervals
• If our confidence level C is equal to 1 -  where  is the
significance level of the hypothesis test, then we have the
following connection between tests and intervals:
A two-sided hypothesis test rejects the null hypothesis (
=0) if our hypothesized value 0 falls outside the
confidence interval for 
• So, if we have already calculated a confidence interval for
, then we can test any hypothesized value 0 just by
whether or not 0 is in the interval!
Cautions about Hypothesis Tests
• Statistical significance does not necessarily mean real
significance
• If sample size is large, even very small differences can have a low
p-value
• Lack of significance does not necessarily mean that the
null hypothesis is true
• If sample size is small, there could be a real difference, but we are
not able to detect it
• Many assumptions went into our hypothesis tests
• Presence of outliers, low sample sizes, etc. make our assumptions
less realistic
• We will try to address some of these problems next class
9
Small Samples
• We have used the sample standard deviation and t
distribution to correct our assumption of known
population SD
• However, even t distribution intervals/tests not as
accurate if data is skewed or has influential outliers
• Rough guidelines from your textbook:
• Large samples (n> 40): t distribution can be used even for
strongly skewed data or with outliers
• Intermediate samples (n > 15): t distribution can be used
except for strongly skewed data or presence of outliers
• Small samples (n < 15): t distribution can only be used if data
does not have skewness or outliers
• What can we do for small samples of skewed data?
Techniques for Small Samples
• One option: use log transformation on data
• Taking logarithm of data can often make it look more symmetric
• Another option: non-parametric tests like the sign test
• Not required for this course, but mentioned in text book if you’re
interested
10