Download "It is a capital mistake to theorize before one has data." Sir Arthur

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
"It is a capital mistake to theorize before one has data."
Sir Arthur Conan Doyle
11.1 INFERENCE FOR THE MEAN OF A POPULATION (Pages 586 605)
OVERVIEW: Last chapter we were given the standard deviation of the
population. In reality, one frequently does not know the standard deviation of the
population from which a random sample was obtained. When sample sizes are
small and the population standard deviation is not known, statisticians make use
of the t-distribution, which is bell-shaped, but differs from the normal distribution.
Hence, there is a t-distribution table that differs from the normal distribution table.
As samples sizes get larger, the t-distribution approaches the normal distribution
in shape. In a one-sample test, the degrees of freedom (needed to use the table) is
one less than the sample size.
A few important things to note in this situation:
-When the standard deviation parameter is estimated from a sample, the resulting statistic
is called the
.
The
of the sample mean ( x ) =
, where n =
sx =
-If a sample has mean x , the one sample t-statistic is
-A confidence interval calculated for a t-statistic has the form
,
where t* is the appropriate value from the t-distribution table (table C in back of book).
Assumptions:
1.
2.
3.
Example 1 (Single sample):
An SRS of 75 male adults living in a particular suburb was taken to study the amount of time
they spent per week doing rigorous exercise. It indicated a mean of 73 minutes with a standard
deviation of 21 minutes.
(a) Calculate a 95% confidence interval of the mean for all males in the suburb.
(b) Interpret this interval in words.
Response to (a):
Assumptions:
 =
x
sx 
C=
dof =
t – confidence interval = x  t *
t* =
sx
n
The desired 95% confidence interval is
Response to (b):
We are 95% confident that the true mean time of rigorous exercise is in the interval
(
) since 95% of the confidence intervals of all samples of size 75
from this population will contain the value of the true population mean of rigorous
exercise time. Consequently, we are 95% confident that our interval is one of those that
contains  .
Example 2
The belief is that the mean number of hours per week of part-time work of high school
seniors in a city is 10.6 hours. Data from a SRS of 50 seniors indicated that their mean
number of hours of part-time work was 12.5 with a standard deviation of 1.3. Test
whether these data cast doubt on the current belief (   .05 ).
=
Assumptions:
We will use a One-sample t-test
H0 :
H0 :
Ha :
Ha :
Test Statistic:
t
x  0
, with n -1 degrees of freedom.
sx
n
t
t=
Using the TI-83, to find
Therefore, tcdf(
is tcdf(
).
)=
Since this is a 2-tail situation, our P-value is 2(
)=
With a P-value this small, one would
a null hypothesis stating = 10.6.
In other words, since P-value is
than  , there is sufficient evidence to doubt that the
mean number of hours of part time work per week for high school seniors in this city is 10.6.
Facts about the t-distribution
1.
2.
3.
Comparative studies are more convincing than single-sample investigations which often will lead
us to conduct an alternative design.
Matched Pairs Design – subjects are matched in pairs and each treatment is given to one subject
in each pair, or a before-after observation is made on the same subjects.
Example 3 (Matched Pairs):
Assume that the following data represents a pretest, posttest situation. In a situation such as this,
one calculates a difference column and tests the null hypothesis that the mean of the difference
column is 0. In a 2-tail test, the alternate hypothesis is that the mean of the differences is not
zero.
Individual After
Before
#
Score (A) Score (B)
1
78
73
2
92
86
3
68
70
4
69
62
5
76
72
6
80
83
7
91
85
8
79
76
9
86
86
10
59
54
Diff. = (A) - (B)
Mean of difference column
s for difference column
SE =standard error
s
n
=
Degrees of freedom
t-statistic
Using the TI-83,
. In other words,
we conclude that there is about a
% chance that we would get a difference as large as
3.1 if the numbers in the difference column came from a population with  = 0. Since this
probability is
, one would probably
a null hypothesis H0:  = 0. Since we are
using a 2-tail test, the P-value is 2tcdf(
)=
, or about
Rules for using the t-test
Ideally the population will have a normal distribution, but for times when this is not given:
- For sample sizes less than 15, the t-procedures can be used if the data are close to normal. Do not
use t-procedures if the data are clearly non-normal or if outliers are present.
- For sample sizes 15 or greater, t-procedures can be safely used except in the presence of
outliers or strong skewness.
- For sample sizes 40 or greater, t-procedures can be used even if data is heavily skewed.
1. A SRS of 40 packages of light bulbs indicated that the mean number of defective
bulbs in each package was .79 with a standard deviation of .2. The packages
contained 4 bulbs each.
a. Define the parameter of interest.
b. Construct a 90% confidence interval based on these data. List your assumptions.
c. Interpret this interval in words.
2. The choice between a z-test and a t-test for a population mean depends primarily on:
(a) the sample size
(c) whether a 1 or 2 tailed test is indicated
(b) the level of significance
(d) given s x or 
3. The P-value of a significance test is the probability that:
(A) the decision resulting from the test is correct
(B) 95% of the confidence intervals will contain the parameter of interest
(C) the null hypothesis is true
(D) the alternative hypothesis is true
(E) none of these describes the P-value
4. If the P-value of a test is less than the level of significance, then which of the following is a correct
conclusion?
(A) The value of the test statistic is in the rejection region for the test
(B) The sample size should be increased to decrease the margin of error
(C) The null hypothesis is true
(D) The corresponding confidence interval will contain the hypothesized value of the
parameter in the null hypothesis
(E) None of these is a valid conclusion
5. A bank makes an offer of no credit fee to SRS of 40 customers. It then compares the amount
customers charged this year with no credit fee to last year’s charged amounts with the credit fee. The
mean increase is $332 and the standard deviation is $108.
a) Is there significant evidence at the 1% level that the mean amount charged increases under the no
credit fee offer?
b) back up your test with a confidence interval
a)

H0 :
Ha :
b)
6. The guidance office of a school wants to test the claim of an SAT test preparation company that
students who complete their course will improve their SAT Math score by at least 50 points. Ten
members of the junior class who have had no SAT preparation but have taken the SAT once were
selected at random and agreed to participate in the study. All took the course and re-took the SAT at the
next opportunity. The results of the testindicated:
-----------------------------------------------------------------------------------------------------------------------------Student
1
2
3
4
5
6
7
8
9
10
____________________________________________________________________________________
Before
475
512 492
465
523
560
610
477
501
420
____________________________________________________________________________________
After
500
540 512
530
533
603
691
512
489
458
____________________________________________________________________________________
Is there sufficient evidence to support the prep course company’s claim that scores will improve by at
least 50 points at the 5% level of significance?
Let’s use the TI-83 to run our test and confirm our answers in exercise #5:
Stat/Tests/2: T-Test
Highlight Stats in order to manually load the data needed
Whoomp, there it is!
Will the TI-83 also CONFIRM my confidence level?
Stat/Tests/#8 TInterval
Highlight Stats to manually input the information needed
Voila, we have the interval