Download Inferences for a Single Population Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
4 – Inference Using t-Distributions
(Ch. 2 of text)
4.1 - Estimation of a Population Mean (  )
In your introductory statistics course(s) you should have examined confidence intervals
as a means of estimating population parameters, e.g. the population mean (. The
confidence interval for the population mean is summarized below.
(100 -  )% Confidence Interval for  (e.g.   .05  95% confidence)
The basic form of a confidence interval is as follows:
(estimate) + (table value) * SE(estimate)
For the population mean () we have,
X  (t  table value) SE ( X )
or
X t
s
n
The appropriate columns in the t-distribution table (Appendix A, Table A.2) for the
different confidence intervals are as follows:
90% Confidence look in the .95 column (if n is “large” we can use 1.645)
95% Confidence look in the .975 column (if n is “large” we can use 1.960)
99% Confidence look in the .995 column (if n is “large” we can use 2.576)
4.2 – Review of the Basic Steps in a Hypothesis Test
Before we look at hypothesis testing for a single population mean we will examine the
five basic steps in a hypothesis test and introduce some important terminology and
concepts.
Steps in a Hypothesis Test
1.
2.
40
3.
4.
5.
41
4.3 - Hypothesis Test for a Single Population Mean (  )
Null Hypothesis ( H o )
Alternative Hypothesis ( H a )
p-value area
  o
  o
  o
  o
  o
  o
Upper-tail
Lower-tail
Two-tailed
(perform test using CI for  )
Test Statistic (in general)
In general the basic form of most test statistics is given by:
(estimate)  (hypothesized value)
Test Statistic =
(think “z-score”)
SE (estimate)
which measures the discrepancy between the estimate from our sample and the
hypothesized value under the null hypothesis.
Intuitively, if our sample-based estimate is “far away” from the hypothesized value
assuming the null hypothesis is true, we will reject the null hypothesis in favor of the
alternative or research hypothesis. Extreme test statistic values occur when our estimate
is a large number of standard errors away from the hypothesized value under the null.
The p-value is the probability, that by chance variation alone, we would get a test statistic
as extreme or more extreme than the one observed assuming the null hypothesis is true.
If this probability is “small” then we have evidence against the null hypothesis, in other
words we have evidence to support our research hypothesis.
Truth
Type I and Type II Errors ( &  )
Decision
H o true
H a true
Reject H o
Fail to
Reject H o
42
Example: Testing Wells for a Perchlorate in Morgan Hill & Gilroy, CA.
EPA guidelines suggest that drinking water should not have a perchlorate level exceeding
4 ppb (parts per billion). Perchlorate contamination in California water (ground, surface,
and well) is becoming a widespread problem. The Olin Corp., a manufacturer of road
flares in the Morgan Hill area from 1955 to 1996 was is the source of the perchlorate
contamination in the this area.
Suppose you are resident of the Morgan Hill area which alternative do you want well
testers to use and why?
H o :   4 ppb
H a :   4 ppb
or
H o :   4 ppb
H a :   4 ppb
Test Statistic for Testing a Single Population Mean (  ) ~ (t-test)
t
X  o
X  o
~ t-distribution with df = n – 1.
or t 
s
SE ( X )
n
Assumptions:
When making inferences about a single population mean we assume the following:
1. The sample constitutes a random sample from the population of interest.
2. The population distribution is normal. This assumption can be relaxed when
our sample size in sufficiently “large”. How large the sample size needs to be is
dependent upon how “non-normal” the population distribution is.
Example 1: Length of Stay in a Nursing Home (Datafile: LOS.JMP)
In the past the average number of nursing home days required by elderly patients before
they could be released to home care was 17 days. It is hoped that a new program will
reduce this figure. Do these data support the research hypothesis?
3
5
12
7
22
6
2
18
9
8
20
15
3
36
38
43
43


Normality does not appear to be satisfied here!
Notice the CI for the mean length of stay is (8.38 days, 22.49 days).
Hypothesis Test:
Ho :
1)
HA :
2) Choose 
Test statistic
3) Compute test statistic
4) Find p-value (use t-Probability Calculator.JMP)
44
5) Make decision and interpret
To perform a t-test in JMP, select Test Mean from the LOS pull-down menu and enter
value for mean under the null hypothesis,17.0 in this example.
Conclusion:
In JMP
The hypothesized mean is the value for the population mean under the
null hypothesis. If normality is questionable or the same size is small
a nonparametric test may be more appropriate. We will discuss
nonparametric tests later in the course.
The graphic on the left is obtained by
selecting P-value animation from the pulldown menu next to Test Mean=value.
* click Low Side for a lower-tail test, similarly for the other two types of alternatives.
45
4.4 - Comparing Two Population Means Using Dependent or
Paired Samples (Section 2.2.4 pgs. 35-37)
When using dependent samples each observation from population 1 has a one-to-one
correspondence with an observation from population 2. One of the most common cases
where this arises is when we measure the response on the same subjects before and after
treatment. This is commonly called a “pre-test/post-test” situation. However, sometimes
we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race,
gender, socio-economic status, height, weight, etc... to control for the influence these
characteristics might have on the response of interest. When this is done we say that we
are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects
we are in effect removing the effect of potential confounding factors, thus giving us a
clearer picture of the difference between the two populations being studied.
DATA FORMAT
Matched Pair X 1i
1
2
3
...
n
X 2i
X 11 X 21
X 12 X 22
X 13 X 23
...
...
X 1n X 2 n
d i  X 1i  X 2i
d1
d2
d3
...
dn
For the sample paired differences
( d i ' s ) find the sample mean (d )
and standard deviation ( s d ) .
The general hypotheses are
H o : d  o
H a :  d   o or H a :  d   o or H a :  d   o
Note: While 0 is usually used as the hypothesized mean
difference under the null, we actually can hypothesize any
size difference for the mean of the paired differences that
we want. For example if wanted to show a certain diet
resulted in at least a 10 lb. decrease in weight then we
could test if the paired differences: d = Initial weight –
After diet weight had mean greater than 10
( H a :  d  10 lbs. )
Test Statistic for a Paired t-Test
(estimate of mean paired difference) - (hypothesized mean difference)
t
SE(estimate)
d  o
~ t - distributi on with df  n - 1
sd
n
where  o  the hypothesized value for the mean paired difference under the null
hypothesis.

100(1-  )% CI for
d
s
 where t comes from the appropriate quantile of t-distribution df = n – 1.
d  t  d

n

This interval has a 100(1-  )% chance of covering the true mean paired difference.
46
Example: Effect of Captopril on Blood Pressure (Datafile: Blood.JMP)
In order to estimate the effect of the drug Captopril on blood pressure (both systolic and
diastolic) the drug is administered to a random sample n = 15 subjects. Each subjects
blood pressure was recorded before taking the drug and then 30 minutes after taking the
drug. The data are shown below.
Syspre – initial systolic blood pressure
Syspost – systolic blood pressure 30 minutes after taking the drug
Diapre – initial diastolic blood pressure
Diapost – diastolic blood pressure 30 minutes after taking the drug
Research Questions:
 Is there evidence to suggest that Captopril results in a systolic blood pressure
decrease of at least 10 mmHg on average in patients 30 minutes after taking it?
 Is there evidence to suggest that Captopril results in a diastolic blood pressure
decrease of at least 5 mmHg on average in patients 30 minutes after taking it?
For each blood pressure we need to consider paired differences of the form
d i  BPpre i  BPpost i . For paired differences defined this way, positive values
correspond to a reduction in their blood pressure ½ hour after taking Captopril. To
answer research questions above we need to conduct the following hypothesis tests:
H o :  syspre syspost  10 mmHg
and
H o :  diaprediapost  5 mmHg
H a :  syspre syspost  10 mmHg
H a :  diaprediapost  5 mmHg
Below are the relevant statistical summaries of the paired differences for both blood
pressure measurements.
The t-statistics for both tests are given below on the following page.
47
Systolic BP
Diastolic BP
We can use the t-Probability Calculator in JMP to find the associated p-values or better
yet use JMP to conduct the entire t-test.
Systolic Blood Pressure
Diastolic Blood Pressure
Both tests result in rejection of the null hypotheses. This we have sufficient evidence to
suggest that taking Captopril will result in mean decrease in systolic blood pressure
exceeding 10 mmHg (p = _______) and a mean decrease in diastolic blood pressure
exceeding 5 mmHg (p = _______). Furthermore we estimate that the mean change in
systolic blood pressure will be somewhere between _______ mmHg and ______ mmHg,
and that the mean change in diastolic blood pressure could be as large as ______ mmHg.
48
4.5 – Comparing Two Pop. Means Using Independent Samples
(Section 2.3 pgs. 37 – 44)
Example 1: Prior Knowledge of Instructor and Lecture Rating
(Datafile: Instructor Rating Study)
How powerful are rumors? Frequently, students ask friends and/or look at instructor
evaluations to decide if a class is worth taking. Kelley (1950) found that instructor
reputation has a profound impact on actual teaching ratings. Towler and Dipboye (1998)
replicated and extended this study by asking: “Does an instructor's prior reputation affect
student ratings?”
Towler, A., & Dipboye, R. L. (1998). “The effect of instructor reputation and need for cognition on student
behavior”
Experimental Design:
Subjects were randomly assigned to one of two conditions. Before viewing the lecture,
students were give a summary of the instructors prior teaching evaluations. There were
two conditions: Charismatic instructor and Punitive instructor.
Summary given in the "Charismatic instructor" condition:
Frequently at or near the top of the academic department in all teaching categories. Professor S was always
lively and stimulating in class, and commanded respect from everyone. In class, she always encouraged
students to express their ideas and opinions, however foolish or half-baked. Professor S was always
innovative. She used differing teaching methods and frequently allowed students to experiment and be
creative. Outside the classroom, Professor S was always approachable and treated students as individuals.
Summary given in the "Punitive instructor" condition:
Frequently near the bottom of the academic department in all important teaching categories. Professor S did
not show an interest in students' progress or make any attempt to sustain student interest in the subject.
When students asked questions in class, they were frequently told to find the answers for themselves. When
students felt they had produced a good piece of work, very rarely were they given positive feedback. In
fact, Professor S consistently seemed to grade students harder than other lecturers in the department.
Then all subjects watched the same twenty-minute lecture given by the exact same lecturer. Following the
lecture, subjects rated the lecturer. Subjects answered three questions about the leadership qualities of the
lecturer. A summary rating score was computed and used as the variable "rating" here.
49
Research Question: Does an instructor prior reputation affect student ratings of a lecture
given by a professor?
Summary Statistics
xC  2.613
x P  2.236
s C  .533
s P  ..543
nC  25
n P  24
Intuitive Decision
In order to determine whether or not the null or alternative hypothesis is true, you could
review the summary statistics for the variable you are interested in testing across the two
groups. Remember, these summary statistics and/or graphs are for the observations you
sampled, and to make decisions about all observations of interest, we must apply some
inferential technique (i.e. hypothesis tests or confidence intervals)
One of the best graphical displays for this situation is the side-by-side boxplots. To get
side-by-side boxplots, select Analyze > Fit Y by X. Place Prior Info in the X box and
Rating in the Y box. Place mean diamonds & histograms on the plot, and we may also
want to jitter the points. The more separation there is in the mean diamonds, the more
likely we are to reject the null hypothesis (i.e data tends to support the alternative
hypothesis).
To answer the question of interest formally we need inferential tools for comparing the
mean rating given to a lecture when students are told the professor is a charismatic
individual vs. mean rating given when students are given the punitive instructor prior
opinion, i.e. compare  charismatic to  punitive.
50
Hypothesis Testing ( 1 vs.  2 )
The general null hypothesis says that the two population means are equal, or equivalently
there difference is zero. The alternative or research hypothesis can be any one of the
three usual choices (upper-tail, lower-tail, or two-tailed). For the two-tailed case we can
perform the test by using a confidence interval for the difference in the population means
and determining whether 0 is contained in the confidence interval.
H o : 1   2 or equivalently (  1   2 )  hypothesized difference (typically 0)
H a: 1   2 or equivalently ( 1   2 )  hypothesized difference (upper - tail)
or
H a : 1   2 or equivalently ( 1   2 )  hypothesized difference (two - tailed, USE CI! )
etc....
Test Statistic
t
( X 1  X 2 )  (hypothesized difference)
~ t-distribu tion with appropriat e degrees of freedom
SE ( X 1  X 2 )
where the SE ( X 1  X 2 ) and degrees of freedom for the t-distribution comes from one of
the two cases described below.
Confidence Interval for the Difference in the Population Means
100(1 -  )% Confidence Interval for ( 1   2 )
( X 1  X 2 )  t  SE ( X 1  X 2 )
where t comes from t-table with appropriate degrees of freedom (see two cases below).
Case 1 ~ Equal Populations Variances/Standard Deviations
2
2
(  1   2 =  2  common variance to both populations) Rule of Thumb for Checking
Variance Equality
If the larger sample variance is more
than twice the smaller sample variance
do not assume the variances are equal.
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are equal.
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are “large”.
51
Case 1 – Equal Variances (cont’d)
Assuming the assumptions listed above are all satisfied we have the following for the
standard error of the difference in the sample means.
1 
2 1
SE ( X 1  X 2 )  s p   
 n1 n2 
where
(n  1) s1  (n2  1) s 2
 1
n1  n 2  2
2
sp
2
2
if n1  n2
s 2p 
s12  s 22
if n1  n2
2
s p is called the “pooled estimate of the common variance ( 2 ) ”. The degrees of
2
freedom for the t-distribution in this case is df  n1  n2  2 .
Example 1: Prior Knowledge of Instructor and Lecture Rating (cont’d)
Case 1 – Equal Variances
To perform the “pooled t-Test” select the
Means/Anova/Pooled t option from the
Oneway Analysis pull-down menu.
Case 2 – Unequal Variances
If you do not want the to assume the population
variances are equal then select the t Test
option.
To formally test whether we can assume the
population variances are equal select UnEqual
Variances from pull-down menu.
52
t-Test Results from JMP
Discussion:
In the previous example we chose to use a pooled t-test assuming the population
variances were equal based upon the visual evidence and applying the “rule of thumb”.
To formally test this assumption, choose the UnEqual Variances option from the
Oneway Analysis pull-down menu. The results are shown below.
Interpretation of Results
53
Example 2: Normal Human Body Temperatures Females vs. Males
(Datafile: Bodytemp.JMP)
Do men and women have the same normal body temperature? Putting this into a
statement involving parameters that can be tested:
H o :  F   M or ( F   M )  0
H a :  F   M or ( F   M )  0
 F  mean body temperature for females.
 M  mean body temperature for males.
Assumptions
1. The two groups must be independent of each other.
2. The observation from each group should be normally distributed.
3. Decide whether or not we wish to assume the population variances are equal.
Checking Assumptions
Assessing Normality of the Two Sampled Populations (Assumption 2)
To assess normality we select Normal Quantile Plot from the Oneway Analysis pulldown menu as shown below.
Normality appears to
be satisfied here.
54
Checking the Equality of the Population Variances
To test the equality of the population variances select Unequal Variances from the
Oneway Analysis pull-down menu.
The test is:
Ho : F   M
Ha : F   M
JMP gives four different tests for examining the equality of population variances. To use
the results of these tests simply examine the resulting p-values. If any/all are less than .10
or .05 then worry about the assumption of equal variances and use the unequal variance tTest instead of the pooled t-Test.
p-values for testing variances
55
Example 2: Normal Human Body Temperatures Females vs. Males (cont’d)
To perform the two-sample t-Test for independent samples:
 assuming equal population variances select the Means/Anova/Pooled t option
from Oneway-Analysis pull-down menu.
 assuming unequal population variances select t-Test from the Oneway-Analysis
pull-down menu.
Because we have no evidence
against the equality of the
population variances
assumption we will use a
pooled t-Test to compare the
population means.
Several new boxes of output will appear below the graph once the appropriate option has
been selected, some of which we will not concern ourselves with. The relevant box for us
will be labeled t-Test is shown below for the mean body temperature comparison.
Because we have concluded
that the equality of variance
assumption is reasonable for
these data we can refer to the
output for the t-Test assuming
equal variances.

What is the test statistic value for this test?

What is the p-value?

What is your decision for the test?

Write a conclusion for your findings.
56

Interpretation of the CI for ( F   M )
Case 2 - Unequal Populations Variances/Standard Deviations (  1   2 )
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are NOT equal.
(This can be formally tested or use rule o’thumb)
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are “large”.
Test Statistic
t
(X1  X 2 )  0
~ t-distribution with df = (see formula below)
SE ( X 1  X 2 )
where the SE ( X 1  X 2 ) is as defined below.
100(1 -  )% Confidence Interval for ( 1   2 )
( X 1  X 2 )  t  SE ( X 1  X 2 )
where
2
SE ( X 1  X 2 ) 
2
s1
s
 2
n1
n2
and
df 
 s1 2 s 2 2 


n  n 
2 
 1
2
rounded down to the nearest integer
2
2
 s1 2 
 s2 2 




n 


 1    n2 
n1  1
n2  1
The t-quantiles are the same as those we have seen previously.
57
Example: Cell Radii of Malignant vs. Benign Breast Tumors
(Datafile: Breast-Diag.JMP)
These data come from a study of breast tumors conducted at the University of WisconsinMadison. The goal was determine if malignancy of a tumor could be established by
using shape characteristics of cells obtained via fine needle aspiration (FNA) and
digitized scanning of the cells. The sample of tumor cells were examined under an
electron microscope and a variety of cell shape characteristics were measured.
One of the goals of the study was to determine which cell characteristics are most useful
for discriminating between benign and malignant tumors.
The variables in the data file are:
 ID - patient identification number (not used)
 Diagnosis determined by biopsy - B = benign or M = malignant
 Radius = radius (mean of distances from center to points on the perimeter
 Texture texture (standard deviation of gray-scale values)
 Smoothness = smoothness (local variation in radius lengths)
 Compactness = compactness (perimeter^2 / area - 1.0)
 Concavity = concavity (severity of concave portions of the contour)
 Concavepts = concave points (number of concave portions of the contour)
 Symmetry = symmetry (measure of symmetry of the cell nucleus)
 FracDim = fractal dimension ("coastline approximation" - 1)
Medical literature citations:
W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
Machine learning techniques to diagnose breast cancer from
fine-needle aspirates.
Cancer Letters 77 (1994) 163-171.
W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
Image analysis and machine learning applied to breast cancer
diagnosis and prognosis.
Analytical and Quantitative Cytology and Histology, Vol. 17
No. 2, pages 77-87, April 1995.
W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
Computerized breast cancer diagnosis and prognosis from fine
needle aspirates.
Archives of Surgery 1995;130:511-516.
W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
Computer-derived nuclear features distinguish malignant from
benign breast cytology.
Human Pathology, 26:792--796, 1995.
See also:
http://www.cs.wisc.edu/~olvi/uwmp/mpml.html
http://www.cs.wisc.edu/~olvi/uwmp/cancer.html
In this example we focus on the potential differences in the cell radius between benign
and malignant tumor cells.
58
The cell radii of the malignant tumors certainly appear to be larger than the cell radii of
the benign tumors. The summary statistics support this with sample means/medians of
rough 17 and 12 units respectively. The 95% CI’s for the mean cell radius for the two
tumor groups do not overlap, which further supports a significant difference in the cell
radii exists.
Testing the Equality of Population Variances
59
Because we conclude that the population variances are unequal we should use the nonpooled version to the two-sample t-test. No one does this by hand, so we will use JMP.
Conclusion:
60