Download Introduction to Hypothesis Testing Using the Normal Distribution, the t-Test and the SAS System

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Xntroduction to Hypothesis Testing
Using the Noraal Distribution, the T Test and SAS@
Arthur L. Carpenter
california occidental Consultants
Xey Words: normal, t
statistics, testing
test, hypothesis,
is often due to characteristics of the
population being sampled. Rolls of a die
or flips of a coin follow the binomial
distribution,
while classroom grades
often follow a bell shaped or normal
distribution (curve). Prior knowledge of
these
distributions
and
the
characteristics of the populations that
the samples are drawn from, can be very
useful to the researcher.
Introduction
The
basic
statistical
concepts
of
hypothesis testing using the normal
distribution and the t test will be
presented to the non-statistician.
The
workshop is designed to assist the
manager and researcher who must, through
the course of events, come into contact
with
statisticians
and
statistical
analyses. wi thin the last few years, the
study and application of statistics has
become a very specialized field and, as
with many specialties, it has developed
its own jargon and ways of doing things.
A large number of experiments result in
values
that
are
continuous.
The
population density per square mile could
be any value from zero to several
thousand including non-integers.
The
distribution of person's heights is also
not discrete. Height and density must be
measured on a continuous scale and result
in continuous probability distributions.
One of the most important distributions
in the field of statistics is the normal
distribution. Its graph, the bell-shaped
normal curve, is often well known even by
people who know little or nothing about
statistics.
This distribution, which
depends on two parameters (mean and
variance), has been shown to occur in
countless experimental samples.
Managers and many researchers are not
always in the position to keep up with
the application of the latest statistical
techniques. Many of them may have had a
statistics class or two during their
college career, however, one or two
courses of undergraduate statistics does
not a statistician make.
The manager
must
therefore
interface
with
statisticians and/or the statistical
results of analyses without sufficient
tools to realize the maximum benefit.
This workshop will supply the manager or
researcher with the verbiage and jargon
necessary to communicate with statisticians
and
the
tools
to read and
understand
basic
computer
generated
output from statistical analyses.
Two
normal
curves
with
variances and unequal means.
equal
The Normal probability Distribution
Probability
distributions
play
an
important role, not only in understanding
the relationships within the data, but as
part of the testing of hypotheses. When
taking samples from a population, usually
some outcomes are more likely than
others.
A plot of the likelihood or
probability of each of the outcomes gives
a
curve
known
as
the
probability
distribution.
Two normal curves with equal means
and unequal variances.
It is not unusual for samples taken in
field or laboratory experiments to follow
a known probability distribution.
This
802
Two normal curves with unequal means
and unequal variances.
When taking large samples from a normally
distributed population it is reasonable
to assume that the sample will also be
distributed normally.
This is not the
case for smaller sampJ.:es. As the sample
size (N) becomes less than 30, the sample
more closely follows the t distribution.
need a way to standardize the many
possible normal curves so that they can
be compared each other as well as to
tables
of
probabilities.
The
distribution of a normal random variable
with a mean of 0 and a variance of 1 is
known as a standard normal distribution.
Any
normal
distribution
can
be
transformed into a standard normal by
subtracting the mean and dividing by the
standard deviation.
We
Z
(x
mean)
deviation)
I
The t distribution looks very much like
the normal, however it has heavier tails
and a
third
parameter,
degrees of
freedom.
As the degrees of freedom
decrease the distribution becomes wider
and flatter. The degrees of freedom for
a given sample is one less than the
sample size (d.f. = N - 1). The degrees
of freedom (DF) allow the test to account
for the uncertainty when estimating the
population variance with the sample
variance.
(standard
The probability of a given t value is
determined in much the same way as it is
for a normal distribution.
Tables are
available in most statistics books and
SAS has a built in function which will
calculate the probability directly.
This transformation can be done using
PROC STANDARD or it can be done in the
data step.
The probability levels, associated with
most tests of hypotheses are based on the
area under the normal curve.
This area
can be equated to probability since both
the area and the probability must be
constrained between 0 and 1.
The formula for the T
similar to that of the Z:
statistic
is
T = (samplemean - populationmean) I
(STD/SQRT (n) )
The probability tables found in the back
of most basic statistics books are based
on
the
standard
normal
curves.
Fortunately SAS users never need to look
up values in tables, since SAS provides
the tables in the form of functions.
Example 2
A manufacturer of computer hard disks
collected failure time information on 15
of its disks.
If the sampled disks had
an average failure time of 2000 hours and
a standard deviation of 300 hours, find
the probability that the true mean time
to failure is less than 1200 hours.
Compare the results to those obtained in
Example 1.
Example 1:
The average time to failure for a disk
drive is 2000 hours, with a standard
deviation of 300 hours.
Assuming that
times
to
failure
are
normally
distributed, find the probability that a
disk drive will fail in less than 1200
hours.
DATA NULL;
DF = Is - I;
T = (1200 -2000) I
PR = PROBT(T,DF);
(300/SQRT(lS»;
PUT T= PR= ;
RUN;
DATA NULL;
Z = (1200
2000) I 300;
PR = PROBNORM(Z);
=
(T=10.32
PUT Z= PR=;
RUN;
(Z= -2.667 PR= 0.00383)
803
PR=3.13E-8)
Hypothesis
~estinq
Example 3
The process used to produce widgets has
been shown to be normally distributed
with a mean of 25 widgets per hour with
a standard deviation of 4.33 widgets. A
new process has been suggested which may
be better (more widgets per hour can be
produced) •
It is anticipated that the
new process will produce 32.5 widgets per
hour with the same standard deviation.
What is the probability of producing 32.5
widgets per hour if the true mean is 251
Does this suggest that Widgets Inc.
should convert to the new process? How
sure are we of the recommendation?
The testing of statistical hypotheses is
very important to the researcher who
needs to make decisions or inferences
based on experimental results. According
to
Walpole
and
Myers
(1972),
"A
statistical hypothesis is an assumption
or statement, that mayor may not be
true,
concerning
one
or
more
populations. II Before the researcher can
reach
a
statistical
conclusion,
statistical tests need to be run. Before
the tests can be conducted the statement
of hypotheses must be made.
Because of the increasing statistical
sophistication of those that evaluate the
validity
of
experimental
results,
especially
governmental
regulatory
boards, the experimenters must also become more sophisticated.
Although this
is great for statisticians it is not
always necessary.
HO: There is no difference between
processes.
HI: The new process is better.
The alternative hypothesis (HI) maybe
either one sided, as in this example, or
two sided (the two processes are not
equal).
Consider for example the case where the
CAP
scores
(California
Academic
Proficiency - tests of the quality of a
school) for graduating seniors in two
schools differ by 150 points (with a
standard deviation of 25), we KNOW that
the scores of the two schools are
different and we DON'T need a statistical
test to tell us.
Just by inspection we
can say that the two schools scored
differently.
If I however, a statement
must be made as to the probability that
the schools are the same, a statistical
test must be made. It is easy to say the
schools are different, but how sure are
we? And how different are they?
DATA NULL;
Z ~ (32.5
25) / 4.33;
PR ~ 1 - PROBNORM(Z);
PUT Z~ PR~;
RUN;
=
(Z~1.732
Statistical tests based on specific
hypotheses
allow
the
researcher
to
quantify the probability of making a
mistaken conclusion. There are two ways
of committing an error when dealing with
an hypothesis (usually referred to as a
null hypothesis).
We can conclude that
the hypothesis is false when it is really
true or we can conclude that the null
hypothesis is true when it is really
false.
These are known as Type I and
Type II errors.
TYPE
I:
Rej ect
hypothesis when it is true.
the
null
TYPE
II :
Accept
hypothesis when it is false.
the
null
PR~0.0416)
The probability level of .04 is low
enough to rej ect the null hypothesis.
The new process looks like it could be
better. And there are only 4 chances in
a hundred that we are wrong. However, we
do not know any of the true statistics of
the new process.
We might recommend a
trial study.
It is possib1e to make a type II error
when
we
do
not
reject
the
null
hypothesis.
Failure to reiect the null
hypothesis is NOT the same as accepting
it as true! A type II error is made when
the null hypothesis is accepted as true
when it is actually false.
The probability of making a type II error
is designated by the Greek letter beta
( B ) and (1 - B) is known as the power
of the test.
Although the power should
always be calculated, in practice it
rarely is, as it is necessary to make
some assumptions about the al ternative
distribution.
The probability of committing a type I
error is called the level of significance
of the test and is usually designated by
the Greek letter alpha (0<).
Hence the
term alpha level.
A customary alpha
level is 5% (ot = .05).
804
Given the wide variety of type~ of
experimental designs it is not surpr~sing
that there are also a great many ways to
test those hypotheses. Three of the more
common classes of these tests include
tests of location e.g. equality means,
tests of independence and tests of
dispersion e.g. equality of variances.
The mean is not significant at the .05
level, however it would have been at the
.1 level.
The t test can also be used to determine
if
the
means
of
two
samples
are
different.
The samples do not have to
have the same sample size and the
underlying variances do not need to be
the same.
PROC TTEST is often used for
these
types
of
comparisons.
It
automatically produces a
two tailed
probability and checks the assumption of
equality of variance.
Tests of location are usually tests of
equality of means and include the z test,
t test, analysts of variance (ANOVA) and
analysis of covariance (ANCOVA).
The
chi-square test, there are several tests
that use the chi-square statistic, is a
test of independence of two or more sets
of data.
Tests of dispersion or
variance, include the F test, Bartlett's
test, Cochran's test and others.
Example 5
The years of service of union and
non-union workers were collected and are
to be compared.
DATA UNION:
INPUT UNION $ YEARS @@:
CARDS:
Y 25 Y 26 Y 30 Y 25 Y 31 Y 27
Y 24 N 19 N 21 N 30 N 25 N 21 N 23
The true underlying distribution is
rarely known in practice.
Usually
samples are collected and the population
statistics are estimated from the sample
e.g. sample mean and sample variance.
The variability associated with the
sample mean is quantified using the
standard error (standard deviation of the
sample mean).
The uncertainty inherent
in the. estimated statistics (mean and
variance)
must
be
taken
into
consideration when calculating the tests
of hypotheses.
PROC TTEST DATA=UNION:
CLASS UNION:
VAR YEARS:
TITLEI I EXAMPLE 5':
TITLE2 'COMPARISON OF SENIORITY':
RUN:
EXAMPLE 5
COMPARISON OF SENIORITY
t Tests
TTEST PROCEDURE
Variable: YEARS
This test is used when sampling from
normal populations and the sample size is
small i.e. <50 and/or when the population
variance is unknown (otherwise we can use
the Z test).
UNION
N
Mean
Std Oev
N
6
7
23.166666
26.857146
3.920034
2.672612
std Error
Minimum
Maximum
1.600347
1.010152
19.000000
24.000000
30.000000
31.000000
Y
T = (samplemean - populationmean) /
(STO/SQRT (n) )
Example 4
The manufacturer of glass microscope
lenses wants
to be sure that her
processing produces no more than 25
'pits' per slide.
She randomly selects
5 slides and counts the pits on each.
Her sample yields a mean of 28 pits and
a standard deviation of 4.33.
Is there
reason to suspect that the process is
indeed the pits?
.
HO:
The mean number of pits is 25.
Hl:
The mean number
greater than 25.
of
pits
Variances
Unequal
Equal
T
OF
Prob>ITI
-1. 9501
8.6
11.0
0.0844
0.0695
-2.0110
For HO: variances are equal, F'=2.l5
DF=(5,6) Prob>F'=0.3782
is
DATA NULL:
T = (28 - 25)/(4.33/SQRT(5»:
OF = 5-1:
PR = 1 - PROBT(T,DF):
PUT T= OF= PR=:
RUN;
(T=1.55 OF=4 PR=0.0981)
805
Paired comparisons are of use when two
determinations have been made on each
replicate. This technique is often used
in before and after studies.
The F test, which is used by PROC TTEST
is itself subject to assumptions and
becomes
biased
if
the
underlying
distributions
are
not
normally
distributed.
PROC TTEST is not used because the
samples are paired. A data step is used
to create a variable of the difference
and the difference is compared to zero.
The PRT option in PROC MEANS can be used
to make the comparison. PRT is always a
two tailed test for the hypothesis that
the difference is zero.
other more sophisticated procedures can
be easily programmed in the data step,
however they are outside the scope of
this workshop.
Recent research has shown that most
parametric tests are fairly robust (are
resistant)
to
departures
from
the
assumptions of normality and equality of
variance. This is especially true if the
sample sizes are large and equal.
Example 6
A detergent manufacturer would like to
show that Brand X really does make whites
brighter.
A measure of reflected light
was recorded before and after washing.
ABOUT THE AUTHOR
DATA DIFF; SET TTEST.EX6;
DIFF = AFTER - BEFORE;
Arthur L. Carpenter has over fourteen
years of experience as a statistician and
data analyst and has served as a senior
consultant with California Occidental
Consultants, CALOXY, since 1983.
His
publications list includes a number of
papers and posters presented at SUGI and
he has developed and presented several
courses and seminars on statistics and
SAS programming.
PROC MEANS DATA=DIFF MEAN N STDERR
T PRT;
VAR DIFF;
TITLE 'EXAMPLE 6';
TITLE2 'PAIRED T TEST';
RUN;
EXAMPLE 6
PAIRED T TEST
Analysis Variable
CALOXY offers SAS contract programminq
and in-house SAS training nationwide.
This workshop was adapted from the three
day
course
"PC/SAS
Introduction
to
Statistics for Managers".
DIFF
N Obs
N
Mean
std Error
5
5
0.10000
0.06324
T
Prob>ITI
1.58113
0.1890
Arthur L. Carpenter
California occidental Consultants
4239 Serena Avenue
Oceanside, CA 92056-5018
(619) 724-8579
REFERENCES
Benjamin, Jack and C. Allin Cornell,
Probability. statistics. and Decision for
civil
Engineers,
McGraw-Hill
Book
Company, 1970.
Sclotzhauer, Sandra O. and Ramon C.
Littell, SAS
System for Elementary
statistical Analysis,
SAS Institute,
Inc., .1987.
Walpole, Ronald E. and Raymond H. Myers,
Probability and statistics for Engineers
and scientists, Macmillan Company, New
York, 1972.
One of the assumptions of most tests of
hypotheses concerning equality of means
is that the sampling variances are equal.
Often the validity of this assumption is
unknown, but it can be tested.
PROC
TTEST
makes
this
comparison
automatically.
When the variances are
unequal, adjustments are made to the t
test results.
TRADEMARK IHFORMATION
SAS is a registered trademark of the SAS
Institute, Inc., Cary, NC, USA.
806