Download t test notes - 2SummersReadings

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Instructor Notes (t-test)
An introduction to statistics usually covers t-tests, ANOVAs, and Chi-Square. For EPSY 5601,
we introduce t-tests, and provide just some information about the others.
The t-test is one type of inferential statistic. It is used to determine whether there is a
significant difference between the means of two groups. With all inferential statistics, we
assume the dependent variable (remember Variables Notes in Unit 8) fits a normal distribution.
When we assume a normal distribution exists, we can identify the probability of a particular
outcome. We specify the level of probability (alpha level, level of significance, p) we are willing
to accept before we collect data (p < .05 is a common value that is used). After we collect data
we calculate a test statistic with a formula. We compare our test statistic with a critical value
found on a table to see if our results fall within the acceptable level of probability. Modern
computer programs calculate the test statistic for us and also provide the exact probability of
obtaining that test statistic with the number of subjects we have.
Remember: Once you have learned the correlation coefficient (r) for your sample, you need to
determine what the likelihood is that the r value you found occurred by chance (Unit 3). In other
words, does the relationship you found in your sample really exist in the population or were your
results a fluke? In the case of a t-test, did the difference between the two means in your
sample occurred by chance and not really exist in your population.
t-test
When the difference between two population averages is being investigated, a t-test is used. In
other words, a t-test is used when we wish to compare two means. The scores must be measured
on an interval or ratio scale (remember Variables Notes in Unit 8). We would use a t-test if we
wished to compare the reading achievement of boys and girls. With a t-test, we have one
independent variable and one dependent variable. The independent variable (gender in this case)
can only have two levels (male and female). The dependent variable would be reading
achievement. If the independent had more than two levels, then we would use a one-way analysis
of variance (ANOVA).
The test statistic that a t-test produces is a t-value. Conceptually, t-values are an extension of zscores. In a way, the t-value represents how many standard units the means of the two groups are
apart.
With a t-test, the researcher wants to state with some degree of confidence that the obtained
difference between the means of the sample groups is too great to be a chance event and that
some difference also exists in the population from which the sample was drawn. In other words,
the difference that we might find between the boys' and girls' reading achievement in our sample
might have occurred by chance, or it might exist in the population. If our t-test produces a t-value
that results in a probability of .01, we say that the likelihood of getting the difference we found
by chance would be 1 in a 100 times. We could say that it is unlikely that our results occurred by
1
chance and the difference we found in the sample probably exists in the populations from which
it was drawn.
Five factors contribute to whether the difference between two groups' means can be
considered significant:
1. How large is the difference between the means of the two groups? Other factors being
equal, the greater the difference between the two means, the greater the likelihood that a
statistically significant mean difference exists. If the means of the two groups are far
apart, we can be fairly confident that there is a real difference between them.
2. How much overlap is there between the groups? This is a function of the variation within
the groups. Other factors being equal, the smaller the variances of the two groups under
consideration, the greater the likelihood that a statistically significant mean difference
exists. We can be more confident that two groups differ when the scores within each
group are close together.
3. How many subjects are in the two samples? The size of the sample is extremely
important in determining the significance of the difference between means. With
increased sample size, means tend to become more stable representations of group
performance. If the difference we find remains constant as we collect more and more
data, we become more confident that we can trust the difference we are finding.
4. What alpha level is being used to test the mean difference (how confident do you want to
be about your statement that there is a mean difference). A larger alpha level requires less
difference between the means. It is much harder to find differences between groups when
you are only willing to have your results occur by chance 1 out of a 100 times (p < .01) as
compared to 5 out of 100 times (p < .05).
5. Is a directional (one-tailed) or non-directional (two-tailed) hypothesis being tested? Other
factors being equal, smaller mean differences result in statistical significance with a
directional hypothesis. For our purposes we will use non-directional (two-tailed)
hypotheses.
Dr. Del Siegle created an Excel Spreadsheet that performs t-tests and a PowerPoint
Presentation that explains how enter data and read it (see Unit 12).
Assumptions Underlying the t Test
1. The samples have been randomly drawn from their respective populations
2. The scores in the population are normally distributed
3. The scores in the populations have the same variance (s1=s2) Note: We use a different
calculation for the standard error if they are not.
2
Three Types of t-tests

Pair-difference t-test (a.k.a. t-test for dependent groups, correlated t-test)
df= n (number of pairs) - 1
This is concerned with the difference between the average scores of a single sample of
individuals who are assessed at two different times (such as before treatment and after
treatment). It can also compare average scores of samples of individuals who are paired
in some way (such as siblings, mothers, daughters, persons who are matched in terms of a
particular characteristic).

t-test for Independent Samples (with two options)
This is concerned with the difference between the averages of two populations. Basically,
the procedure compares the averages of two samples that were selected independently of
each other, and asks whether those sample averages differ enough to believe that the
populations from which they were selected also have different averages. An example
would be comparing math achievement scores of an experimental group with a control
group.
1. Equal Variance (Pooled-variance t-test) df= n (total of both groups) - 2
Note: Used when both samples have the same number of subject or when
s1=s2 (Levene or F-max tests have p > .05).
2. Unequal Variance (Separate-variance t-test) df dependents on a formula, but a
rough estimate is one less than the smallest group
Note: Used when the samples have different numbers of subjects and they
have different variances -- s1<>s2 (Levene or F-max tests have p < .05).
3
How do I decide which type of t-test to use?
Note: The F-Max test can be substituted for the Levene test. Del's spreadsheet class uses the FMax.
Type I and II errors

Type I error -- reject a null hypothesis that is really true (with tests of difference this
means that you say there was a difference between the groups when there really was not a
difference). The probability of making a Type I error is the alpha level you choose. If you
set your probability (alpha level) at p < 05, then there is a 5% chance that you will make
a Type I error. You can reduce the chance of making a Type I error by setting a smaller
alpha level (p < .01). The problem with this is that as you lower the chance of making a
Type I error, you increase the chance of making a Type II error.

Type II error -- fail to reject a null hypothesis that is false (with tests of differences this
means that you say there was no difference between the groups when there really was
one)
4
Hypotheses (some ideas...)
1. Non directional (two-tailed)
Research Question: Is there a (statistically) significant difference between males and
females with respect to math achievement?
H0: There is no (statistically) significant difference between males and females with
respect to math achievement.
HA: There is a (statistically) significant difference between males and females with
respect to math achievement.
2. Directional (one-tailed)
Research Question: Do males score significantly higher than females with respect to
math achievement?
H0: Males do not score significantly higher than females with respect to math
achievement.
HA: Males score significantly higher than females with respect to math achievement.
Also, remember Variables notes in Unit 8…
The basic idea for calculating a t-test is to find the difference between the means of the two
groups and divide it by the STANDARD ERROR (OF THE DIFFERENCE) -- which is the
standard deviation of the distribution of differences.
Just for your information: A CONFIDENCE INTERVAL for a two-tailed t-test is calculated by
multiplying the CRITICAL VALUE times the STANDARD ERROR and adding and subtracting
that to and from the difference of the two means.
EFFECT SIZE is used to calculate practical difference. If you have several thousand subjects, it
is very easy to find a statistically significant difference. Whether that difference is practical or
meaningful is another questions. This is where effect size becomes important. With studies
involving group differences, effect size is the difference of the two means divided by the
standard deviation of the control group (or the average standard deviation of both groups if you
do not have a control group). Generally, effect size is only important if you have statistical
significance. An effect size of .2 is considered small, .5 is considered medium, and .8 is
considered large.
A bit of history...
W.A. Gassit (1905) first published a t-test. He worked at the Guiness Brewery in Dublin and
published under the name Student. The test was called Student t test (later shortened to t test).
t-tests can be easily computed with the Excel or SPSS computer application. For this course use
Del Siegle's t-test Excel speradsheet that does a very nice job of calculating t-values and other
pertinent information.
5
What is the difference between STANDARD DEVIATION and STANDARD ERROR?
The standard deviation is a measure of the variability of a single sample of observations. Let's
say we have a sample of 10 plant heights. We can say that our sample has a mean height of 10
cm and a standard deviation of 5 cm. The 5 cm can be thought of as a measure of the average of
each individual plant height from the mean of the plant heights.
The standard error, on the other hand, is a measure of the variability of a set of means. Let's say
that instead of taking just one sample of 10 plant heights from a population of plant heights we
take 100 separate samples of 10 plant heights. We calculate the mean of each of these samples
and now have a sample (usually called a sampling distribution) of means. The standard deviation
of this set of mean values is the standard error.
In lieu of taking many samples one can estimate the standard error from a single sample. This
estimate is derived by dividing the standard deviation by the square root of the sample size. How
good this estimate is depends on the shape of the original distribution of sampling units (the
closer to normal the better) and on the sample size (the larger the sample the better).
The standard error turns out to be an extremely important statistic, because it is used both to
construct confidence intervals around estimates of population means (the confidence interval is
the standard error times the critical value of t) and in significance testing.
t-test with Excel
Returns the probability associated with a t-test. Use the TTEST function to determine whether
two samples are likely to have come from the same two underlying populations that have the
same mean.
Syntax
TTEST(array1, array2, tails, type)




Array1 is the first data set.
Array2 is the second data set.
Tails specifies the number of distribution tails. If tails = 1, TTEST uses the one-tailed
distribution. If tails = 2, TTEST uses the two-tailed distribution.
Type is the kind of t-test to perform.
1 - Paired
2 - Two-sample equal variance (homoscedastic)
3 - Two-sample unequal variance (heteroscedastic)
6
Remarks




If array1 and array2 have a different number of data points, and type = 1 (paired), TTEST
returns the #N/A error value.
The tails and type arguments are truncated to integers.
If tails or type is non-numeric, TTEST returns the #VALUE! error value.
If tails is any value other than 1 or 2, TTEST returns the #NUM! error value.
Example
TTEST({3,4,5,8,9,1,2,4,5},{6,19,3,2,14,4,5,17,1},2,1) equals 0.196016
Notes prepared by Del Siegle, Ph.D.
Neag School of Education - University of Connecticut
7