Download class notes - rivier.instructure.com.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
CLASS NOTES: Introduction to Analysis of Variance
CONCEPT
CALCULATION/EXAMPLES
Analysis of Variance: A
hypothesis testing procedure that is
used to evaluate mean differences
b/t 2 or more treatments (or
populations).
* ANOVA uses sample data to
draw general conclusions about a
population (sound familiar?)
* The goal of ANOVA is to
determine whether the mean
differences observed among the
samples provide enough evidence
to conclude that there are mean
differences among the populations;
whether or not there is a difference
b/t treatment methods.
* ANOVA can be used w/ either an
independent-measures or a repeated
measures design.
* Analysis means dividing into
smaller parts. Since we are
analyzing variance between means,
that is why this process is called
Analysis of Variance or ANOVA.
Factor: The variable (independent
or quasi-independent) that
designates the groups being
compared. An independent variable
is one that is manipulated by the
researcher whereas a quasiindependent variable is one that is
not manipulated by the researcher.
APPLICATION
*ANOVA is similar to t-tests in
that they test for mean differences,
but t-tests are only limited to 2
treatments whereas ANOVA can
compare two or more treatments.
* Remember that the independent
variable is the variable manipulated
by the researcher.
* The ability to combine different
factors & to mix different designs
w/in one study provides researchers
w/ the flexibility to develop studies
that address scientific questions
that could not be answered by a
single design using a single factor.
Independent factor: Temperature,
where the researcher will
manipulate the temperature in the
room
Quasi-independent factor: Age,
where the researcher will compare
a treatment for different age
groups.
Remember that in a quasiexperiment, although the
independent variable is
manipulated, it is not created by the
experimenter, but already exists in
nature, such as “age.”
Ex: (independent measures)
measuring differences in trial
medication for
Evaluating differences between the
variable medication at 10mg, 50mg
& 100mg
Variance: A measure of variability
that indicates how far all of the
scores in a distribution vary from
the mean.
Between-Treatments Variance:
The variance b/t treatment
conditions or the difference b/t the
overall means of the treatment
conditions.
sample set 1: 10mg
sample set 2: 50 mg
sample set 3: 100mg
There are two possible explanations
for differences b/t treatment
conditions or differences b/t
treatment condition means:
1) treatment effect: The difference
is caused by the treatments.
* which would correspond w/ your
alternative hypothesis
2) chance: The difference is simply
due to chance.
* error
There are 2 primary sources for
Chance differences:
When computing the betweentreatment variance, we are
measuring differences that could be
caused by either individual
differences or experimental error.
By analyzing these differences, we
can then establish how big the
differences are when there is no
treatment effect involved or how
much difference (or variance)
occurs by chance alone (null
hypothesis).
1) Individual differences:
There are variances b/t
scores for each sample
group b/t of there are
different participants w/
different scores for each
sample.
2) Experimental Error: There is
always potential for some
degree of error in general.
Within-Treatment Variance: The
variance within each sample.
Ex: (independent measures)
evaluating the differences b/t
measures for sample set 1
Those differences or variances b/t
scores for each individual even
though they may have received
exactly the same treatment.
Not only are there variations we are
looking for between treatments, but
also variations within the individual
sample sets.
This is not a complete list of all
notations & formulas, but those that
are different from earlier notations
& also those that are new. There
are other notations & formulas used
in ANOVA that are not listed here,
but that you are already familiar w/.
ANOVA Notation & Formulas
Example:
k = Number of treatment
conditions
3 different treatments for
Alzheimer’s Disease
k = 3 (treatment conditions)
If the scores (or participants) in
each treatment condition are not
equal (i.e., 5 for one treatment, 4 in
another, 6 in another); then you can
identify a specific sample by using
a subscript (ex: n2 = number of
scores for treatment condition #2)
n = Number of scores in each
treatment
If there are 5 scores in each
treatment condition, n = 5
N = Total # of scores in the entire
study
If there are 9 scores altogether (3
treatment conditions, 3 scores in
each treatment condition), then N =
9
T = Total value of each treatment
condition
T = ∑X
C = Correction Factor
G = The sum of all the scores in the
research study (the grand total)
SS = Sum of Squares
MS = Mean square
dfT = Degrees of Freedom TOTAL
C = (∑X1 + ∑X2 + ∑X3…..)2
N
Add up all the N scores or add up
the treatment totals G = Σ (∑X1 +
∑X2 + ∑X3…..)
SS = ∑X2 – (∑X)2
N
MS = SS
df
N–1
This is the total df formula for
ANOVA.
This formula looks familiar, yes?
The Sum of Squares formula is the
numerator portion of our standard
deviation formula!
You will be plugging in different
values into the SS & df place
depending upon which MS you are
calculating: for between treatments,
within treatments, between subjects
or error.
There are variations to this formula
based upon the following factors:
Repeated vs. Independent Measures
ANOVA, Between-treatment
variance, Between-subject variance
& error. See additional materials on
Repeated & Independent measures
for these specific df formulas. Also,
see full example for calculations of
each degrees of freedom)
DISCLAIMER: There are no
universally accepted notations
for ANOVA & you may find
other sources or use other
symbols. But for the purposes of
this class, these are the symbols
that we will be using &
recognizing w/ ANOVA.
Hypothesis Testing with ANOVA
One-Way F-Ratio Between
Subjects
Step 1:
State Your Hypothesis
Null hypothesis (Ho) states that
there is no difference b/t treatments
Alternative hypothesis (H1) states
that at least one population mean
is different from another.
Here is where the alternative
hypothesis looks different from the
other alternative hypotheses. Since
we are comparing more than 2
treatment methods, there are a
number of different alternatives
that can exist (A > B, but B = C; A
= B = C; A = B, but B < C, etc..).
So all we need to “reject the null”
is for at least one difference b/t
treatments to exist since the null
hypothesis would require no
variance b/t means.
* Researchers usually have some
idea of what difference they are
looking for in their study.
Step 2:
Set the Criteria for a Decision
a. Set your alpha level
(alpha) = .05
Remember that most researchers
choose from alpha levels of .05, .01
or .001
dft = dfw + dfb
or
dft = N – 1
dft = total degrees of freedom
dfw = degrees of freedom within
treatment groups
dfb = degrees of freedom between
treatments
Σ(n – 1) or
dfw = N – k
dfw = The total number of all
values or scores in the entire study
minus the number of treatment
b. Calculate the degrees of
freedom (this is for
independent measures.
See Class Notes for
repeated measures
ANOVA for specific df
formulas related to repeated
measures)
1. df Total
2. df Within Treatments
conditions. If you were to have an
N of 9 (9 total scores) & 3
treatment methods, then your w/in
samples df would be 9 – 3 or 6.
3. df Between
Treatments
c. Determine your critical
region. Look in the back of
your textbook in the
appendices for the Critical
Values of the F for the
analysis of variance
dfb = k – 1
The number of treatment effects (k)
minus 1. So if there are 3 treatment
effects, then 3 – 1 = 2
For the F table, the row across the
top indicates the between
treatments degrees of freedom &
the column on the far left indicates
the within treatments degrees
freedom. Put your finger on these
two values & bring them together
to indicate the F-value that
separates the critical region from
the null region. (one for an α = .05;
an F value in bold to represent α =
.01
As in Correlation & T-stats, we
look to the normal curve to
determine the cut-off point between
the critical region (the tail(s) of the
distribution where, should the Fratio fall, the outcome would be to
reject the null hypothesis, meaning
that at least one sample mean is
different from the others) & the
null region (the body portion of the
distribution where, should the Fratio fall, the outcome would be to
fail to reject the null hypothesis
indicating that there is no
difference b/t the sample means)
Step 3:
Collect Your Data & Compute
your Sample Statistics
ANOVA Formula For Between
Treatment Variance
For each treatment condition,
calculate…
a) ΣX, ΣX2 & (ΣX)2
Review example in text
b) n
n = number of participants/ sources
in each sample
c) M
The mean for each treatment
condition
For the full or TOTOAL set of
values, calculate….
d) N
N = The total number of all scores
in the experiment
e) M
Mean of each treatment mean
By adding the mean values together
& dividing by the number of
treatment measures
f) ΣX, ΣX2 & (ΣX)2
For the total set of values (see
example for calculations)
Sum of Squares
g) SSt or Total
SSt = ΣX2 – (ΣX)2
N
h) SSb or between treatments
SSb = ∑ (∑Xg)2 – (∑X)2
g
ng
N
i) SSw or within treatments
SSW = ∑ ∑Xg2 – (∑Xg)2
g
ng
Using your total values
The sum of squares between
treatment groups. The Sigma (∑)
represents the sum. The ‘g’
beneath the Sigma indicates that
you should repeat the formula for
each treatment. The subscript ‘g’
represents the value associated w/
each
Mean Square
j) MSb
MSb = SSb
dfb
The mean square for between
groups is the sum of squares for
between groups divided by the
degrees of freedom for between
groups.
MSw = SSw
Dfw
The mean square for within groups
is the sum of the squares for within
groups divided by the degrees of
freedom for within groups.
k) MSw
The F-Ratio
Requirements for Using the FRatio:
* The sample groups have been
randomly & independently selected
* There is a normal distribution in
the population from which the
samples are selected.
* The data are in interval from (or
ratio)
* The within-group variances of the
samples should be fairly similar.
F=
Variance between treatments
Variance w/in treatments
F = MSb
MSw
* Remember your t-statistic:
Obtained difference from
_________sample means_______
Difference expected by chance
* notice that the F-ratio is based on
variance instead of sample mean
difference. Again, b/c there are a
number of different possibilities
that can exist b/t the different
means in the case of ANOVA, then
we calculate the overall variance
b/t the sample means.
This property called homogeneity
of variance, simply means that
ANOVA demands sample groups
that do not differ too much w/
regard to their internal variabilities.
* The numerator of the ratio
measures the actual difference
obtained from the sample data, &
the denominator measures the
difference that would be expected
if there is no treatment effect.
* When the treatment has no effect
& any difference is simply due to
chance (the effect size is “0”), then
the F-ratio should be around “1.00”
* When the treatment does have an
effect, then the numerator (b/t
treatment differences) should be
larger than those due to chance
(differences due to chance), so your
F-ratio should be noticeably larger
than 1.00.
The Distribution of F-Ratios
Once you have computed your FRatio score…
Since there are 2 df’s, then it is
expressed as :
df= 2, 12
The first df listed as your between
treatments df & your second the
within treatments df.
MSwithin = ΣSS = SS1 + SS2 + SS3...
Σdf
df1 + df2 + df3
Remember that pooled variance is
used when you have unequal ‘n’
values.
1) Go to the F distribution table
with your dfbetween treatments
score (calculated in the
numerator portion of your FRatio formula) & dfwithin
treatments score (found in the
denominator portion of your
F-Ratio formula).
2) Locate these 2 df’s on the
table (numerator is listed in
the row above whereas the
denominator is listed in the
column on the right).
3) Connect these 2 scores
together in the middle.
Regular type scores give you
the critical value for alpha
level of .05. Bold will give
you the critical value for
alpha level .01.
MSwithin & Pooled Variance
Just as in the t-statistic where each
SS was added for the numerator &
divided by each df added as the
denominator. Same here, you just
keep adding however many “SS’s”
& “df’s” you have according to
how many sample groups you have
in your research study.
Based upon your results, you will
either
Step 4:
Make a decision
Reject the null, meaning at least
one mean was different from the
others
Or
You fail to reject the null, meaning
that there were no differences b/t
the different treatment conditions
or means.
Example of ANOVA
Time
Before Therapy
After Therapy
6 Months After
Therapy
Therapy 1
(Group #1)
Scores for group #1
measured before
Therapy #1
Scores for group #1
measured after
Therapy #1
Scores for group #1
measured 6 months
after Therapy #1
Therapy 2
(Group #2)
Scores for group #2
measured after
Therapy #2
Scores for group #2
after Therapy #2
Scores for group #2
6 months after
Therapy #2
Therapy
Technique
Hypothetical Data
Treatment 1
50º
(sample 1)
0
1
3
1
0
Treatment 2
70 º
(sample 2)
4
3
6
3
4
Treatment 3
90 º
(sample 3)
1
2
2
0
0
M=1
M=4
M=1