Download Research Questions, Hypotheses, and Variables

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
Research Questions,
Variables, and Hypotheses
RCS 6740
6/1/05
1
Review
What are research questions?
 What are variables?

Definition
 Function
 Measurement Scale

2
Hypotheses

OK, now that we know how to set up
a research project by posing
research questions and labeling
variables, it is time to move on to a
more formal way of structuring and
interpreting research.
3
Hypotheses Definitions



Hypotheses are predictions about the
relationship among two or more variables
or groups based on a theory or previous
research (Pittenger, 2003)
Hypotheses are Assumptions or theories
that a researcher makes and tests.
Why are hypotheses important?
4
Importance of Hypotheses

Hypotheses:



Direct our observations
• Identifies the variables examined and data to be
collected
Describe a relationship among variables
• Can state that as one variable increases, the other
will decrease; as one variables increases, the other
will increase, and so on.
Refer to populations
• Hypotheses help researchers infer that results of a
sample will translate to a population
5
4 Functions of Hypotheses

Hypotheses can:
Estimate Population Characteristics
 Correlate Variables
 Display Differences among Two or more
populations
 Show possible Cause and Effect
What research designs relate to
each of these 4 functions?

6
Symbols used in
Hypotheses








M= mean
µ (mu: mew)= population mean
Roman Letters (e.g., A, B, C, D) are used to
represent statistics
Greek Letters (e.g., α, β) are used to
represent parameters
α= significance level; probability of
committing a Type I Error (α= .05)
p= probability value (p= .05)
Null Hypothesis= (H0: µ1 - µ2 = 0 or H0: µ1 =
µ2)
Alternative Hypothesis= (H1: µ1-µ2 ≠ 0 or
H1: µ1 ≠ µ2 )
7
Types of Hypotheses
Research Hypotheses
 Statistical Hypotheses

8
Research Hypotheses



Research Hypothesis: A statement of the
relationship among two or more variables
or groups.
The acceptance or non-acceptance of
which is based on resolving a logical
alternative with a null hypothesis.
Example: Student who are taking RCS
6740 will score higher on Exam 2 than
students who do are not taking RCS
6740.
9
Research Hypotheses Cont.


Research hypotheses can be stated as
Directional or Non-directional.
Directional hypotheses predict the specific
relationship among two or more variables
or groups:

Student who are taking RCS 6740 will score
higher on Exam 2 than students who do are
not taking RCS 6740
H0: µ1 < µ2


H1: µ1 > µ2
IQ scores will correlate in a positive manner
with Self Esteem Scores
Cats will bark less frequently than Dogs
10
Research Hypotheses Cont.

Non-Directional Hypotheses predict that
there will be differences among two or
more groups, but do not specify the
direction of the differences



Men and Women will differ in their recall of
phone numbers
The scores on the Geriatric Depression Scale
will differ between people with Stroke and
people with Alzheimer’s disease
IQ scores will correlate with Self Esteem
scores
11
Research Hypotheses Cont.
Your Turn!
 Come up with some directional and
non-directional research hypotheses
 Share with the class

12
Statistical Hypotheses





Statistical Hypotheses are mathematical, or logical
statements that help researchers interpret the results
of research
Statistical hypotheses consist of the Null
Hypothesis (H0), the hypothesis of no difference
and the Alternative Hypothesis (H1 or HA) which is
similar in form to the research hypothesis.
Null: (H0: µ1 - µ2 = 0 )
Alternative: (H1: µ1-µ2 ≠ 0)
Written out examples:
 Null: There will be no difference on Exam 2 scores
between student who are taking RCS 6740 and
students who do are not taking RCS 6740
 Alternative: There will be a difference on Exam 2
scores between student who are taking RCS 6740
and students who do are not taking RCS 6740 13
Statistical Hypotheses Cont.

Remember, and this is important:
The null hypothesis always implies
that there is no relation or
statistical difference between
variables or groups
 The alternative hypothesis implies
that there is a meaningful
relationship among variables or
groups

14
Testing Hypotheses

We only test the null hypothesis; we
do not test the research hypothesis.
“Our decision about the null
hypothesis is the only link between
what our statistical models tell us
about probability, and the decision
we make concerning the research
hypothesis” (Williams, 1986, p. 55).
15
Testing Hypotheses Cont.


When a variable has more than one
operational definition, more than one null
hypothesis may be associated with one
research hypothesis.
We use a variety of statistical procedures
to test null hypotheses. The choice of
which procedure depends on a variety of
factors including the research hypothesis,
the data, the sampling strategy, and what
we want to be able to say as a result of
our testing.
16
Types of Tests


Statistical procedures that are commonly
used for hypothesis testing include:
correlation, analysis of variance (ANOVA),
analysis of covariance (ANCOVA),
regression, multivariate analysis of
variance (MANOVA), t-tests, and ChiSquare.
Each of these procedures has an
associated test statistic, which is used to
determine significance. For example
ANOVA, ANCOVA, and regression use F
statistics and their associated p-values.
17
Types of Tests

Multivariate procedures, like MANOVA,
use a variety of test statistics with
interesting names, like Wilk’s lambda.
These are then related to a more
common test statistic, like F.

The secret here, for the layperson, is that
all test statistics are eventually related to
a probability distribution and a p-value.
These p-values mean the same thing
across test statistics.
18
Error Types


In hypothesis testing, we must contend with two
types of errors -- Type I and Type II.
Errors are mistakes that we can make when
judging the null hypothesis

Type I error is what happens when the tested
hypothesis is falsely rejected. (It is when you
say you found something, but that something is
really an error.) A type I error is a false positive.

Type II error is what happens when a false
tested hypothesis is not rejected (Hays, 1986).
(It is when you don’t find something that is, in
fact, there.) A type II error is a false negative.
19
Error Types Cont.

Alpha () is the level of probability (pre-set by
the researcher) that the tested hypothesis will
be falsely rejected. Alpha is the pre-set risk of a
Type I error.


In other words, alpha is the degree of risk that you
accept, in advance of conducting the study, that what
you find will be an error.
Beta () is the probability (often neglected by
the researcher) that a false null hypothesis will
not be rejected.

Beta is the probability that you won’t find what you are
looking for if, in fact, it is really there.
20
Error Types Cont.

The picture on the next slide, which
is modified from Hays (1986) and
Ferguson and Takane (1989),
illustrates the relationships among
Type I and Type II errors and alpha
and beta.
21
Error Types Cont.

Error Types Chart
Reject H0
Decision
Fail to
Reject
(decide in
favor of
H0)
H0 is True
H1 is True
Type I
α
Correct
1- β
Correct
1- α
Type II
β
22
Error Types Cont.

Link to Real World Example of Error
Types:
http://www.intuitor.com/statistics/T1T2Errors.html
23
Power, Effect Size, and
Measurement


STATISTICAL POWER
Statistical power is “the probability of rejecting a
null hypothesis that is, in fact, false” (Williams,
1986, p. 67). Put more simply, statistical power
is the probability of finding relationships or
differences that in fact exist (Cohen, 1988).
In our fish story, it is the probability of finding
fish in Lake Alice, if they are in fact there. In
terms of beta (the probability of a Type II error),
statistical power = 1 - beta.
24
Statistical Power (cont)

Statistical power is a function of “the preset
significance criterion [alpha], the reliability of
sample results, and the effect size [the actual
size of the difference or strength of the
relationship]...” (Cohen, 1988, p. 4).

Considering complex interrelationships of the
above criteria, one can say that The researcher
can easily set alpha, but cannot easily set beta.
Alpha and beta are directly, but not perfectly
related.


Lowering alpha increases beta and lowers the power.
Increasing alpha decreases beta and increases power.
25
Power, Effect Size, and
Measurement Cont.
Statistical power is then related to:
• Sample size
• Effect size
• Statistical design (including number
of groups, 1- vs. 2-tailed tests)
• Significance criteria
26
Power, Effect Size, and
Measurement Cont.
EFFECT SIZE
 Effect size (ES) refers to the amount
of common variance between the
independent variable(s) (IV) and the
dependent variable(s) (DV), or the
degree to which changes in the IV(s)
result in changes in the DV(s).
27
Effect Size Cont.

For example, if I am interested in the differences in
competitive closure rate between rehabilitation counselors
with master’s degrees in rehabilitation counseling and
those with bachelor’s or unrelated master’s degrees, my
effect size would be the size of the difference between the
means of the two groups.

Or, if I wanted to test a specific intervention for students
with learning disabilities, and I had a test, which I believed
measured the effectiveness of my intervention; then my
effect size might be the difference in test scores between
an experimental group that received the intervention and
a control group that did not receive the intervention.
28
Effect Size Cont.

Similarly, if I wanted to examine the
impact of a specific course on
research anxiety, effect size could be
the differences in the mean scores
of research anxiety between an
experimental group who completed
the course and a control group who
did not.
29
Effect Size Cont.



Here is a large problem: Effect size depends on
what measure we use to operationalize the
construct.
For example, effect size depends on the net we
use, the test we select, etc. Actual effect sizes
may be much larger than observed effect sizes.
What might be considered a moderate to large
effect in a laboratory situation may appear as a
small effect in the real world where you can’t
control numerous sources of extraneous
variance, e.g., variability in individual
characteristics, treatment implementation,
environmental characteristics (Cohen, 1988).
Small effect sizes are common and should be
expected in ex post facto and quasi30
experimental situations (Cohen, 1988).
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN,
ANDSTATISTICAL POWER

This is just a conceptual introduction. We will
return to validity of measurement in a future
lecture.

All research depends on an operational definition
of the constructs of interest. In intervention
research, the operational definitions of both the
treatments and the outcomes influence effect
size. As we are all aware, there are a variety of
frames of reference regarding interventions and
outcomes.
31
RELATIONSHIP OF MEASUREMENT,
RESEARCH DESIGN, ANDSTATISTICAL
POWER Cont.



Consider the elephant fable with the
researchers who mapped different parts of the
elephant. Their descriptions of the elephant
differed considerably.
What we see in research depends, at least in
part, on what facet(s) of the construct of
interest is (are) operationalized by our
outcome measure(s). It is always better to
look at the construct in more than one way
(more than one facet) in order to limit threats
to validity from mono-operational bias.
In other words, looking at the elephant from
different angles can improve the degree to
which our descriptions of the elephant actually
describe the elephant.
32
RELATIONSHIP OF MEASUREMENT,
RESEARCH DESIGN, ANDSTATISTICAL
POWER Cont.

Now, consider measuring the same elephant
with portable X-Ray machines. Pictures of each
part of the elephant are taken and then
compared with each other. Not only do these
pictures not resemble each other, but they also
don’t resemble the descriptions provided by the
previous group of researchers.


This part of the elephant fable indicates how what we
see is indicated by our method of observation or
measurement.
Again, a researcher interested in a deeper
understanding of the elephant may choose
multiple methods of measurement in order to
avoid threats to validity from mono-method bias.
33
RELATIONSHIP OF MEASUREMENT,
RESEARCH DESIGN, ANDSTATISTICAL
POWER Cont.


The relationship of measurement, research
design, and statistical power means that large
treatment effects can actually be observed as
small effects. In other words, even if an
intervention is very effective, measurement and
design complications may make the effect
appear small and thus require high statistical
power for detection.
The following telescope model depicts the
interrelation. The effect is obscured when we
only look at part of the construct of interest. The
apparent effect size is then attenuated by the
extent to which our operational definitions
(including our measurement techniques) do not
reliably and validly capture the construct of
interest (i.e., intervention effectiveness).
34
Power, Effect Size, and
Measurement Cont.

Telescope Model
Actual Effect Size
Observed Effect Size
Statistical Design
Research Design
Measurement
35
RELATIONSHIP OF MEASUREMENT,
RESEARCH DESIGN, ANDSTATISTICAL
POWER Cont.

Apparent effect size is further attenuated when
research design does not fully filter out
extraneous sources of variation (e.g., counselor
or client differences). Violations of assumptions
of statistical procedures can further attenuate
effect size. Interestingly, problems in research
design and statistical design can also introduce
sources of Type 1 error (e.g., dust on the lens or
false positive results).

The relationship of effect size, measurement,
and design is further complicated by the frame
of reference or angle from which one approaches
or operationalizes the construct. This
complication is illustrated in the following figure.
36
Power, Effect Size, and
Measurement Cont.

Telescope Model 2
37
RELATIONSHIP OF MEASUREMENT,
RESEARCH DESIGN, ANDSTATISTICAL
POWER Cont.

Validity is a key element of the relationship of effect
size, measurement, and design. Qualitative methods
can further valid operationalization of constructs.
Multiple operational definitions and multiple methods
as recommended by Cook and Campbell (1979) can
enhance the validity of research, including counseling
effectiveness research.

Units of measurement should be carefully considered
in planning research. Researchers studying social and
cultural context of behavior have questioned the
reductionist tradition of separating acts, actors, and
audiences, as well as the tendency to study
behaviors without consideration of social and cultural
mediation (see e.g., Trueba, Rodriguez, Zou, &
Cintron, 1993; Wertch, 1991).
38
PRE-ANALYSIS STATISTICAL POWER

Pre-analysis statistical power estimation is a
recommended technique. The following steps
will allow you to consider statistical power in
research planning.
1.
2.
3.
Estimate effect size from past research and the type
of experimental design planned. When you are
unsure, underestimate effect size so as to
overestimate power. Also, in quasi-experimental or ex
post facto circumstances, it is usually best to estimate
a small effect size unless otherwise indicated.
Decide on exact statistical test and significance
criterion.
Determine acceptable level of power, .80 is nice but
.70 may be acceptable in some circumstances.
39
PRE-ANALYSIS STATISTICAL POWER
4.
5.

Use power tables for that statistical test or an
appropriate computer program to determine the
number of subjects required for the specified
significance criterion and desired level of power.
If you have a fixed number of subjects, consider
adjusting the significance criterion (alpha) or
statistical design if necessary to obtain adequate
power.
Recall, the .05 significance criterion is not
sacred, especially when it results in a power of
less than .30 (i.e., less than a 30% chance of
finding differences that actually exist).
(Szymanski & Parker, 1992)
40
Power, Effect Size, and
Measurement Cont.
ALPHA INFLATION
 Multiple comparisons can increase
alpha, the probability of a Type I
error. Recall the fish story. The
probability of a Type I error
escalates with the number of
comparisons made in the study. The
experiment-wise alpha is computed
as: 1-(1-alpha)n
41
ALPHA INFLATION


As we discussed, one way to guard against alpha
inflation is to use a Bonneferoni-type procedure
and to split alpha by the number of
comparisons. There are a variety of such
procedures that can be used (see e.g.,
Marasciulo & Serlin, 1988) according to the
relative importance of the tested hypotheses.
The problem with reducing alpha is that it
inflates beta. In situations in which alpha
inflation is accepted due to a problem with
power, one must look to replications for
confidence in the findings.
42
Power, Effect Size, and
Measurement Cont.

See Power Tables
http://fsweb.berry.edu/academic/education/vbissonnette/tables/tables.html
43
Test Statistics, Probability,
and Significance


In order to test a hypothesis, we compare
the obtained value of a test statistic (e.g.,
the obtained F) to a critical value of the
test statistic (e.g., a critical F) that is
associated with the preset significance
level (alpha).
If the obtained value of the test statistic
is greater than the critical value, we
determine that there is a significant
difference or relationship.
44
Test Statistics, Probability,
and Significance Cont.



Test Statistic: The specific statistic (i.e., the
tool) that is chosen to test the null hypothesis.
Examples include F, t, r.
Obtained Value: The actual value obtained
when applying the test statistic to the data of
interest. The probability value associated with
the obtained value is p.
Critical Value: The critical value of the test
statistic that is associated with the chosen
significance level (alpha). If the obtained value is
greater that the critical value, the result is
significant.
45
Test Statistics, Probability,
and Significance Cont.


Probability Value: The probability that
observed relationships or differences are due to
chance.
Alpha: Alpha is also known as significance level
or rejection region. It is the level of probability
set by the researcher as grounds for rejection of
the null hypothesis (Williams, 1986, p. 58).
Alpha is the probability level associated with the
critical value of the test statistic.

In other words, alpha is our predetermined risk that
differences that we declare to be real are actually due
to chance.
46
Test Statistics, Probability,
and Significance Cont.


Obtained: This is also known as the obtained
probability (p): significance of the test
statistic. It is the “probability that the data
could have arisen if Ho were true” (Cohen, 1994,
p. 998).
Significance: What happens when the obtained
probability p is less than our predetermined alpha.
Significance also occurs when the obtained value of
the test statistic is greater than the critical value of
the test statistic.



Test Statistic Probability Value
Critical Value Significance Level (alpha)
Obtained Value Obtained or Actual Probability (p)
• Note that larger obtained values of test statistics are generally
related with smaller values of p.


If Obtained Value > Critical Value, then * Significance *
If p < Alpha, then * Significance *
47
Revisit the Bell Curve
48
Test Statistics, Probability,
and Significance Cont.


WHETHER YOU ARE LOOKING AT OBTAINED
VALUES OF TEST STATISTICS IN RELATION TO
CRITICAL VALUES OR YOU ARE LOOKING AT
ACTUAL PROBABILITY LEVELS, IT IS
IMPORTANT TO NOTE THAT TEST STATISTICS
AND THEIR ASSOCIATED PROBABILITIES ONLY
TELL US THE PROBABILITY THAT A
DIFFERENCE OR RELATIONSHIP OCCURRED BY
CHANCE.
THESE STATISTICS DO NOT TELL US THE SIZE
OF GROUP DIFFERENCES OR THE STRENGTH
OF RELATIONSHIPS
49
Steps in Hypothesis Testing for
Quantitative Research Designs





Hypothesis testing is a 4 phase
procedure:
Phase I: Research Hypotheses,
Design, and Variables
Phase II: Statistical Hypotheses
Phase III: Hypotheses Testing
Phase IV: Decision/Interpretation
50
Phase I: Research Hypotheses,
Design, and Variables
1.
State your research hypotheses.
2.
Decide on a research design based on your
research problem, your hypotheses, and what
you really want to be able to say about your
results (e.g., if you want to say that A caused
B, you will need an experimental or time-series
design; if probable cause is sufficient, a quasiexperimental design would be appropriate).
3.
Operationally define your variables. Recall that
one variable can have more than one
operational definition.
51
Phase II: Statistical
Hypotheses
1.
Consider your chosen statistical
procedures.
2.
Write one statistical null
hypotheses for each operational
definition of each variable that
reflects that statistical operations
to be performed.
52
Phase III: Hypotheses
Testing

Complete the following steps for each statistical
null hypothesis:
1.
2.
3.
4.

Select a significance level (alpha).
Compute the value of the test statistic (e.g., F, r, t).
Compare the obtained value of the test statistics with
the critical value associated with the selected
significance level or compare the obtained p-value
with the pre-selected alpha value.
If the obtained value of the test statistic is greater
than the critical value (or if the obtained p-value is
less than the pre-selected alpha value), reject the null
hypothesis. If the obtained value is less than the
critical value of the test hypothesis, fail to reject the
null hypothesis.
Another way of looking it: If p is less than
or equal to alpha, reject the null
hypothesis.
53
Phase IV:
Decision/Interpretation
1.
2.
3.
4.
For each research hypothesis, consider the
decisions regarding the statistical null
hypotheses.
For each research hypothesis, consider
qualitative contextual information relating
potential plausibility.
Cautiously explain your findings with respect to
the research hypotheses.
List and discuss the limitations (threats to valid
inference).


Note: Null hypothesis testing is currently under
scrutiny (see e.g., Cohen, 1994; Kirk, 1996).
It is generally recommended that you report the effect
size along with the value of the test statistic and the
p-value. An alternative is to report confidence
intervals.
54
Points to Consider about
Hypotheses Testing
FISHING IN LAKE ALICE
 We don’t prove the null
hypothesis. If you go fishing on
Lake Alice and you don’t catch fish,
you cannot conclude that there are
no fish in the lake!!!

55
Points to Consider about
Hypotheses Testing Cont.

What circumstances might keep us from
finding fish in the lake? Possible
problems include:
1.
2.
3.
wrong or insensitive outcome measures
(using the large net for small fish),
sampling problems (looking in the wrong
part of the lake), or
methodological problems (scooping the top
of the lake rather than the bottom, where
the fish hang out).
56
Points to Consider about
Hypotheses Testing Cont.


Returning to hypothesis testing:
Failure to reject the null hypothesis
cannot be interpreted as proof that no
differences or relationships exist.
Existing differences or relationships
might be obscured by:
1.
2.
3.
4.
insensitive outcome measures (the wrong
fishnet),
inappropriate statistical designs,
poor sampling strategies, and
low statistical power.
57
Questions about
Hypotheses
?
58
Class Activity
59