Download Research Design and Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
RESEARCH DESIGN AND ANALYSIS
Jan B. Engelmann, Ph.D.
Department of Psychiatry and Behavioral Sciences
Emory University School of Medicine
Contact: [email protected]
Brief review
Central tendency
spread
Brief review
X

X 
4
GPA
N
5
3
2
y = 0.3037x + 1.8757
R² = 0.6736
1
0
0
2
4
6
Mean hours of studying/week
8
Inferential statistics


Descriptive statistics are useful for describing data.
But, they will not tell us whether the difference
between 2 means is due to chance, or significant.
A
difference may simply be due to sampling error and
may not be reproducible.
 To test whether we are observing a true effect of our
treatment on behavior we need inferential statistics,
such as:
 T-test
 ANOVA
 Correlation/Regression.
CORRELATION
Associations are important



We all try to make connections between events in the
world.
This is an important task our brain accomplishes by
way of forming associations.
These associations are the basis of many behaviors
that ensure our survival.
 E.g.
coming close to someone with illness increases our
chances of getting ill.
 If you touch a stove you will burn yourself.
 Prediction of your partner’s mood state when certain
events occur.
Correlation



Correlation refers to whether the relationship between
2 variables is positive, negative or zero.
The degree of association between two variables can
be statistically measured.
The Pearson product-moment correlation coefficient.
 The
Pearson r is a statistic that quantifies the extent to
which two variables X and Y are associated, and whether
their association is positive, negative or zero.
 It assesses the degree to which X and Y vary together
(covary).
 Linear relatedness between X and Y.
Naming Conventions

Correlations are based on pairs of variables, X and
Y.
 The
X variables is the independent (predictor) variable.
 The Y variable is the dependent (criterion) variable.
 Changes in X predict changes in Y.

A word of caution: correlation does not imply
causation.
 More
on this later.
An example: extraversion

Extraverts are individuals who show sociable
characteristics:


Introverts are on the other end of this continuum:


They are generally shy and less sociable, especially in novel
settings.
We are interested in developing a new scale of
extraversions.


They are outgoing in their demeanor, are drawn to new
people, seek new experiences.
In it, we ask our participants how they would react
invarioussocial situations.
How could we test the validity of our measure?
An example: extraversion



1-week after completion of questionnaire,
participants take part in a staged social interaction:
a party.
We measure how many people participants choose
to interact with.
We expect a certain relationship between our
personality scores and observed social behavior:
 We
expect a positive correlation.
 This indicates that our inventory is predictive of real
world interactions.
Types of relationships

Positive relationship = as the value of X increases
(decreases), the corresponding value of Y increases
(decreases).
X

and Y values change in the same direction.
Negative relationship = as the value of X increases
(decreases), the corresponding value of Y
decreases (increases).
X
and Y values change in opposite directions.
Types of relationships
Positive correlation (+)
12
10
Interaction Behavior

8
6
4
y = 0.3861x + 0.9774
R² = 0.8521
2
0
0
5
10
15
Extraversion Score
20
25
Types of relationships
Negative correlation (-)
16
14
Type II diabetes index

y = -0.6894x + 13.58
R² = 0.8526
12
10
8
6
4
2
0
-2
0
5
10
15
Hours exercise / week
20
25
Types of relationships

Zero correlation (0)
Strength of correlation: r varies between -1 and +1
1.2
y = -0.0472x + 0.5368
R² = 0.0017
1
Y Criterion

0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
X Predictor
0.8
1
Calculating r


The sums of squares is integral to a vast majority of
inferential statistics tests.
The Pearson r is no exception:
r
 ( X  X )(Y  Y )
 ( X  X )  (Y  Y )
Sums of squares for X
2
2
Sums of squares for Y
The cool thing is:

Numerator is referred to as covariance:



Another type of SS indicating how X and Y vary together.
Denominator: SS indicating how X and Y vary
separately.
You can do this now!

You learned all the steps yesterday.

Here is the equation in simplified form:
( X  X )(Y  Y )

r
SSx  SSy

Let’s try it: data set Extroversion at webpage.
Magnitude of r







> +/-0.8 – Very strong
0.6 – 0.79 - Strong
0.4 – 0.59 - Moderate
0.2 – 0.39 - Weak
< 0.19 - Very weak
Coefficient of determination = r2
Indicates the proportion of change in one variable
that can be accounted for by another variable.
A note on causality


Correlation does not imply causation!
Example: Study on contraceptive use in Taiwan.
 Researchers
found that the single best predictor of
contraceptive use was number of electrical appliances
owned.
 The third variable problem: Socioeconomic status and
education level leads to more income that can be spent on
toasters and such.

So there are three problems in correlational research:
 We
X
do not know the direction of the effect:
can cause a change in Y
 Y can cause a change in X
 A third variable can cause a change in both X and Y.
HYPOTHESIS TESTING
Hypothesis testing

Hypothesis testing is the process by which decisions are
made concerning the values of parameters.
The decision to be made is whether our results are due to small
chance fluctuations alone, or whether differences are in fact
present.
 E.g. Does the stress of divorce lead to behavioral problems in
children?

A sample of 5 children from divorced households was drawn and
tested on the Achenbach Youth Self-Report scale.
 A mean score of 56 was obtained from our sample.
 We know the population mean is 50.


Is this difference large enough to conclude that stress from
divorce produces behavioral problems?

What is the likelihood of obtaining a score of 56 if our sample was
drawn from a normal population?
The sampling distribution of a statistic


Inferential statistics rely heavily on sampling
distributions.
A sampling distribution is the distribution of a statistic
over repeated sampling from a population.
 They
tell us what values to expect for a particular
statistics under predefined conditions.
 Because we often work with means, we focus here on the
sampling distribution of the mean.

The sampling distribution of the mean is the
distribution of sample means over repeated sampling
from one population.
What is the sampling distribution of the
mean?


Typically, sampling distributions are derived mathematically.
Conceptually, it is the distribution of means of an infinite
number of random samples (with a specified N) drawn
under specified conditions


Conditions: population mean = 50, standard deviation = 10.
Samples are repeatedly drawn from the same population
and a distribution is plotted.
For each sample, the mean is calculated and recorded.
 After doing this a large number of times, we will be able to draw
the sampling distribution of the mean.
 We will find that this distribution is Gaussian (bell shaped) and
normally distributed.
 We will also find that most observation cluster around the
population mean.

Illustration of the sampling distribution
Repeated
sampling
Plot each
sample
Sample
Population
μ=50
σ=10
Sampling distribution of the mean

From the sampling distribution of the mean, we can
tell that certain values are very likely, while others
are highly unlikely.
 E.g.
most values cluster around the population mean.
 Such
values are likely to be obtained in any given sample.
 Others
 These
are at the tails of the distribution.
are very unlikely sample means in any given sample.
 So,
the sampling distribution tells us what values to expect
in our sample IF we in fact obtained a sample from this
population.
 We
can even assign an exact probability to obtaining a
certain sample mean if we sampled from a given population.
Hypothesis testing revisited


Now that we know about sampling distributions, we can
do some hypothesis testing.
How can we test what the likelihood of obtaining a
sample mean of 56 is, if this sample was drawn from a
population with a mean of 50?
56 surely is larger than 50, but is this difference significant?
 If we did in fact sample from a population with a mean of
50, the probability of sampling a mean of 56 is 0.09.

About 10% of the time would we get a sample mean of 56.
 Likely same population.


How about a sample mean of 62?
The probability of that is equal to 0.0037
 Likely different population.

Hypothesis testing steps

1. Set up a research hypothesis.
 Children
under the stress of divorce are more likely to
exhibit behavior problems than normal children.
 The sample was not drawn from a population of normal
children.

2. Set up a null hypothesis (Ho).
 Children
under the stress of divorce are equally likely to
exhibit behavioral problems as normal children.
 The sample was drawn from a population of “normal”
children, whose parameters we know.

3. Obtain a random sample of children under stress.
Hypothesis testing steps



4. Obtain the sampling distribution of the mean
assuming that the Ho is true.
5. Given the sampling distribution, calculate the
probability of obtaining a sample mean at least as
large as the observed value.
6. On the basis of this probability, reject or accept the
Ho.
 Accept
Ho = sample was drawn from normal population.
 Reject Ho = sample was drawn from particularly stressed
population.
Why test the null hypothesis

1. Fisher: “We can never prove something to be true,
but we can prove something to be false.”
 Observing
4000 cows with one head does not prove the
statement “Every cow has one head” right.
 Finding one cow with two heads, however, proves it wrong.

2. Practicality: In experiments, we typically want to
show that some value (treatment) is greater or smaller
than another (control).
 Research
 E.g.
hypotheses are typically not specific enough.
the population mean of children stressed by divorce could
be 62, 63, or greater or smaller than that. We do not know.
 Having a specific null hypothesis remedies this problem and
allows us to obtain a sampling distribution.
Sampling distributions of test statistics

Test statistics are the results of statistical tests, such
as:
 T-tests
(t), ANOVAs (F), correlations (r), etc.
 These tests are specific statistical procedures used to
infer the statistical significance of results.
 They have their own sampling distributions that are
obtained and used in the same way as the sampling
distribution of the mean.
 We
will never actually do this – they are inferred
mathematically.
Decision making (type I and II errors)


The decision we are making, no matter the statistic, is
whether to accept or reject Ho.
Once we know the conditional probability of obtaining
a sample under the null, we can make such a decision.
 Decision
to reject depends on significance level.
 If this probability is < 0.05 we reject Ho.
 α-level
= 0.05.
 This
means that we erroneously reject Ho 5% of the time
we conduct this experiment (Type I error).
 This
standard has been adapted as sufficiently unlikely by
behavioral scientists.
 But
we can also erroneously fail to reject the Ho (Type II
error).
THE T-TEST
Good things come from beer

The student’s t test.
 The
t-test was developed as a method to improve beer
quality.

There are 3 applications of t tests:
 1.
One-sample t test: Comparison of sample mean to
population mean.
 2. Independent-samples t test: Comparison of sample
means from control and experimental groups.
 3. Paired-samples t test: employed in repeated
measures designs to test effect of treatment.
Assumptions underlying the t test




1. The population the sample data are drawn from
is normally distributed.
2. Data are randomly sampled from a population
so that we can generalize back to population.
3. Data need to be interval/ratio scales so that we
can calculate the mean.
4. Samples need to have equal variances.
What the t test does


The t test assesses whether any measurable
difference exists between means.
Absence of difference is indicated by a t statistic of
0.
 Both
positive and negative t values are possible,
providing an indication of which mean is larger and
smaller.


How great is the difference between sample means
relative to the standard error?
Let us look at an equation.
T test

X1  X 2
t
error
The t test basically contrasts the difference between
2 means to the standard error.
 The
standard error is a measure of our sampling error.
 Based on the type of comparison we are making, the
error term changes.

So, the t test provides a numerical value of the
extent (ratio) to which the effect of our manipulation
exceeds the amount of error we can expect from
sampling.
The error term revisited



Error term represents a combined index of the
average deviation from the mean behavior within
each group.
Referred to as within-group variability.
This is explained by 2 sources of error:
 1.
random error (sampling error/individual
differences).
 2. experimental error is due to failures on the
experimenters part.


If the difference between
the means exceeds our
estimated sampling
(random) error, we conclude
that the difference is due to
our manipulation, not
chance.
If the difference does not
exceed estimated random
error, we conclude that
findings are due to change.
One sample t test


One-sample t tests are used to compare an
observed mean with a hypothesized value assumed
to represent the population.
Typically employed to test whether observed scores
deviated from an already established pattern.
X1  
t
sX
Degrees of freedom: df = N -1
Independent-sample t-test

Most common, because it allows us to verify whether a
difference between means exists.
 Did

our experimental manipulation have an effect?
This type of test is employed to analyze betweensubjects designs.
 Did
the independent variable create an observable change
in behavior measured by the dependent variable.
 Denominator is the standard error of the difference
between the means.
Degrees of freedom: df=
(N1 + N2) -2
X1  X 2
t
sX  X
1
2
Conceptual model of t test

Between groups variation = the difference of
interest
 E.g.

the difference between 2 means.
Within groups variation = summary of error sources.
 This
almost always involves the calculation of SS.
Let us do some statistics


Data set handout.
Problem 1:
 1.
Use Excel to conduct t test.
 2. Use SPSS to conduct t test.
 3. Plot results in Excel and SPSS.
OUR EXPERIMENT
What we tested



The effect of level of optimism on the efficacy of
simple mood induction.
What did we do?
How do we test this?
 Measure
we obtained:
 1.
Pre- and post-induction mood.
 2. LOT-R


What is that?
What we want to test:
 1.
Was the mood induction effective?
 2. Did personality have an effect on efficacy?