Download parametric statistics version 2[1].

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Regression toward the mean wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Parametric
statistics
922
Outline
Measuring the accuracy of the mean
 Practical notes for practice
 Inferential statistics

 T-test
 ANOVA
Measuring the accuracy of the
mean
The mean is the simplest statistical model
that we use
 This statistic predicts the likely score of a
person
 The mean is a summry statistic

Measuring the accuracy of the
mean

The model we choose (mean/ median /
mode) should represent the state in the
real world
 Does
the model represent the world
precisely?
 The mean is a prefect representation only if
all the scores we collect are the same as the
mean.
Mean

When the mean is
a perfect fit: there
is no difference
between the mean
and each data
point
Child
Score
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
10
Mean
80/8=10
Mean


Usually, there are
differences between
the mean and the raw
scores
If the mean is
representative of the
data these differences
are small.
Child
Score
1
10
2
9
3
8
4
12
5
8
6
11
7
10
8
12
Mean
80/8=10
Deviation

The differences
between the model
prediction (=mean)
and each raw
score is the
deviation
Child
Score
Mean
1
10
10
2
9
10
3
8
10
4
12
10
5
8
10
6
11
10
7
10
10
8
12
10
Mean
10
Deviation
Deviation
Compute the
deviation of each
score from the
mean
 Measure the
overall deviation
(sum)

Deviation
Child
Score
Mean
1
10
10
0
2
9
10
1
3
8
10
2
4
12
10
-2
5
8
10
2
6
11
10
-1
7
10
10
0
8
12
10
-2
Mean
10
Sum 0
Deviation
Raw score
Mean
Deviation
10
10
0
0
9
10
1
1
8
10
2
4
12
10
-2
4
8
10
2
4
11
10
-1
1
10
10
0
0
12
10
-2
0
4
Sum
Squared dev.
18
Deviation


Sum of squared
deviations (also called
the sum of squared
errors) is a good
measure of the
accuracy of the mean
Except that it gets
bigger with more
scores
Deviation
Squared dev.
0
0
1
1
2
4
-2
4
2
4
-1
1
0
0
-2
0
4
18
Variance
Divide sum of squared deviations
by the number of scores minus 1

Sum of
square
deviations
Number of
scores (N)
N-1
Variance
Standard
deviation
18
8
7
18/7=2.57
2.57=1.6


We can compare variance across samples
Square root of variance is standard deviation
Accuracy of the mean
Sum of squared deviations (sum of
squared errors), variance and standard
deviation all measure the same thing:
variability of the data
 Standard deviation (SD) measures how
well the mean represents the data: small
SD indicate data points close to the mean

Standard Deviation (SD)
SD close to the mean
Mean
SD
Sentence 1 7
0.2
Sentence 2 5
0.5
Sentence 3 2
0.1
Sentence 4 2
0.5
Sentence 5 5
0.2
Sentence 6 6
0.2
Sentence 7 4
0.1
Sentence 8 6
0.2
Standard Deviation (SD)
SD far from the mean
Mean
SD
Sentence 1 7
1.5
Sentence 2 5
2.5
Sentence 3 2
1.5
Sentence 4 2
2.5
Sentence 5 5
3
Sentence 6 6
1.5
Sentence 7 4
2.5
Sentence 8 6
3.5
Why use number of scores
minus 1?

We are using a sample to estimate the
variance in the population
Population?
Sample-population
The intended population of
psycholinguistic research can be all
people / all children aged 3 / etc.
 Actually, we collect data only from a
sample of the population we are interested
in.
 We use the sample to make a guess about
the linguistic behavior of the relevant
population.

Sample - population
Size of the sample
 The mean as a model is resistant to
sampling variation: different samples from
the same populations usually have a
similar mean

Why use number of scores
minus 1?

We are using a sample to estimate the
variance in the population
 Population?
Why use number of scores
minus 1?
We are using a sample to estimate the
variance in the population
 Variance in the sample: observations can
vary (5, 6, 2, 9, 3) mean=5
 But: if we assume that the sample mean is
the same as the population (mean=5)

Why use number of scores
minus 1?
For the next sample, not all observations
are free to vary.
 For a sample of (5, 7, 1, 8, ?) we already
need to assume that the mean is 5. (?=4)

Why use number of scores
minus 1?
This does not mean we fix the value of the
observation, but simply that for various
statistics we have to calculate the number
of observations that are free to vary.
 This number is called: Degrees of freedom
and it must be one less than the sample
size (N-1).

To summarize

Mean  represents the sample

Sample  represents population
Many samples - population
Theoretically, if we take several samples
from the same population
 Each sample will have its own Mean and
SD
 If the samples are taken from the same
population, they are expected to be
reasonably similar.

Many samples
mean
Mean
Frequency
Population
10
8
1
Sample 1
9
9
2
Sample 2
11
10
3
Sample 3
10
11
2
Sample 4
12
12
1
Sample 5
9
Sample 6
10
Sample 7
11
Sample 8
8
Sampling distribution
Mean
Frequency
8
1
9
2
10
3
11
2
12
1
Average of all sample means will
give the value for the population
mean
Sampling distribution



How accurate is a
sample likely to be?
Calculate the SD of
the sampling
distribution
This is called the
standard error of the
mean (SE)
Standard Error (SE)


We do not collect
many samples, but
compute SE
SE= SD/N
Mean
SD
Sentence 1 7
0.2
Sentence 2 5
0.5
Sentence 3 2
0.1
Sentence 4 2
0.5
Sentence 5 5
0.2
Sentence 6 6
0.2
Sentence 7 4
0.1
Sentence 8 6
0.2
Standard Error (SE)
Mean
SD
SE
7
0.2
0.07
5
0.5
0.18
2
0.1
0.04
2
0.5
0.18
5
0.2
0.07
6
0.2
0.07
4
0.1
0.04
6
0.2
0.07
Why are the samples different?

Source of the variance:
 Different
population or
 Sampling error (random effect, can be
calculated)

Can we take results from the sample to
make generalizations about the
population?
Accuracy of sample means
Calculate the boundaries within which
most sample means will fall.
 Looking at the means of 100 samples, the
lowest mean is 2, and the highest mean is
7.
 The mean of any additional sample will fall
within these limits.

Confidence Interval
The limits within which a certain percent
(typically we look at 95%) of sample
means will fall.
 If we collect 100 samples, 95 of them will
have a mean within the confidence interval

PRACTICE

Experiment:
 Compare
children with specific language
impairment (SLI) and children who are
typically developing (TD).

Hypothesis:
 effect
of word order SVO vs. VSO

Task : repeat a sentence
V CORRECT?

SVO
YES/NO
VSO
YES/NO
30 Children, each was presented with 10
sentences (5 SVO, 5 VSO).
SLI
age
gender
SVO1
SVO2
VSO1
VSO2
Child 1
1
4;02
1
1
0
1
0
Child 2
0
3;04
1
0
1
1
0
Child 3
1
4;06
1
0
0
1
0
Child 4
0
3;11
0
1
1
1
0
Compute:
Mean?
Frequency?
Mean
SLI
SVO
VSO
TD
SVO
VSO
SD
Basic analysis with Excel
Descriptive statistics: Sum, Average,
Percentage
 Drawing graphs
 Parametric statistics: Mean, Standard
Deviation, t-test
 Smart sheets: COUNTIF

INFERENTIAL STATISTICS
Statistical hypothesis
Hypothesis for the effect of a linguistic
phenomenon
 Findings from a sample
 Do the findings support the hypothesis?

 Do

they show a linguistic effect?
To answer this, we consider a null
hypothesis
The null hypothesis (H0)
H0= the experiment has no effect
 The purpose of statistical inference is to
reject this hypothesis
 H1 = the mean of the population affected
by the experiment is different from the
general population

Rejecting the null hypothesis
Compare the mean of the sample to two
populations (under H1 or H0).
 We cannot show the sample belongs to
the population under H1.
 All we can do is compare the sample to
population under H0 and consider the
likelihood that it belongs to it.

Rejecting the null hypothesis
Check if our sample belongs to the
population under H1 or H0
 Consider confidence interval, SE
 Compare means
 Compare varience

Level of significance (alpha)
Is the difference between the sample and
the population big enough to reject H0?
 Determine a critical value (alpha) as
criterion for including the sample in the
population
  < 0.05

Parametric statistics
Variables are on interval scale (at least)
 Compute means of raw grade (several
items to one condition)

 t-tests
 ANOVA
 ANACOVA
t-Tests
t-tests are used in order to compare two
samples and decide whether they are
significantly different or not.
 The t-tests represent the difference
between the means of the two samples
which takes into consideration the degree
to which these means could differ by
chance.

t-Tests
The degree to which the means could
differ by chance is the Standard Error (SE)
 We do not calculate the t-value ourselves,
but we use it to determine the effect of the
experiment on the sample.
 How do we know if the t-value is
significant (p<0.05)?

t-Tests
Every sample belongs to a different tcurve. This depends on the degree of
freedom (df= N-1.)
 Check the table of values called the
Student's t-distributions, which is based on
df determines. We mark the df on t.
 t(32)=1.15

Types of t-tests:

Matched/Paired/Dependent t-test - compares
two sets of scores for the same sample, or
scores of matched samples. Sample can be of
equal variance or unequal variance.

Independent (two sample) t-test - compares
two different samples (on the same test).
Samples can be of equal size or unequal size, of
equal variance or unequal variance. The df for
independent t-test is Nx-1+Ny-1.
ANOVA - Comparing Means of
more than two samples
ANOVA - Analysis of variance. It considers
within group variability as well as random
between group variability and nonrandom
between group variability.
 The type of ANOVA depends on the
research design - the number of
independent and dependent variables.





One way ANOVA - one independent variable,
one dependent variable with more than two
values
Two-Way Independent ANOVA - Two
independent variables, with different participants
in all groups (each person contributes one
score).
Two-Way Repeated Measures ANOVA Everything comes from the same participants
Two-Way Mixed ANOVA - one independent
variable is tested on the same participants, the
other on different participants


F-score is the product of dividing the between
group variance (which takes into consideration
random and non-random variance) by the within
group variance .
For every F-score we can determine the
significance based on the dfs (between groups
and within groups).
Post hoc comparisons





Post-hoc comparisons are used to find out
where the differences which yielded significant
come from.
Tukey Test - used when the sample sizes are
the same.
Scheffe Test - used with unequal sample sizes,
but could be used with equal sample sizes
Bonferroni correction - when there are multiple
comparisons the level of significance is divided
by the number of test to avoid family-wise
errors .
These tests can also be used to test unplanned
comparisons
ANCOVA - Analysis of
covariance
Allows introducing covariates (factors
other than the experimental design which
might influence the results) into the
ANOVA.