Download Intro to Inferential Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Inferential Statistics
Comparison of Group Means
Normal curve
Characteristic curve
Bell curve
Normal distribution
Gaussian distribution
Continuous probability distribution
0.13%
Standard Deviations (z scores)
Cumulative Percentages
2.14% 13.59%
-3
-2
.01%
Percentile Equivalents
2.3%
1
5
10
34.13%
34.13%
-1
0
15.9%
50.0%
20
30
0.13%
13.59% 2.14%
+1
+2
84.1%
97.7%
40 60
80
50 70
90 95
+3
99.9
%
99
IQ Scores
55
70
85
100
115
130
145
SAT Scores (sd 209)
400
608
817
1026
1235
1444
1600
Some Definitions
• A population is the largest group to which you wish
to generalize.
• Summaries of the measures of the characteristics of
individuals in a population are called parameters.
• Groups selected from the population for study are
called a sample.
• Summaries of the measures of the characteristics of
individuals in a sample are called statistics.
Normal curves:
1. Lots of measures in the
middle—fewer toward the
extremes (bell shaped).
2. Symmetrical
3. Mean, median, and mode
are the same
Often distributions aren’t normal.
Negatively skewed
distribution
• Distributions where the bulk of the responses are weighted in one
direction are called skewed distributions.
• If the majority of the data are toward lower values, the distribution is
negatively skewed.
Often distributions aren’t normal.
Positively skewed
distribution
• Distributions where the bulk of the responses are weighted in one
direction are called skewed distributions.
• If the majority of the data are toward lower values, the distribution is
negatively skewed.
• If the majority of the data are toward higher values, the distribution is
positively skewed.
Median
Skewed distributions
behave badly
Mean
Mode
• The mean, median, and mode are not identical.
• The distribution is not symmetrical.
• The relationship between the standard deviation and the percent of the
responses under the curve is not a constant.
Kurtosis
• The characteristic of a distribution to be sharper (or flatter) than
expected is called kurtosis.
Why Worry About the Shape
• Inferential statistics are based on the assumption
of normal distribution.
• Since we are using samples of populations some
variation from normal would be expected because
of sample size.
• The best way to decide if distributions are not
normal (problematic) is to produce a histogram
and look at it.
Inferences about the relationship of a sample to a population
are based on normal distributions.
0.13%
0.13%
2.14% 13.59%
Standard Deviations
-3
-2
34.13%
-1
34.13%
0
+1
13.59% 2.14%
+2
+3
Population Curve
Mean Group of 30
Population Mean
We already know
how to compute
percentile rank for
an individual.
What would happen
if I looked at a group
of 30 instead of an
individual?
Frequency Distribution for Groups of 30
Population Mean
Computing the means of groups of 30
randomly selected from the population.
Theoretically, a new frequency
distribution will appear
representing the means of
groups of 30.
Sampling Distribution of the Mean
Population Mean
Population Distribution
Sampling Distribution Mean
(n = 30)
Sampling Distribution for
random groups of 30
Sample Sizes
n=1000
If the sample size was really large
then it would be unlikely to get
sample group means that were very
far from the population mean.
Population Distribution
n=100
n=30
If the sample size was really
small then sample group means
would be distributed similarly
to the population.
As the sample size increases the
Sampling Distribution of the Mean gets narrower.
Standard Error of the Mean
n=1000
n=100
n=30
Population Distribution
The standard error is the
standard deviation of the
population adjusted for
sample size.
The standard deviation of a Sampling Distribution is called the Standard Error.
For any distribution the larger the sample size the
smaller the numerical standard error.
The standard error of the
sampling distribution is a
function of the sample size.
Sampling Distribution
n=30
test scale
1 standard error
The standard error of the
sampling distribution is a
function of the sample size.
Sampling Distribution
n=100
test scale
1 standard error
Standard Error
Sampling Distribution Mean
1 Standard Error
Sampling Distribution
(n = 30)
2 Standard Errors
50.0
34.13
13.59
The relationship of standard error to a
sampling distribution is the same as
standard deviation to a normal distribution.
Standard Error
Sampling Distribution Mean
1 Standard Error
Sampling Distribution
(n = 30)
2 Standard Errors
50.0
34.13
13.59
Remember this is not a distribution of scores on the test.
It is a distribution of the means of
randomly selected groups of 30.
What do we have so far?
• Normal distributions are good because of
the relationship of the sd to the area under the curve.
• When a sampling distribution is built from a
population it is normally distributed.
• Standard error is the population standard deviation
adjusted for group size.
• The relation of the standard error to a sampling
distribution is the same as a standard deviation to a
population distribution.
Measuring Group Means Against the
Sampling Distribution
Sampling Distribution Mean
1 Standard Error
Sampling Distribution
(n = 30)
50.0
34.13
15.87% (.16)
Probability of higher mean
At any given Standard Error it is possible to compute the likelihood
that a higher or lower group mean could have occurred by chance
(remember the relationship of z-scores to percentile rank).
Measuring Group Means Against the
Sampling Distribution
Sampling Distribution Mean
Sampling Distribution
(n = 30)
2 Standard Errors
13.59
50.0
34.13
2.28% (.02)
Probability of higher mean
At any given Standard Error it is possible to compute the likelihood
that a higher or lower group mean could have occurred by chance
(remember the relationship of z scores to percentile rank).
Measuring Group Means Against the
Sampling Distribution
Sampling Distribution Mean
Sampling Distribution
(n = 30)
2 Standard Errors
13.59
50.0
34.13
2.28% (.02)
Probability of higher mean
Remember that each group mean that got into the
sampling distribution represents a group that was
randomly selected.
Probability
• How unlikely does the occurrence of a
group mean have to be before we would say
that it is so unlikely that it couldn’t have
happened by chance? It must have occurred
for some other reason.
Significance
Probability of
higher mean
If the probability that a given group mean would occur by
chance in a sampling distribution is very small then the
occurrence of that group mean is said to be significant.
Significance
Probability of
higher mean
To use the z-score analogy when the percentile ranking of a
group mean is really high (or really low) then it is significant. It
is significant because it is unlikely to occur randomly.
Significance
5% (.05)
1% (.01)
Most social science research declares that group means
occurring by chance less than 5% of the time are significant.
Most medical research uses 1% or much less.
• With a z-score we used a table to look up percentile rank. Since the
normal distribution is now a sampling distribution based on a specific
group size the z-table won’t work.
• A new table for each possible group size needs to be generated to test
the percentile rank of the group mean comparison.
• Fortunately the computer
does this for you.
Occurs randomly 5%
of the time (.05)
Occurs randomly
1% of the time
(.01)
Probability (p) that a given score or higher could have
appeared by chance is .01 (1% or 1 in 100 times),
substantially less than .05 (5 % or 5 in 100 times [1 in 20])
t-Critical
• The sampling distribution of the mean is a different shape
for every sample size (n).
• This point on the distribution where a given mean becomes
significant is called t-critical.
• In the old days we would compute a t-score (analogous to
a z-score) and then look up the significance point on a
table.
• Now you can read the p value directly to get the same
information.
Inquiry Science
• Mary wants to know if teaching science with
inquiry techniques will do better than more
traditional methods. She teaches a unit in her class
using inquiry science methods and then compares
the chapter end test scores of her students with
students in John’s class who have been taught the
same unit with more traditional methods.
• What is the probability that the group mean
differences could have appeared by chance?
Brown Study
Hypothesis Testing
Analysis based on normal
distributions
• Dependent variable—the measurement.
Outcome data that will be compared.
• Independent variable—how comparison
groups are identified. The intervention
variable.
• Hypothesis: A tentative explanation that accounts for
a set of facts and can be tested by further
investigation
• Problem statement: The purpose of this study is to
examine the impact of Direct Instruction reading
programs on district test scores.
– Hypothesis: Students who participate in direct instruction
reading programs will have higher scores than those who
do not.
Proof
• Gather information on the dependent variable (test scores) and
compare the scores based on the independent variable (which
scores come from students in which program: DI or not).
• What if most DI students were better but not all?
• What about all the students from whom you did not have
scores?
• Proof means for all cases. You cannot prove your hypothesis.
What happens if this is reversed?
• If your hypothesis says that there will be a difference,
could you test to see if there was not a difference?
• Null hypothesis: stating that no differences would be found
when you really are interested in finding them.
• If the hypothesis says that something is true then the null
hypothesis says that it is not true.
Hypothesis Testing
• Hypothesis: Kids with DI will have higher scores
than those from other programs.
• Null Hypothesis: There will be no difference
between kids with DI and those from other
programs.
• You gather evidence and decide if the evidence says
that direct instruction students had better scores.
• You show that the hypothesis is not not true.
• You reject the null hypothesis.
Using Null Hypotheses
• Use a sample (otherwise we would use
parameters)
• Stop talking about individuals and only
compare groups (mean and standard
deviation)
Hypothesis Testing
Mean of Kids
without DI
Mean of Kids
with DI
In statistical parlance we want to know if the two groups have come from
the same population (fail to reject null) or do they represent samples from
different populations (reject the null).
Hypothesis Testing
Mean of Kids
without DI
Mean of Kids
with DI
Sampling distribution of
the mean for the sample
size of the kids without DI
What is the probability that the group mean difference could
have appeared by chance?
• Measure a bunch of people and if there is little difference
among groups (the difference was likely to appear by chance)
you say, I have been unable to find evidence that differences
exist.
• You have been unable to reject the null hypothesis. You have
no evidence that your independent variable had any impact.
• If the differences among groups are large enough you say the
differences could not have happened by chance.
• You have been able to reject the null hypothesis. Your
independent variable becomes, for now, the best explanation
for the differences.
Why do this?
• Inferential statistics uses samples of populations to
determine if differences in group means could
occur by chance.
• Inferential statistics is not used to prove
hypotheses. It is used to demonstrate whether null
hypotheses are not true.
Think about this as steps:
1.
You want to know that something is true but you cannot
absolutely know because you can’t test all cases.
2.
Instead you focus on the opposite of what you want to know
is true (the null hypothesis).
3.
Start gathering evidence.
4.
If enough cases show that something is true then saying it was
not true is false.
5.
You have rejected the null hypothesis.
6.
For the time being, the thing you wanted to show is true (your
hypothesis) is the best explanation.
Ok, one more semantic step and we are done:
• What we are interested in knowing is whether something
made two groups different.
• The evidence that you gather to see if that is true are means
and standard deviations of groups.
• The null hypothesis always says there will be no difference
between the groups.
• If there is a very low probability that the difference in the
groups could occur by chance, that is the evidence that the
null hypothesis is wrong.
• Low probabilities (p values) show that the thing you wanted
to show (your hypothesis) is more likely to be true than not
true. You have rejected the null hypothesis.
Significance (Alpha Level)
5% (.05)
1% (.01)
Choosing the point at which we say that something is
significant is subjective. It is up to the researcher. Why let
this subjective sounding judgment enter the process?
Being Wrong
Test Group Mean
5% (.05)
• We say that occurring randomly less than 5% of the time is
really unlikely so it is not random. But, that statement
would be wrong 5% of the time.
• Type 1 Error: Saying it is not random when it was.
Being Wrong
Test Group Mean
5% (.05)
• We say that occurring randomly more than 5% of the time
is too likely so we say chance is the best explanation. But,
sometimes real differences occur even though they look
like chance.
• Type 2 Error: Saying it is random when it was not.
Reducing Being Wrong
5% (.05)
• Reduce Type 1 errors by lowering the alpha level or using
more conservative calculations.
• Reduce Type 2 errors by increasing the sample size.
• Reduce all errors by improving the study design (validity).
Excel Example
On the course resource site download:
Excel Files: Area Test
Compare the pre and post scores using a Paired t-Test