Download Sampling distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
RMTD 404
Lecture 3
Distributions and Probability
 From this point on, we are going to work extensively with
distributions and probabilities of events.
 Just about every situation that we deal with in statistics involves
estimating probabilities of events given what we know about a
distribution (e.g. coin).
 Most of the time those distributions are not known, so we make
assumptions about those distributions.
 Based on those assumptions, we’ll estimate the probability or
likelihood of observing a specific event or series of events.
 The primary distribution we have been discussing is the normal
distribution.
Distributions and Probability (recap)
 A normal distribution has the following characteristics:
•
unimodal: peaked in the middle.
•
symmetrical: the left and right sides of the distribution are mirror
images.
•
bell-shaped: probabilities taper in the tails of the distribution.
•
unlimited: the tails of the distribution extend to infinity in both
directions.
 The normal distribution is useful for several reasons:
•
It is a shape that we frequently see in the real world and
everywhere else
•
We know the probability density function for the normal curve.
•
Many of the statistics we deal with take a normally distributed
shape under repeated sampling.
•
Many statistical procedures have been developed that rely on an
assumption of normally distributed data.
Distributions and Probability (recap)
Distributions and Probability (recap)
 Recall that we can standardize a variable by linearly transforming it
to have a mean of 0 and standard deviation (and variance) of 1.
 When you standardize a normally distributed variable, the underlying
distribution (with a mean of 0 and variance of 1) is called a standard
normal curve, represented as N(0,1). The standard normal curve is
useful because it simplifies the interpretation of probabilities of
events—most tables of normal curve probabilities are based on the
standard normal curve.
 A “score” in a standard normal distribution is called a z score, and
we can compute z scores via the following transformation:
 We interpret a z score as the number of standard deviation units that a
particular element lies from the mean of the distribution. This is apparent
from the form of the linear transformation that we apply to obtain z scores.
Because of what we know about the standard normal curve, we can identify
important probabilities associated with particular z scores.
 P(z  0) = .50
 P(0  z  1) = .34
 P(-1  z  1) = .68
 P(z  1) = .5 + .34 = .84
 P(-2  z  2) = .9544
 P(-1.96  z  1.96) = .95
 P(z  -1.96 and z  1.96)
= 1 - P(-1.96  z  1.96)
= .05
Z Table
Although statistical software
packages routinely compute
the proportion of area under a
normal curve for you, it is
useful to learn how to read
tables of those values.
This table gives three
proportions associated with a
variety of z scores.
Z Table
Column 2
Column 3
Negative Values
Column 4
Z Table
As you can see, we can get the
area between any two z scores by
adding and subtracting areas that
cumulatively make up the area
we’re interested in.
For example, what area is
associated with each of the
following statements?
P(1 < z < 2) =
P(-1 < z < 2) =
P (z < -2 or z > 2.5) =
Other distributions we’ll use
• T Distribution
• F Distribution
• Chi-Square Distribution
Sampling Distributions and Hypothesis
Testing
• We are now going to begin discussing how to
use what we know about the distributional
properties of the statistics and parameters we
are interested
• We want to determine the likelihood that an
observed statistic came from some
hypothetical population and make judgments
based on this likelihood – this is basic
hypothesis testing
Sampling Distributions and Hypothesis
Testing
• Two important terms
– Sampling distribution is the distribution of the
value of a particular statistic, over a hypothetically
infinite repeated samples of equal size taken from
the same population
– Standard error is the standard deviation of the
sampling distribution
– We are often interested in the mean – so with this
information we want to know what the mean
would look like over an infinite number of
experiments
Sampling Distributions and Hypothesis
Testing
Very likely
Less likely
Unlikely
Sampling Distributions and Hypothesis
Testing

Steps for testing a hypothesis:
1.
We generate a research hypothesis (in words)—a theory-based
prediction. When written symbolically, the research hypothesis is
called the alternative hypothesis (a.k.a. H1 or Ha).
2.
We pretend that the data were chosen from a population with
known characteristics. That is, we create a null hypothesis (H0)—
one that, based on our theory, we believe to be incorrect.
3.
We gather data (e.g., randomly sample people, randomly assign
them to treatments, expose them to the treatments, and measure
their responses to the treatments).
4.
We compute the characteristics of the sampling distribution of the
statistic assuming that the null hypothesis is true. (e.g. µ=0 σ=1)
Sampling Distributions and Hypothesis
Testing
5.
6.
7.
–
We calculate the probability of obtaining a statistic as extreme as or
more extreme than the one observed, based on the sampling
distribution.
We decide whether the observed probability of that value (or a more
extreme one) is too remote to support our theory.
•
If the probability of obtaining the observed statistic is very
small, then we reject our null hypothesis and retain our
alternative hypothesis. That is, we retain our theory.
•
If the probability of obtaining the observed statistic is not
small, then we retain our null hypothesis and fail to support
our alternative hypothesis. That is, we fail to support our
theory (does not mean our theory is false!)
We make a substantive (word- and theory-driven) interpretation of
the statistical test.
*Knowing the shape of a sampling distribution allows us to
determine the probability of observing a particular test or sample
statistic under the assumption that the null hypothesis is true.
Example

GRE quantitative scores are believed to be normally distributed with a
mean of 500 and standard deviation of 100.

Suppose you have a student who participated in a new GRE preparation
course. Advocates of the course claim that its success will demonstrate
that quantitative GRE scores can be altered by targeted study.

The developers of the GRE claim that the course will not work because the
quantitative GRE test measures skills that must be developed over a long
period of study.

As you can see, there is controversy—one position suggests that the
student who has experienced the preparation course will perform better
than average and the other position suggests that the student’s
performance will be “typical.”
Example


To go about determining whether this student is better than “typical,” we state our
research hypothesis (in this example, from the perspective of the proponents of the
preparation course): This student has a higher than average quantitative test score.
Symbolically, we write the research hypothesis as the alternative hypothesis:


Ha: μX > μ 0 or μ test prep > μ typical or μ test prep > 500
(The population from which this observation came has a mean greater than the typical
mean of 500).

Then we state the converse of the alternative hypothesis as our null hypothesis:


H0: μ X  μ 0 or μ test prep < μ typical or μ test prep < 500
(i.e., This student has “typical” quantitative skills.)

Note that the alternative and null hypotheses refer to parameters rather than
statistics.
Example
Here’s a picture of our decision-making framework. Note that we need to
identify only one point on the GRE scale where we believe that the possibility is
too remote to be reasonable—a value that is too high to be believable. What
value would you choose?
Too remote to
be plausible
• Instead, maybe this score is part of another
population
Example
 Suppose that we record the student’s GRE score, and it
equals 740. This observation seems to be more consistent
with the claims of the course advocates than the claims of
the test developers.
 Now the question now becomes: Under the H0 assumption
that this student is typical, how unusual is a score of 740 (or
greater) on the quantitative section of the GRE?
 That is, how unlikely is a score as or more extreme than
740? As specified in our decision making framework, we
want to talk about the absolute magnitude of the score,
relative to the population mean, by considering only the
upper tail of the null distribution.
Example
• We can use z-scores to estimate the
proability
P(x ≥ 740)
P(z ≥ 2.4) = .01
Sampling Distributions and Hypothesis
Testing – Rejecting the Null Hypothesis
 Hence, we would observe a score as or more remote than 740 less
than 1% of the time in the population of “typical” GRE test takers.
 We refer to this probability as a p-value -- the probability of obtaining
a score as extreme as or more extreme than the one observed
under the assumption of the null hypothesis.
 If we believe that this score is too improbable to have occurred by
chance, then we would reject our null hypothesis and retain our
alternative and research hypotheses, concluding that this student is
not typical.
 If we do not believe that this is too improbable to have occurred by
chance, then we would retain our null hypothesis.
 *Typically, we don’t conclude that our null hypothesis is true, we
simply conclude that we don’t have sufficient evidence to support our
research hypothesis.
Sampling Distributions and Hypothesis
Testing – Rejection Regions and Critical
Values
 Two points are important:

First, our decision making criteria is somewhat arbitrary—
different people might use different criteria to define
“improbable.”

Second, because we are making retain/reject decisions based
on probabilities, we might be making a mistake—we could
reject the null hypothesis when it is indeed true.
 Researchers have adopted the convention that observations that
could occur less than 5% of the time under the null hypothesis
are improbable enough to reject the null hypothesis.
 Other less common levels are 1% (i.e., a stricter rule, because it
requires a more unusual result to “reject”) and 10% (a more lenient
rule because it requires a less unusual result to “reject”).
Sampling Distributions and Hypothesis
Testing – Rejection Regions and Critical
Values
 This rejection level (a.k.a. significance level) indicates how
unlikely an event must be before we reject the null hypothesis.
 So, by the conventional standard, the probability (p-value) must be
.05 or less.
 Two terms that are related to each other:
 Rejection region: The area(s) under the sampling distribution
where events are unlikely enough to warrant rejecting the null
hypothesis;
 Critical value: The raw score associated with the boundary of the
rejection region.
 In the GRE score example, the critical value equals 664.



CV = X where P(z > Zx) < .05
CV = X where z =1.64
CV=100(1.64)+500=664
Sampling Distributions and Hypothesis
Testing – Rejection Regions and Critical
Values
•
•
In our case, H0 is μtest prep = 500. The critical value for the extreme areas
under the curve is 664.
Because our observed GRE score of 740 falls in the rejection region (or
outside of the critical value), we reject the null hypothesis and conclude that
the alternative hypothesis is true.
CV = 664
Retain null
Reject null
x = 740
Recap
•
So far, we’ve introduced concepts that allow us to test the null
hypothesis in two ways.
1. Compute critical values (by converting the relevant z score or
scores associated with a to the raw score scale) and compare
the observed statistic to the critical value(s).
– If the observed statistic is more extreme than the critical value
(s), then reject the null hypothesis.
2. Compute the p-value of the observed raw score (by converting
the observed raw score to a z score and finding the probability
of that z score in the normal curve table) and compare the pvalue to the chosen α.
– If the p-value is smaller than the chosen a, then reject the null
hypothesis
Errors
•
•
•
Since α equals the probability of incorrectly rejecting the null hypothesis,
(1- α) equals the probability of correctly retaining the null hypothesis.
In our example, we would correctly retain the null hypothesis 1-.05 = .95 or
95% of the time.
*The level α corresponds to the critical value (here 664) and represents the
probability of rejecting a true null hypothesis.
CV = 664
1-alpha
p
alpha
x = 740
Errors
•
•
•
Now consider this figure, which contains an arbitrarily-chosen alternative
distribution (shaded). This is one of many possible distributions that could
have generated the observed score.
When we retain the null hypothesis, we can make another type of error
when we retain the null hypothesis, if this alternative is true.
In this example, we may incorrectly reject the alternative distribution and
retain the null distribution with a very high probability.
Beta
This type of error is called
a Type II error and is
represented as the beta
level (β) of the hypothesis
test.
β represents the
probability of incorrectly
rejecting a true alternative
distribution; incorrectly
retaining the null
Errors
• Recall that (1-α) represents the probability of correctly
retaining the null hypothesis. On the other hand, (1-β)
represents the probability of correctly rejecting the null
hypothesis.
• This probability is given a special name, statistical power or
simply power.
1-Beta
Power only
applies when H0 is
false. That is,
when H0 is true,
we cannot
correctly reject it!
Summary of Errors
 The table below summarizes the nature of statistical errors and the
corresponding symbols.
Truth
Decision
Ho True
Ho False
Reject Ho
Type I error (a)
Power (1-b)
Retain Ho
Correct decision (1-a) Type II error (b)
 However, also realize we will typically NOT know what the “Truth” is—if we
did, we would not need to use statistics in our decision-making.
 Hence, estimating statistical power requires us to make a lot of
assumptions.
One and two-tailed tests
•
•
Our GRE example considered only one tail of the null distribution as fair game
for rejecting the null hypothesis. That is, the observed score could have been
only greater than the population mean.
One-tailed (a.k.a. directional) hypothesis test allows us to focus all of your
attention on differences in one tail of the null distribution.
Your null hypothesis would state
that the parameter you are
interested in is equal to or more
extreme than some value
(e.g., H0: μX  0 or H0: μX  0,
depending on the expected
direction), and your alternative
hypothesis would state that the
parameter is greater than or less
than that value (e.g., H1: μX > 0
or H1: μX < 0, respectively).
irrelevant
improbable
One and two-tailed tests
 If you cannot confidently predict the direction of the expected
difference, you should focus your attention on both tails of
the null distribution. In this case, you would perform a twotailed (a.k.a non-directional) test.
 Your null hypothesis would state that the parameter you are
interested in equals some value (e.g., H0: μX = 0), and your
alternative hypothesis would state that the parameter is
simply not equal to that value (e.g., H1: μX  0).
 A two-tailed test would be appropriate either when
(1) no theory exists for making a prediction about the direction
of
observed differences, or
(2) two competing theories predict the opposite outcomes.
 *Many researchers use two-tailed tests even though they are
seldom warranted.
One and two-tailed tests
• When you choose a two-tailed test, you choose to divide your
Type I error rate (a) into both tails of the null distribution. As
a result, you choose critical values for rejecting the null
hypothesis that define the most extreme a/2 proportion in
each tail.
irrelevant
improbable
One and two-tailed tests
•
•
•
By using a one-tailed hypothesis test, you require a less extreme critical
value—all of a lies in a single tail of the distribution.
Hence, when α = .05 in a one-tailed (directional) test, the 5% of the null
distribution that constitutes the rejection region lies in the single tail that is
relevant to the hypothesis test.
On the other hand, in a two-tailed (non-directional) test, the 5% of the null
distribution that constitutes the rejection region is divided into each tail
(2.5% each).
alpha
alpha/2
alpha/2