Download Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Central limit theorem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Week 8, Part I
Using the Standard Normal Curve
The Standard Normal
Distribution


We have learned about the standard normal
distribution: a normal curve with a mean of zero and
a standard deviation of 1.
How can you convert any normal distribution into a
standard normal distribution?
The Standard Normal
Distribution

To convert any normal distribution into a standard
normal distribution, convert the scores to z-scores:



First, subtract the mean from each score.
Then, divide by the standard error.
If you perform this procedure on every score in a normal
distribution, the result is a standard normal distribution.
The Standard Normal
Distribution

What is so great about a standard normal
distribution?
The Standard Normal
Distribution

The standard normal distribution can be
easily used to determine the probability
associated with a particular range of values
on a normally distributed variable.
The Standard Normal
Distribution

How does it work?



By “partitioning” the area under the normal curve into sub-areas
corresponding to values within and outside the range of values.
The area within the range of values corresponds to the probability
of having values within that range.
The area outside the range of values corresponds to the probability
of having values outside that range.
The Standard Normal
Distribution

For example, the shaded area below
corresponds to the probability of obtaining a
value that is from 0 to 1 standard deviations
above the mean.
The Standard Normal
Distribution

That probability, it turns out, is .3413.


How do I know?
I look it up in a Normal Distribution Table
(Statistical Table B in the back of the book!)
The Standard Normal
Distribution

What is the probability of having a value from
0 to 1.96 standard deviations above the
mean?
The Standard Normal
Distribution


Answer: .4750
Next question: What is the probability of
having a value that is more than 1.96
standard deviations above the mean?
The Standard Normal
Distribution

There are two ways to get this answer.

First, since we just saw that the probability of a z-score between 0 and +1.96 is .4750
and we know that the overall probability of a z-score that is greater than 0 is .5000,
we can answer the question using subtraction:


.5000 - .4750 = .0250.
Second, we can look up the answer in the Normal Distribution Table provided by the
book. The column “C” entry next to 1.96 is .0250.
The Standard Normal
Distribution

One important extension of this “partitioning” procedure is that it
applies equally to areas to the right of the mean (i.e. to ranges of
scores below the mean.)


What is the probability of having a value between 0 and 1.96 standard
deviations below the mean (i.e., the probability of having a z-score between
0 and -1.96)?
Answer: the same as the probability of having a z-score between 0 and
+1.96: .475.
The Standard Normal
Distribution

A second important extension of this “partitioning” procedure
is that it applies to ranges that include the mean. Of
particular interest, we can quickly obtain the probability that
an observation falls within a certain distance Z from the
mean. To do this, we simply add: p(from 0 to Z)+p(from –Z
to 0) = 2*p(from 0 to Z).
The Standard Normal
Distribution

What is the probability that an observation falls within 1 standard
deviation of the mean?



First, translate the question: what is the probability of having a z-score from 1.00 to +1.00?
Answer: 2*P(0<z<+1.00) = 2*.3413 = .6826.
This is the familiar result that 68% of the observations in a normally
distributed population fall within one standard deviation of the mean.
The Standard Normal
Distribution

Given the answer to the previous question (.6826),
you should be able to say what is the probability that
an observations falls more than one standard
deviation from the mean. Well?
The Standard Normal
Distribution

Since the area under the normal curve equals 1 and
the probability of having a z-score from -1 to
+1=.6826, the probability of having |z|>1.00 is
given by 1 - .6826 = .3174.
The Standard Normal
Distribution

Let’s take another example. What is the probability
that an observation from a normal distribution falls
1.96 standard deviations from the mean?
The Standard Normal
Distribution

First, translate the question: What is the probability
of having a z-score between -1.96 and +1.96?

P(-1.96<z<+1.96) = ?
The Standard Normal
Distribution

Then, solve the problem:


P(-1.96<z<+1.96) = 2*P(0<z<+1.96)=2*P(-1.96<z<0)
= 2*.4750 = .9500.
There is a “95% probability” that an observation from a
standard normal distribution falls within 1.96 standard
deviations of the mean.

(That is why 1.96 is such a special number.)
The Standard Normal
Distribution

Based on what we have done so far, what is the
probability that an observation from a normal
distribution falls more than 1.96 standard deviations
from the mean?
The Standard Normal
Distribution

Based on what we have done so far, what is the
probability that an observation from a normal
distribution falls more than 1.96 standard deviations
from the mean?
The Standard Normal
Distribution

Another extension: even though the table only lists probabilities for
ranges bounded by either 0 or infinity, you can use subtraction to obtain
probabilities associated with ranges not bound by either of these.


What is the probability that an observation from a normal distribution falls
between 1 and 1.96 standard deviations above the mean?
Translation: p(1.00<z<1.96) = ?
The Standard Normal
Distribution

p(1.00<z<1.96) = p(0<z<1.96) – p(0<z<1.00)
= .4750 - .3413 = .1337.
The Standard Normal
Distribution

This can be seen graphically:
Partitioning in reverse:
finding critical values of z


So far we have seen that you can easily
identify probabilities under the normal
curve that correspond to ranges of zscores by looking them up in the
Normal Distribution Table.
We can also perform the same
procedure in reverse.
Partitioning in reverse:
finding critical values of z

Rather than start with a range of z
scores and determine an associated
probability, we can start with a given
probability and determine the z-scores
that define the associated range. We
call these z-scores “critical values” of z,
denoted by Z.
Partitioning in reverse:
finding critical values of z

For example, we might have the following
question: what is the value of z that defines
a distance above the mean in which only
1% of the observations fall?


How many standard deviations above the mean
do only 1% of observations fall?
What is the value, Z, of z for which p(Z<z)=.01?
Partitioning in reverse:
finding critical values of z

To answer this type of question, we also
consult the Normal Distribution Table. But
now we look in Column C rather than
Column A and we find the value for Z
corresponding to the entry nearest .01.

Turns out that the answer is +2.33.
Partitioning in reverse:
finding critical values of z


Note that we have to take the sign of Z into account.
Once again, due to the symmetry of the normal
distribution, the fact that the area under the normal curve
equals 1.00, and additive property of probabilities, if +2.33
is the value of Z above which 1% of the observations fall,
then the following must also be true:





+2.33 is the value of Z below which 99% of the observations fall.
-2.33 is the value of Z below which 1% of the observations fall.
-2.33 is the value of Z above which 99% of the observations fall.
2.33 is the value of Z that demarcates a symmetrical range around
the mean outside which 2% of all observations fall.
2.33 is the value of Z that demarcates a symmetrical range around
the mean within which 98% of all observations fall.
Partitioning in reverse:
finding critical values of z

What is the Z that demarcates a range about the
mean beyond which only 5% of observations fall?


To answer this, first divide the probability of falling
outside the range by 2, since the observations can fall in
either tail of the distribution. .05/2 = .025.
Next, find .025 in column C of the Normal Distribution
Table. What is corresponding Z score?
Some terminology



We refer to the probability of falling outside a
particular range of z-scores as a (alpha).
We use Za to denote the critical value of Z for a
particular probability, a.
We refer to the area that is outside the range of zscores as the “critical region.”
Some terminology

We distinguish between “one-tailed a” and “two-tailed a.”



One-tailed a means the entire critical region is contained in one of the tails (either positive or
negative) of the distribution.
Two-tailed a means the critical region is equally divided between both tails (positive and negative)
of the distribution.
Remember: the Normal Distribution Table presents one-tailed a’s in Column C. The look
up a Za corresponding to a particular two-tailed a, first divide the two-tailed a by 2.

E.g., to determine Za for two-tailed a = .01, find the Za for one-tailed a=.005. Looking up .005 in
Column C, the closest entry corresponds to 2.58 in column A. The probability that an observation
is at least 2.58 standard deviations from the mean is .005. 2.58 is the critical value of Z defining a
two-tailed critical region of .01, i.e., the value of z that defines a symmetrical range about the
mean beyond which 1 percent of the observations fall.
Remember!


The first step when working with normal
distributions is to convert the original scores (“xscores”) into z-scores. In order to use the Normal
Distribution Table you must have z-scores.
Once you convert x-scores into z-scores, you can
turn a question about an x-score into a question
about a z-score. Then you can answer the
question by partitioning the area under the
standard normal curve.
Remember!


When doing these types of problems,
always start by drawing a normal curve,
labeling the appropriate Za values, and
shading the appropriate area. That helps
you intuitively understand what you are
looking for.
Let’s go through an example: chapter 6,
exercise 16.
Week 8, Part II
Sampling Distributions and the
Central Limit Theorem
Sampling distributions



We have seen how the area under the
standard normal curve can be partitioned in
order to determine the probability that an
observation from a normal distribution falls
within or outside particular ranges of values.
How is this useful for making statistical
inferences?
To see how, we must learn about the concept
of a sampling distribution.
Sampling distributions


Recall that we use statistics to make inferences about a
population based on a sample. For example, we might be
interested in estimating a mean in the population (e.g., mean
age in a population of children) based on a sample from that
population.
By convention, we use Greek letters to denote population
characteristics:



mx denotes the population mean on the variable x.
sx denotes the population standard deviation on the variable x
An estimate of the mean age in the population based on one
sample – a “point estimate” – is only an estimate. We use
Roman letters to denote sample characteristics:


x denotes the sample mean on the variable x.
sx denotes the sample standard deviation on the variable x.
Sampling distributions



We usually have only one sample from which to estimate mx. But
imagine that we could take multiple samples of a given size, N.

Say that x is age and mx in our population of children is 4.5.

It would not be unreasonable, were we to take 5 samples of 6
individual children to obtain the sample means to the right
(see Ritchey, p. 192).
Note that none of these means (none of these “point estimates”)
give us the true mean, but each of them is pretty close to the true
mean.
In fact, we are more likely to get point estimates that are close to
the mean than point estimates that are far from the mean, but in
some cases we nonetheless will get point estimates far from the
true mean.

From the population for this example, we might get an
estimate of 0.4.
x 1 = 4 .0
x 2 = 5 .5
x 3 = 4 .3
x 4 = 5 .3
x 5 = 4 .7
Sampling distributions

Now imagine that rather than just five samples of 6 individuals from
our population of children, we took a very large number of samples.
For example, say we took 10,000 samples and each time we wrote
down the sample mean.




We would have 10,000 sample means, which would themselves form a
frequency distribution.
Just as we can convert any frequency distribution into a probability
distribution, we can convert this frequency distributions of sample means
into probability distributions of sample means.
Now say we took every possible sample of 6 individuals from our
population and computed the sample mean for each. These sample
means would also form a distribution.
We call probability distribution of the point estimates of some
population parameter (such as mx ) drawn from all the possible samples
of a given size a sampling distribution.

If our parameter of interest is mx then we are interested in the sampling
distribution of sample means.
The Central Limit Theorem


Because sampling distributions involve a very large number of
samples, they are often hypothetical. But statisticians have studied
the properties of sampling distributions by taking large numbers of
samples of given sizes from populations with known means and other
parameters of interest.
They discovered that sampling distributions have some very important
properties, which are captured by the “Central Limit Theorem”:
1. m x = m x
2 .s x =
sx
N
3. The distribution is normal (for N > 120) or approximately
normal (for N<120).
The Central Limit Theorem
1. m x = m x
2 .s x =
sx
N
3. The distribution is normal (for N > 120) or approximately
normal (for N<120).

In other words:
1.
2.
3.
The mean of the sampling distribution of sample means is the population
mean.
The standard deviation of the sampling distribution of sample means,
which we call the standard error, is the population standard deviation
divided by the square root of the sample size.
The sampling distribution of sample means is normally distributed if N is
greater than 120. If N is less than 120, it is almost normally distributed.
The Central Limit Theorem

The fact that the standard error is the population standard deviation
divided by the square root of the sample size implies that the larger
the sample size, the closer (on average) a point estimate of the
population mean will be to the true population mean.

To see this, look at figure 7-4 on p. 198 of Ritchey.

This confirms the basic intuition that larger samples are better than
smaller samples for estimating population parameters.

Since we don’t usually know the population standard deviation
(otherwise, why would we need to look at a sample?) we approximate
the standard error using the sample standard deviation:
sx
sx =
n 1
The Central Limit Theorem


The most useful aspect of the central limit theorem is the third point:
the sampling distribution of sample means is (at least approximately)
normal.
This property holds regardless of the shape of the population
distribution of x.

To see this – and to get a better intuitive understanding of sampling
distributions – run the demo at:
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
The Central Limit Theorem

What this means is that you can use the normal distribution to make
inferences about the population parameters of variables even if the
original variables are not normally distributed!

We do this by partitioning the area under the normal curve in a
manner similar to what we’ve already been doing.
The t distribution

In fact, we use a slightly different distribution called the t distribution.

To convert x scores to t scores, we use the same formula as we do to
convert them to z scores:
xx
tx =
sx

The t distribution is very similar to the normal distribution


They are identical for sample sizes of 120 or greater
For smaller sample sizes they are flatter (see Figure 7-7 in Ritchey, p.201)
The t distribution



The t distribution varies by sample size – what we
call “degrees of freedom” (N-1)
We use the t distribution table (Statistical Table C) in
way similar to one of the ways we use the normal
distribution table:
We identify critical values of t (Ta) corresponding to
particular critical regions (probabilities of falling
outside the range defined by Ta).
The t distribution

To use the t distribution table, we start with a pre-determined a; for
example, a = .05.

We then determine the critical value of t (Ta) associated with the
corresponding degrees of freedom.


Remember: df = n-1 for the t distribution.
We must also keep in mind whether we are dealing with a one-tailed
or a two-tailed a.
Proportions


When our x variable is a dichotomous variable, we
can think of the mean of x within a population as the
proportion of the population falling in one of the
categories (call it “success”).
(Review) This follows logically if we code the variable
as a dummy variable, assigning a 1 to the category
denoting success and 0 to the category denoting
failure.


For example, say we convert the variable “gender” (1 for
women, 2 for men) into a dummy variable called “woman”
equalling 1 for women, 0 for men.
The mean of this variable in a population of interest will give
us the proportion of women, which is equivalent to the
probability that a randomly drawn individual is a woman.
Proportions

It also follows from the characteristics of
dummy variables that the population
standard deviation is given by:
sP =

PQ
=
N
P(1  P)
N
Where P denotes the proportion of successes
(probability of success) and Q denotes the
proportions of failures (probability of failure).