Download Slide 1 - highlandstatistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 18 – Like a
Ton of Bricks
…Hope You’re Insured.
Sampling Distributions Again
In this world, many distributions are not
normally distributed.
 Some distributions might be, but we
cannot verify it.
 Usually, in fact, we can only verify this with
a census.
 A census is dumb.

Sampling Distributions Again
These distributions are referred to as
parent distributions.
 This is because they have babies.
 Self-respecting statisticians call them child
distributions.
 I prefer to think of them as underloved
babies, just like those poor, sweet robot
babies.

Sampling Distributions Again
Each sample size creates a separate child
distribution.
 In other words, there is one distributions
for samples of size 2 and other for
samples of size 3.
 And another for samples of size 4.
 Also, samples of size 5 is a separate
distribution.

Sampling Distributions Again
Also 6.
 Also 7.
 Also 12.
 Also 18.
 Also 107.
 And many, many more.
 Like…one for each whole number, I
suppose.

Sampling Distributions Again
Each one of these child distributions is
just a little closer to normal than the
original parent distribution.
 The exception is a parent distribution
which starts of normally distributed
already.
 The child distributions would still,
however, also be normal.

Sampling Distributions Again





The sampling distribution lets us calculate things
based on a normal curve for distributions that
might not even be normal.
It is technically the collection of every possible
mean from every single potential sample of that
particular size.
You would never want to create a sampling
distribution ever.
An actual sampling distribution is even dumber
than a sample.
The concept of a sampling distribution, however,
is handy.
Sampling Distributions Again
These distributions focus on the average
of the sample, and as such outliers tend
to have their effect diluted.
 This can be by counterbalancing outliers.
 This can also be the fact that most of the
elements of the sample will be more
typical values.
 This leads to a smaller standard deviation
than the original distribution had.

Sampling Distributions Again
Much in the way that other types of
children will often resemble their parents,
sampling distributions have some things in
common.
 Sampling distributions have the same
mean as the original population.
 This idea is that the mean of all the
sample means is the same as the original
mean was in the first place.

Waw!?!
Each individual number in the sampling
distribution is the mean of a sample.
 So the sampling distribution is actually an
enormous gathering of means.
 A sample mean convention, as it were.
 And if we took every single possible mean
and averaged them, they would average to
the original mean of the parent
distribution.

We Care….Why?
These sampling distributions look a lot
like Normie.
 In the absence of paternity tests, we
cannot prove anything, however.
 The resemblance to Normie makes the
math easier.
 Well, once we review how to do z-score
type stuff.

Mean and Standard Deviation



We established that the means of all the
sampling distributions tend to centralize
around the true mean of the parent
distribution.
This centralizing effect causes the standard
deviation of the sampling distribution to be
smaller.
For a quantitative variable, in fact, we can
take the original standard deviation and
divide by the square root of n, which is the
sample size.
Mean and Standard Deviation

For proportional data (which we will
spend the next 4 chapters discussing) our
standard deviation will be the square root
of the quotient of the product of the
proportion and its compliment divided by
the sample size.
Waw?!?
It is actually a simple formula:
The Central Limit Theorem
The Central Limit Theorem is a huge deal
in Statistics.
 If you were to go into a statistician’s bar
and talk crap about this theorem, there
would be a brawl.
 It basically states what we were just
talking about, but I will recap it.

The Central Limit Theorem

Sampling distributions are more normal than
their parent distributions, unless the parent
was normally distributed.
◦ In which case the sampling distribution stays
normal.

Sampling distributions have the same mean
as the parent distribution and have a
standard deviation which is the standard
deviation of the parent distribution divided
by the square root of the sample size.
The Central Limit Theorem
This is important: a distribution that does
not start normal will never have normal
sampling distributions…they will only be
close, at best.
 As a rule of thumb, 30 is usually enough
to make nearly any distribution roughly
normal in its sampling distribution.
 There are also conditions to check.

Conditions?
In order to make our assumption of having a
roughly normal sampling distribution something
other than hubris or arrogance, we have to
meet a few requirements.
 If we can do this, we can bypass the usual effect
that assuming has on you and me.
 This is just plain exciting to me, since usually
making assumptions just ends up biting you in
the donkey.

The Conditions
Randomizing Condition: The sample must
be random.
 Independence Condition: Each thing in
the sample must be independent.
 The 10% Condition: The sample must be
less than 10% of the population.
 The Large Enough Sample Condition: The
sample must be large enough.

That Last One Seemed Vague
True.
 It is not vague though, but instead it changes
based on which kind of variable we are
looking at.
 We will clarify it when we get there, but the
basic idea is that for proportional data it is
having at least 10 successes and 10 failures,
and for quantitative data you need to have a
distribution that lacks extreme skew or
outliers.

Standard Error
P.S. – Instead of calling the standard
deviation of the sampling distribution a
standard deviation, we will instead call it a
standard error.
 This is because it is not really a deviation
unless we calculate every mean, but
instead it represents a specific sampling
value being different from the mean due
to sampling error.

Assignments
Chapter 18 – 25 and 27, then 5 and 17.
 Due Tuesday.
 I will be giving more homework due
Thursday so don’t fall behind.
 Midterm project presentations are in just
over two weeks.
 There will be a chapter 18+19 Quiz next
week.
 Read chapter 19 for Monday.

Quiz Bulletpoints





Be able to use z-scores to find probabilities for
individuals.
Be able to use z-scores to find probabilities for
sample averages.
Be able to use z-scores to find probabilities for
sample proportions.
Be able to find a confidence interval for the true
proportion based on a sample.
Be able to find the sample size in order get a
desired margin of error.