Download Outline Statistical Methods Importance of sampling distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Probability and Statistics
LECTURE 6
SAMPLING DISTRIBUTIONS
Outline
•
•
•
•
The importance of sampling distribution
Repeated sampling
Sampling distribution of sample mean
Using simulation to understand sampling
distribution
Central limit theorem
•
Adapted from http://www.prenhall.com/mcclave
6-1
6-2
Importance of sampling
distribution
Statistical Methods
•
•
6-3
The basis of statistical inference
Basis for understanding hypothesis
testing, estimation, etc.
6-4
Repeated sampling
Example




We wish to estimate population mean
Select a random sample
Find the sample mean (e.g. = 20) and use
it as an estimate
If other people select different samples,
and find markedly different sample means

Repeated sampling
• The same problem but:
•
Sampling distribution of sample mean gives
ideas about how sample means vary
between samples
Sample mean: just a particular sample
statistic
•
Would we trust our estimate?
6-5
If everyone else selects different samples, their
results are close to our result

6-6
Example of sampling
distribution
•
•
•
Given a population of salaries of 5
employees: 2, 5, 7, 8, 10 (in hundred
dollars/month)
Imagine population mean is unknown;
we wish to estimate population mean
salary
Example of sampling
distribution
•
•
Denote mean of random sample:
Before sample selection: does
represent a fixed value or a Random
Variable?
We select a random sample of 3
salaries
6-7
6-8
Example of sampling
distribution
If
represents a variable that can
change in values


how many possible values it can take?
What is the possibility of each value?
What if we use a sample size of n=4?
Sampling distribution of
sample mean
Probability distribution of all of the
possible values of the sample mean for a
given size sample selected from a
population
What if we change the sample size?
6 - 10
6-9
Example of sampling
distribution of variance
Questions
•
•
Is there a sampling distribution of
median?
Is there a sampling distribution of
variance?
6 - 11
6 - 12
Activity: exploring sampling
distributions via simulation
In general
Sampling distribution is a probability
distribution of all of the possible
values of a sample statistic for a
given size sample selected from a
population
•
Use the applet on the webpage:
http://www.rossmanchance.com/applets/OneSample
.html
•
•
1st Population: math scores of 15892 high
school students
Let’s observe
Histogram of population
Mean of population
SD of population



6 - 13
6 - 14
Activity
•
Now we will develop sampling
distribution of sample means (for
example, by selecting 10000 samples
or more) for n =




6 - 15
2
10
30
100
Observations
Let’s write down our observations:





6 - 16
Many sampling distributions (for each n)
Shape of sampling distribution
Mean of sampling distribution (and
compare it mean of population)
SD of sampling distribution (and compare
it with SD of population)
The difference between sampling
distribution and population
Activity
Activity
•
•
Now let’s choose a different population
(a non-normal population) provided by
the website



Repeat what we have done
Write down our observations
When does the sampling distribution
becomes approximately normal?

•
•
6 - 17
Clearly distinguish between population
and sampling distribution
Homework: you should experiment with
other populations in the website to
deepen your understanding of
sampling distributions
Question: Is there a sampling
distribution of another statistic?
6 - 18
Theorem II: Central Limit
Theorem
Theorem I
•
If a random sample is selected from a
normal population, the sampling
distribution of sample mean is normal
•
Demonstrated by the applet of
population of math scores
6 - 19
Now we should
•
If a random sample is selected from a
non-normal population, the sampling
distribution of sample mean is
approximately normal for large sample
sizes
•
Demonstrated by the applet of a
skewed population
6 - 20
Theorem II: Central Limit
Theorem
Properties of sampling
distribution of mean
Practical guideline:
•
If the population is nearly normal, then a sample of size n = 5
will probably be large enough to assure that
is
approximately normal.
If the population is symmetric, then a sample of size n = 20 to
25 is enough for the Central Limit Theorem (CLT) to hold.
For most moderately skewed distributions, a sample size of
around 30 is traditionally thought to be sufficiently large for the
CLT to hold. This is a rule of thumb but this is not a definitive
number.
For very skewed distributions or distributions with outliers, the
sample size required for the CLT to hold may be much larger
than 30.
•
•
•
6 - 21
The relationship between


The relationship between
SD of population and
SD of all sample means


6 - 22
Sampling error
•
•
Difference between sample statistic
and parameter
Important when making inference
about population
Standard error of mean
SD of sample means




6 - 23
Mean of population and
Mean of all sample means
6 - 24
Represents (approx.) average deviation of
sample means to center
The center = population mean
Represents (approx.) average error when
using sample mean to estimate population
mean
So called Standard error of mean =
(if n/N ≤ 0.05)
Finite population
correction factor
•
In cases where n/N > 0.05, the
standard error of mean is:
Finding probability of
sample mean
•
•
6 - 25
First, check that the sampling
distribution of sample mean is normal
or nearly so
If so, convert to Z to find probability:
6 - 26
Solution
Exercise 1
You’re an operations analyst
for AT&T. Long-distance
telephone calls are normally
distributed with  = 8 min. & 
= 2 min. If you select a
random sample of 25 calls,
what is the probability that the
sample mean would be
between 7.8 & 8.2 minutes?
© 1984-1994 T/Maker Co.
6 - 27
6 - 28
Solution
Exercise 2
You’re an operations analyst
for company A. The
distribution of long-distance
telephone calls is symmetric
but non-normal with  = 8
min. &  = 2 min. If you
select a random sample of
30 calls, what is the
probability that the sample
mean lies between 7.8 & 8.2
minutes?
© 1984-1994 T/Maker Co.
6 - 29
6 - 30
Conclusion
•
•
•
•
•
The importance of sampling distribution
Repeated sampling
Sampling distribution of sample mean
Using simulation to understand sampling
distribution
Central limit theorem
6 - 31