Download Sampling and Sampling Distributions Sampling Distribution Basics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Sufficient statistic wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 7
Sampling Distribution Basics
Sampling and Sampling
Distributions
• Sample statistics (the mean and standard
deviation are examples) vary from sample to
sample.
• Sample statistics are computed from random
variables from a population and, as such are
random variables themselves.
• A sampling distribution is simply a probability
distribution of a sample statistic.
BIT 5724
Sampling Distributions
• Generally we do not know the mean or variance
of a random variable; and
• Often the purpose of sampling is to estimate
parameters (mean, variance, etc.) of a
population. We use samples because:
– The population is too large for a census;
– It is too expensive to conduct a census; and/or
– The units must be destroyed in order to test the
variable(s) of interest, i.e. destructive testing.
BIT 5724
Definitions
• A parameter is a numerical descriptive
measure of a population. It is calculated
from the observations in the population.
• A sample statistic is a numerical
descriptive measure of a sample. It is
calculated from the observations in the
sample.
BIT 5724
1
Chapter 7
Sample Statistics
Example
• Sample mean (used to estimate the population
mean - a parameter);
• Sample median;
• Sample variance (used to estimate the
population variance - another parameter);
• Sample standard deviation (derived from the
sample variance and used to estimate the
population variance - another parameter).
• We want to estimate the population mean:
BIT 5724
BIT 5724
– Two possible sample statistics
• Sample mean • Sample median -
• Expected value (of the population) is still:
  3 .5
• Mean of x is: x  13 / 3  4 .33 While median is: m  4
• Now which is closer to the true mean (expected
value)?
BIT 5724
m
– Which one should be used? For example, toss a die
three times and let x be the number of dots showing
on the up face. Suppose we have 2, 2, and 6 come
up:
•
•
•
•
Example, cont.
– What if we had sample measurements of 3, 4,
and 6?
x
Expected value (of the population) is:   3 .5
Mean of x is: x  10 / 3  3 .33
While median is: m  2
Which is closer to the true mean (expected value)?
Sampling Statistics
• Since sampling statistics are random
variables, they must be compared on the
basis of their probability distributions - the
collection of values and associated
probabilities of each statistic that would
be obtained if the sampling experiment
were repeated a very large number of
times.
BIT 5724
2
Chapter 7
Definitions
More Definitions
• The sampling distribution for a sample
statistic (calculated from a sample of n
measurements) is the probability
distribution for the statistic; or
• The sampling distribution is a function that
gives the probability of every possible
value of a sample statistic for specified
population and sample size.
• A point estimator of a population parameter is a
rule or formula that tells us how to use the
sample data to create a single number that can
be used as an estimate of the population
parameter.
• If a sample statistic has a sampling distribution
with a mean equal to the population parameter
the statistic is intended to estimate, the statistic
is said to be an unbiased estimator of the
parameter.
BIT 5724
BIT 5724
And More Definitions
Sampling Distribution of the
Sample Mean
• If the mean of the sampling distribution is not
equal to the parameter, the statistic is said to be
a biased estimator of the parameter.
• Often we are interested in making an
inference about the mean of some
population,  . The sample mean is a
good choice as the estimator for  .
BIT 5724
BIT 5724
3
Chapter 7

Variability among Samples
Point Estimates
S
estimates

estimates

23
24
25
26
23.5 mpg
BIT 5724
27
28
29
27.5 mpg
BIT 5724
Normal Distribution for the Mean
Distribution
Revisited
Useful Useful
Probabilities
for Normal
Distributions
68%
95%
99%
The Mean and Standard Deviation of
Sampling Distribution of x
• Regardless of the shape of the population relative
frequency distribution:
– The mean of the sampling distribution of x will equal
 , the mean of the sampled population.
– The standard deviation of the sampling distribution of
will equal  , the standard deviation of the sampled
population divided by the square root of the sample
size n:









• Confidence intervals assume that the sample means
BIT 5724 are normally distributed.
x

x
n
(often referred to as the standard error of the mean)
BIT 5724
4
Chapter 7
Standard Error of the Mean
• A statistic that measures the variability of your
estimate is the standard error of the mean.
• It differs from the sample standard deviation
because
the sample standard deviation is a measure
of the variability of data
the standard error of the mean is a measure
of the variability of sample means.
Standard error of the mean =
s
n
=
Example
• Let x be a normally distributed random
variable with a mean of 89 and a standard
deviation of 12:
– What is the probability that the mean of a
sample of size n=19 will be between 85 and
93?
– What is the probability that the mean of a
sample of size n=40 will exceed 91?
s
X
BIT 5724
BIT 5724
Answer to First Part
x 

x 
n
So,  x 
z
Answer to Second Part
12
 2.753
19

n
So,  x 
x
x
85  89
 1.45
2.753
93  89
And , z 
 1.45
2.753
12
 1.897
40
So, z 
z
p( 1.45  z  1.45)  0.4265  0.4265  0.8530
n  29, p( 1.8  z  1.8)  0.9266
BIT 5724
91  89
 1.05
1.897
p ( z  1.05)  0.500  0.3531  0.1469
BIT 5724
5
Chapter 7
Example
Answer
• The population of orders for printing jobs at a
print shop is approximately normal with a mean
of 200 pages and a standard deviation of 40
pages. The shop is almost out of paper and it
has five orders that must be finished before a
shipment of paper can be expected. If the shop
has 1,200 sheets of paper left, what is the
probability that the five orders will not exhaust
the stock of paper?
• Hint: Find P( x  240)
BIT 5724
x 
n
So, x 
z
40
 17.889
5
240  200
 2.236
17.889
p( z  2.236)  0.500  0.4875  0.9875
BIT 5724
Example
• Let x be a random variable with a mean of 1,200
and a standard deviation of 20:
– What is the probability that the mean of a sample of
size 80 will exceed 1,202?
– What is the probability that the mean of a sample of
size 50 will be less than 1,202?
– If the probability that the mean of a sample of size n
will exceed 1,201 is 0.25, what must n equal?
BIT 5724

Answers
• Part 1 - 0.1867
• Part 2 - 0.7611
• Part 3 - 180
BIT 5724
6
Chapter 7
Central Limit Theorem
• If a random sample of n observations is
selected from a population, when n is
sufficiently large, the sampling distribution
of x will be approximately a normal
distribution. Typically, a sample size of n  30
is considered large enough. The larger the
sample size n, the better the normal
approximation.
BIT 5724
Central Limit Theorem, Illustrated
Normality and the Central Limit Theorem
• To satisfy the assumption of normality, you can
do one of the following:
verify that the population distribution is
approximately normal
apply the central limit theorem
• The central limit theorem states that the distribution of
sample means is approximately normal, regardless of
the population distribution’s shape, if the sample size is
large enough.
• “Large enough” is usually approximately 30
observations. It is more if the data are heavily skewed,
and fewer if the data are symmetric.
BIT 5724
Sampling Distribution of the
Proportion
• We are often interested in making an inference
about the proportion of some population, p.
• Examples:
– Proportion of freshman that graduate from Virginia
Tech in four years.
– Proportion of defective items in a lot.
– Proportion of a set of loans that will become
nonperforming.
BIT 5724
BIT 5724
7
Chapter 7
The Sample Proportion and Standard
Deviation of the Number of Successes
• The sample proportion p is the value of the
random variable x divided by the sample
X
size.
p
Normal Approximation to the Sampling
Distribution of the Proportion
np  5
• Rules:
n (1 
p )  5
n
• The standard deviation of the sampling
distribution is:
 
• Z-value for sampling distribution for p:
p (1  p )
n
BIT 5724
Z

p  p

p
BIT 5724
Example
• If a sample of size 100 is taken from a
population of size 1000 and the population
contains 300 successes:
– What is the probability that the sample
proportion of successes will be 0.35 or more?
– What is the probability that the sample
proportion of successes will be between 0.25
and 0.45?
Answers
• Part a:
p (1  p )
0 . 3 (1  0 . 3 )

 0 . 0458
n
100
0 . 35  0 . 30
z 
 1 . 09
0 . 0458
p ( p  0 . 35 )  p ( z  1 . 09 )  0 . 5  0 . 3621
 
 0 . 1379
• Part b:
p ( 0 . 25  p  0 . 45 )  p (  1 . 09  z  3 . 28 )  0 . 3621  0 . 5  0 . 8621
BIT 5724
BIT 5724
8
Chapter 7
Example
• An advertising campaign for a new perfume has
a goal of reaching 50% of the women in the
target group. Suppose a national sample of 300
women from the target group is drawn to see
how the campaign in working. 129 women in
the group can recall seeing an ad or commercial
for the new perfume. If the population
proportion was 0.50, what is the probability of
observing a sample proportion of 0.43 or less in
a sample of 300?
BIT 5724
p (1  p )
0.5(1  0.5)

 0.0289
n
300
p  p 0.43  0.5
Z

 2.42
p
0.0289
 
p ( p  0.43)  p ( z  2.42 )  0.5  0.4922  0.0078
BIT 5724
From Here To Inference
• The primary function of getting a sampling
distribution is to produce a statistical inference.
• Probability distributions allow us to make
probability statements about values of a random
variable. Thus, knowledge of the population and
its parameters allows us to use the probability
distribution to make probability statements about
individual members of the population.
BIT 5724
Answer
From Here To Inference (cont.)
• With sampling distributions, knowledge of the
parameters and some information about the
distribution allow us to make probability statements
about a sample statistic.
• In applying both probability distributions and
sampling distributions, we must know the value of
relevant parameters, a highly unlikely circumstance.
In the real world, parameters are almost always
unknown because they represent descriptive
measurements about extremely large populations.
• Statistical inference addresses this problem—now
we will assume that most population parameters are
BIT 5724
unknown.
9