Download Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 7
Sampling Distributions
Introduction
• In real life calculating parameters of populations
is prohibitive (difficult to find) because
populations are very large.
• Rather than investigating the whole population,
we take a sample, calculate a statistic related to
the parameter of interest, and make an
inference.
• The sampling distribution of the statistic is the
tool that tells us how close is the statistic to the
parameter.
population
Sampling Techniques
Parameters
Statistics
x, s2
,  2
inference
sample
Statistical
Procedures
Sampling
Example:
• A pollster is sure that the responses to his
“agree/disagree” question will follow a binomial
distribution, but p, the proportion of those who “agree” in
the population, is unknown.
• An agronomist believes that the yield per acre of a
variety of wheat is approximately normally distributed,
but the mean  and the standard deviation  of the
yields are unknown.
 If you want the sample to provide reliable information
about the population, you must select your sample in a
certain way!
Types of Sampling Methods/Techniques
The sampling plan or experimental design determines the amount of
information you can extract, and often allows you to measure the
reliability of your inference.
Sampling
Non-Probability
Samples
Probability Samples
Simple
Random
Judgement
Quota
Chunk
Stratified
Cluster
Systematic
Simple Random Sampling
Sampling Plan:
1. Simple random sampling is a method of sampling
that allows each possible sample of size n an equal
probability of being selected.
Example
•There are 89 students in a statistics class. The
instructor wants to choose 5 students to form a
project group. How should he proceed?
1. Give each student a number from 01 to
89.
2. Choose 5 pairs of random digits from
the random number table.
3. If a number between 90 and 00 is
chosen, choose another number.
4. The five students with those numbers
form the group.
Other Sampling Techniques
• There are several other sampling plans that
still involve randomization:
2. Stratified random sample: Divide the population
into subpopulations or strata and select a simple
random sample from each strata.
3. Cluster sample: Divide the population into
subgroups called clusters; select a simple random
sample of clusters and take a census of every
element in the cluster.
4. 1-in-k systematic sample: Randomly select one
of the first k elements in an ordered population,
and then select every k-th element thereafter.
Examples
• Divide West Malaysia into states and
take a simple random sample within each state.
• Divide West Malaysia into states and take a simple random
sample of 5 states.
• Divide a city into city blocks, choose a simple random
sample of 10 city blocks, and interview all who
live there.
• Choose an entry at random from the phone book, and select
every 50th number thereafter.
Non-Random Sampling Plans
• There are several other sampling plans that do not
involve randomization. They should NOT be
used for statistical inference!
1. Convenience sample: A sample that can be taken easily
without random selection.
•
People walking by on the street
2. Judgment sample: The sampler decides who will and
won’t be included in the sample.
3. Quota sample: The makeup of the sample must reflect
the makeup of the population on some selected
characteristic.
•
Race, ethnic origin, gender, etc.
Types of Samples
• Sampling can occur in two types of practical
situations:
1. Observational studies: The data existed before you
decided to study it. Watch out for
 Nonresponse: Are the responses biased because
only opinionated people responded?
 Undercoverage: Are certain segments of the
population systematically excluded?
 Wording bias: The question may be too
complicated or poorly worded.
Types of Samples
• Sampling can occur in two types of practical
situations:
2. Experimentation: The data are generated by
imposing an experimental condition or treatment on
the experimental units.
 Hypothetical populations can make random
sampling difficult if not impossible.
 Samples must sometimes be chosen so that the
experimenter believes they are representative of
the whole population.
 Samples must behave like random samples!
Sampling Distributions
• Numerical
descriptive measures calculated from
the sample are called statistics.
• Statistics vary from sample to sample and hence
are random variables.
• The probability distributions for statistics are
called sampling distributions.
• In repeated sampling, they tell us what values of
the statistics can occur and how often each value
occurs.
Sampling Distributions
Definition: The sampling distribution of a statistic is
the probability distribution for the possible values of the
statistic that results when random samples of size n are
repeatedly drawn from the population.
Population: 3, 5, 2, 1
Draw samples of size n = 3
without replacement
p(x)
Possible samples
3, 5, 2
3, 5, 1
3, 2, 1
5, 2, 1
1/4
2
3
x
x
10 / 3  3.33
9/3  3
6/3  2
8 / 3  2.67
Each value of
x-bar is equally
likely, with
probability 1/4
1. Sampling Distribution of the Mean…
•A fair die is thrown infinitely many times, with the random
variable X = # of spots on any throw. The probability distribution
of X is:
x
P(x)
1
2
3
4
5
6
1/6
1/6
1/6
1/6
1/6
1/6
…and the mean and variance are:
   xp( x)
1
1
1
 1( )  2( )  ...  6( )  3.5
6
6
6
  ( x   ) 2 p( x)  1.71
Sampling Distribution of Two Dice
•A sampling distribution is created by looking at
all samples of size n=2 (i.e. two dice) and their means…
•While there are 36 possible samples of size 2, there are only 11 values for
(e.g. =3.5) occur more frequently than others (e.g. =1).
, and some
Sampling Distribution of Two Dice…
•The sampling distribution of
6/36
P( )
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
is shown below:
5/36
p x 
4/36
3/36
2/36
1/36
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Example
Thrown two fair dice. Based on all possible
samples, the calculation of mean and standard
deviation can also be done as.
Mean :   3.5
Std Dev :
/ 2  1.71 / 2  1.21
Sampling Distribution of the Mean
n5
 x  3.5
n  10
 x  3.5
n  25
 x  3.5
2x
  .5833 (  )
5 6
2x
2
 x  .2917 (  )
10
2x
  .1167 (  )
25
2
x
2
x
19
Sampling Distribution of the Mean
n5
 x  3.5
2x  .5833 ( 

)
5
2
x
n  10
 x  3.5
n  25
 x  3.5
2

2x  .2917 (  x )
10
2

2x  .1167 (  x )
25
Notice that 
is smaller than 
The larger the sample size the
2
smaller  x . Therefore, x tends
to fall closer to , as the sample
size increases.
2
x
2
x.
Central Limit Theorem
Central Limit Theorem: If random samples of n
observations are drawn from a nonnormal population with
finite  and standard deviation  , then, when n is large,
the sampling distribution of the sample mean x is
approximately normally distributed, with mean  and
standard deviation  / n . The approximation becomes
more accurate as n becomes large.
How Large is Large?
If the population is normal, then the sampling
distribution of x will also be normal, no matter what
the sample size.
When the population is approximately symmetric, the
distribution becomes approximately normal for
relatively small values of n.
When the population is skewed, the sample size must
be at least 30 before the sampling distribution of x
becomes approximately normal.
The Sampling Distribution of the
Sample Mean
A random sample of size n is selected from a
population with mean  and standard deviation .
The sampling distribution of the sample mean
have mean  and standard deviation  / n .
x
will
If the original population is normal, the sampling
distribution will be normal for any sample size.
If the original population is nonnormal, the sampling
distribution will be normal when n is large.
The standard deviation of x-bar is sometimes called the
STANDARD ERROR (SE).
Sampling Distribution of the Mean
Central Limit Theorem
Given population with  and  2 the sampling
distribution will have:
Mean
Variance
Standard Deviation
Standard Error (mean)
x  
2

 x2 
n

=x 
n
As n increases, the shape of the distribution becomes
normal (whatever the shape of the population)
Example
Finding Probabilities for
the Sample Mean
If the sampling distribution of x is normal or
approximately normal, standardize or rescale the
interval of interest in terms of
z
x 
/ n
Find the appropriate area using Z Table.
Example: A random
sample of size n = 16 from
a normal distribution with
 = 10 and  = 8.
12  10
P ( x  12)  P ( z 
)
8 / 16
 P ( z  1)  1  .8413  .1587
Example
A soda filling machine is supposed to fill cans of
soda with 12 fluid ounces. Suppose that the fills are actually
normally distributed with a mean of 12.1 oz and a standard
deviation of 0.2 oz. What is the probability that the average
fill for a 6-pack of soda is less than 12 oz?
P (x  12) 
x   12  12.1
P(

)
 / n .2 / 6
P( z  1.22)  .1112
Exercise
The time that the laptop’s battery pack
can function before recharging is needed
is normally distributed with a mean of 6
hours and standard deviation of 1.8
hours. A random sample of 25 laptops
with a type of battery pack is selected and
tested. What is the probability that the
mean until recharging is needed is at
least 7 hours?
The characteristics of the sampling
distribution of a statistic:
• The distribution of values is
obtained by means of repeated
sampling
• The samples are all of size n
• The samples are drawn from the
same population
2. Sampling Distribution of a Proportion…
The proportion in the sample is denoted "p-hat"
number of " successes"
pˆ 
n
The proportion in the population (parameter) is denoted p
pˆ is to p, as x is to 
Types of response variables
Quantitative
Sums
Averages
Response type
Categorical
Counts
Proportions
Prior chapters have focused on quantitative response
variables. We now focus on categorical response variables.
The Sampling Distribution of
the Sample Proportion
The Central Limit Theorem can be used to
conclude that the binomial random variable X with
mean np and standard deviation np 1  p  is
approximately normal when n is large.
x
ˆ
The sample proportion, p  n is simply a rescaling
of the binomial random variable X, dividing it by n.
From the Central Limit Theorem, the sampling
distribution of p̂ will also be approximately
normal, with a rescaled mean and standard
deviation.
Approximating Normal from the Binomial
 Under certain conditions, a binomial random variable
has a distribution that is approximately normal.
 When n is large, and p is not too close to zero or one,
areas under the normal curve with mean np and
standard deviation np 1  p  can be used to
approximate binomial probabilities.
 Make sure that np and n(1-p) are both greater than 5
to avoid inaccurate approximations!
The Sampling Distribution of
the Sample Proportion
A random sample of size n is selected from a binomial
population with parameter p.
x
The sampling distribution of the sample proportion, pˆ 
n
pq
will have mean p and standard deviation
n
If n is large, and p is not too close to zero or one, the
sampling distribution of p̂ will be approximately
normal.
The standard deviation of p-hat is sometimes
called the STANDARD ERROR (SE) of p-hat.
Finding Probabilities for
the Sample Proportion
If the sampling distribution of p̂ is normal or
approximately normal, standardize or rescale the
p̂  p
interval of interest in terms of
If both np > 5 and
z
pq
n
np(1-p) > 5
Find the appropriate area using Z Table.
Example: A random
sample of size n = 100
from a binomial
population with p = 0.4.
.5  .4
P ( pˆ  .5)  P ( z 
)
.4(.6)
100
 P ( z  2.04)  1  .9793  .0207
Example
The soda bottler in the previous example claims
that only 5% of the soda cans are underfilled.
A quality control technician randomly samples 200 cans
of soda. What is the probability that more than 10% of
the cans are underfilled?
n = 200
U: underfilled can
p = P(U) = 0.05
q = 0.95
np = 10 nq = 190
OK to use the normal
approximation
P( pˆ  .10)
.10  .05
 P( z 
)  P( z  3.24)
.05(.95)
200
 1 .9994  .0006
This would be very
unusual, if indeed p = .05!
Example
Sampling Distribution of the Difference
Between Two Averages
• Theorem : If independent sample of size n1 and n2 are drawn at
random from two populations,with means 1 and 2 and variances
12 and 22, respectively, then the sampling distribution of the
differences of means, X 1  X 2 , is approximately normally
distributed with mean and variance given by
X
1X 2
Hence
 1   2 and  X2 1  X 2 
Z
 12
n1

 22
n2
.
( X 1  X 2 )  ( 1   2 )
( 12 / n1 )  ( 22 / n2 )
is approximat ely a standard normal variable.
38
Example
•Starting salaries for MBA grads at two universities are normally
distributed with the following means and standard deviations.
Samples from each school are taken…
University 1
University 2
Mean
RM 62,000 /yr
RM 60,000 /yr
Std. Dev.
RM14,500 /yr
RM18,300 /yr
50
60
sample size, n
• What is the sampling distribution of
• What is the probability that a sample mean of U1 students will
exceed the sample mean of U2 students?
Example
mean:
 x  x  1   2  x1  x2 = 62,000 – 60,000 =2000
and standard deviation:
= 14,500 2  18,300 2
1
2
50
60
= 3128.3
P( x1  x2  0)  P(
x1  x2  ( 1 -  2 )

2
1
n1


2
2
0  2000

)
3128
n2
 P( z  .64)  .5  .2389  .7389
Sampling Distribution of the Difference Between Two
Sample Proportions
• Theorem : If independent sample of size n1 and n2 are drawn at
random from two populations, where the proportions of obs with the
characteristic of interest in the two populations are p1 and p2
respectively, then the sampling distribution of the differences
between sample proportions, pˆ1  pˆ 2 is approximately normally
distributed with mean and variance given by
 pˆ1  pˆ 2  p1  p2
Then
Z
and
pˆ1  pˆ 2  ( p1  p2 )
 pˆ1  pˆ 2
is approximately a standard normal variable
41
Sampling Distribution of the Difference Between Two
Sample Proportions
• Example : It is known that 16% of the households in Community A
and 11% of the households in Community B have internets in their
houses. If 200 households and 225 households are selected at
random from Community A and Community B respectively,
compute the probability of observing the difference between the
two sample proportions at least 0.10?
42
Sampling Distribution of S2
• Theorem : If S2 is the variance of a random sample of size n taken
from a normal population having the variance 2, then the statistic
 
2
(n  1) S 2

2
n

i 1
( X i  X )2
2
• has a chi-squared distribution with v = n -1 degrees of freedom.
For v  7
 02.05  14.067
 02.95  2.167
43