Download Sampling Distributions of SAMPLE MEANS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 7
Sampling and Sampling
Distributions
•
Population:
Collect Data: Sampling
In finance, economics or any other area of
concern it is usually impossible to access
entire population data, mainly because of
money and time restrictions.
Population = The complete set of all
Sample = It is an observed SUBSET of
items about which information is
POPULATION.
desired.
n= Sample size such that n< N
N= Population size, can be very large
or even infinite
Random Sampling = The procedure to select
Parameter = It is a specific
“n” objects from the population “N” with
characteristic of population like
equal chance (probability) of selection
for each member of the population.
mean, variance, standard
deviation
If the data set is entire population,
then population mean is:
N

x1  x2    x N

N
x
i 1
i
2
 
2
If the data set is from a sample, then the
sample mean and variance are followings:
n
N
The variance of the population is
N
Sample Statistics = It is a specific
characteristic of a sample!!
x  x2    xn
x 1

n
 x   
i 1
N
i
N
S
2

 x
i 1
i
2
 x
n 1
x
i 1
n
i
Sampling Distributions of SAMPLE MEANS
Different samples may result different sample means.
Example: Lets consider the following population: 1, 2, 3, 4.
 2  1.25
N=4 and   2.5
Lets consider all possible sample of size 2:
4C2 = 4!/[2!(4-2)!] = 6 is the total number of possible samples
Sample 1: 1, 2  Mean of first sample:
Sample 2: 1, 3  Mean of first sample:
Sample 3: 1, 4  Mean of first sample:
Sample 4: 2, 3  Mean of first sample:
Sample 5: 2, 4  Mean of first sample:
Sample 6: 3, 4  Mean of first sample:
x1  (1  2) / 2  1.5
x 2  (1  3) / 2  2
x 3  (1  4) / 2  2.5
x 4  (2  3) / 2  2.5
x 5  (2  4) / 2  3
x 6  (3  4) / 2  3.5
See Different samples may result different sample means!!
Each of the sample has equal chance of occurrence, so the selection probability
of each sample is (1/6)
Sample
Sample mean
Probability
1,2
1.5
1/6
1,3
2
1/6
1,4
2.5
1/6
2,3
2.5
1/6
2,4
3
1/6
3,4
3.5
1/6
Different samples may result different sample means!!
Lets see what is the average of the sample means i.e. average of
E ( x)   x P( x)  1.5 * (1 / 6)  2 * (1 / 6)  ...  3.5 * (1 / 6)  2.5
x
We can generalize this result as: E ( x) 
 xP( x)  
x
E (x )
•
What is the variance of the sample means:
Var ( x)  E ( x  μ)2   ( x  μ)2 P( x)
x
 (1.5  2.5) 2 * (1 / 6)  ...  (3.5  2.5) 2 * (1 / 6)  0,42
•
What is the relation between Var (x)
•
If the population size is small than
Var ( x) 
•

2
n
(
and
2
:
N n
)
N 1
Here ( N-n / N-1 ) is the correction factor for finite population.
•
If the population size is large than Var ( x ) 
•
•
In our example
Var ( x) 
2
n
(
2
n
N n
1.25 4  2
1.25 2
) 
(
) 
( )  0.42
N 1
2
4 1
2
3
What is the distribution of Sample Mean
• In general when population size is
large we learn that
Var ( x )   x2 
2
Sample
mean
Probability
1,2
1.5
1/6
1,3
2
1/6
1,4
2.5
1/6
2,3
2.5
1/6
2,4
3
1/6
3,4
3.5
1/6
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
n
E ( x)   x   x P( x)  
x
Sample
probability
• Consider our example:
• Lets look at figure at the right hand
side, we see that the distribution of
the sample mean is symmetric
around the mean, (looks similar to
Normal Distribution !!!)
1.5
2
2.5
sample mean
3
3.5
A Bunch of Proves and Central Limit Theorem:
•
Lets consider a population composed
of elements: X1,, X2,, …, XN with
2

mean
and variance 
•
•
1) When we pick up a RANDOM sample of ‘n’ which is: X1,, X2,, …, Xn
These X random variables are INDEPENDENT of each other!!
•
Sample mean is actually nothing but a linear combination independent
random variables
So:
1 n
1
1
1
•
E ( x)  E (
n
X
i 1
i
)
n
E ( X 1  X 2  ...  X n ) 
n
[ E ( X 1 )  E ( X 2 )  ...  E ( X n )] 
n
n  
1 n
1 2
1 2
2
2
Var ( x)  Var (  X i )  ( ) Var ( X 1  ...  X n )  ( ) (n ) 
n i 1
n
n
n
•
2) If a population composed of elements: X1,, X2,, …, XN with mean ,
and variance, 2, and distributed Normally,
•
Then
x ~N(, 2/n)
• 3)Central Limit Theorem (CLT) :
Generalizes this property.
• IF SAMPLE SIZE “n” is LARGE
(n>=30), then x ~ N(, 2/n)
• See: The real distribution of X does not
have to be known neither it does not have
to be Normal.
• If n is larger then
x ~ N(, 2/n)
How we benefit from CLT?
if n is larger then
•
•
•
•
x
~ N(, 2/n)
Example1: The weights of people traveling by air in some region
have the mean of 163 pounds and the standard deviation of 18
pounds.
What is the probability that the average weight of 36 person will
be greater then 167 pounds?
Information about population:  = 163 pounds, 2= 18 pounds
n=36 > 30  CLT then
x ~ N( = 163, 2/n= 182 /36)
P( x  167)  P(
x  x
x

167  163
2
18 / 36
)  P( Z  1.33)  1  P( Z  1.33)  0.0918
•
•
•
ACCEPTANCE INTERVALS
When we observe sample mean: x
We know that it comes from Normal distribution when n is large.
2
x ~N(,  /n)
• So we can use “Empirical Rule”
EMPIRICAL RULE: For many LARGE populations empirical rule provides following
approximations, (In our case with mean
and standard deviation 2 )
x
x
Approximately 68% of the observations are in the interval:  x   x
Approximately 95% of the observations are in the interval:   2
x
x
****Almost all of the observations are in the interval:
 x  3 x
If we consider the third rule it says that:
will be in the interval of [   3 ,   3 ]
x
x
x
x
x
with almost 100% probability.
For Normal Distribution we can find EXACT boundaries of the confidence intervals !!
Confidence Intervals
• Example: Lets consider that we are
informed that the health insurance claims
have historical mean of $4000 and
standard deviation $2000. You take a
random sample of 100. What are the 95%
confidence interval for the sample
mean? Interpret the result.
•
•
•
= $4000 , =$2000
Here we will find 95 % confidence interval.
The (1-)% confidence interval is equal to in general:
•
Z
Here  / 2 is the Standard normal table values when the upper tail
probability is /2. In our case =1-0.95=0.05/2=0.025
Thus Z 0.025  1.96
•
•
P(-z ≤ Z ≤ z) = 0.95 here z=1.96 and
x  x
P(1.96 
 1.96)  0.95
x
Z
 x  Z  / 2 x
x  x
x
P(  x  1.96 x  x   x  1.96 x )  0.95
•
Thus with 95%probability (confidence) we can say that the sample mean
lies between
x
 x  Z / 2 x  [4000  1.96 * 2000,4000  1.96 * 2000]  [3608,4392]
Sampling Distributions of Sample Variance
• The variance of the population is
2
N
 xi   
2 
i 1
N
• The sample variance is:
N
s2 
 x
i 1
i
2
 x
n 1
• If “n” is small proportion of “N” i.e. (n/N) is small
i.e N is large
2
2
E
(
s
)


• Then :
CONFIDENCE INTERVALS
•
•
•
The (1-)% confidence interval for sample mean:
x :  x  Z / 2 x
P( x  Z / 2 x  X   x  Z / 2 x )  0.95
it means
X ~ N ( X   , X 

n
)
NOTE: we consider that “n” observations are
taken from NORMALLY distributed POPULATION
P(   Z / 2 x  X    Z / 2 x )  0.95
•
The (1-)% confidence interval for population
 : x  Z / 2 x
mean:
here  X 
•
•

if we know
n

P( x  Z / 2 x    x  Z / 2 x )  0.95
then P( x  Z / 2

n
   x  Z / 2

n
)  0.95
 If we know population standard deviation ( i.e. if we know population standard
deviation, ), then plug it into CONFIDENCE INTERVAL!!!! And USE standard
NORMAL table to find Z
 /2
If we do NOT know  then we can use SAMPLE VARIANCE, s2, as an
estimator of population variance.
N
•
it means
S
2

 x
i 1
i
2
 x
n 1
As we know s2 is an consistent estimator i.e. s2 
2
.
•
If we do NOT know  and use SAMPLE VARIANCE, s2, as an
estimator, then we do NOT use standard Normal distribution but
“student’s t” distribution with (n-1) degree of freedom to find t / 2
P( x  t / 2,n1
•
•
•
•
s
s
   x  t / 2,n1
)  0.95
n
n
NOTE: we consider that “n” observations are taken from NORMALLY
distributed POPULATION. We cannot use N(0,1) table since population
variance is NOT known
Some Properties of Student’s t distribution
It is symmetric around mean “0”
It approximates to Normal distribution as n increases (specifically if n>30)
Examples
• Example 8.3 from textbook: (if we know population variance)
Suppose that shopping times for customers at a local grocery store are normally
distributed. A random sample of 16 shoppers in the local grocery store had a mean
of 25 minutes. Assume  =6 minutes. Find the standard error of the sample mean,
margin of error, and width for a 95 % confidence interval for the population mean.
•
Standard Error = Standard Deviation


6
 1.5
16
•
Standard Error of sample mean =
•
Margin of Error =
•
Width of the 95% confidence interval = 2* Margin of Error = 2*(2.94) =5.88
•
95 % confidence interval is:
n
Z / 2 x  Z 0.05 / 2 (1.5)  Z 0.025 (1.5)  1.96(1.5)  2.94
 : x  Z / 2 x  25  2.94
P(25  2.94    25  2.94)  (22.06    27.94)  0.95
• Example 8.5 from textbook: (if we do NOT know population variance)
Gasoline prices rose drastically during the early years of this century. Suppose that a recent
study was conducted using truck drivers with equivalent years of experience to test run
24 trucks of a particular model over the same high way. Estimate the population mean
fuel consumption for this truck model with 90%confidence if the fuel consumption, in
miles per gallon, for these 24 trucks was:
15.5, 21, 18.5, 19.3, 19.7, …., 21.8
Here what we know about population? Nothing, we do not know population variance
So we n=24, we will use sample variance to estimate population variance.
Note: we should assume that population is Normal.
How we can test this assumption?
N
s 
2
2
 xi  x 
i 1
n 1
s  1.695
N
(15.5  18.68) 2  ...  (21.8  18.68)

 2.873
24  1
x
 x 
i
i 1
n

15.5  ...  21.8
 18.68
24
s
s
   x  t / 2,n 1
)  0.95
n
n
s
1.695
 : x  t / 2,n 1
 18.68  t0.05, 231
 18.68  (1.714)(0.346)
n
24
 : 18.68  0.5930  [18.09,19.27]
P( x  t / 2,n 1
Related documents