Download chapter 1 (part 3)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
BASIC
STATISTICS
SAMPLING DISTRIBUTIONS
 Introduction
 Point Estimation
 Sampling Distribution of
x
 Sampling Distribution for the Difference between
Two Means
 Sampling Distribution of p̂
 Sampling Distribution for the Difference between
Two Proportions
Introduction

A sampling distribution is a distribution of all of the possible
values of a sample statistic for a given size sample selected
from a population.

A sample is a portion of population.

For example, suppose you sample 50 students from your
college regarding their mean GPA. If you obtained many
different samples of 50, you will compute a different mean for
each sample. We are interested in the distribution of all
potential mean GPA we might calculate for any given sample
of 50 students.
4
The reason we select a sample is to collect data to
answer a research question about a population.
The sample results provide only estimates of the
values of the population characteristics.
The reason is simply that the sample contains only a
portion of the population.
With proper sampling methods, the sample results
can provide “good” estimates of the population
characteristics.
Point Estimation
Point estimation is a form of statistical inference.
In point estimation we use the data from the sample
to compute a value of a sample statistic that serves
as an estimate of a population parameter.
We refer to x as the point estimator of the population
mean .
s is the point estimator of the population standard
deviation .
p̂ is the point estimator of the population proportion p.
7
Relationships between the population
distribution and the sampling distribution of the
sample mean:
mean of the sample means
= population mean
Population
distribution
Sampling
distribution
of the
sample
mean
The dispersion narrower
than the population
distribution.
tends to become a bellshaped and to approximate
normal.
Sampling Distribution of the Sample Mean, X

The probability distribution of sample mean, X is
called its sampling distribution.

It list the various values that X can assume and the
probability of each value of X .
Mean and Standard Deviation of the Sample
Mean, X
Mean of sample
mean
= x
x  
μ = mean of
population
Standard deviation
(standard error) of
sample mean
= x

x 
n
σ = standard deviation
for population
n = sample size
The Central Limit Theorem


If all samples of a particular size are selected from any
population, the sampling distribution of the sample
mean is approximately a normal distribution.
The standard normal distribution;
Z
( X  )

n

This approximation improves with larger samples.
Sample Mean Sampling Distribution: If the Population is not
Normal

We can apply the Central Limit Theorem:
◦ Even if the population is not normal, sample means
from the population will be approximately normal as
long as the sample size is large enough.
n↑
As the sample
size gets large
enough…
the sampling
distribution becomes
almost normal
regardless of shape
of population
Properties and Shape of the Sampling Distribution of the
Sample Mean, X
Sampling distribution
of sample mean
Properties and shape of sampling
distribution
 2 
X ~ N  , 
n 

- Sample size, n ≥ 30 (normally
distributed)
- If σ2 unknown, then it is estimated by
s2
  
X ~ N  , 
n 

- Sample size, n < 30 (sample is from
normal population)
- σ2 is known/ σ is known
2
T
X 
2
s
n
~ tn 1
- Sample size, n < 30
- σ2 is unknown / σ is unknown
- t-distribution with n-1 degree of
freedom
Example 1
Suppose a population has mean μ = 8 and standard deviation σ = 3.
Suppose a random sample of size n = 36 is selected. What is the
probability that the sample mean is between 7.8 and 8.2?
Solution:
 Even if the population is not normally distributed, the central
limit theorem can be used (n > 30)
 so the sampling distribution of
x is approximately normal
σ
3
 with mean μ x = 8 and standard deviation σ x 

 0.5
n
36


P(7.8  X  8.2)  P 7.8 - 8  X - μ  8.2 - 8 
 3

σ
3


36
n
36


 P(-0.4  Z  0.4)  0.3108
Example 2
The amount of time required to change the oil and filter of any
vehicles is normally distributed with a mean of 45 minutes and a
standard deviation of 10 minutes. A random sample of 16 cars is
selected.
1. What is the standard error of the sample mean to be?
2. What is the probability of the sample mean between 45 and 52
minutes?
3. What is the probability of the sample mean between 39 and 48
minutes?
4. Find the two values between the middle 95% of all sample
means.
Solution:
• X: the amount of time required to change the oil and filter of
any vehicles X ~ N 45,102
n  16

•

X
2
: the mean amount of time
required
to change the oil and
 10

filter of any vehicles X ~ N  45, 16 


a) Standard error = standard deviation,
52  45 
 45  45
b) P  45  X  52   P 
Z

2.5 
 2.5
 P  0  Z  2.8 
 0.4974
10

 2.5
16
48  45 
 39  45
c) P  39  X  48   P 
Z

2.5
2.5 

 P  2.4  Z  1.2 
 0.4918  0.3849
 0.8767
P  a  X  b   0.95
d)
b  45 
 a  45
P
Z 
  0.95
2.5 
 2.5
P  za  Z  zb   0.95
from table:
za  1.96
zb  1.96
a  45
 1.96  a  40.1
2.5
b  45
 1.96  b  49.9
2.5
Sampling Distribution for the Difference between
Two Means
- the sampling distribution of the difference between two
sample means, the distribution of X 1  X 2 can be written as;
Mean of
X1  X 2

12  22 
X1  X 2 ~ N  1  2 ,


n1
n2 

- Hence,
Z
( X 1  X 2 )  ( 1  2 )
 12
n1

 22
n2
Variance of
X1  X 2
Example 3
A taxi company purchased two brands of tires, brand A and brand B.
It is known that the mean distance travelled before the tires wear
out is 36300 km for brand A with standard deviation of 200 km
while the mean distance travelled before the tires wear out is
36100 km for brand B with standard deviation of 300 km. A
random sample of 36 tires of brand A and 49 tires of brand B are
taken. What is the probability that the
a) difference between the mean distance travelled before the
tires of brand A and brand B wear out is at most 300 km?
b) mean distance travelled by tires with brand A is larger than the
mean distance travelled by tires with brand B before the tires wear
out?
Solution:
X 1 : the mean distance travelled before the tires of brand A wear out
X 2 : the mean distance travelled before the tires of brand B wear out

2002 3002 
X 1  X 2 ~ N  36300  36100,


36
49 

X 1  X 2 ~ N  200, 2947.846 
a) P | X 1  X 2 | 300   P  300  X 1  X 2  300 
300  200 
 300  200
 P
Z

2947.846 
 2947.846
 P  9.21  Z  1.84   0.9671
b) P  X 1  X 2   P  X 1  X 2  0 
0  200 

 PZ 

2947.846 

 P  Z  3.68  0.9999
Sampling Distribution of the Sample Proportion

p̂
The population and sample proportion are denoted by p and
, respectively, are calculated as,
X
p
N
and
x
pˆ 
n
where
 N = total number of elements in the population;
 X = number of elements in the population that possess a
specific characteristic;
 n = total number of elements in the sample; and
 x = number of elements in the sample that possess a specific
characteristic.

For the large values of n (n ≥ 30), the sampling
distribution is very closely normally distributed.
pq
pˆ ~ N ( p, )
n

Mean and Standard Deviation of Sample
Proportion
Mean of sample
proportion

= p̂
 P̂  p
Standard deviation of
sample proportion
=  p̂
 P̂ 
pq
n
Example 4
If the true proportion of voters who support Proposition A is p  0.40
what is the probability that a sample of size 200 yields a sample
proportion between 0.40 and 0.45?
Solution:
σ pˆ 
p(1  p)
0.4(1  0.4)

 0.03464
n
200
0.45  0.40 
 0.40  0.40
P(0.40  pˆ  0.45)  P 
Z

0.03464 
 0.03464
 P(0  Z  1.44)  0.4251
Example 5
The National Survey of Engagement shows about 87% of freshmen
and seniors rate their college experience as “good” or “excellent”.
p̂
Assume this result is true for the current population of freshmen and
seniors. Let
be the proportion of freshmen and seniors in a
random sample of 900 who hold this view. Find the mean and
standard deviation.
Solution:
Let p the proportion of all freshmen and seniors who rate their
college experience as “good” or “excellent”. Then,
p = 0.87 and q = 1 – p = 1 – 0.87 = 0.13
 pˆ  p  0.87
p̂
The mean of the sample distribution of
is:
p̂
pq
0.87(0.13)
 pˆ 

 0.011
n
900
The standard deviation of
is:
Sampling Distribution for the Difference between
Two Proportions
- the sampling distribution of the difference between two
sample proportions, the distribution of Pˆ1  Pˆ2 can be written
as;
Mean of
Pˆ1  Pˆ2
p1 1  p1  p2 1  p2  

ˆ
ˆ
P1  P2 ~ N  p1  p2 ,


n
n
1
2


- Hence,
( Pˆ1  Pˆ2 )  ( p1  p2 )
Z
p1q1 p2 q2

n1
n2
Variance of
Pˆ1  Pˆ2
Example 6
A certain change in a process for manufacture of component parts
was considered. It was found that 75 out of 1500 items from the
existing procedure were found to be defective and 80 of 2000 items
from the new procedure were found to be defective. If one random
sample of size 49 items were taken from the existing procedure and
a random sample of 64 items were taken from the new procedure,
what is the probability that
a) the proportion of the defective items from the new procedure
exceeds the proportion of the defective items from the existing
procedure?
b) proportions differ by at most 0.015?
c) the proportion of the defective items from the new procedure
exceeds proportion of the defective items from the existing
procedure by at least 0.02?
Solution:
PˆN :The proportion of defective items from the new procedure
PˆE :The proportion of defective items from the existing procedure
80
 0.04
2000
0.04(0.96) 

PˆN ~ N  0.04,

64


pN 
75
 0.05
1500
0.05(0.95) 

PˆE ~ N  0.05,

49


pE 
0.05(0.95) 0.04(0.96) 

ˆ
ˆ
PN  PE ~ N  0.04  0.05,


49
64


Pˆ  Pˆ ~ N  0.01, 0.0016 
N

E
 
a) P PˆN  PˆE  P PˆN  PˆE  0

0   0.01 

 PZ 

0.0016 

 P  Z  0.25 
 0.4013

 
b) P | PˆN  PˆE | 0.015  P 0.015  PˆN  PˆE  0.015

0.015   0.01 
 0.015   0.01
 P
Z

0.0016
0.0016


 P  0.125  Z  0.625 
 0.2838

 
c) P PˆN  PˆE  0.02  P PˆN  PˆE  0.02

0.02   0.01 

 PZ 

0.0016 

 P  Z  0.75 
 0.2266
Exercises:
1.
Assume that the weights of all packages of a certain
brand of cookies are normally distributed with a mean
of 32 ounces and a standard deviation of 0.3 ounce.
Find the probability that the sample mean weight of a
random sample of 20 packages of this brand of cookies
will be between 31.8 and 31.9 ounces.
Answer: 0.0667
BQT 173
Institut Matematik Kejuruteraan,
UniMAP
29
2. According to the BBMG Conscious Consumer Report,
51% of the adults surveyed said that they are willing to pay
more for products with social and environmental benefits
despite the current tough economic times (USA TODAY,
June 8, 2009). Suppose this result is true for the current
population of adult Americans. There is a sample 1050
adult Americans who will hold the said opinion. Find the
probability that the value of sample proportion is between
0.53 and 0.55.
Answer: 0.0921
3. It is known that 30% and 35% of the residents in Taman
Sutera and Bandar Mas subscribe to New Straits Times
newspaper respectively. If a random sample of 50
newspaper readers from Taman Sutera and 50 readers
from Taman Mas were taken randomly, what is the
probability that the sample proportion of New Straits
Times subscribes in Taman Sutera is larger than Bandar
Mas?
Answer: 0.2981
End of Chapter 1