Download Ch6and7english

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Sampling
Random sampling
Each possible sample has an equal chance of being selected.
sampling
sample
population
2 main types of random sampling
random
sampling
sampling with
replacement
sampling without
replacement
(infinite population)
(finite population)
In addition, sampling method can be also classified according to the application as follows.
1. Simple random sampling (is a probability sampling)
- A simple random sample (SRS) from a finite population gives each possible sample set an equal
probability of being selected.
- A SRS from an infinite population requires that all sample observations be statistically independent.
2. Stratified sampling
- A stratified sample is obtained by forming strata in the population and from each stratum, selecting a
simple random sample.
1
group 1
sa
ing
pl
m
group 2
stratification
group 3
population
sample
Stratified sampling
3. Cluster sampling
Is obtained by selecting a set of clusters from a population on the basis of simple random sampling.
The sample is formed by taking a census of each cluster.
4. Systematic sampling
A systematic sample is formed by selecting one unit at random and then selecting additional units at
evenly-spaced interval (unit interval or time interval) until the sample has been formed.
5. Judgment sampling
A Judgment sample is obtained by having an expert who is familiar with the population characteristics
select units from the population.
6. Convenience sampling
A convenience sample is obtained by-selecting “convenient” population units.
PARAMETERS
 Characteristics of the population
 Computed from all the individuals of the
population
 Represented by Greek letter (such as )
STATISTICS
 Characteristics of the samples
 Computed from the samples drawn from the
population
 Represented by English letter (such as x )
2
Central Limit Theorem (CLT)
Let
X1, X2, …Xn be a random sample from a distribution with mean  and variance 2, then if n
is sufficiently large, x has approximately a normal distribution with and  x   and  x2 
2
n
 The CLT tells us that the sampling distribution of x will become increasingly closer to a normal
distribution as the sample size increases, and that when the sample size becomes infinite, the
sampling distribution of x is the normal distribution.
 CLT can be used with any form of population distribution.
Statistical Inference
Goal – to make inferences about a population based on a subset of it
Idea – A sample is drawn from the population.
– A sample statistic is used to draw inferences about the population parameter, θ
Statistical Inference
Estimation
Point estimation
Hypothesis Testing
Interval estimation
Estimation – a sample statistic is used to estimate the value of θ
Hypothesis Testing
– hypothesize a value of θ
– Use the sample information to make decision
population
Parameter
sampling process
Point estimator

sample
n
(x1,x2,..xn)
ˆ  g ( x1 , x 2 ,...x n )
estimator
population
Parameter
sampling process
Interval estimator
sample
n
(x1,x2,..xn)
(ˆL ,ˆU )

estimator
ˆL  g1 ( x1 , x 2 ,..., x n )
ˆU  g 2 ( x1 , x 2 ,..., x n )
3
Chapter 6
Point Estimation
θ = parameter

 = estimator of θ
Unbiased minimum variance criteria (UMV)
Used for selecting the best estimator
Based on 2 factors
- unbiasedness
- minimum variance (Among all the unbiased estimators of θ,
choose the one with minimum variance.)
1. Unbiasedness


An estimator  is unbiased if E(  ) = θ for every possible value of θ

(ie. the average of the values of  computed in each of all possible samples of size n is θ)
ˆ1 = unbiased estimator
ˆ1
E (ˆ1 )  
ˆ2 = biased
ˆ2
E (ˆ2 )
2. Unbiased minimum variance estimator


An estimator  of a population parameter θ is called an unbiased minimum variance (UMV)


estimator if the expected value of  is θ, and among all unbiased estimators of θ ,  has the least amount
of variability.
4
ˆ1
ˆ2
ˆ2
ˆ1
= UMV estimator

5
Ch 7 Interval Estimation
Interval estimator of population mean
A 100 (1 – α)% confidence interval for the mean μ of a normal population when the value of σ is known is :
x   .
2

n
The sample size n necessary to ensure an interval length L is obtained from : L = 2   .

n
2

n  (2  . ) 2
L
2
Derivation
1. Let x1 ,…, xn be a random sample from normal population having a mean μ and standard deviation σ

2. To find confidence interval (CI) for any parameter θ, we have to look for its estimator (  ) that :
(a) has approximately a normal distribution,
(b) is unbiased, and
© has a known value of standard deviation (  )


In this case, x is an estimator of the population mean ( μ ) that can be used to find the confidence interval
because it satisfies all the conditions : (a) x has a normal distribution ( X i is from a normal population)
(b) x is unbiased estimator,
(c)  x 

n
3. Transform x to its standardized valve ( Z) because the Z – value can be used conveniently to find the
confidence interval due to its standard normal distribution
x
Z =
x
=
x
/ n
4. To find the confidence interval, the value of Z must be between 2 values
a< Z <b
a<
x
/ n
<b
------------------------ (1) Confidence Interval
5. For 100(1 –  )% confidence interval, the area between a and b is 1 - 
P (a <
x
/ n
<b) = 1-
6
Area =

Area =
1
2
a

2
b
0

2
From Z notation  a =    , b =  
Thus, the shaded area on each side is
2
2
1
 z
z
0
2
2
Thus, (1) becomes
  
2
x
/ n
 
2
Solving for μ , we obtain the confidence Interval for population mean μ
x   .
2

n
   x   .
2

n
Zα Notation
Zα denotes the value on the measurement axis for which the area to the right of Zα is equal to α
Area P( Z  z )  
0
z
Similarly,   denotes that the area to the right is equal to
2

2
Interpreting a confidence interval
95% confidence interval of μ = In the long run 95% of the computed CI will contain μ
7

True value of
(1)
(2)
(3)
(4)
(5)
(6)
..
.
(
)
(
experiment 1
experiment 2
) experiment 3
)
(
(
.
.
.
)
(
)
(
)
(
95% contain
)
(
)

5% do not
(
)
Long sequence of replications of an experiment.
Example If the 95% CI of μ = (79.3 , 80.7)
RIGHT - (79.3 , 80.7) may be one of 95 confidence intervals that contain μ
WRONG – μ is between (79.3 , 80.7) with probability 0.95
Example 1
Extensive monitoring of a computer time-sharing system has suggested that
response time to a particular editing command is normally distributed with standard deviation
25 milliseconds. A new operating system has been installed, and it is desired to estimate the
true average response time  for the new environment. Assuming that response times are still
normally distributed with  = 25, what sample size is necessary to ensure that the resulting 95
% confidence interval has a length of (at most) 10?
Sol.
95 = 100(1 – α)
α = 0.05


n =  2   . 
L
2

2
25
=  2  0.025 . 
10 

2
2
25
=  2 (1.96).  = 96.04

10 
 A sample size of 97 is required
Note - The smaller the desired length L, the larger n must be
8
Ex . 2 Suppose that when a signal having value μ is transmitted from location A, the value received at
location B is normally distributed with mean μ and variance 4. To reduce error, suppose the same value is
sent 9 times. If the successive values received are 5 , 8.5 , 12 , 15 , 7 , 9 , 7.5 , 6.5 , 10.5, construct a 95% CI
for μ
Sol.
x 
81
= 9 , σ=2, n=9
9
100 (1 – α) = 95  α = 0.05
x   .
2

n
 9   0.025 .
2
9
2
= 9 ± 1.96
= (7.69, 10.31)
9
Example 3 A machine Is producing ball bearings with diameters of 0.5 inches. Based on lengthy experience
with the machine, it is known that the standard deviation of the bearings is 0.005 inches. A sample of 25 ball
bearings is selected, and their average diameter is formed to be 0.498 inches. Determine a 99% confidence
interval for the population average of ball bearing diameters.
Sol.   0.005 , n = 25, x = 0.498
For 99% confidence interval, 100 (1 - ) = 99,  = 0.01
x   .
2

n
 0.498   0.005 .
0.005
25
0.005
0.498  2.575
25
=
= (0.4954, 0.5006)
Interval estimator for population mean μ
- for large sample (n  30)
- any population distribution
- σ is not known
When n is large, a 100(1 – α)% confidence interval for the mean μ of any population distribution is :
x   .
2
s
n
Proof
Let X1, X2 ,… ,Xn be a random sample from any population having a mean μ and standard
deviation σ.
If n is large, the Central Limit Theorem implies that x has approximately a normal distribution.
9
Thus
x
Z =
/ n
So that P ( z  
2
has approximately a standard normal distribution.
x
 z )  1 - 
/ n
2
Therefore the confidence interval is x    .
2

n
When n is large, S will be close to . Therefore we can use S to estimate  if  is unknown.
Confidence interval is x    .
2
s
n
Example 5 A sample of 56 research cotton samples resulted in a sample average
percentage elongation of 8.17 and a sample standard deviation of 1.42 Find a 95% largesample confidence interval for the true average percentage elongation .
s
Solution x    .
n
2
95 = 100 (1 - ),  = 0.05
x   0.025 .
s
= 8.17  1.96.
n
(1.42)
56
= (7.80, 8.54)
Interval estimator of population mean μ
- for small samples
- normal distribution
- σ is not known
100(1 – α)% confidence interval for the mean μ is :
x  t
2
, n 1
.
s
n
A small-sample interval for 
If the population random variable x is normally distributed, then we know that x will also be normally
distributed.
Thus ƶ =
x
/ n
is a standard normal variable
When n is small, s is no longer likely to be close to σ
If the standard deviation σ must be estimated by s, then the standardized variable :
10
t=
x
s/ n
will no longer be standard normal variable
* The sampling distribution of t is called the “student” t distribution with n – 1 degrees of freedom
Properties of t distributions
Let t denote the density function curve for  degrees of freedom
1. Each t curve is bell – shaped and centered at 0.
2. Each t curve is more spread out than the standard normal (ƶ) curve.
3. As  increases, the spread of t curve decreases
4. As  → ∞, the sequence of t curves approaches the standard normal curve
Z curve
t
curve
0
tα, Notation
Let tα, = the number on the measurement axis for which the area under the t curve with  degrees of freedom
to the right of tα, is α ; tα, is called a t critical value.
t
curve
area =
0

t  ,
Ex. 6 Four determinations of the percentage of methanol in a certain solution yielded x = 8.34% , s = 0.03%
Assuming (approximate) normality of the population of determinations, find a 95% confidence interval for μ
Sol. x  8.34 , S = 0.03, unknown   use t
95 = 100 (1 - ),  = 0.05
x  t
2
, n 1
.
s
n
= 8.34  t 0.025, 41 .
= 8.34  3.182
0.03
0.03
4
4
= (8.292, 8.388)
11
Ex 8 The use of a small amount of carbon in producing certain steels is beneficial. Too much carbon,
however, could be detrimental. Consequently, an upper confidence limit on the mean carbon content of a
carbon steel is needed. Let X denote the number of pounds of carbon in a ton of carbon steel (lb/ton). If a
sample of size n = 15 is obtained, and it was found that the mean of X is 20 lb/ton and standard deviation is
0.60 lb/ton . Determine a 95% upper confidence limit for the mean of X.
x  t
Sol.
2
, n 1
.
s
= 20  t 0.025, 151 .
n
0.60
15
= 20  2.145
0.60
15
= ……. lb/ton
We are 95%confident that the average amount of carbon does not exceed 20.30 lb/ton.
Confidence Interval for population proportion
A large sample 100(1 – α)% confidence interval for a population proportion p is
 

pq
n
p  
2

x
Where p  , n = sample size
n
x = the observed number of successes




q  1  p , large sample  n p  5 and n q  5
Proof
Let
p = the proportion of successes in the population (population proportion)
n = # of samples
x = # of successes in the sample
For small sample, x has a binomial distribution with E(x) = np , and σxx = np(1  p) (From Ch. 3)
For large sample, x can be approximated by normal distribution

The estimator of p = p 
x 1
  x
(approximately normally distributed)
n n
constant

 p also has a normal distribution

p is an estimator that can be used to find confidence interval because

(1) p has a normal distribution

(2) p is an unbiased estimator because  = p

p
 x
n
 = E 

p
1
1
E  x   np   p
n
n
12
 1 
 x
(3)  2 = V   = V  
p
n
 n 

x

p (1  p )
1
1
=   V (x) =   np (1 – p) =
n
n
n
p (1  p)
=
n
2
 

p
2

Transform p to its standardized value (ƶ)

ƶ=
pp


p

For 100(1 – α)% confidence interval ,    
2
pp

 

p
2
Solving for p , we obtain


p    .   p  p    .  
2

p   .
2
p
2
p

pˆ (1  pˆ )
pˆ (1  pˆ )
 p  p   .
n
n
2
confidence interval for population proportion p
Example 9 The 1983 Tylenol poisoning episode and other similar incidents have focused
attention on the desirability of packaging various commodities in a tamper-resistant manner.
In a survey of consumer attitudes toward such packaging, of the 270 consumers surveyed,
189 indicated that they would be willing to pay extra for tamper-resistant packaging. Let p
denote the proportion of all consumers who would pay extra for such packaging. Find a 95 %
confidence interval for p.

x
p  = 189/ 270 = 0.700
Solution
n
0.7 (0.3)
pˆ (1  pˆ )
 =
=
= 0.0279
p
n
270
95 % confidence interval, use z 0.5
2

p   0.025 pˆ = 0.700  (1.96)(0.0279) = (0.645, 0.755)
13