Download Chapter 8 Fundamental Sampling Distributions and Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 8
Fundamental Sampling Distributions and Data Descriptions
8.1 Random sampling
Population and Samples
Population: The totality of observations with which we are concerned, whether their
number be finite or infinite, constitutes what we call a population.
Def 8.1
A population consists of totality of the observations with which we are
concerned
Def 8.2
A sample is a subset of a population
Any sampling procedure that produces inferences consistently overestimate (高估)or
consistently underestimate(低估) some characteristic of the population is said to be
biased. To eliminate any possibility of bias in the sampling procedure, it is desirable
to choose a random sample in the sense that observations are made independently
and at random.
Def 8.3
Let X1, X2, …, Xn be n independent random variables, each having the same
probability distribution f(x). We then define X1, X2, …, Xn to be a random sample of
size n from the population f(x) and write its joint probability distribution as
f ( x1 , x 2 ,K , x n ) = f ( x1 ) f ( x 2 ) L f ( x n ) .
1
8.2 Some Important Statistics
To elicit information about the unknown population parameters.
Def 8.4
Any function of random variables constituting a random sample is called a
statistic.
Central Tendency in the Sample; The Sample Mean
Eq: The most commonly used statistics for measuring the center of a set of data, are
the Mean, Median, and mode.
Def 8.5
If X1, X2, …, Xn represent a random sample of size n, then the sample mean is
defined by the statistic
n
X =
∑
X
i =1
n
i
.
The Sample Variance
Def 8.6
If X1, X2, …, Xn represent a random sample of size n, then the sample
variance is defined by the statistic
n
S2 =
∑ (X
i =1
i
− X )2
n −1
2
Ex 8.1 :
A comparison of coffee prices at 4 randomly selected grocery stores in San Diego
showed increases from the previous month of 12, 15, 17 and 20 cents for a 1 pound
bag.
Find the variance of this random sample of price increases.
Sol:
x=
12 + 15 + 17 + 20 64
=
4
4
4
s2 =
∑ ( xi − x ) 2
4
∑ (x
i
− 16) 2
= i =1
n −1
3
2
2
(12 − 16) + (15 − 16) + (17 − 16) 2 + (20 − 16) 2 34
=
=
3
3
i =1
3
Theorem 8.1(重要)
If S2 is the variance of a random sample of size n

 n
n∑ X −  ∑ X i 
 i =1 
S 2 = i =1
n(n − 1)
n
2
2
i
Proof:
n
S
2
=
∑
i =1
n −1
n
X =
( X i − X )2
∑
X
i =1
n
=
=
i
− 2 XX i + X 2 )
n −1
n
=
∑
X
i =1
2
i
− 2X
n
∑
i =1
Xi + X
n −1
n
n
2
i =1
2
i
n
S
∑
(X
∑
i =1
X
2
i
− 2(
∑
X
i =1
n
n
i
n
)∑ X i + (
∑
X
i =1
n
i =1
n −1
i
2
n
)2
=
n∑ X
i =1
i
n
− (∑ X i ) 2
i =1
n ( n − 1)
Def 8.7 :
The sample standard deviation, denoted by S, is the positive square root of the sample
variance
4
2
Ex 8.2 :
Find the variance of the data 3, 4, 5, 6, 6, and 7, representing the number of trout
caught by a random sample of 6 fishermen on June 19, 1996, at Lake Muskoka.
Sol
 n

n∑ Xi −  ∑ Xi 
 i =1 
H int : S 2 = i =1
n(n − 1)
n
n
∑ Xi
2
2
2
= 3 2 + 4 2 + 5 2 + 6 2 + 6 2 + 7 2 = 171
i =1
n
∑ Xi = 3 + 4 + 5 + 6 + 6 + 7 = 31
i =1
n=6
2
 n

n∑ Xi −  ∑ Xi 
2
 i =1  = S 2 = 6 *171 − (31) = 13
S 2 = i =1
6(6 − 1)
6
n(n − 1)
n
2
Sample s tan dard deviation s = 13
Exercises 8.2 : 8.1, 8.3, 8.7, 8.13, 8.15
5
6
8.3 Data Displays and Graphical Methods
Box and Whisker Plot (Boxplot---盒鬚圖)
This plot encloses the interquartile range of the data in a box that the median
displayed within. The interquartile range has as its extremes the 75th percentile
(upper quartile) and 25th percentile(lower quartile)
Boxplot can provide the viewer information regarding which observations may be
outliers
If the distance from the box exceeds 1.5 times the interquartile range, the
observation may be labeled as outliers.
Quantile Plot
Def 8.8
A quantile of a sample, q(f ) is a value for which a specified fraction f of the data values is
less than or equal to q(f ).
6
8.4 Sampling Distributions
Def 8.10
The probability distribution of a statistics is called a sampling distribution
8.5 Sampling Distribution of Means
Theorem 8.2(Central Limit Theorem)
If X is the mean of a random sample of size n taken from a population with mean µ and
finite variance σ 2 , then the limiting form of the distribution
of Z =
X −µ
δ
, as n → ∞ , is the standard normal distribution n(z; 0,1).
n
Ex 8.6 :
An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed, with mean equal to 800 hours and a standard
deviation of 40 hours.
Find the probability that a random sample of 16 bulbs will have an average life of less
than 775 hours
Sol
母體分布: µ = 800 , σ = 40
樣本分布 : n = 16, σ x = 40
= 10
16
775 − 800
) = P ( Z > −2.5) = 0.0062 (查表 p.670 )
P ( X > 775 ) = P ( Z >
10
7
Inference on the Population Mean
One very important application of the central limit theorem is the determination of
reasonable values of the population mean µ
Ex 8.7 :
An important manufacturing process produces cylindrical component parts for the
automotive industry. It is important that the process produce parts having population
mean of 5mm. The Engineer involved conjectures that the population mean is 5.0mm
100 parts are produced. We know that population standard deviation σ =0.1
sample average x = 5.027 mm
Does this sample information appear to support or refuse the conjecture
Sol:
P[ ( X − 5) ≥ 5.027 − 5.0] = P[( X − 5) ≥ 0.027] + P[( X − 5) ≤ −0.027]



0.027 
= 2 P Z >
 = 2 P( Z > 2.7) = 0.007
0.1


100 

Chance an x that is 0.027 mm from the mean in only 7 in 1000 experiments.
As result, this experiment with x =5.027 doesn’t give supporting evidence to
conjuncture that µ = 5.0
Sampling Distribution of the difference Between Two Average
Theorem 8.3
If independent samples of size n1 and n2 are drawn at random from two
populations with means µ1 and µ2 and variance σ 1 and σ 2 ,then the sampling
2
2
distribution of the differences of means, X 1 − X 2 , is approximately normally
distribution with the mean and variance given by
µX
Hence Z =
1−X2
= µ1 − µ 2 and
( X 1 − X 2 ) − ( µ1 − µ 2 )
(σ
2
1
) (
n1 + σ 2 n 2
2
)
σ X −X
1
2
2
=
σ 12
n1
+
σ 22
n2
is approximately a standard normal variable.
8
Ex 8.8 :
Eighteen specimens are painted using type A and the drying time in hours is recorded
on each. The same done with type B. The population standard deviations are both to
be 1.0
Assuming that mean drying time is equal for two types of paint, Find
(
P X A − X B > 1.0
)
Where X A , X B are average drying times for sample size n A = n B = 18
Sol
Assuming that mean drying time is e qual for two types of paint
Q µ X A -X B = 0
σ X2 A− X B =
(
σ A2
nA
+
σ B2
nB
=
)
1 1 1
+
=
18 18 9
P ( X A − X B ) > 1.0 = P( Z >
1.0 − 0
1 9
) = P( Z > 3.0) = 1 − P( Z < 3.0) = 0.0013
9
Ex 8.9 :
The two television picture tubes of manufacturer A and B
Manufacture
A
B
Mean Lifetime
6.5
6.0
Standard deviation
0.9
0.8
Sample size
36
49
What is the probability that a random sample tube from A will have a mean lifetime
that is at least 1 year more than from B?
Sol:
µX
1−X 2
= 6.5 − 6.0 = 0.5 and σ X 1 − X 2 =
0.81 0.64
+
= 0.189
36
49
1.0 − 0.5
= 2.65
0.189
P( X 1 − X 2 ≥ 1.0) = P( Z > 2.65) = 1 − P( Z < 2.65) = 1 − 0.9960
= 0.0040
z=
10
8.6 Sample distribution of S2
If a random sample of size n is drawn from a normal distribution with mean µ and
variance σ2, and we obtain a value of static S2
Theorem 8.4
If S2 is the variance of a random sample of size n taken from a normal population
having the variance σ2, then static
χ =
2
(n − 1) S 2
σ2
n
=∑
i =1
( X i − X )2
σ2
has a chi-squared distribution with ν = n − 1 degrees of freedom.
11
Ex 8.10 :
A manufacturer of car batteries guarantees that his batteries will last, on the average, 3
years with a standard deviation of 1 year. If five of these batteries have lifetimes of
1.9, 2.4, 3.0, 3.5, and 4.2 years, is the manufacturer still convinced that his batteries
have a standard deviation of 1year? Assume that the battery lifetime follows a normal
distribution.
已知 s 2 =
then
n
n
i =1
i =1
n∑ X i2 − (∑ X i ) 2
χ 2= =
n(n − 1)
(n − 1) S 2
σ
2
=
=
5 * 48.26 − 152
= 0.815
5* 4
(5 − 1)(0.815)
= 3.26(介於95%之內 → 所以假設是合理的)
1
12
8.7 t-Distribution
In many experimental scenarios knowledge of σ is no more reasonable than
knowledge of the population mean µ . Often, in fact an estimate of σ must be supplied
by the same sample information that produced the sample average x
若不知σ,只知道μ
T=
X −µ
S
n
n > 30 → T is standard normal

 n < 30 → t - distribution
If the sample size is large enough (n ≥ 30 ), the distribution of T does not differ
considerable from the standard normal distribution.
Corollary
Let X1, X2, …, Xn be independent random variables that are all normal with mean
µ and standard deviation σ .Let
X
X =∑ x
i =1 n
n
( X i − X )2
and S = ∑
n −1
i =1
2
Then the random variable T =
n
X −µ
has a t-distribution with ν = n − 1 degree
s
n
of freedom.
13
What does the t-Distribution look like?
Ex 8.11 :
The t-value with v=14 degrees of freedom that leaves an area of0.025 to the left and
therefore an area of 0.975 to the right
Sol:
t 0.975 = −t 0.025 = −2.145
Ex 8.12 :
Find P (−t 0.025 < T < t 0.05 )
Sol:
P (−t 0.025 < T < t 0.05 ) = 1 − 0.05 − 0.025 = 0.925
14
Ex 8.13
Find k such that P(k<T<-1.761) = 0.045, for random sample of size 15 selected from a
normal distribution and T =
X −µ
s
n
Sol
From Table A.4 : v = 14 , t 0.05
− t 0.05 =-1.761
let k = − tα
0.045 = 0.05 − α
k= − t 0.05 =-2.977
1.761
α = 0.005
and
P( -2.977< T <-1.761)=0.045
15
Related documents