Download 1 STAT 370: Probability and Statistics for y Engineers [Section 002]

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
North Carolina State University
STAT 370: Probabilityy and Statistics for
Engineers
[Section 002]
Announcements
• HW 11 (sampling dist and C.I., 10pt) due Apr 27 @
11:59PM
• HW 12 (hypothesis testing, 10pt) due Apr 27 @
11:59PM
• Final (100 pt): Mon, May 7 @ 8AM-11AM,
comprehensive
Instructor: Hua Zhou
Harrelson Hall 210
11:45AM-1:00PM, Apr 18, 2012
Plan
• Last time:
Sampling distribution, CLT
• Today:
Continue with sampling distribution, CLT, confidence
interval
Sampling distribution for the sample mean
• The mean of the sampling distribution of X is the
mean of the population  , regardless of the size of the
sample, or the shape of the population
• This result tells us that X is unbiased for  , i.e. it
doesn’t systematically overestimate or
underestimate, but gives the right answer on
average.
2
• The variance of the sampling distribution of X is  / n
• Means are less variable than individual
observations;
• Means of larger samples are more precise than
means of small samples.
1
Sampling distribution for the sample mean
• If the population is normally distributed then the
sampling distribution of X is also normally distributed
– The sampling distribution is a tighter bell curve than the
population’s bell curve.
• If the population is not normally distributed, what is the
sampling distribution of X ?
Central Limit Theorem (CLT)
• Let X1, X2 ,..., Xn be a sample from X, which has mean 
and variance  2. If the sample size (n) is large then the
sampling distribution of X is at least approximately
normal with mean  and variance  2 / n , regardless of
the population distribution.
Equivalently:
X  
has standard normal distribution.
n
 /
Thi is
This
i called
ll d the
th Central
C t l Limit
Li it Theorem
Th
(CLT).
(CLT)
Note: n  30 is considered large enough for CLT to apply.
In class exercise
The serum HDL cholesterol level of females 20-29 years
olds is normally distributed with a mean of 53 and standard
deviation of 13.4:
(a) What is the probability that a randomly selected
female 20-29 years of age will have a serum
cholesterol above 60?
(b) What is the probability that a randomly sample of
16 females 20-29 years of old will have average
serum cholesterol above 60?
(c) What is the probability that a randomly sample of
100 females 20-29 years of old will have average
serum cholesterol above 60?
Solution
• (a) P(X>60) = P(Z<-7/13.4) = 0.3015
• (b) 0.0183
• (c) 8.76E-8
2
In class exercise
The serum HDL cholesterol level of females 20-29 years
olds has a mean of 53 and standard deviation of 13.4:
(a) What is the probability that a randomly selected
female 20-29 years of age will have a serum
cholesterol above 60?
(b) What is the probability that a randomly sample of
16 females 20-29 years of old will have average
serum cholesterol above 60?
((c)) What is the p
probability
y that a randomly
y sample
p of
100 females 20-29 years of old will have average
serum cholesterol above 60?
Roulette
Players bet $1 that the ball will land in a red slot and
win $1 if it does. Let Xi be the net winnings on the i-th
day.
(a) What is distribution
distribution, mean and variance of Xi ?
(b) Suppose you play once per day for 365 days, what
does CLT say about your average winning?
(c) What is the probability that your average winning is
positive?
Solution
• (a) cannot compute since the population
distribution is unknown
• (b) cannot compute because the sample size is
too small for CLT to apply
• (c) approximately 8.76E-8
Solution
• (a) E(Xi) = -0.0526, Var(Xi) = 0.9972
• (b) average payoff after 365 days is approximately
normal with mean -0.0526
-0 0526 and standard deviation
0.0523
• (c) P( X  0) = 0.1562
3
Take home exercise
Take home exercise
• The scores of high school students seniors on the ACT
college entrance examination in 2003 had mean
mu=20.8
mu
20.8 and standard deviation sigma = 4.8. The
distribution of the scores is normal.
a) What is the approximate probability that a single
student randomly chosen from all those taking the test
scores 23 or higher?
b) Take a SRS of 25 students who took the test. What
are the mean and standard deviation of the sample
mean score of these 25 students?
c) What is the approximate probability that the mean
score of these students is 23 or higher ?
• A $1 bet in a state lottery game pays $500 if the 3 digit
number you choose exactly matches the winning
number, which is drawn at random. Here’s the
distribution of the payoff X :
Payoff X
Probability
$0
0.999
$500
0.001
a) What are the mean and standard deviation of X
b) Joe buys a lottery ticket every day for 60 days
days. What
does the CLT say about the distribution of Joe’s average
payoff after 60 days?
Take home exercise
Take home exercise
• An automatic grinding machine in an auto parts plant
prepares axles with a target diameter µ=40.125mm. The
machine has some variability so the standard deviation
of the diameters is σ=0.002 mm.
A sample of 40 axles is inspected and the sample mean
diameter is recorded.
• The scores of high school seniors on ACT college
entrance examination in 2003 had a mean µ=20.8 and
standard deviation σ =4.8.
=4 8
Find the probability that the sample mean diameter
differs from the target value by 0
0.004
004 or more?
a) What is the probability that a single student
randomly chosen from all those taking the test scores
23 or higher?
p of 35 students who took the
b)) Now take a sample
test. What are the mean and standard deviation of the
sample mean score of these 35 students?
c) What is the probability that the sample mean score
of these students is 23 or higher?
4
Sampling Distributions (cont’d)
Population distribution has mean µ and standard deviation
σ
• The sample mean X has mean µ and standard deviation
T-distribution
• Challenge: What happens to the sampling distribution of
the mean of a sample, if we don’t know the true σ
(population standard deviation ?)
/ n
• If the population distribution is normal with mean µ and
standard deviation σ, then X is also normal.
• If the population distribution is not normal, then X is
approximately normal when n>=30
• If the population distribution is normal with mean µ and
with unknown standard deviation, then X ???
• If the population distribution is not normal and with
unknown standard deviation, then X ???
• The sampling distribution will be different.
T – distribution
• Suppose that a simple random sample (SRS) of size n is
drawn from a N(µ, σ2 ). Then the sampling distribution of
the statistic
X  
t 
S /
T – distribution
• Density
n
is T-distribution with (n-1) degrees of freedom
(Tn-1), where S is the sample standard deviation.
Remarks:
1. Note that the true standard deviation was estimated
by the sample standard deviation.
2. If the underlying population is not normal, then the
distribution of t is approximately Tn-1, for large n (n≥30).
5
T – distribution
Summary on Sampling Distributions
• The T-distribution was discovered by William S. Gosset in
1908. has the following characteristics
• Symmetric / Bell-shaped
• Mean = 0
• Width of the distribution or the flatness of the distribution is
determined by degrees of freedom (df)
• Flatter than the standard normal distribution
• As the degrees of freedom (df) increase, the T-distribution
looks more like a Normal
• Cutoff points are larger than those for the normal
distribution
• For example, for the 97.5th percentile:
t 20  2.086
t30  2.042
t 50  2.009
t   1.96
Population distribution has mean µ and standard deviation
σ
• The sample mean X has mean µ and standard deviation
/ n
• If the population distribution is normal with mean µ and
standard deviation σ, then X is also normal.
• If the population distribution is normal with mean µ and
unknown standard deviation, then t  X   is a tS / n
di t ib ti with
distribution
ith df n-1.
1
• If the population distribution is not normal with known
standard deviation, then X is approximately normal/t
when n>=30 (Central Limit Theorem).
Remarks
• The sampling distribution of sample mean is especially
applicable to problems on confidence intervals and
hypothesis testing.
6