Download Chapter 6: Some Continuous Probability Distributions

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
8.1
Chapter 8: Fundamental Sampling Distributions and
Data Descriptions
Take Sample
Sample
Inference
Population
We have spent the last three chapters discussing quantities
in the population (PDFs, E(X), Var(X),…). Now, we are
going to continue talking about population quantities, but
also how to take a sample and the summarizing the sample
itself. Understanding this chapter will be key to
understanding how we make the inferences from the sample
to the population! From this chapter, it is important to learn
the following:
 Understand all components of the diagram above.
While we will not necessarily be doing the inference
part in this chapter, you will have a basic idea how this
will be done.
 Sample mean and variance
 Sampling distributions – what they are, how they can
be used, the central limit theorem
 Chi-squared PDF and t-PDF
 2005 Christopher R. Bilder
8.2
8.1-8.2: Random Sampling and Some Important
Statistics
Below are part of the notes from Chapter 1. Parts have
been added to update the notes relative to what we have
discussed so far this semester.
Most of what we will be doing in this course centers around
trying to understand a set of information (data).
This set of information is from ALL objects in a population.
Often, this set of information is so big that obtaining all of
the information about these objects is extremely difficult.
Instead, this information may be hard to obtain making it
impossible to obtain all of it.
To understand the population, we usually will take a
representative subset of information from the population.
The subset is a smaller set of information that is usually
more manageable. The objects from the population
used in this subset form the sample. Typically, we want
the objects in the sample to be very similar
(representative) to objects in the population.
 2005 Christopher R. Bilder
8.3
Example: Suppose we are interested in estimating the
average GPA of all students at UNL. How would we do this?
Assume we do not have access to any student records.
1. Define the population of interest
The population is all UNL students
Problems obtaining information on all of the students:
 >20,000 students
 Students drop out and enroll late
2. Define a characteristic or random variable of interest
Let X denote GPA
3. Define the parameter of interest
Parameter: Numerical summary measure used to
describe a population characteristic.
The parameter is the population average here. The
average is often called the “mean” and denoted by the
greek letter “mu”, .
There could be other parameters as well that we are
interested in. For example, the variance or standard
deviation.
 2005 Christopher R. Bilder
8.4
3.6
2.4
2.7
2.8
3.9
Population
3.4
1.2
2.9
3.2
4.0

To find the average GPA, we could add up ALL of the
GPAs of students in the population and divide by the
total number of students.
Why would this be difficult to do?
4. The role of PDFs
By making the assumption that the student GPAs follow
a particular PDF, we can often simplify the problem to
some respect. For example, suppose we could
assume the GPA PDF is as plotted below:
 2005 Christopher R. Bilder
8.5
GPA population probability distribution example
0.7
0.6
0.5
f(x)
0.4
0.3
0.2
0.1
0
0
1
2
3
4
x (GPA)
GPA Probability Distribution
Note that the above PDF is for a random variable X =
4Y where Y has a Beta PDF with =5 and =2 (see
#12 on p. 175). The PDF of Y is
 (  ) 1
y (1  y)1 0  y  1

f(y)   ()()

0
otherwise

By making the assumption about the PDF, we can
somewhat simplify the problem of examining the GPAs.
For example, we can find the E(X) =  and Var(X) = 2
using properties discussed in Chapter 4. Note that for
this PDF, E(X) = E(4Y) = 4/(+) = 45/(5+2) = 20/7 =
 2005 Christopher R. Bilder
8.6
2.8571 and Var(X) = Var(4Y) = 42/[(++1)(+)2] =
4252/[(5+2+1)(5+2)2] = 0.4082.
Questions:
 How realistic is this PDF assumption?
 How could we check if this is a reasonable
assumption?
 What can we do if this assumption is not correct?
In real-life applications, , , , 2 parameters are
never really known. Thus, we need to take a sample in
order to estimate them.
5. Define the sample
Suppose a representative sample of 20 students is
taken from the population. Each student’s GPA is a
random variable denoted by X. The particular student’s
which are chosen out of the population have random
variable GPAs denoted by X1, X2,…, X20. Each of
these random variables, have the PDF of f(xi) for
i=1,…,20.
Once we know what the actual GPAs are for these
students, say 2.9, 3.4,… , we have the observed values
of the random variables since we are “observing” a
particular student GPAs. These observed values are
denoted by x1, x2, …, x20. The observed values are
also called observations.
 2005 Christopher R. Bilder
8.7
6. Define the statistic
Statistic: Numerical summary measure used to
describe a sample characteristic.
Note: A statistic estimates a parameter
The statistic is the sample mean here which estimates
the population mean, . The sample mean is
20
X
 Xi
i 1
20
.
When the actual GPAs are observed, then
20
x
 xi
i 1
20
is a symbolic way to denote an actual numerical value.
Please remember the discussion about random
variables (X – denoted by capital letters) and their
observed values (x – denoted by lowercase letters) in
Chapter 3. I know that this can be confusing!
There are other statistics me may want to calculate as
well. For example, the sample variance is
 2005 Christopher R. Bilder
8.8
20
S2 
 (Xi  X)
2
i 1
20  1
and it estimates the population variance, 2.
There are also statistics which estimate  and . One
way to derive these statistics is through maximum
likelihood estimation which is the subject of Section
9.15.
7. Random sample
How do we take the sample?
Random sample: Select n items from a population
where each has an equal chance of being chosen.
There are other ways to take the sample, but each of
them is interested in being representative of the
population.
 2005 Christopher R. Bilder
8.9
Take Sample
3.6
2.4
2.7
2.8
2.9
Sample
3.4
3.9
2.8
Population
3.4
1.2
2.9
3.6
3.2
4.0
X

In the above diagram, I could not fit all >20,000 GPAs
in the population and all 20 GPAs in the sample!
By taking a random sample, this ensures that X1 is
independent of all the other Xi’s. Similarly, X2 is
independent of all of the other Xi’s and so on. Using
the independence property discussed in Section 3.4,
the joint PDF for X1, X2,…, X20 is
f(x1, x2,…,x20) = f(x1)f(x2)f(x3)…f(x20).
In terms of the GPA problem, what does it mean for X1
to be independent of X2 to be independent of X3 ….?
8. Inference
 2005 Christopher R. Bilder
8.10
Inference: A deduction or conclusion about the
population based on the sample.
Based upon the statistic in the sample, we will make
inferences about the parameter in the population with a
certain level of accuracy. This level of accuracy can be
made through the use of probability. We will be begin
to discuss how this is done in this chapter!
Take Sample
3.6
2.4
2.7
2.8
2.9
Sample
3.4
Inference
3.9
2.8
Population
3.4
1.2
2.9
3.6
3.2
4.0
X

Questions:
 The sample mean GPA, X , estimates the population
mean GPA, . Is the sample mean GPA equal to the
population mean GPA?
 2005 Christopher R. Bilder
8.11
 How accurate is the sample mean GPA in estimating the
population mean GPA?
The statistical science allows us to measure the
accuracy. In this chapter we will start to learn how to
measure this accuracy.
 Suppose another random sample is taken. Is the
sample mean GPA going to be the same?
 Is the sample mean a random variable?
 What would happen if a random sample is not taken?
Suppose only College of Business students are
sampled.
 2005 Christopher R. Bilder
8.12
Important statistics and definitions
Definition 8.4 – Any function of the random variables
constituting a random sample is called a statistic.
Definition 8.5: If X1, X2, …, Xn represent a random sample of
size n, then the sample mean is defined by the statistic
n
X
 Xi
i 1
n
Notes:
 This statistic estimates the population mean, .
 This measures “central” location of all possible values
of the random variable. Other measures of central
location include the median (50% of all values are less
than and 50% are greater than) and the mode (most
frequent value).
Definition 8.6: If X1, X2, …, Xn represent a random sample of
size n, then the sample variance is defined by the statistic
n
S2 
 (Xi  X)
2
i 1
n 1
Notes:
 2005 Christopher R. Bilder
8.13
 This statistic estimates the population variance, 2.
 The sample standard deviation, S, is the positive
square root of S2.
 See p. 199 for a re-expression of the formula (I
recommend against using it due to numerical
inaccuracies which can occur when using it).
 Where does this formula come from?
o Remember that 2 = E[(X-)2]. Put into words, this
is the expected average squared deviation of the
random variable X from the population mean.
o Notice that (Xi  X)2 part plays the role of (X-)2.
n
o The  ___ (n  1) part plays the role of E[ ].
i 1
Remember in the discrete random variable case,
we sum over ALL possible values of X to find E[ ].
For the sample, we are going to sum over all
values observed. Since the “expected value” is
what we would expect on average to happen
(remember that f(x) is like a weight), we divide by
n-1 in S2 to find the average squared deviation.
The reason for dividing by n-1 instead of n will be
discussed in Section 9.3.
 I usually will put a problem on an exam which asks you
to compute this quantity by hand.
Side note: All or most observed values should be 2 to 3
standard deviations from the mean.
 2005 Christopher R. Bilder
8.14
X  2S or X  3S
Example: Sample mean and variance GPA
(sample_mean_var_GPA.xls)
Below is an actual sample of GPAs where the PDF is
defined on p. 8.5. Thus, x1 = 1.656, x2 = 1.417, …, x20 =
3.375
GPAs
1.656
1.417
2.810
3.328
3.745
3.325
3.338
2.899
3.549
3.426
2.726
2.186
3.044
3.385
3.678
2.721
3.351
3.069
2.424
3.375
(xi-xbar)^2 Simple calculations
Sample size
1.733
20
Sample mean 2.973
2.420
Sample variance 0.4070
0.026
Sample s.d. 0.6380
0.126
0.597
0.124
Functions in Excel
0.134
Sample mean 2.973
0.005
Sample variance 0.4070
0.332
Sample s.d. 0.6380
0.205
0.061
Rule of thumb lower upper
0.618
2 s.d. 1.697 4.249
0.005
3 s.d. 1.059 4.886
0.170
0.498
0.063
0.143
0.009
0.301
0.162
 2005 Christopher R. Bilder
8.15
x
1.656  1.417 
20
n
s 
2
 (xi  x)
i 1
2
 3.375
 2.973
(1.656  2.973)2   (3.375  2.973)2

20  1
n 1
 0.4070
s   s2   0.4070  0.6380
Rule of thumb with 2 standard deviations:
x  2s  2.973  2  0.6380  1.697
x  2s  2.973  2  0.6380  4.249
Examine how the above range for GPAs corresponds to
the plot on p. 8.5. Of course, GPAs can not be greater
than 4. Also, notice how 18 of 20 observations fall within
x  2s and 20 of 20 observations fall within x  3s .
Below is a screen capture of the formulas used in Excel
to calculate these quantities. Note that Chris Malone’s
Excel Instructions website contains help on some of
these functions. For example,
http://www.statsteacher.com/excel/analyses/mean.html
contains information about the AVERAGE() function.
 2005 Christopher R. Bilder
8.16
 2005 Christopher R. Bilder
8.17
8.3: Data Displays and Graphical Methods
This section describes ways to view the observed values
from the sample graphically. This helps to understand
the distribution, mean, variance, and other summary
measures calculated based on the observed values.
Histograms
A plot of the frequency distribution of the observed
values in the sample.
Example: Sample mean and variance GPA
(sample_mean_var_GPA.xls)
Classes
0
0.5
1
1.5
2
2.5
3
3.5
4
Bin
0
0.5
1
1.5
2
2.5
3
3.5
4
More
Frequency
0
0
0
1
1
2
4
9
3
0
Be very careful with interpreting these classes. For
example, “Classes = 4” means >3.5 and 4. Also,
“Classes = 3.5” means >3 and 3.5.
 2005 Christopher R. Bilder
8.18
or
e
M
4
5
3.
3
5
2.
2
1
5
1.
0.
5
10
9
8
7
6
5
4
3
2
1
0
0
Frequency
Histogram for GPA sample
GPA
Also, be very careful with lining up these bars with the
classes! See my red arrows above for help.
Compare this to the GPA PDF on p. 8.5. Are their
shapes similar?
Box plots
There are a few different ways to draw these types of
plots. Below is an example set of box plots for four
different samples (not necessarily from the sample
population) to be used for our definition of a box plot.
 2005 Christopher R. Bilder
8.19
Notes:
 The sample 25th percentile is the value such that
approximately 25 percent of the observed values are
below it and 75 percent are above it. The value is often
denoted as Q1. See p.204 for more information on its
exact calculation.
 The sample median (50th percentile) and sample 75th
percentile = Q3 are similarly defined.
 The “box” in the middle of each box plot shows the
range of the middle 50 percent of the observed values.
 The sample mean is also plotted. Notice that it does
not necessarily need to equal the sample median.
 Lines to the left of the box and to the right of the box
are drawn out to values as shown above. Most
observed values are expected to fall within this range.
This serves a similar purpose as the rule of thumb for
the number of standard deviations all data lies from its
mean.
 2005 Christopher R. Bilder
8.20
 Observed values outside of horizontal lines are often
called outliers since they are outside of the range we
would expect them to fall within.
Dot plots
Below is an example dot plot. Each plot symbol denotes
an observed value. These values are “jittered” in the
vertical direction to help avoid overlapping.
Unfortunately, Excel does not have an easy way to
create these plots. The box_dot_plot.xls file serves as a
“template” for drawing box and dot plots. The file can
create box and dot plots for up to four different samples
with sample sizes less than 500. This file was created by
myself and Chris Malone from Winona State University.
 2005 Christopher R. Bilder
8.21
Example: Sample mean and variance GPA
(sample_mean_var_GPA.xls, box_dot_plot.xls, and
data_summary.xls)
I copied and pasted the observed values from
sample_mean_var_GPA.xls into the DATA sheet of
box_dot_plot.xls. I deleted the data which was in
columns 2-4.
The plots are on the box and dot plots sheet.
 2005 Christopher R. Bilder
8.22
Notes:
 Notice how the two observed values outside of the
X  2S range show up as outliers here!
 Compare the dot plot to the histogram on p. 8.18.
 Be careful about the scale of the x-axis on both plots.
Typically, you will want to make them exactly the same
so that you can compare the two plots!
 2005 Christopher R. Bilder
8.23
Below are the results from using data_summary.xls
Note that the box plot drawn here is done a little
differently. No outliers are shown since the dot plot
gives that information. The lines on both sides of the
box plot are only drawn out to the limits shown on p.
8.19 or to the smallest or largest value within the limits.
Notice the right side line is drawn only out to the
maximum value. The left side line is drawn out to the full
limits.
 2005 Christopher R. Bilder
8.24
Example: Investments (Data only contained in invest.xls)
Below are box and dot plots for monthly investment
returns from May 1991 to May 2001 for three different
stock indexes. For example, a value of 0.10 means a
10% profit was made for a particular month. The
different sizes of the outliers do not mean anything (error
in file).
 Which investment has more variability?
 Which has the larger mean return?
 2005 Christopher R. Bilder
8.25
8.4-8.5: Sampling Distributions and the Sampling
Distribution of Means
Suppose X1, X2, …, Xn is a random sample from a population
with PDF possibly unknown. Also, suppose E(Xi) =  and
Var(Xi) = 2 for i = 1, …, n. What is E(X) and Var(X) ?
n

Xi  1 n
1n
1n
 i1 
E(X)  E 
 E  Xi   E(Xi )     

i 1
n i1
n i1
 n  n


n

Xi  1
n
1 n
 i1 
Var(X)  Var 
 2 Var  Xi  2  Var(Xi )

i 1
n i1
 n  n


1 n 2 2
 2  
n i1
n
 
 
Notes:
 If it is not clear why the above statements are true,
please review Section 4.3 of the notes.
 Let’s examine E(X) =  more closely.
o Suppose a random sample was taken of size n
and X was found. Most likely, it will not be exactly
equal to , but you would expect it to be somewhat
close.
o Suppose another random sample was taken of
size n and X was found. Most likely, it will not be
 2005 Christopher R. Bilder
8.26
exactly equal to  or the past X , but you would
expect it to be somewhat close.
o Suppose this process of taking random samples of
size n and calculating X was repeated an infinite
number of times. If you were to find the average
value of ALL of these X ’s, it would be .
o We will examine a computer simulation of this
process shortly.
 Let’s examine Var(X) = 2/n more closely.
o The larger the sample size, the SMALLER Var(X)
becomes. Why?
o We will examine a computer simulation of why
Var(X) = 2/n shortly.
 Often, you will see the use of  X to mean E(X) and 2X
to mean Var(X) . This corresponds to the notation
introduced in Section 3.4 when we had multiple random
variables.
Central Limit Theorem
So far, we have been concerned about the PDF for a
random variable X. Remember the example shown in
Chapter 6 (p. 6.41) of what can happen if the PDF
assumption is wrong. We have examined ways to help
determine if this PDF is true or to fix the assumption (i.e.
look at a histogram, change the parameter values of the
PDF).
 2005 Christopher R. Bilder
8.27
There are other ways to get around making an
assumption about the PDF for X. Instead, we can take a
sample from a population (without knowing the PDF) and
calculate a statistic. By the Central Limit Theorem
(CLT), we can use a normal PDF approximation to the
statistic’s PDF – NO MATTER WHAT THE PDF WAS
FOR THE POPULATION!!! The CLT is one of the most
important concepts to take from this course!!!
Below is the CLT expressed in terms of one statistic, the
sample mean.
Central Limit Theorem – If X is the mean of a random
sample of size n taken from a population with mean  and
finite variance 2, then the limiting form of the PDF for
Z
X 
/ n
as n, is the standard normal PDF. Equivalently, one can
say that X has an approximate normal PDF with mean  and
variance 2/n for a large sample size.
Questions:
1) What does this CLT mean?
 2005 Christopher R. Bilder
8.28
 No matter what the PDF for the Xi (i=1,…,n), X has
approximately a standard normal PDF provide the
sample size, n, is sufficiently large.
 Probabilities involving X can be found with the normal
PDF in a similar way as probabilities were found for
one random variable, X, in Chapter 6.
 If the sample size, n, is not sufficiently large enough,
the CLT still works if we can assume each Xi has the
same normal PDF.
2) How large of a sample size is needed for the CLT to
work?
 This is dependent on the PDF for the Xi (i=1,…,n).
 As a general rule of thumb, n30 should work for most
PDFs for the Xi (i=1,…,n). However, smaller sample
sizes may work as well.
3) Why is this formula for Z used?
 In Chapter 6, we showed that a normal random
variable, X, with E(X) =  and Var(X) = 2 could be
transformed to a STANDARD normal random variable,
Z, using Z = (X-)/.
 Remember that E(X) =  and Var(X) = 2/n. Thus,
using the same type of transformation as described in
the last bullet, we obtain Z = X    / n



4) Why does X have a PDF?
 X is a random variable since it is the average of other
random variables.
 2005 Christopher R. Bilder
8.29
 Note that X varies from sample to sample to sample.
One can quantify how these X 's are "distributed"
(possible values they can take on) using a PDF!
5) The PDF of a statistic is often called a sampling
distribution since it comes about through taking a
sample from a population.
6) There is still one problem with the CLT – you need to
know  and 2. How to get around this problem will be
discussed in future sections and chapters.
7) Many, many other statistics can have their PDF
approximated by a CLT.
 Let Yi = 0 or 1 with E(Yi) = p for i=1,…,n where each Yi
are independent (thus, the Yi are a Bernoulli random
n
variables). Then Y   Yi n = P̂ is the sample
i 1
proportion of 1’s or successes. Note that E(Y) = p and
P̂  p
can be
Var(Y) = p(1-p)/n. Thus, Z 
p(1  p) / n
approximated by a standard normal PDF. This is the
same result as shown in Section 6.5 (divide the
expression in that section by n in the numerator and
denominator).
 Theorem 8.3 – to be discussed later.
Example: UNL GPA 1,000 samples (CLT_GPA_ex.xls)
The following is done in the Excel file:
 2005 Christopher R. Bilder
8.30
 1,000 samples of size 20 (assume the CLT holds) are
taken from the population with PDF shown below.
GPA population probability distribution example
0.7
0.6
0.5
f(x)
0.4
0.3
0.2
0.1
0
0
1
2
3
4
x (GPA)
GPA Probability Distribution
Each row in the SAMPLES sheet represents a sample
of size 20. There are 1,000 rows. Remember that  =
2.8571 and 2 = 0.4082.
 The sample mean is calculated for each sample so
that there are 1,000 X ’s
 A histogram is constructed of the 1,000 sample means
to determine if the X ’s PDF is approximately normal
(as the CLT says it should be).
This histogram below is found in the HISTOGRAM
FOR X_BAR sheet:
 2005 Christopher R. Bilder
8.31
Simulated Distribution of X_bar
160
140
Frequency
120
100
80
60
40
20
Classes
The histogram below comes from using
data_summary.xls with the 1,000 X ’s.
 2005 Christopher R. Bilder
3.9
3.6
3.3
3
2.7
2.4
2.1
1.8
1.5
1.2
0.9
0.6
0.3
0
0
8.32
A normal PDF with mean 2.856 and variance 0.0211 is
plotted on the histogram.
 The mean and variance of the 1,000 X ’s are given to
see if they are approximately equal to E(X) =  and
variance Var(X) = 2/n. Note that if ALL possible
samples of size 20 were taken, the mean and variance
of all of them would be E(X) and Var(X) . In this case,
we are just taking 1,000 samples. For this particular
example, it was shown on p. 8.5 that  = 2.8571 and
2 = 0.4082. Thus, E(X) =2.8571 and Var(X) = 2/n =
0.4082/20 = 0.02041. Below are the means and
variances of the 1,000 X ’s. These are calculated on
the E(X_BAR) AND VAR(X_BAR) sheet.
Mean
Variance
Standard Deviation
Min
Max
Number of means
2.8560
0.0211
0.1453
2.3472
3.3083
1000
 The probability of X being between 3 and 4 is
approximated from the 1,000 X ’s. Thus, the
proportion of X ’s between 3 and 4 is found. The
proportion is 0.165 (see the use of
=(COUNTIF(U17:U1016,"<4") 2005 Christopher R. Bilder
8.33
COUNTIF(U17:U1016,"<3"))/1000 in the SAMPLES
sheet)
 The CLT says the probability that X is between 3 and
4 can be approximated by a normal distribution with
mean  and variance 2/n.
P(3< X <4) = P( X <4) – P( X <3) can be found with the
Excel function:
NORMDIST(4,2.8571,SQRT(0.4082/20),TRUE)NORMDIST(3,2.8571,SQRT(0.4082/20),TRUE)
The resulting probability is 0.1586.
Equivalently, one can find this probability through
P(3< X <4)
 3  2.8571
X
4  2.8571 
= P



2
0.4082 / 20 
 /n
 0.4082 / 20
= P 1.0003  Z  7.9999
= P(Z<7.9999) – P(Z<1.0003)
= 1-0.8414
= 0.1586
Transforming the probability to be in terms of a
standard normal random variable, Z, used to be
always done when using standard normal
distribution tables. This is no longer needed now.
Notice how close the probability resulting from the CLT
is to the probability resulting from the simulated PDF of
 2005 Christopher R. Bilder
8.34
X . Thus, the CLT allows us to calculate these
probabilities without taking 1,000 samples of size 20,
finding the mean, finding the variance, … . Of course,
taking 1,000 samples of size 20 is not feasible for the
vast majority of real-life applications!
Since we thoroughly discussed finding probabilities with the
normal PDF in Chapter 6, many of the same techniques with
finding these probabilities apply here. Remember the main
advantage of using X is that you do not need to know
the PDF for X!
Example: Healthy Choice (health_choice.xls)
Healthy Choice claims that it fills boxes on average with
24 oz. of cereal and the standard deviation is 2 oz. of
cereal. Suppose a FDA official wants to find out if boxes
of Healthy Choice cereal have the advertised weight of
24 oz.. The FDA official random samples 36 boxes of
cereal.
Suppose Healthy Choice is making cereal with =24 oz.
of cereal and =2 oz. of cereal.
1) What is the approximate probability the sample mean
weight is greater than 23 oz.?
 2005 Christopher R. Bilder
8.35
Notice that nothing is said about the PDF for each box
here!!!
Normal PDF for
mean=24 and s.d.=0.3333
1.4
1.2
f(X_bar)
1
0.8
0.6
0.4
0.2
0
22
23
24
25
26
X_bar
Note that  = 24 and /n = 2 / 36 =0.3333.
Find P( X >23) = 1 – P( X <23). The Excel function is
1-NORMDIST(23,24,0.3333,TRUE)
The resulting probability is 0.9987.
2) What is the approximate probability the sample mean
weight of the boxes is between 23 and 25 oz.?
Find P(23< X <25). The Excel function is
NORMDIST(25,24,0.3333,TRUE)NORMDIST(23,24,0.3333,TRUE)
The resulting probability is 0.9973.
 2005 Christopher R. Bilder
8.36
3) The company will be fined if the sample mean weight
of the boxes is not within 1 oz. of the advertised true
mean. What is the approximate probability the
company will receive a fine?
Find P( X <23 or X >25). This is 1-P(23< X <25) =
1-0.9973 = 0.0027.
The following theorem is another way to express the CLT,
but for a different statistic.
Theorem 8.3: If independent samples of size n1 and n2 are
drawn at random from two populations, discrete or
continuous, with means 1 and 2 and variances 12 and 22 ,
respectively, then the sampling distribution of the difference
between sample means, X1- X2 , is approximately a normal
PDF with mean and variance given by:
12 22
E X1  X 2  1  2 and Var X1  X2 
 .
n1 n2
X1  X2  1  2 
Hence, Z 
is approximately a standard
2
2
1 2

n1 n2
normal random variable for large n1 and n2. Equivalently,
one can say that X1- X2 has an approximate normal PDF with
12 22

mean 1-2 and variance
for large samples.
n1 n2





 2005 Christopher R. Bilder

8.37
Notes:
 n130 and n230 is usually a large enough sample so
that the CLT holds.
 It is often of interest to compare to population means to
see which one is larger.
 This result will be used a lot in Section 9.8 and 10.8.
 Probabilities can be found using the normal PDF here
in a similar manner as done when there was only one
sample mean. Please see the textbook for examples if
you are not for sure how to exactly.
Question: Why isn’t there a covariance term in
12 22
Var X1  X2 
 ?
n1 n2


 2005 Christopher R. Bilder
8.38
8.6: Sampling Distribution of S2
The chi-square PDF is another PDF which is often used
in statistics. Below is its definition from Section 6.8:
Chi-squared PDF – The continuous random variable X has a
chi-squared PDF, with  degrees of freedom, if its PDF is
given by
1

 / 21  x / 2
x
e
for x>0
 /2
f(x)   2 (  / 2)

0
otherwise

where  is a positive integer.
Mean and variance of chi-squared random variable:
E(X) =  =  and Var(X) = 2 = 2
Notes:
 The chi-squared PDF is a gamma PDF with =/2 and
=2.
  is a parameter. Different shapes of the PDF result
from different values of . The reason why  is called
the “degrees of freedom” will be explained shortly.
 Chi-squared PDF could equivalently be expressed as
2 PDF.
 We often want to find quantiles or percentiles from this
PDF. For example, the c value which results from
P(X<c) = 0.95 is called the 0.95 quantile and the 95%
 2005 Christopher R. Bilder
8.39
percentile for the PDF. Thus, the area to the LEFT of c
is 0.95 underneath the curve. Symbolically, this c value
2
is typically denoted as 0.05,
using the notation from our
book. Be very careful with this notation since the area
to the RIGHT is given in the subscript.
Example: Chi-squared PDF (chi_square_dist.xls)
chi_square_dist.xls is an interactive file which allows you
to see the PDF for different values of degreed of
freedom.
To find the quantile, the Excel function is CHIINV(area to
the right, degrees of freedom).
 2005 Christopher R. Bilder
8.40
To find probabilities, the Excel function is CHIDIST(x,
degrees of freedom) which gives P(X>x) = ___. Notice
this is probability is the opposite of what most of the
other functions in Excel discussed so far give for
continuous PDFs. For example, GAMMADIST(x, /2, 2,
TRUE) gives P(X<x).
To find f(x), there is no specific function. Instead, the
relationship with the gamma PDF can be used and
GAMMADIST(x, /2, 2, FALSE) will find f(x) for a
specified .
Just as a reminder, remember how one would find these
probabilities using regular integration! For example,

1
P(X>15.98717) =  10 / 2
x10 / 21e x / 2dx  0.1. In
(10 / 2)
15.99 2
Maple,
> assume(x>0);
> assume(nu>0,nu::integer);
> f(x):=1/(2^(nu/2)*GAMMA(nu/2)) *
x^(nu/2-1) * exp(-x/2);
f( x~ ) :=
x~
( 1/2  1 )
(  1/2 x~ )
e
( 1/2  )  1
2
   
2

> int(eval(f(x),nu=10),x =
15.98717..infinity);
 2005 Christopher R. Bilder
8.41
.1000002634
> 1-stats[statevalf, cdf,
chisquare[10]](15.98717);
.1000002634
Table A.5 on p. 674-5 give some of the probabilities for a
specified . Below is part of the table for =10

10
0.995
2.156
0.99
2.558
0.98
3.059
0.975
3.247
0.95
3.940
0.9
4.865
0.8
6.179
0.75
6.737
0.7
7.267
0.5
9.342
0.3
0.25
0.2
0.1
0.05 0.025
0.02
0.01 0.005 0.001
11.781 12.549 13.442 15.987 18.307 20.483 21.161 23.209 25.188 29.588
You are not responsible for knowing how to use this
table.
In addition to obtaining the PDF for X , we may want the PDF
for S2, the sample variance. In order to do this, we do need
to make the assumption that X1, X2, …, Xn are a random
sample from a normal PDF with E(X) =  and Var(X) = 2.
Below is the result:
 2005 Christopher R. Bilder
8.42
Theorem 8.4: If S2 is the variance of a random sample of
size n taken from a normal population having the variance
2, then the statistic
n (X  X)2
(n  1) i
2
n (X  X)2
(n  1)S
2
i 1
n 1   i
 

i 1
2
2
2
Has a chi-squared PDF with =n-1 degrees of freedom.
Pf:
Unfortunately, there are important theorems which
we skipped in Chapter 7 which are needed here.
One of the theorems say that if X1, X2, …, Xn are
normal random variables with the same mean and
variance, then Y = X1+X2+…+Xn is also a normal
random variable with E(Y) = ni1E(Xi ) = n and
Var(Y) = ni1 Var(Xi ) = n2 (we already had seen the
E( ) and Var( ) part from Section 4.3). Also, suppose
that Zi = (Xi-)/ for i=1,…,n. Then Zi has a standard
normal PDF. And, Z1 + Z2 + … + Zn has a standard
normal PDF with variance n.
A second important theorem showed that Zi2 has a
chi-squared PDF with =1 degree of freedom. Thus,
the square of a standard normal random variable is
a chi-squared random variable with 1 degree of
 2005 Christopher R. Bilder
8.43
freedom! And, the sum Z12 + Z22 + … + Zn2 has a chisquared PDF with =n degrees of freedom.
Using the above theorems, we can note the
following:
  Xi   
n
n
2
 Zi 
2
i 1
has a chi-squared PDF with

2
i 1
=n.
Notice that the numerator,   Xi    , can be rewritten
n
n

 
2
i 1

2
as   Xi  X  X    since X is just being added
i 1
and subtracted. Multiplying this out results in
  
  X  X  X  
n

  Xi  X  X   
i 1
n
2


 X  X
n
  Xi  X
i 1
n
2
2
i
i 1

  Xi  X
i 1
2
i
i 1
n
2
n


2


 2 Xi  X X  

    2  X  X  X   
 nX    2X    X  X
n
2
  X 
i 1
n
i 1
i
n
2
i 1

n X 


2
 
n

i

 2 X   0
  
n
n
i 1
i 1
since  Xi  X   Xi  nX   Xi   Xi  0 .
i 1
i 1
 2005 Christopher R. Bilder
8.44

n
 

2
n

Then   Xi  X  X      Xi  X
i 1
i 1
Also,
  Xi   
n
n
Z 
i 1

2
i
n
2

i 1
(n  1)
2
n
 Xi  X
i 1


 Xi  X
i 1



2
n X  .
n X 

2
2

2
n 1

2

n X 


2
2

X 
(n  1)S
Thus,

2
2 / n
2

2

2


X 
(n  1)S


2

2 / n
2
2
2
has a chi-squared PDF
(n  1)S2
with =n degreed of freedom. And,
has a
2

2
X 
chi-squared PDF with =n-1 since
has a
2
 /n
chi-squared PDF with =1 (Xi has a normal PDF with
mean  and variance 2).


What are degrees of freedom?
Suppose the sum of three numbers is 6. In order to
know all of the three numbers, you only need to know 2
of them and the sum. For example,
 2005 Christopher R. Bilder
8.45
X1 = 1 (pick)
X2 = 2 (pick)
X3 = 3 (cannot Vary)
Sum = 6
The degrees of freedom are 2 for the sum. Similarly, the
degrees of freedom are n-1 for X and S2. The chisquared PDF and other PDFs build these values into
their PDF formulas as parameters. Typically, the
degrees of freedom will be always known since we know
the sample size. Thus, these PDFs can be easier to
work with.
 2005 Christopher R. Bilder
8.46
8.7: t-distribution
W.S. Gosset story
Theorem 8.5: Let Z be a standard normal random variable
and V a chi-squared random variable with  degrees of
freedom. If Z and V are independent, then the PDF of the
random variable T, where
Z
T
V/
is given by the PDF of
 ( 1) / 2
    1 / 2 
t2 
for -<t<.
h(t) 
1  
   / 2  

This is known as the t-distribution with  degrees of freedom.
pf: See a book which would be used for STAT 463 or
STAT 872.
Example: Compare t and standard normal PDF
(t_stand_norm.xls)
As you can see from the plots below, the t-distribution is
very similar to the standard normal PDF. The main
difference is that there is more area underneath the
“tails” (ends) of the t-distribution. As the degrees of
freedom, , become larger, the difference between the
two PDFs becomes extremely small. Often, the
 2005 Christopher R. Bilder
8.47
standard normal PDF be used in place of the tdistribution when  is not small. In fact, for a  equal to
infinity the t-distribution is the standard normal PDF!
Example: Finding probabilities from a t-distribution
(t_prob.xls)
 2005 Christopher R. Bilder
8.48
To find P(T>t), use the TDIST(t, , 1) function in Excel.
For example, P(T>1.96) with =5 degrees of freedom is
0.0536.
Notes:
1) Note that TDIST(t, , 2) will provide 2P(T>t).
The reason why Excel has this option will be
discussed in Chapter 10.
2) Please be careful about that the function finds
P(T>t), NOT P(T<t)! Again, the function finds
1-F(t), not F(t).
3) Excel will not accept a negative value of t!
Instead, you need to use the symmetry of the
PDF to find the probability. Thus, P(T<-1.96) =
P(T>1.96).
4) Below is a drawn in version of what has been
found above.
To find t in P(T>t) = 0.05, the TINV(probability*2, )
function can be used. For example, to find t in P(T>t) =
0.0536 with =5, use TINV(0.0536*2, 5). BE VERY
CAREFUL! Notice that I needed to multiply the
probability value by 2. The reason why Excel has this
property will be discussed in Chapter 10.
 2005 Christopher R. Bilder
8.49
Notes:
1) We can say “1.96 is the 0.9464 quantile from a tdistribution with 5 degrees of freedom.” Also, we
could say “1.96 is the 94.64 percentile from a tdistribution with 5 degrees of freedom.”
2) t in P(T>t) =  is often denoted by t, for 
degrees of freedom and  as the area to the right
of it. Thus, P(T > t0.0536,5) = 0.0536.
3) t, = -t1-,. Why?
Examine how to work with the t-distribution through the
use of the spreadsheet below.
 2005 Christopher R. Bilder
8.50
Table A.4 on p.672-3 gives some of the probabilities that
one could calculate for some degrees of freedom. We
will not use the table in this class.
Just as a reminder, remember how one would find these
probabilities using regular integration! For example,
 (5 1) / 2
2
    5  1 / 2 

 1 t
P(T>1.96)=  
dx  0.0536 . In


5
1.96   5 / 2  5 
Maple,
> f(t):=GAMMA((nu+1)/2)/(GAMMA(nu/2)
*sqrt(Pi*nu)) * (1+t^2/nu)^(-(nu+1)/2);
 2005 Christopher R. Bilder
8.51
(  1/2  1/2 )
1
1 
t2 

     1  
2
2 

f( t ) := 
1
    
2 
> int(eval(f(t),nu=5),t=1.96..infinity);
.05364397625
> 1-stats[statevalf, cdf,
studentst[5]](1.96);
.0536439763
The CLT says that Z 
X 
can be approximated by a
/ n
standard normal PDF for large n (sample size). Below are
some problems:
1) Typically,  and  will not be known.
2) What if n is small?
Below is part of the solution to the problems:
Corollary: Let X1, X2, …, Xn be independent random
variables that are all normal with mean  and standard
deviation . Let

n Xi  X
Xi
2
X
and S  
i 1 n
i 1
n 1
n

2
 2005 Christopher R. Bilder
8.52
Then the random variable T 
X 
S/ n
has a t-distribution with
 = n-1 degrees of freedom.
Notes:
1)  in the CLT has been replaced with S in the
corollary. This makes the statistic more realistic
because S can be calculated from a sample.
2) The standard normal distribution is not being used
anymore. Instead, T does have a t-distribution
EXACTLY provided X1, X2, …, Xn are independent
random variables with the same normal PDF. No
matter what the sample size, T has the t-distribution!
3) From 2, the distributional assumptions about X1, X2,
…, Xn still limit us somewhat. Remember the CLT
held NO MATTER what the PDF for X1, X2, …, Xn.
However, the t-distribution still often serves as a nice
approximation for the PDF of T in many situations.
4) Suppose n is very large, what happens to the PDF of
T?
Example: Healthy Choice (health_choice.xls)
Healthy Choice cereal fills its boxes on average with 24
oz. of cereal and the standard deviation is 2 oz. of
cereal. Suppose a FDA official wants to find out if boxes
of Healthy Choice cereal have the advertised weight of
24 oz.. The FDA official random samples 36 boxes of
 2005 Christopher R. Bilder
8.53
cereal. SUPPOSE the individual weight for each box is
normally distributed with the same  and .
Suppose Healthy Choice is making cereal with =24 oz.
of cereal. In the sample of size 36, the SAMPLE
STANDARD DEVIATION was s=2 oz. of cereal. What is
the probability the sample mean weight is greater than
23 oz.?
What is different here than from p. 8.34?
1) Each box’s weight has the same normal PDF. Last
time, nothing was said about the PDFs.
2) The sample gave us a sample standard deviation of
s=2. Thus, we did not use the assumption of =2
as we did last time.
3) Although n30 meaning the CLT should work o.k., I
am going to use the t-distribution here.
P( X >23)
 X   23  24 

= P

S
/
n
2
/
36


= P(T>-3)
= P(T<3) by the symmetry of the PDF
= 1 – P(T>3) need to do in order to use TDIST()
= 1 – 0.002474
= 0.997526
The Excel function used was: =1-TDIST(3,35,1)
 2005 Christopher R. Bilder
8.54
On p. 8.34 we found the probability to be 0.998650
using the CLT.
 2005 Christopher R. Bilder