Download Stat 280 Lab 9: Law of Large Numbers and Central

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Law of large numbers wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
Stat 280 Lab 9: Law of Large Numbers and Central
Limit Theorem
Objectives: This lab is designed to introduce the Law of Large Numbers and
Central Limit Theorem.
Directions: Follow the instructions below, answering all questions. Your answers
should be in the form of a brief report (MS Word), to be handed in to the instructor
before you leave. Please include plots and descriptive statistics in your report.
Background:
Law of Large Numbers:
Suppose you draw independent observations at random from any population with
finite mean . How accurately do you want to estimate ? As the number of
observations drawn increases, the mean, xbar of the observed values eventually
approaches the mean  of the population. This is called the law of large numbers.
The law of large numbers says broadly that the average results of many independent
observations are stable and predictable. A grocery store deciding how many gallons
of milk to stock and a fast-food restaurant deciding how many beef patties to
prepare can predict demand even thought their many customers make independent
decisions. The law of large numbers says that these many individual decisions will
produce a stable result.
Central Limit Theorem:
Draw a simple random sample of size n from any population with mean  and finite
standard deviation . When n is large, the sampling distribution of the sample
mean is approximately normal:
xbar ~ N(, /√n)
More generally, the central limit theorem says that the distribution of a sum or
average of many small random quantities is close normal. The central limit theorem
allows us to use normal probability calculations to answer questions about sample
means from many observations even when the population distribution is not
normal.
In this lab you will examine the distribution of sample averages for sample sizes of 1,
5, and 30 from a non-normal distribution, the continuous uniform distribution. The
mean and variance of a uniform(a,b) random variable is (a + b)/2 and (b – a)^2/12.
1. Start MINITAB and create thirty sets of 300 observations from the Uniform(0,1)
distribution. You can do this by selecting Calc -> Random Data -> Uniform and
specifying 300 rows, and Store in Columns c1-c30. Set the Lower Endpoint to 0.0
and the Upper Endpoint to 1.0.
a) What is the population mean?
b) What is the population variance and standard deviation?
Now suppose that you want to estimate the mean of this population, but that you
didn't know that the distribution was Uniform(0,1). To estimate the mean you
could:
i) estimate  with Xbar from a sample of size 1
ii) estimate  with Xbar from a sample of size 5
iii) estimate  with Xbar from a sample of size 30
Based on the Law of Large Numbers and the Central Limit Theorem, we expect to
see:
i) as n gets larger, the standard deviation of Xbar gets smaller, and
ii) as n gets larger, the distribution of Xbar begins to approach the normal
distribution.
To check these distributional properties, we will need to repeat the calculation of
Xbar many times. We will repeat the calculation of Xbar 300 times for each sample
size Xbar (note: keep this number of replications distinct in your mind from the
sample size, 1, 5, or 30, on which the Xbar is based).
2. Computing Xbar1, Xbar5, Xbar30. Create a new column labeled Xbar5 at the
top of the worksheet in column 32 but do not enter any values. Instead, select Calc > Row Statistics from the top menu, and select button for Mean (that is, the sample
mean). Set the input variables to c1-c5 and store the result in 'Xbar5'. Repeat this
process, creating Xbar30 in column 33, with input variables c1-c30. Create a column
labeled Xbar1 in column 31 and you can use any one of the columns c1 to c30 to
compute Xbar1. Each row now shows a single replication of computing sample
means for samples of size 1, 3, and 30, along with the original data.
Before plotting the histograms for the observed values of Xbar1, Xbar5, and
Xbar30, you will create a column in MINITAB for the histogram bin boundaries.
Otherwise MINITAB will use its own bin sizes for each of the distributions (Xbar1,
Xbar5, and Xbar30) and the different bins will make the results confusing.
Select Calc  Make Patterned Data  Simple Set of Numbers and Store the Result
in c34. From first value: 0, to last value: 1, in steps of: .025. Leave the values for
List Each Value and List the whole sequence set to 1.
3. Looking at Xbar1, Xbar5, Xbar30.
To see the action of the central limit theorem (CLT) and law of large numbers
(LLN), we will look at histograms and quantile-quantile plots of repeated samples of
each of these. Select Graph -> Histogram. You can specify all three histograms in
the same command:
Click on the options button and set the type of histogram to Percent and Define
Intervals, Midpoint/Cutpoint positions: c34.
Look at the histograms for the Xbar values. Comment on the changing shape as n –
-> 1, 5, 30. What happens to the spread?
4. Normal Probability Plots
Histograms are not always the greatest way to check distributional shape. In fact,
the pattern can change quite a bit when you change the histogram boundaries just a
little. To augment this view, we will use quantile-quantile plots. We will plot the
empirical quantiles of Xbar1 or Xbar5 or Xbar30 against the theoretical quantiles of
Z, a Normal(0,1) random variable. That is, we will plot Xbar1(p) against Z(p) for
various values of p, for example:
How do we tell whether the fit is good? We will use our experience with plotting
truly normal data. To gain some experience, we will first generate some normally
distributed values using MINITAB, and look at plots for those data sets. If the data
is Normal the quantile plot should be close to a straight line.
Create four new columns with 300 normally distributed ( = 0, 2 = 1) values:
Select Calc -> Random Data -> Normal and specifying 300 rows, and Store in
Columns c35-c38. Set the Lower Endpoint to 0.0 and the Upper Endpoint to 1.0.
Now select from the Graph  Probability plot. Under variables select C35 as the
column and look at the resulting plot. Repeat the procedure to plot C36, C37, and
C38. All can be on the screen at the same time, to give you a feel for how much
deviation from a straight line you might expect when the data truly are normal.
Now you are ready to look at normal plots of the Xbar data. Create normal
probability plots for Xbar1, Xbar5, and Xbar30. Do each graph separately and
compare. What do you see? Do you see the CLT in action?
Finally, look at the average and standard deviation values printed in the normal
probability plot window for Xbar1, Xbar5, Xbar30. Jot them down in following
table:
Xbar1
Xbar5
Xbar30
Average
Std Dev
Compare them with = _ .5_____ and  = sqrt(1/12)___.
Do you see the LLN in action?
Sources: www.me.psu.edu/lamancusa/Quality/Module3/L3PREVU.doc and
Introduction to the Practice of Statistics (Moore, McCabe)