Download Lab 6. Central Limit Theorem. Control Charts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Lab 6. Central Limit Theorem. Control Charts
www.nmt.edu/~olegm/283labs/Lab6stat.pdf
Note: the menus and other things you will read or type on the computer are in italics. Attach the
printouts whenever needed.
The Central Limit Theorem (CLT) is a very important result that creates a foundation for statistical inference to follow. We will take a look at how the CLT works for
samples of different sizes, and at one of its applications: Control Charts for monitoring.
1
Central Limit Theorem
We have learned the following facts about the sampling distribution of the sample
mean X:
σ
1. µX = µ (population mean), σX = √
n
2. CLT: As n is getting large, the shape of X distribution gets closer to the normal
distribution.
How large is “large”? The answer depends on the shape of the original population
distribution. If the original distribution is Normal, then even n = 1 is enough! We
will consider a few examples.
First, consider the exponential distribution, given by the density curve
x
1
f (x) = exp −
for x > 0
µ
µ
0.10
0.00
f(x)
0.20
It is a classical skewed distribution. It describes, for example, a lifetime of a radioactive atom, with µ being the average lifetime. Shown below is the exponential density
with µ = 5, standard deviation σ = 5 also.
0
5
10
15
20
x
Suppose a sample of size n = 4 is drawn from such distribution. How will the means
1
of such samples behave? To answer this question using the simulation approach, we
will draw 200 such samples and make the histogram of their sample means.
Do Calc → Random Data → Exponential and then
We have obtained 200 rows that will represent our 200 samples.
To find the value of X for each row, use Calc → Row Statistics, select Mean and store
the result in C1.
Problem 1
(a) Make a histogram of C1. Describe the shape of the sampling distribution you
obtained.
(b) Find descriptive statistics for C1 and fill in the left half of the table below:
Actual n = 4
Theoretical n = 4
Actual n = 30 Theoretical n = 30
Mean
µ=5
µ=5
Standard
Deviation
√
σ/ n =
√
σ/ n =
(c) Repeat the experiment for n = 30. Fill in the rest of the table.
(d) Make your conclusions:
• As sample size increased, how did the mean and standard deviation of
sampling distribution change?
• As sample size increased, how did the shape of sampling distribution
change?
Problem 2
The file DailyEQ.txt describes daily numbers of global earthquakes magnitude 5 and
higher, during 1973-1999.1
1
More recent data have several spikes in earthquake activity, most recently March 2011.
2
(a) Obtain the histogram of the daily numbers. Does it look Normal?
(b) Find the estimates of µ and σ.
(c) Note that an important assumption for CLT is that the data are independent.
In our case, this will mean that a high number of earthquakes on one day does
not imply a high number of earthquakes the next day.
Make a scatterplot of the original series with the lagged series (Stat → Time
series → Lag) like we did with Dow Jones data earlier, to see if the data are
close to being independent.
(d) Now, obtain the “monthly” average numbers of earthquakes per day (these will
be our X). To do this, first arrange the data into a block of 30 columns (these
will be representing days of the “month” and we will ignore the fact that a
month can be longer or shorter than 30 days):
• Create a repeating series of numbers 1, 2, 3,..., 30, ...,1, 2, 3, ..., 30 in C2
using Calc → Make Patterned data → Simple
• Use Manip → Unstack columns, Unstack data in C1, Using subscripts in
C2.
• Use Calc → Row statistics like in Lab 5 to find the sample means of
“Months”.
(e) Make a histogram of the obtained monthly averages. Are they close to Normal?
(f) Calculate the standard
deviation of monthly averages. Compare it to the theo√
retical value of σ/ n.
2
Control Charts
Control charts are used to monitor industrial production, pollution and many other
processes. They may make use of the Central Limit Theorem (shape of X distribution
is close to the normal distribution). Thus, about 99.7% of the time the sample mean
will be contained within 3σX of the population mean. An engineer will collect several
(typically n = 2 to 9) observations in a time period (hour, day etc.) and compute
their sample mean. Assume that the production parameters mean µ and st.dev. σ
are known historically.
If, during a particular time period, the sample mean is beyond the interval
√
LCL (lower control limit) = µ − 3σ/ √n
UCL (upper control limit) = µ + 3σ/ n
then we will say that the process is statistically out of control.
Even in the case there’s nothing wrong, we could still occasionally observe high/low
3
values of X (due to natural variability) and announce that the process is out of control. This will happen 100% − 99.7% = 0.3% of the time. Since it is expensive to halt
production, we’d like to be conservative and keep this percentage low.
Minitab gives you an option to enter µ and σ when they are available. Otherwise, it
will estimate µ and σ based on all the observations pooled together. (Minitab generally estimates σ using the sample standard deviations or ranges of each subgroup.)
Problem 3
In a company that manufactures playground equipment, an engineer monitors the
drill that bores holes in wood. Each drill should be set to a depth within 5 tenth of
a millimeter of the desired depth. The engineer recorded 5 measurements per day for
a particular drill, they are the errors (deviations from the desired depth) in tenth of
a millimeter.
Historically, the mean error was 0.2 and the standard deviation of the error was 3.1.
Use the data in Cranksh.txt
(a) Construct the control chart using Stat → Control Charts → Variables Charts
for Subgroups → Xbar, and then
and, using Xbar options → Parameters, type in the historical mean of 0.2 and
historical σ of 3.1
(b) Confirm the Minitab’s calculation of the UCL and LCL based on the formulas
above.
(c) Is the system in statistical control? Explain.
(d) Now let the Minitab estimate µ and σ. Can the system be considered in statistical control now?
4