Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lab 6. Central Limit Theorem. Control Charts www.nmt.edu/~olegm/283labs/Lab6stat.pdf Note: the menus and other things you will read or type on the computer are in italics. Attach the printouts whenever needed. The Central Limit Theorem (CLT) is a very important result that creates a foundation for statistical inference to follow. We will take a look at how the CLT works for samples of different sizes, and at one of its applications: Control Charts for monitoring. 1 Central Limit Theorem We have learned the following facts about the sampling distribution of the sample mean X: σ 1. µX = µ (population mean), σX = √ n 2. CLT: As n is getting large, the shape of X distribution gets closer to the normal distribution. How large is “large”? The answer depends on the shape of the original population distribution. If the original distribution is Normal, then even n = 1 is enough! We will consider a few examples. First, consider the exponential distribution, given by the density curve x 1 f (x) = exp − for x > 0 µ µ 0.10 0.00 f(x) 0.20 It is a classical skewed distribution. It describes, for example, a lifetime of a radioactive atom, with µ being the average lifetime. Shown below is the exponential density with µ = 5, standard deviation σ = 5 also. 0 5 10 15 20 x Suppose a sample of size n = 4 is drawn from such distribution. How will the means 1 of such samples behave? To answer this question using the simulation approach, we will draw 200 such samples and make the histogram of their sample means. Do Calc → Random Data → Exponential and then We have obtained 200 rows that will represent our 200 samples. To find the value of X for each row, use Calc → Row Statistics, select Mean and store the result in C1. Problem 1 (a) Make a histogram of C1. Describe the shape of the sampling distribution you obtained. (b) Find descriptive statistics for C1 and fill in the left half of the table below: Actual n = 4 Theoretical n = 4 Actual n = 30 Theoretical n = 30 Mean µ=5 µ=5 Standard Deviation √ σ/ n = √ σ/ n = (c) Repeat the experiment for n = 30. Fill in the rest of the table. (d) Make your conclusions: • As sample size increased, how did the mean and standard deviation of sampling distribution change? • As sample size increased, how did the shape of sampling distribution change? Problem 2 The file DailyEQ.txt describes daily numbers of global earthquakes magnitude 5 and higher, during 1973-1999.1 1 More recent data have several spikes in earthquake activity, most recently March 2011. 2 (a) Obtain the histogram of the daily numbers. Does it look Normal? (b) Find the estimates of µ and σ. (c) Note that an important assumption for CLT is that the data are independent. In our case, this will mean that a high number of earthquakes on one day does not imply a high number of earthquakes the next day. Make a scatterplot of the original series with the lagged series (Stat → Time series → Lag) like we did with Dow Jones data earlier, to see if the data are close to being independent. (d) Now, obtain the “monthly” average numbers of earthquakes per day (these will be our X). To do this, first arrange the data into a block of 30 columns (these will be representing days of the “month” and we will ignore the fact that a month can be longer or shorter than 30 days): • Create a repeating series of numbers 1, 2, 3,..., 30, ...,1, 2, 3, ..., 30 in C2 using Calc → Make Patterned data → Simple • Use Manip → Unstack columns, Unstack data in C1, Using subscripts in C2. • Use Calc → Row statistics like in Lab 5 to find the sample means of “Months”. (e) Make a histogram of the obtained monthly averages. Are they close to Normal? (f) Calculate the standard deviation of monthly averages. Compare it to the theo√ retical value of σ/ n. 2 Control Charts Control charts are used to monitor industrial production, pollution and many other processes. They may make use of the Central Limit Theorem (shape of X distribution is close to the normal distribution). Thus, about 99.7% of the time the sample mean will be contained within 3σX of the population mean. An engineer will collect several (typically n = 2 to 9) observations in a time period (hour, day etc.) and compute their sample mean. Assume that the production parameters mean µ and st.dev. σ are known historically. If, during a particular time period, the sample mean is beyond the interval √ LCL (lower control limit) = µ − 3σ/ √n UCL (upper control limit) = µ + 3σ/ n then we will say that the process is statistically out of control. Even in the case there’s nothing wrong, we could still occasionally observe high/low 3 values of X (due to natural variability) and announce that the process is out of control. This will happen 100% − 99.7% = 0.3% of the time. Since it is expensive to halt production, we’d like to be conservative and keep this percentage low. Minitab gives you an option to enter µ and σ when they are available. Otherwise, it will estimate µ and σ based on all the observations pooled together. (Minitab generally estimates σ using the sample standard deviations or ranges of each subgroup.) Problem 3 In a company that manufactures playground equipment, an engineer monitors the drill that bores holes in wood. Each drill should be set to a depth within 5 tenth of a millimeter of the desired depth. The engineer recorded 5 measurements per day for a particular drill, they are the errors (deviations from the desired depth) in tenth of a millimeter. Historically, the mean error was 0.2 and the standard deviation of the error was 3.1. Use the data in Cranksh.txt (a) Construct the control chart using Stat → Control Charts → Variables Charts for Subgroups → Xbar, and then and, using Xbar options → Parameters, type in the historical mean of 0.2 and historical σ of 3.1 (b) Confirm the Minitab’s calculation of the UCL and LCL based on the formulas above. (c) Is the system in statistical control? Explain. (d) Now let the Minitab estimate µ and σ. Can the system be considered in statistical control now? 4