Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction This paper documents the results of a large number of Monte Carlo simulations exploring the behavior of different variable data statistical control charts. We were performing experiments to determine if various variable data statistical control charts performed significantly differently against three common distributions. This paper consists of five sections and an appendix. This first section, the introduction, provides an overview of the techniques used during this project and also gives the reader the overall format of this paper. The next section contains the purpose and gives the goals of this effort. The third section discusses the method by which the project was performed and the results generated. The section after that presents this effort’s results. The fifth section contains the conclusions drawn from those results. The appendix contains listings of the software tools used to generate the results. This introduction discusses the general use of statistical control charts for variable data, details the mean and dispersion charts we explored in this project, gives examples of the three data distributions we worked with and explains the use of operating characteristic charts to compare the power of different control charts. The next paragraph gives a brief discussion of the general use of control charts. The charts There are many different types of charts used in statistical process control, such as X-bar, Range, Standard Deviation, Variance, XmR, u, c, and p. Each of these charts is useful for monitoring a specific type of data. In general, data can be categorized in two ways. Variable data comes from physical measurements that have a continuous range, such as length or weight. Attribute data is usually binary, (such as pass/fail,) a count, (such as number of defects,) or a rate, (such as number of defects per inspected item). This paper focuses on the statistical process control charts that handle variable rather than attribute data. There are two general categories of variable data charts. These charts focus on a measure of the underlying process’s central tendency or a measure of the underlying process’s dispersion. Both types will be discussed in the following paragraphs. Central tendency charts First, we will discuss the charts that handle the central tendency of the process. While there are several possible measures of central tendency: mean, median, and mode, the statistical process community uses the mean in almost all cases. The X-Bar chart is the chart that works with the mean of a given process’s variable data. The simplest XBar chart consists of a centerline and two limit lines. The centerline denotes the computed average of the measurements being controlled (more on this later). The area between the two limit lines is where most of the process’ measurements should fall. In the typical statistical process control application, one would expect 99.73% (three standard deviations) of the process’s measurements to be within the limit lines. A sample X-Bar chart is shown on the next page. The green line in the center is the estimate of the process mean. The upper and lower red lines are the upper and lower control limits. In this example a single point, number 44, is out of control. In order to compute the proper values for the limit lines, we need the standard deviation of the process’s measurements. Since there is no way to determine the population standard deviation without exhaustively measuring the entire population of the process under examination, we need a good estimate for the population standard deviation. The statistical process community uses two different methods of obtaining this estimate. The first is based on the range of the process measurements. The second is based on the standard deviation of the process measurements. Obtaining the estimate from the range is operationally simple, but the accuracy of the 1 estimate suffers if large numbers of process samples are taken. Estimates based on the standard deviation are somewhat more difficult to compute, but become more and more accurate as more samples are taken. This paper addresses X-Bar charts made by using the sample standard deviation to approximate the population standard deviation. While exploring both methods was technically feasible, resource limitations forced the experimenter to choose a single method. (During the presentation Professor Tsao specifically asked about the relative performance of the X-Bar chart based on range verses the X-Bar chart based on the standard deviation, so a single chart was created and is included as the second appendix.) Xbar Chart 12 UCL 11.5 11 Measurements 10.5 10 9.5 9 8.5 LCL 44 8 0 10 20 30 40 50 Samples 60 70 80 90 100 Dispersion charts Next, we discuss the charts that handle the dispersion of the process. Again, there are several possible measures of the dispersion: range, standard deviation and variance; the statistical process community uses the range and standard deviation in almost all cases. This project explored control charts based on all three measures of dispersion. Charts using the range are commonly used because they are simple to construct and work well when small sample sizes are used. Standard deviation charts are a bit more complex to construct but are thought to have better performance with large sample sizes. Standard deviation charts also handle variable sample sizes. We also used control charts based on the variance. Variance control charts are a bit more complex to construct than standard deviation charts, but it was thought that they would perform better against non-normal distributions. An example s chart is shown on the next page. In this example the process’s average standard deviation is given in green. The upper control limit is given in red. In this instance, the lower control limit is zero, so it is the horizontal axis. This example does not have any points out of control. The distributions In the perfect world, measuring the output of a process would yield exactly the same number every time. The product would be precisely the correct size and there would be no noise in the measurement. Unfortunately, many natural and artificial events occur during a process. These events blur the exact size and precise measurement we would like to have. When one looks at a large number of these blurred measurements, an underlying structure becomes clear. These underlying structures, determined by the gross effects of these events, are probability distributions. 2 One nice thing about a process’s underlying probability distribution is that it strongly reflects the process’s overall environment. A sharp change in the environment will change the measurements and these measurements in turn, will form a different distribution. For example, if a given process is running well and only subject to natural variations, its underlying distribution will be a normal distribution with a characteristic mean and standard deviation. However, if some part of the process slowly wears out, it is very likely that the mean of the underlying distribution will change. So by monitoring characteristics of a process’s underlying distribution, one can detect changes in the process itself. This project focused on three probability distributions, the normal, the uniform and the exponential distributions. The most common distribution is the normal distribution. The normal distribution occurs when there are a large number of unknown events occurring, each of which has a small overall impact on the process. The normal distribution has a well-known bell shape. The uniform distribution is flat and occurs when there is a single event that has equal chances of being at any one of a number of states. The roll of a single die with an equal chance of being one through six is a good example of a uniform distribution. The exponential distribution occurs when there are a large number of potential events, each with a equal likelihood of happening, but each event happens independently of the others. Examples of these distributions are shown in the next three pages. S Chart 4 3.5 UCL S ta ndard Deviat ion 3 2.5 2 1.5 1 0.5 0 LCL 0 10 20 30 40 50 60 S ample Number 70 80 90 100 3 The Normal Distribution Central Limit Theorem for 10000 samples of size 10: Normal Distribution 1600 1400 1400 1200 1200 Number of counts Number of counts Normal distribution mean = 10, std dev = 1 1600 1000 800 1000 800 600 600 400 400 200 200 0 6 7 8 9 10 Units 11 12 13 14 0 6 7 8 9 10 Units 11 12 13 14 The graph on the left is a histogram of 10000 samples of data drawn from a normal distribution with a mean of ten and a standard deviation of one. The graph on the right is a histogram of the mean of 10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is also given in red. Both charts are on the same scale. They illustrate one behavior of the central limit theorem. As the number of grouped measurements drawn from a given distribution increase, the resulting standard deviation decreases. 4 The Uniform Distribution Uniform d is tribution mean = 10, s td dev = 1 Central Limit Theorem for 10000 samples of size 10: Uniform Distribution 6 00 1500 5 00 Number of counts Number of count s 4 00 3 00 2 00 1000 500 1 00 0 7 8 9 10 value 11 12 13 0 7 8 9 10 Value 11 12 13 The graph on the left is a histogram of 10000 samples taken from a uniform distribution with a mean of 10 and a standard deviation of 1. The uniform distribution is “flat.” Every value between the minimum and maximum has an equal chance of occurring. The graph on the right is a histogram of the mean of 10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is given in red. Both charts are on the same horizontal scale, but they have different vertical scales. These graphs illustrate the main behavior of the central limit theorem. While the samples are taken from a flat distribution, as they are grouped together, the resulting distribution approaches a normal distribution. This implies that when using large sample sizes all underlying distributions can be treated as normal. While this makes intuitive sense with relatively benign, symmetric distributions like the normal and the uniform distributions, the next graphs show similar results using the highly non-symmetric exponential distribution. 5 Exponential 5000 Central Limit Theorem for 10000 samples of size 10: Exponential Distribution 5000 4500 4500 4000 4000 3500 3500 Number of counts Number of counts Exponential distribution, mean 10, std dev = 1 3000 2500 2000 3000 2500 2000 1500 1500 1000 1000 500 500 0 0 9 10 11 12 13 14 15 Value 16 17 18 19 20 9 10 11 12 13 14 15 Value 16 17 18 19 20 The graph on the left is a histogram of 10000 samples taken from an exponential distribution with a mean of 10 and a standard deviation of one. The graph on the right is a histogram of the mean of 10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is given in red. Both charts are on the same scale. The exponential distribution is highly non-symmetric with both the median and mode much lower than the mean. Intuitively there is nothing in the left histogram that would indicate a “bump” forming around the mean of ten. However, the distribution generated by taking groups of ten samples is becoming much more symmetric with a decided peak around ten. 6 Operating Characteristics Charts Since part of this project was to compare the “power” or “performance” of the various control charts under different circumstances, we needed some way to measure their characteristics. We choose to do this by comparing the different charts’ operating characteristics. An operating characteristic chart shows how likely a given method is to miss a valid change in process as a function of how big a change was made. For example, if a process changes ten percent and the way we are trying to detect changes watches for 99 percent changes it is not very likely that our method would detect the change. The same method would do a lot better if the random change were 100 percent. An example chart is given below: OC Chart for X-bar based on s 1 0.9 probability of not detecting the shift 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 sigma shift 4 5 6 The horizontal axis gives how large a change there is in the process. In this case the change is measured in how many standard deviations the process mean changed. The vertical axis shows how likely the chart was to miss the change. In this example there are seven lines denoting the characteristics of this chart for sample sizes of 2,3,5,10,20,30 and 50 working from right to left. Looking at the blue line (rightmost) we see that with only two samples, this control chart would need more than a two standard deviation shift before it had a 50 percent chance of detecting the change. With 50 samples (the black, leftmost, line) the same chart would need less than 8/10ths of a standard deviation change for a 50 percent chance. Goal The goal of this project was to explore how the different variable-data control charts perform given nonnormal distributions and to compare the performance of the different dispersion graphs. To meet this goal first we looked at how the standard deviation-based X-Bar chart handled the three different distributions discussed above as a function of sample size. To meet the second portion of the goal, we then looked at how each of the dispersion charts handled the normal distribution as a function of sample size. Finally we looked at how each of the dispersion charts handled each of the distributions as a function of sample size. 7 The results of these experiments are given in the body of this report. Procedure This section describes how the project’s data and results were generated. We will discuss how the data and results for the mean and standard deviation shifts were obtained. Process Mean Data The overall data and result generation process consisted of four steps. For each sample size we would generate a set of baseline data. This step mimicked the normal collection of thirty good data points from which to establish the process’s mean upper and lower control limits. An example of the baseline data is given below: Xbar Ch art 12 UC L 1 1. 5 Mea sureme nt s 11 1 0. 5 10 9. 5 9 8. 5 LC L 44 8 0 10 20 30 40 50 S ample s 60 70 80 90 10 0 We started with using thirty points to compute these parameters but soon changed to 100. At thirty points, “false alarm” in the baseline would significantly skew the remainder of the data. For example, the graph above shows the calculated process mean to be about 10.2. The random number generator that we used to generate this data was set for a mean of 10. Moving to 100 points corrected this skew whether or not there were “false alarm” points. After we had computed the process mean, upper and lower control limits we would then generate 10000 points of experimental data for each process shift. Some experimenting was needed to determine which shifts would generate a reasonable OC chart. We settled on shifts of 0.1 standard deviation from 0.1 until 4.0 and then single points at 5.0 and 6.0 standard deviations. An example of a three-sigma process mean shift is given below. Finally, at each shift we would compare each of the 10000 points of experimental data against the upper and lower control limits established for the baseline data. The operating characteristic measurement for that shift was determined by dividing the number of out of control points by 10000 and subtracting the result from one. 8 Xbar Chart: Normal Distribution with a 3 Sigma Shift 15 42 53 71 75 82 86 90 55 98 65 46 60 41 33 93 100 51 57 63 64 68 78 83 87 92 38 4547 7073 48 5254 5962 69 7476 9597 37 43 44 7779 88 94 80 31 58 3234 39 66 72 49 85 91 99 14 13 UCL Measurements 12 11 10 9 8 LCL 7 0 10 20 30 40 50 Samples 60 70 80 90 100 Process dispersion shift data The method for generating the process dispersion shift data was the same as for the process mean shift with two major differences. The first difference is that instead of shifting the process mean by a number of standard deviations, we are changing the standard deviation to a percentage or a multiple of the baseline value. The second difference is that while in the process mean case we only had one chart, X-Bar, in the dispersion case we have three charts, range, standard deviation and variance. In the interest of symmetry, an example baseline s chart and three times the standard deviation s chart are shown below: S Chart 4 3.5 UCL S ta ndard Deviat ion 3 2.5 2 1.5 1 0.5 0 LCL 0 10 20 30 40 50 60 S ample Number 70 80 90 100 9 S Chart 6 79 85 5 6365 37 36 55 40 Standard Deviation 4 50 46 UCL 95 62 66 80 3 2 1 0 LCL 0 10 20 30 40 50 60 Sample Number 70 80 90 100 Results This section summarizes the results obtained from the data generated by the procedure given in the previous section. We will first look at the operating characteristic graphs generated by the mean shift data. Then we will examine the operating characteristic graphs generated by the process dispersion shifts. Mean shifts The first sets of results are from the process mean shifting experiment. There is one graph each showing how the X-Bar chart based on the standard deviation performs against the three distributions at different sample sizes. Then there is a comparative performance chart that shows the performance against the three distributions on one chart for three sample sizes. 10 Normal OC Chart for X-bar based on s 1 0.9 probability of not detecting the shift 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 sigma shift 4 5 6 This chart shows the operating characteristics for a X-Bar chart based on the standard deviation with an underlying normal distribution (mean ten, standard deviation one) at sample sizes of 2, 3, 5, 10, 20, 30, and 50. The slight “wobbles” in the lines are due to the Monte Carlo simulation. All seven sample sizes show very low rates of false alarms (they all intercept the vertical axis very near 100 percent.) 11 Uniform OC Chart for X-bar based on s 1 0.9 probability of not detecting the shift 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 sigma shift 4 5 6 This chart shows the operating characteristics for a X-Bar chart based on the standard deviation with an underlying uniform distribution (mean ten, standard deviation one) at sample sizes of 2, 3, 5, 10, 20, 30, and 50. Again, note that since the lines intercept the vertical axis near one the false alarm rate is very low. 12 Exponential OC Ch art fo r X-ba r bas ed on s 1 proba bility of n ot det ect ing th e s hift 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0 0 1 2 3 s ig ma s hift 4 5 6 This chart shows the operating characteristics for a X-Bar chart based on the standard deviation with an underlying exponential distribution (mean ten, standard deviation one) at sample sizes of 2, 3, 5, 10, 20, 30, and 50. Inspecting where the lines intersect the vertical axis we see that they don’t all intersect very near to one. This means that we would expect a non-trivial false alarm rate. Viewed from this graph, we would expect about a two percent false alarm rate as opposed to well under one half of one percent for a normal distribution. 13 Comparative Performance This page shows an operating characteristic chart that gives the comparative performance of the X-Bar chart based on the standard deviation with respect to the three distributions and sample size. The different distributions, normal, uniform and exponential are denoted by different colors. The sample sizes, 2, 5 and 30 move from right to left. OC Chart for X-bar based on s 1 black = normal blue = uniform red = exponential 0.9 probability of not detecting the shift 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 sigma shift 4 5 6 This chart shows the operating characteristic chart for the X-Bar chart based on standard deviation with three different underlying distributions for sample sizes of 2, 5 and 30. From this result it seems that the XBar chart based on standard deviation is relatively insensitive to underlying distribution. Dispersion shifts The next sets of results are from the process dispersion changing experiment. There is one graph each showing the operating characteristics of the different dispersion charts against the normal distribution at different sample sizes. Then there are three comparative performance charts that show the performance of the three dispersion charts against the three distributions for three sample sizes. 14 Range OC Chart for range chart 1 0.9 probability of not detecting the shift 0.8 0.7 0.6 0.5 2 0.4 0.3 3 0.2 5 0.1 20 10 0 0 1 2 3 sigma shift 4 5 6 This first chart shows the operating characteristics of the range chart against a normal distribution of mean ten and standard distribution of one for sample sizes of 2, 3, 5, 10, and 20. Note that this chart only handles shrinking ranges for sample sizes of 10 and 20. 15 Standard Deviation OC Chart for s chart 1 0.9 probability of not detecting the shift 0.8 0.7 0.6 0.5 2 0.4 0.3 3 0.2 5 0.1 20 50 30 10 0 0 1 2 3 sigma shift 4 5 6 This first chart shows the operating characteristics of the standard deviation chart against a normal distribution of mean ten and standard distribution of one for sample sizes of 2, 3, 5, 10, 20.and 50. Note that this chart only handles shrinking standard deviations for sample sizes of 10, 20 , 30, and 50. 16 Variance O C Cha rt for s 2 ch art 1 proba bility of n ot detecting th e s hift 0. 9 0. 8 0. 7 0. 6 0. 5 2 0. 4 0. 3 3 50 0. 2 30 0. 1 20 5 10 0 0 1 2 3 s ig ma s hift 4 5 6 This first chart shows the operating characteristics of the variance chart against a normal distribution of mean ten and standard distribution of one for sample sizes of 2, 3, 5, 10, 20, 30 and 50. This chart handles shrinking variances down to a sample size of five. If the Monte Carlo simulation had a smaller sigma shift size than 0.1 this chart may have handled a sample size of three. 17 Comparative Performance The next three graphs show the comparative performance of the range, standard deviation and variance charts against the three distributions for three sample sizes. 18 Range O C for rang e ch art 1 b lack = no rma l blu e = unifo rm red = exp on ential prob ab ilit y of n ot d et ecting t he s h ift 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0 0 1 2 3 4 s igma s h ift 5 6 7 8 This graph shows the relative performance of the range chart against the normal, uniform and exponential distributions for sample sizes of 2, 5 and 20. 19 Standard Deviation OC fo r s ch art 1 blac k = normal b lu e = u niform red = e xpo nen tial proba bility of n ot det ect ing th e s hift 0. 9 0. 8 0. 7 2 0. 6 0. 5 0. 4 2 0. 3 2 5 0. 2 0. 1 20 20 20 5 5 0 0 1 2 3 4 s ig ma s hift 5 6 7 8 This graph shows the relative performance of the standard deviation chart against the normal, uniform and exponential distributions for sample sizes of 2, 5 and 20. 20 Variance OC for s 2 ch art 1 b lack = no rma l blu e = unifo rm red = exp on ential prob ab ilit y of n ot d et ecting t he s h ift 0. 9 0. 8 2 0. 7 0. 6 0. 5 2 0. 4 2 0. 3 5 0. 2 20 20 20 0. 1 5 5 0 0 1 2 3 4 s igma s h ift 5 6 7 8 This graph shows the relative performance of the variance chart against the normal, uniform and exponential distributions for sample sizes of 2, 5 and 20. 21 Conclusion This section documents the conclusions reached based on the results presented in the previous section. X-bar chart conclusion The result from the comparative performance of the X-Bar chart based on the standard deviation against the three distributions shows that the X-Bar chart based on the standard deviation is relatively insensitive to the underlying distribution. The largest differences seen in that chart are from the extreme distribution, the exponential, at low sample sizes. This leads strong credence to the belief that the Central Limit Theorem is a strong actor in the X-Bar chart. Dispersion chart conclusions The individual operating characteristic graphs show that the variance chart has the best performance for narrowing distributions and that the standard deviation graph has the best performance for widening distributions. The comparative performance graphs show that all three dispersion charts are very sensitive to the underlying distribution. 22 Appendix One: Matlab scripts Matlab scripts generated during this project. New Matlab scripts OC_curve.m subgroup_sizes = [2 5 30]; %subgroup_sizes = [2 3 5 10 20 30 50]; points = 10000; group_result = []; for subgroup_size = subgroup_sizes % subgroup_size % generate the baseline 30 points %baseline = unifrnd(10-sqrt(3),10+sqrt(3),subgroup_size,100); baseline = normrnd(10,1,subgroup_size,100); % baseline = exprnd(1,subgroup_size,100)+9; baseline_means = mean(baseline); baseline_grand_ave = mean(baseline_means'); baseline_ranges = max(baseline) - min(baseline); baseline_ave_range = mean(baseline_ranges'); baseline_stds = std(baseline); baseline_ave_std = mean(baseline_stds'); %xbar_UCL_range = baseline_grand_ave + 0.58 * baseline_ave_range; %xbar_LCL_range = baseline_grand_ave - 0.58 * baseline_ave_range; xbar_UCL_std = baseline_grand_ave + A3(subgroup_size)* baseline_ave_std; xbar_LCL_std = baseline_grand_ave - A3(subgroup_size)* baseline_ave_std; shifts = [0:.1:4 5 6]; mid_result = spc_shifter(shifts, [xbar_LCL_std xbar_UCL_std], [1 10 1], [subgroup_size points]); group_result = [group_result; mid_result]; mid_result = spc_shifter(shifts, [xbar_LCL_std xbar_UCL_std], [2 10 1], [subgroup_size points]); group_result = [group_result; mid_result]; mid_result = spc_shifter(shifts, [xbar_LCL_std xbar_UCL_std], [3 10 1], [subgroup_size points]); group_result = [group_result; mid_result]; end 23 % plot(shifts, group_result); for gag = [1:3:9] plot(shifts, group_result(gag,:),'k:', ... shifts,group_result(gag+1,:),'b--', ... shifts, group_result(gag+2,:),'r-'); hold on end title('OC Chart for X-bar based on s'); xlabel('sigma shift'); ylabel('probability of not detecting the shift'); legend('black = normal','blue = uniform','red = exponential',0) hold off OC_dis_curve.m subgroup_sizes = [2 5 20]; points = 10000; method = 1; dist = 1; % number of Monte Carlo points % 1 = range, 2=std, 3=var % 1 = normal, 2 = uniform, 3 = exponential group_result = []; for subgroup_size = subgroup_sizes subgroup_size if dist == 1 baseline = normrnd(10,1,subgroup_size,100); elseif dist == 2 baseline = unifrnd(10-sqrt(3),10+sqrt(3),subgroup_size,100); elseif dist == 3 baseline = exprnd(1,subgroup_size,100)+9; end if method == 1 baseline_ranges = max(baseline) - min(baseline); baseline_ave_range = mean(baseline_ranges'); UCL = baseline_ave_range * bD4(subgroup_size); LCL = baseline_ave_range * bD3(subgroup_size); elseif method == 2 baseline_stds = std(baseline); baseline_ave_std = mean(baseline_stds'); UCL = baseline_ave_std * B4(subgroup_size); LCL = baseline_ave_std * B3(subgroup_size); elseif method == 3 baseline_var = var(baseline); baseline_ave_var = mean(baseline_var); UCL = baseline_ave_var * chi2inv((1-(1-0.9974)/2),subgroup_size 1) / (subgroup_size - 1); LCL = baseline_ave_var * chi2inv((1-0.9974)/2,subgroup_size - 1) / (subgroup_size - 1); end 24 shifts = [.1:.1:3 3.5:.5:8]; mid_result = spc_dis_shifter(shifts, [LCL UCL], [1 10 1], [subgroup_size points], method); group_result = [group_result; mid_result]; mid_result = spc_dis_shifter(shifts, [LCL UCL], [2 10 1], [subgroup_size points], method); group_result = [group_result; mid_result]; mid_result = spc_dis_shifter(shifts, [LCL UCL], [3 10 1], [subgroup_size points], method); group_result = [group_result; mid_result]; end for gag = [1:3:9] plot(shifts, group_result(gag,:),'k:', ... shifts,group_result(gag+1,:),'b-', ... shifts, group_result(gag+2,:),'r--'); hold on end if method == 1 title('OC for range chart'); elseif method == 2 title('OC for s chart'); elseif method == 3 title('OC for s2 chart'); end xlabel('sigma shift'); ylabel('probability of not detecting the shift'); legend('black = normal','blue = uniform','red = exponential',0) hold off Spc_shifter.m function [result] = spc_shifter(shifts, cntl_lim, dist, sizes) % function [result] = spc_shifter(shifts, cntl_lim, dist) % % shifts = [ a row vector of how far in nominal std dev/var to shift the mean ] % cntl_lim = [LCL UCL] % dist = [ 1=normal, 2=uniform, 3=exponential mean std_dev ] % % result = [ a row vector of the Monte Carlo probability of not detecting the % shifts] [m n] = size(shifts); result = []; dist for shift = shifts 25 % generate shifted distribution if dist(1) == 1 shift_dist = normrnd(dist(2) + shift, dist(3), sizes(1), sizes(2)); elseif dist(1) == 2 shift_dist = unifrnd(dist(2)-sqrt(3*dist(3)) + shift, dist(2)+sqrt(3*dist(3)) + shift, sizes(1), sizes(2)); elseif dist(1) == 3 shift_dist = exprnd(1, sizes(1), sizes(2))+ dist(2) + shift - 1; end shift_means = mean(shift_dist); counts = sum(1*((shift_means < cntl_lim(1)) | (cntl_lim(2) < shift_means))); counts = 1 - counts/sizes(2); result = [result counts]; end Spc_dis_shifter.m function [result] = spc_dis_shifter(shifts, cntl_lim, dist, sizes, method) % function [result] = spc_dis_shifter(shifts, cntl_lim, dist) % % shifts = [ a row vector of how far in nominal std dev/var to shift the mean ] % cntl_lim = [LCL UCL] % dist = [ 1=normal, 2=uniform, 3=exponential mean std_dev ] % % result = [ a row vector of the Monte Carlo probability of not detecting the % shifts] % method = 1 = range, 2 = std, 3 = var result = []; for shift = shifts % generate shifted distribution if dist(1) == 1 shift_dist = normrnd(dist(2), shift * dist(3), sizes(1), sizes(2)); elseif dist(1) == 2 shift_dist = unifrnd(dist(2)-sqrt(3*dist(3)*shift), dist(2)+sqrt(3*dist(3)*shift), sizes(1), sizes(2)); elseif dist(1) == 3 shift_dist = exprnd(dist(3)*shift, sizes(1), sizes(2))+ dist(2) + 10 - dist(3)*shift; end if method == 1 center = max(shift_dist)-min(shift_dist); elseif method == 2 26 center = std(shift_dist); elseif method == 3 center = var(shift_dist); end counts = sum(1*((center < cntl_lim(1)) | (cntl_lim(2) < center))); counts = 1 - counts/sizes(2); result = [result counts]; end d2.m function y = d2(num) % function y = d2(num) %num r = [2:1:25 30:5:45 50:10:100]; d2m = [ 1.128, 1.693, 2.059, 2.326, ... 2.534, 2.704, 2.847, 2.970, 3.078, ... 3.173, 3.258, 3.336, 3.407, 3.472, ... 3.532, 3.588, 3.640, 3.689, 3.735, ... 3.778, 3.819, 3.858, 3.895, ... 3.931, 4.086, 4.213, 4.322, 4.415 ... 4.498, 4.639, 4.755, 4.854, 4.939, ... 5.015]; index = find (r == num); y = d2m(index); d3.m function y = d3(num) % function y = d3(num) %num r = [2:1:25 30:5:45 50:10:100]; d3m = [.8525 .8884 .8798 .8641 ... .8480 .8332 .8198 .8078 .7971 .7873 .7785 .7704 .7630 .7562 .7499 .7441 .7386 .7335 .7287 .7242 .7199 .7159 .7121 ... .7084 .6927 .6799 .6692 .6601 .6521 .6389 .6283 .6194 .6118 .6052]; ... ... ... ... ... index = find (r == num); y = d3m(index); 27 bD3.m function y = bD3(num, sigma) % function y = bD3(num) % % num = the sample size % sigma (opt) = the number of standard deviations to put the control % limit if nargin < 2 sigma = 3.0; end if isempty(sigma) sigma = 3.0; end temp = num >= 7; y = temp .* (1 - sigma .* d3(num)./d2(num)); function y = D4(num, sigma) % function y = D4(num) % % num = the sample size % sigma (opt) = the number of standard deviations to put the control % limit if nargin < 2 sigma = 3.0; end if isempty(sigma) sigma = 3.0; end y = 1 + sigma .* d3(num)./d2(num); A2.m function y = A2(num, sigma) % function y = A2(num) % % num = the sample size % sigma (opt) = the number of standard deviations to put the control % limit if nargin < 2 sigma = 3.0; end if isempty(sigma) sigma = 3.0; end 28 y = sigma ./ (d2(num) .* sqrt(num)); A3.m function y = A3(num, sigma) % function y = A3(num) % % num = the sample size % sigma (opt) = the number of standard deviations to put the control % limit if nargin < 2 sigma = 3.0; end if isempty(sigma) sigma = 3.0; end y = sigma ./ (c4(num) .* sqrt(num)); A4.m function y = A4(num, sigma) % function y = A4(num) % % num = the sample size % sigma (opt) = the number of standard deviations to put the control % limit if nargin < 2 sigma = 3.0; end if isempty(sigma) sigma = 3.0; end y = sigma ./ (d4(num) .* sqrt(num)); B3.m function y = B3(num, sigma) % function y = B3(num) % % num = the sample size % sigma (opt) = the number of standard deviations to put the control % limit if nargin < 2 sigma = 3.0; end 29 if isempty(sigma) sigma = 3.0; end temp = (num >= 6); y = temp .* (1.0 - sigma .* sqrt(1 - c4(num) .* c4(num))./ c4(num)); B4.m function y = B4(num, sigma) % function y = B4(num, sigma) % % num = the sample size % sigma (opt) = the number of standard deviations to put the control % limit if nargin < 2 sigma = 3.0; end if isempty(sigma) sigma = 3.0; end y = 1.0 + sigma .* sqrt(1 - c4(num) .* c4(num))./ c4(num); c2.m function result = c2(num) % result = c2(num) % % this is the small c2 constant for SPC result = sqrt(2./num) .* gamma(num./2)./gamma((num-1)./2); c4.m function result = c4(num) % result = c4(num) % % this is the small c4 constant for SPC result = c2(num) .* sqrt(num ./(num-1)); 30 Modified Matlab scripts Xbarplot2.m function [outliers, h] = xbarplot2(data,conf,specs) %XBARPLOT X-bar chart for monitoring the mean. % XBARPLOT(DATA,CONF,SPECS) produces an xbar chart of % the grouped responses in DATA. The rows of DATA contain % replicate observations taken a a given time. The rows % should be in time order. % % CONF (optional) is the confidence level of the upper and % lower plotted confidence limits. CONF is 0.99 by default. % This means that 99% of the plotted points should fall % between the control limits. % % SPECS (optional) is a two element vector for the lower and % upper specification limits of the response. % % OUTLIERS = XBARPLOT(DATA,CONF,SPECS) returns a vector of % indices to the rows where the mean of DATA is out of control. % % [OUTLIERS, H] = XBARPLOT(DATA,CONF,SPECS) also returns a vector % of handles, H, to the plotted lines. % % % B.A. Jones 9/30/94 Copyright (c) 1993-98 by The MathWorks, Inc. $Revision: 2.6 $ $Date: 1997/11/29 01:47:10 $ if nargin < 2 conf = 0.9974; end if isempty(conf) conf = 0.9974; end [m,n] = size(data); if m > 30 m = 30; end xbar = mean(data(1:m,:)')'; avg = mean(xbar); s = sqrt(sum(sum(((data(1:m,:) - xbar(:,ones(n,1))).^2)))./(m*(n1))); tinverse = tinv(conf,m*(n-1)); UCL = avg + tinverse*s./sqrt(n-1); LCL = avg - tinverse*s./sqrt(n-1); tmp = NaN; 31 [m,n] = size(data); xbar = mean(data')'; incontrol = tmp(1,ones(1,m)); outcontrol = incontrol; greenpts = find(xbar > LCL & xbar < UCL); redpts = find(xbar <= LCL | xbar >= UCL); incontrol(greenpts) = xbar(greenpts); outcontrol(redpts) = xbar(redpts); samples = (1:m); hh = plot(samples,xbar,'k-',samples,UCL(ones(m,1),:),'r',samples,avg(ones(m,1),:),'g-',... samples,LCL(ones(m,1),:),'r-',samples,incontrol,'b+',... samples,outcontrol,'r+'); if any(redpts) for k = 1:length(redpts) text(redpts(k) + 0.5,outcontrol(redpts(k)),num2str(redpts(k))); end end whitebg(gcf,'white'); t1 = text(0.5,UCL,'UCL','Color','r'); t2 = text(0.5,LCL,'LCL','Color','r'); title('Xbar Chart','Color','w'); if nargin == 3 set(gca,'NextPlot','add'); LSL = specs(1); USL = specs(2); t3 = text(m + 0.5,USL,'USL','Color','r'); t4 = text(m + 0.5,LSL,'LSL','Color','r'); hh1 = plot(samples,LSL(ones(m,1),:),'g',samples,USL(ones(m,1),:),'g-'); set(gca,'NextPlot','replace'); hh = [hh; hh1]; end if nargout > 0 outliers = redpts; end if nargout == 2 h = hh; end set(hh([3 5 6]),'LineWidth',2); xlabel('Samples'); ylabel('Measurements'); 32 Schart2.m function [outliers, h] = schart2(data,conf,specs) %SCHART S chart for monitoring the standard deviation. % SCHART(DATA,CONF,SPECS) produces an S chart of % the grouped responses in DATA. The rows of DATA contain % replicate observations taken a a given time. The rows % must be in time order. % % CONF (optional) is the confidence level of the upper and % lower plotted confidence limits. CONF is 0.99 by default. % This means that 99% of the plotted points should fall % between the control limits. % % SPECS (optional) is a two element vector for the lower and % upper specification limits of the response. % % OUTLIERS = SCHART(DATA,CONF,SPECS) returns a vector of % indices to the rows where the standard deviation of DATA is % out of control. % % [OUTLIERS, H] = SCHART(DATA,CONF,SPECS) also returns a vector % of handles, H, to the plotted lines. % % Reference: Montgomery, Douglas, Introduction to Statistical Quality Control, John Wiley & Sons 1991 p. 235. % % % B.A. Jones 2-13-95 Copyright (c) 1993-98 by The MathWorks, Inc. $Revision: 2.6 $ $Date: 1997/11/29 01:46:43 $ if nargin < 2 conf = 0.9974; end if isempty(conf) conf = 0.9974; end ciprob = 1-(1-conf)/2; [m,n] = size(data); if m > 30 m = 30; end xbar s sbar = mean(data(1:m,:)')'; = (std(data(1:m, :)'))'; = mean(s); c4 = sqrt(2/(n-1)).*gamma(n/2)./gamma((n-1)/2); cicrit = tinv(ciprob,n-1) b3 = 1 - cicrit*sqrt(1-c4*c4)/c4; b4 = 1 + cicrit*sqrt(1-c4*c4)/c4; %chi2crit = chi2inv([(1-conf)/2 1-(1-conf)/2],n-1); %sigmaci = sbar*sqrt((n-1)./chi2crit) 33 LCL = b3*sbar; if LCL < 0, LCL = 0; end UCL = b4*sbar; tmp = NaN; [m,n] = size(data); s = std(data')'; incontrol = tmp(1,ones(1,m)); outcontrol = incontrol; greenpts = find(s > LCL & s < UCL); redpts = find(s <= LCL | s >= UCL); incontrol(greenpts) = s(greenpts); outcontrol(redpts) = s(redpts); samples = (1:m); hh = plot(samples,s,'k-',samples,UCL(ones(m,1),:),'r',samples,sbar(ones(m,1),:),'g-',... samples,LCL(ones(m,1),:),'r-',samples,incontrol,'b+',... samples,outcontrol,'r+'); if any(redpts) for k = 1:length(redpts) text(redpts(k) + 0.5,outcontrol(redpts(k)),num2str(redpts(k))); end end whitebg(gcf,'white'); t1 = text(0.5,UCL,'UCL','Color','k'); t2 = text(0.5,LCL,'LCL','Color','k'); title('S Chart','Color','k'); if nargin == 3 set(gca,'NextPlot','add'); LSL = specs(1); USL = specs(2); t3 = text(m + 0.5,USL,'USL','Color','k'); t4 = text(m + 0.5,LSL,'LSL','Color','k'); hh1 = plot(samples,LSL(ones(m,1),:),'g',samples,USL(ones(m,1),:),'g-'); set(gca,'NextPlot','replace'); hh = [hh; hh1]; end if nargout > 0 outliers = redpts; end if nargout == 2 h = hh; end 34 set(hh([3 5 6]),'LineWidth',2); xlabel('Sample Number'); ylabel('Standard Deviation'); S2chart.m function [outliers, h] = s2chart(data,conf,specs) %S2CHART S2 chart for monitoring the variance. % SCHART(DATA,CONF,SPECS) produces an S chart of % the grouped responses in DATA. The rows of DATA contain % replicate observations taken a a given time. The rows % must be in time order. % % CONF (optional) is the confidence level of the upper and % lower plotted confidence limits. CONF is 0.99 by default. % This means that 99% of the plotted points should fall % between the control limits. % % SPECS (optional) is a two element vector for the lower and % upper specification limits of the response. % % OUTLIERS = SCHART(DATA,CONF,SPECS) returns a vector of % indices to the rows where the standard deviation of DATA is % out of control. % % [OUTLIERS, H] = SCHART(DATA,CONF,SPECS) also returns a vector % of handles, H, to the plotted lines. % % Reference: Montgomery, Douglas, Introduction to Statistical Quality Control, John Wiley & Sons 1991 p. 235. % % % B.A. Jones 2-13-95 Copyright (c) 1993-98 by The MathWorks, Inc. $Revision: 2.6 $ $Date: 1997/11/29 01:46:43 $ if nargin < 2 conf = 0.9973; end if isempty(conf) conf = 0.9973; end ciprob = 1-(1-conf)/2; [m,n] = size(data); if m > 30 m = 30; end xbar = mean(data(1:m,:)')'; s2 = (var(data(1:m, :)'))'; s2bar = mean(s2); %c4 = sqrt(2/(n-1)).*gamma(n/2)./gamma((n-1)/2); 35 %cicrit = tinv(ciprob,n-1); %b3 = 1 - cicrit*sqrt(1-c4*c4)/c4; %b4 = 1 + cicrit*sqrt(1-c4*c4)/c4; %chi2crit = chi2inv([(1-conf)/2 1-(1-conf)/2],n-1); %sigmaci = sbar*sqrt((n-1)./chi2crit) %LCL = b3*sbar; LCL = s2bar * chi2inv((1-conf)/2,n - 1) / (n - 1); if LCL < 0, LCL = 0; end %UCL = b4*sbar; UCL = s2bar * chi2inv(ciprob,n - 1) / (n - 1); tmp = NaN; [m,n] = size(data); s = std(data')'; incontrol = tmp(1,ones(1,m)); outcontrol = incontrol; greenpts = find(s > LCL & s < UCL); redpts = find(s <= LCL | s >= UCL); incontrol(greenpts) = s(greenpts); outcontrol(redpts) = s(redpts); samples = (1:m); hh = plot(samples,s,'k-',samples,UCL(ones(m,1),:),'r',samples,s2bar(ones(m,1),:),'g-',... samples,LCL(ones(m,1),:),'r-',samples,incontrol,'b+',... samples,outcontrol,'r+'); if any(redpts) for k = 1:length(redpts) text(redpts(k) + 0.5,outcontrol(redpts(k)),num2str(redpts(k))); end end whitebg(gcf,'white'); t1 = text(0.5,UCL,'UCL','Color','k'); t2 = text(0.5,LCL,'LCL','Color','k'); title('S2 Chart','Color','k'); if nargin == 3 set(gca,'NextPlot','add'); LSL = specs(1); USL = specs(2); t3 = text(m + 0.5,USL,'USL','Color','k'); t4 = text(m + 0.5,LSL,'LSL','Color','k'); hh1 = plot(samples,LSL(ones(m,1),:),'g',samples,USL(ones(m,1),:),'g-'); set(gca,'NextPlot','replace'); hh = [hh; hh1]; end 36 if nargout > 0 outliers = redpts; end if nargout == 2 h = hh; end set(hh([3 5 6]),'LineWidth',2); xlabel('Sample Number'); ylabel('Variance'); Appendix 2: Special result Professor Tsao specifically asked about the performance of the X-Bar based on range and X-Bar based on the standard deviation with an underlying exponential distribution. The following chart was generated to answer his question. It shows the operating characteristic curve for both charts against an exponential distribution at sample sizes of two, five and twenty. OC Chart for X-bar based on range vs based on std dev 1 black = range red = std dev probability of not detecting the shift 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 20 0 0.5 1 5 1.5 sigma shift 2 2 2.5 3 Note that there is no significance in performance between the range and standard deviation based X-Bar charts. 37