Download Appendix One: Matlab scripts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Introduction
This paper documents the results of a large number of Monte Carlo simulations exploring the behavior of
different variable data statistical control charts. We were performing experiments to determine if various
variable data statistical control charts performed significantly differently against three common
distributions.
This paper consists of five sections and an appendix. This first section, the introduction, provides an
overview of the techniques used during this project and also gives the reader the overall format of this
paper. The next section contains the purpose and gives the goals of this effort. The third section discusses
the method by which the project was performed and the results generated. The section after that presents
this effort’s results. The fifth section contains the conclusions drawn from those results. The appendix
contains listings of the software tools used to generate the results.
This introduction discusses the general use of statistical control charts for variable data, details the mean
and dispersion charts we explored in this project, gives examples of the three data distributions we worked
with and explains the use of operating characteristic charts to compare the power of different control charts.
The next paragraph gives a brief discussion of the general use of control charts.
The charts
There are many different types of charts used in statistical process control, such as X-bar, Range, Standard
Deviation, Variance, XmR, u, c, and p. Each of these charts is useful for monitoring a specific type of data.
In general, data can be categorized in two ways. Variable data comes from physical measurements that
have a continuous range, such as length or weight. Attribute data is usually binary, (such as pass/fail,) a
count, (such as number of defects,) or a rate, (such as number of defects per inspected item). This paper
focuses on the statistical process control charts that handle variable rather than attribute data.
There are two general categories of variable data charts. These charts focus on a measure of the underlying
process’s central tendency or a measure of the underlying process’s dispersion. Both types will be
discussed in the following paragraphs.
Central tendency charts
First, we will discuss the charts that handle the central tendency of the process. While there are several
possible measures of central tendency: mean, median, and mode, the statistical process community uses the
mean in almost all cases.
The X-Bar chart is the chart that works with the mean of a given process’s variable data. The simplest XBar chart consists of a centerline and two limit lines. The centerline denotes the computed average of the
measurements being controlled (more on this later). The area between the two limit lines is where most of
the process’ measurements should fall. In the typical statistical process control application, one would
expect 99.73% (three standard deviations) of the process’s measurements to be within the limit lines. A
sample X-Bar chart is shown on the next page. The green line in the center is the estimate of the process
mean. The upper and lower red lines are the upper and lower control limits. In this example a single point,
number 44, is out of control.
In order to compute the proper values for the limit lines, we need the standard deviation of the process’s
measurements. Since there is no way to determine the population standard deviation without exhaustively
measuring the entire population of the process under examination, we need a good estimate for the
population standard deviation.
The statistical process community uses two different methods of obtaining this estimate. The first is based
on the range of the process measurements. The second is based on the standard deviation of the process
measurements. Obtaining the estimate from the range is operationally simple, but the accuracy of the
1
estimate suffers if large numbers of process samples are taken. Estimates based on the standard deviation
are somewhat more difficult to compute, but become more and more accurate as more samples are taken.
This paper addresses X-Bar charts made by using the sample standard deviation to approximate the
population standard deviation. While exploring both methods was technically feasible, resource limitations
forced the experimenter to choose a single method. (During the presentation Professor Tsao specifically
asked about the relative performance of the X-Bar chart based on range verses the X-Bar chart based on the
standard deviation, so a single chart was created and is included as the second appendix.)
Xbar Chart
12
UCL
11.5
11
Measurements
10.5
10
9.5
9
8.5 LCL
44
8
0
10
20
30
40
50
Samples
60
70
80
90
100
Dispersion charts
Next, we discuss the charts that handle the dispersion of the process. Again, there are several possible
measures of the dispersion: range, standard deviation and variance; the statistical process community uses
the range and standard deviation in almost all cases.
This project explored control charts based on all three measures of dispersion. Charts using the range are
commonly used because they are simple to construct and work well when small sample sizes are used.
Standard deviation charts are a bit more complex to construct but are thought to have better performance
with large sample sizes. Standard deviation charts also handle variable sample sizes. We also used control
charts based on the variance. Variance control charts are a bit more complex to construct than standard
deviation charts, but it was thought that they would perform better against non-normal distributions.
An example s chart is shown on the next page. In this example the process’s average standard deviation is
given in green. The upper control limit is given in red. In this instance, the lower control limit is zero, so it
is the horizontal axis. This example does not have any points out of control.
The distributions
In the perfect world, measuring the output of a process would yield exactly the same number every time.
The product would be precisely the correct size and there would be no noise in the measurement.
Unfortunately, many natural and artificial events occur during a process. These events blur the exact size
and precise measurement we would like to have. When one looks at a large number of these blurred
measurements, an underlying structure becomes clear. These underlying structures, determined by the gross
effects of these events, are probability distributions.
2
One nice thing about a process’s underlying probability distribution is that it strongly reflects the process’s
overall environment. A sharp change in the environment will change the measurements and these
measurements in turn, will form a different distribution. For example, if a given process is running well and
only subject to natural variations, its underlying distribution will be a normal distribution with a
characteristic mean and standard deviation. However, if some part of the process slowly wears out, it is
very likely that the mean of the underlying distribution will change. So by monitoring characteristics of a
process’s underlying distribution, one can detect changes in the process itself.
This project focused on three probability distributions, the normal, the uniform and the exponential
distributions. The most common distribution is the normal distribution. The normal distribution occurs
when there are a large number of unknown events occurring, each of which has a small overall impact on
the process. The normal distribution has a well-known bell shape. The uniform distribution is flat and
occurs when there is a single event that has equal chances of being at any one of a number of states. The
roll of a single die with an equal chance of being one through six is a good example of a uniform
distribution. The exponential distribution occurs when there are a large number of potential events, each
with a equal likelihood of happening, but each event happens independently of the others.
Examples of these distributions are shown in the next three pages.
S Chart
4
3.5 UCL
S ta ndard Deviat ion
3
2.5
2
1.5
1
0.5
0 LCL
0
10
20
30
40
50
60
S ample Number
70
80
90
100
3
The Normal Distribution
Central Limit Theorem for 10000 samples of size 10: Normal Distribution
1600
1400
1400
1200
1200
Number of counts
Number of counts
Normal distribution mean = 10, std dev = 1
1600
1000
800
1000
800
600
600
400
400
200
200
0
6
7
8
9
10
Units
11
12
13
14
0
6
7
8
9
10
Units
11
12
13
14
The graph on the left is a histogram of 10000 samples of data drawn from a normal distribution with a mean of ten and a standard deviation of one. The graph on
the right is a histogram of the mean of 10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is also given
in red. Both charts are on the same scale. They illustrate one behavior of the central limit theorem. As the number of grouped measurements drawn from a given
distribution increase, the resulting standard deviation decreases.
4
The Uniform Distribution
Uniform d is tribution mean = 10, s td dev = 1
Central Limit Theorem for 10000 samples of size 10: Uniform Distribution
6 00
1500
5 00
Number of counts
Number of count s
4 00
3 00
2 00
1000
500
1 00
0
7
8
9
10
value
11
12
13
0
7
8
9
10
Value
11
12
13
The graph on the left is a histogram of 10000 samples taken from a uniform distribution with a mean of 10 and a standard deviation of 1. The uniform
distribution is “flat.” Every value between the minimum and maximum has an equal chance of occurring. The graph on the right is a histogram of the mean of
10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is given in red. Both charts are on the same
horizontal scale, but they have different vertical scales. These graphs illustrate the main behavior of the central limit theorem. While the samples are taken from a
flat distribution, as they are grouped together, the resulting distribution approaches a normal distribution. This implies that when using large sample sizes all
underlying distributions can be treated as normal. While this makes intuitive sense with relatively benign, symmetric distributions like the normal and the
uniform distributions, the next graphs show similar results using the highly non-symmetric exponential distribution.
5
Exponential
5000
Central Limit Theorem for 10000 samples of size 10: Exponential Distribution
5000
4500
4500
4000
4000
3500
3500
Number of counts
Number of counts
Exponential distribution, mean 10, std dev = 1
3000
2500
2000
3000
2500
2000
1500
1500
1000
1000
500
500
0
0
9
10
11
12
13
14
15
Value
16
17
18
19
20
9
10
11
12
13
14
15
Value
16
17
18
19
20
The graph on the left is a histogram of 10000 samples taken from an exponential distribution with a mean of 10 and a standard deviation of one. The graph on the
right is a histogram of the mean of 10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is given in red.
Both charts are on the same scale. The exponential distribution is highly non-symmetric with both the median and mode much lower than the mean. Intuitively
there is nothing in the left histogram that would indicate a “bump” forming around the mean of ten. However, the distribution generated by taking groups of ten
samples is becoming much more symmetric with a decided peak around ten.
6
Operating Characteristics Charts
Since part of this project was to compare the “power” or “performance” of the various control charts under
different circumstances, we needed some way to measure their characteristics. We choose to do this by
comparing the different charts’ operating characteristics.
An operating characteristic chart shows how likely a given method is to miss a valid change in process as a
function of how big a change was made. For example, if a process changes ten percent and the way we are
trying to detect changes watches for 99 percent changes it is not very likely that our method would detect
the change. The same method would do a lot better if the random change were 100 percent. An example
chart is given below:
OC Chart for X-bar based on s
1
0.9
probability of not detecting the shift
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
sigma shift
4
5
6
The horizontal axis gives how large a change there is in the process. In this case the change is measured in
how many standard deviations the process mean changed. The vertical axis shows how likely the chart was
to miss the change. In this example there are seven lines denoting the characteristics of this chart for sample
sizes of 2,3,5,10,20,30 and 50 working from right to left. Looking at the blue line (rightmost) we see that
with only two samples, this control chart would need more than a two standard deviation shift before it had
a 50 percent chance of detecting the change. With 50 samples (the black, leftmost, line) the same chart
would need less than 8/10ths of a standard deviation change for a 50 percent chance.
Goal
The goal of this project was to explore how the different variable-data control charts perform given nonnormal distributions and to compare the performance of the different dispersion graphs.
To meet this goal first we looked at how the standard deviation-based X-Bar chart handled the three
different distributions discussed above as a function of sample size.
To meet the second portion of the goal, we then looked at how each of the dispersion charts handled the
normal distribution as a function of sample size. Finally we looked at how each of the dispersion charts
handled each of the distributions as a function of sample size.
7
The results of these experiments are given in the body of this report.
Procedure
This section describes how the project’s data and results were generated. We will discuss how the data and
results for the mean and standard deviation shifts were obtained.
Process Mean Data
The overall data and result generation process consisted of four steps. For each sample size we would
generate a set of baseline data. This step mimicked the normal collection of thirty good data points from
which to establish the process’s mean upper and lower control limits. An example of the baseline data is
given below:
Xbar Ch art
12
UC L
1 1. 5
Mea sureme nt s
11
1 0. 5
10
9. 5
9
8. 5 LC L
44
8
0
10
20
30
40
50
S ample s
60
70
80
90
10 0
We started with using thirty points to compute these parameters but soon changed to 100. At thirty points,
“false alarm” in the baseline would significantly skew the remainder of the data. For example, the graph
above shows the calculated process mean to be about 10.2. The random number generator that we used to
generate this data was set for a mean of 10. Moving to 100 points corrected this skew whether or not there
were “false alarm” points.
After we had computed the process mean, upper and lower control limits we would then generate 10000
points of experimental data for each process shift. Some experimenting was needed to determine which
shifts would generate a reasonable OC chart. We settled on shifts of 0.1 standard deviation from 0.1 until
4.0 and then single points at 5.0 and 6.0 standard deviations. An example of a three-sigma process mean
shift is given below.
Finally, at each shift we would compare each of the 10000 points of experimental data against the upper
and lower control limits established for the baseline data. The operating characteristic measurement for that
shift was determined by dividing the number of out of control points by 10000 and subtracting the result
from one.
8
Xbar Chart: Normal Distribution with a 3 Sigma Shift
15
42
53
71 75 82 86 90
55
98
65
46
60
41
33
93 100
51 57 63
64 68
78 83 87 92
38 4547
7073
48 5254 5962 69
7476
9597
37 43
44
7779
88 94
80
31
58
3234 39
66 72
49
85 91
99
14
13
UCL
Measurements
12
11
10
9
8
LCL
7
0
10
20
30
40
50
Samples
60
70
80
90
100
Process dispersion shift data
The method for generating the process dispersion shift data was the same as for the process mean shift with
two major differences. The first difference is that instead of shifting the process mean by a number of
standard deviations, we are changing the standard deviation to a percentage or a multiple of the baseline
value. The second difference is that while in the process mean case we only had one chart, X-Bar, in the
dispersion case we have three charts, range, standard deviation and variance.
In the interest of symmetry, an example baseline s chart and three times the standard deviation s chart are
shown below:
S Chart
4
3.5 UCL
S ta ndard Deviat ion
3
2.5
2
1.5
1
0.5
0 LCL
0
10
20
30
40
50
60
S ample Number
70
80
90
100
9
S Chart
6
79
85
5
6365
37
36
55
40
Standard Deviation
4
50
46
UCL
95
62
66
80
3
2
1
0 LCL
0
10
20
30
40
50
60
Sample Number
70
80
90
100
Results
This section summarizes the results obtained from the data generated by the procedure given in the
previous section. We will first look at the operating characteristic graphs generated by the mean shift data.
Then we will examine the operating characteristic graphs generated by the process dispersion shifts.
Mean shifts
The first sets of results are from the process mean shifting experiment. There is one graph each showing
how the X-Bar chart based on the standard deviation performs against the three distributions at different
sample sizes. Then there is a comparative performance chart that shows the performance against the three
distributions on one chart for three sample sizes.
10
Normal
OC Chart for X-bar based on s
1
0.9
probability of not detecting the shift
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
sigma shift
4
5
6
This chart shows the operating characteristics for a X-Bar chart based on the standard deviation with an underlying normal distribution (mean ten, standard
deviation one) at sample sizes of 2, 3, 5, 10, 20, 30, and 50. The slight “wobbles” in the lines are due to the Monte Carlo simulation. All seven sample sizes show
very low rates of false alarms (they all intercept the vertical axis very near 100 percent.)
11
Uniform
OC Chart for X-bar based on s
1
0.9
probability of not detecting the shift
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
sigma shift
4
5
6
This chart shows the operating characteristics for a X-Bar chart based on the standard deviation with an underlying uniform distribution (mean ten, standard
deviation one) at sample sizes of 2, 3, 5, 10, 20, 30, and 50. Again, note that since the lines intercept the vertical axis near one the false alarm rate is very low.
12
Exponential
OC Ch art fo r X-ba r bas ed on s
1
proba bility of n ot det ect ing th e s hift
0. 9
0. 8
0. 7
0. 6
0. 5
0. 4
0. 3
0. 2
0. 1
0
0
1
2
3
s ig ma s hift
4
5
6
This chart shows the operating characteristics for a X-Bar chart based on the standard deviation with an underlying exponential distribution (mean ten, standard
deviation one) at sample sizes of 2, 3, 5, 10, 20, 30, and 50. Inspecting where the lines intersect the vertical axis we see that they don’t all intersect very near to
one. This means that we would expect a non-trivial false alarm rate. Viewed from this graph, we would expect about a two percent false alarm rate as opposed to
well under one half of one percent for a normal distribution.
13
Comparative Performance
This page shows an operating characteristic chart that gives the comparative performance of the X-Bar
chart based on the standard deviation with respect to the three distributions and sample size. The different
distributions, normal, uniform and exponential are denoted by different colors. The sample sizes, 2, 5 and
30 move from right to left.
OC Chart for X-bar based on s
1
black = normal
blue = uniform
red = exponential
0.9
probability of not detecting the shift
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
sigma shift
4
5
6
This chart shows the operating characteristic chart for the X-Bar chart based on standard deviation with
three different underlying distributions for sample sizes of 2, 5 and 30. From this result it seems that the XBar chart based on standard deviation is relatively insensitive to underlying distribution.
Dispersion shifts
The next sets of results are from the process dispersion changing experiment. There is one graph each
showing the operating characteristics of the different dispersion charts against the normal distribution at
different sample sizes. Then there are three comparative performance charts that show the performance of
the three dispersion charts against the three distributions for three sample sizes.
14
Range
OC Chart for range chart
1
0.9
probability of not detecting the shift
0.8
0.7
0.6
0.5
2
0.4
0.3
3
0.2
5
0.1
20
10
0
0
1
2
3
sigma shift
4
5
6
This first chart shows the operating characteristics of the range chart against a normal distribution of mean ten and standard distribution of one for sample sizes
of 2, 3, 5, 10, and 20. Note that this chart only handles shrinking ranges for sample sizes of 10 and 20.
15
Standard Deviation
OC Chart for s chart
1
0.9
probability of not detecting the shift
0.8
0.7
0.6
0.5
2
0.4
0.3
3
0.2
5
0.1
20
50 30
10
0
0
1
2
3
sigma shift
4
5
6
This first chart shows the operating characteristics of the standard deviation chart against a normal distribution of mean ten and standard distribution of one for
sample sizes of 2, 3, 5, 10, 20.and 50. Note that this chart only handles shrinking standard deviations for sample sizes of 10, 20 , 30, and 50.
16
Variance
O C Cha rt for s 2 ch art
1
proba bility of n ot detecting th e s hift
0. 9
0. 8
0. 7
0. 6
0. 5
2
0. 4
0. 3
3
50
0. 2
30
0. 1
20
5
10
0
0
1
2
3
s ig ma s hift
4
5
6
This first chart shows the operating characteristics of the variance chart against a normal distribution of mean ten and standard distribution of one for sample
sizes of 2, 3, 5, 10, 20, 30 and 50. This chart handles shrinking variances down to a sample size of five. If the Monte Carlo simulation had a smaller sigma shift
size than 0.1 this chart may have handled a sample size of three.
17
Comparative Performance
The next three graphs show the comparative performance of the range, standard deviation and variance
charts against the three distributions for three sample sizes.
18
Range
O C for rang e ch art
1
b lack = no rma l
blu e = unifo rm
red = exp on ential
prob ab ilit y of n ot d et ecting t he s h ift
0. 9
0. 8
0. 7
0. 6
0. 5
0. 4
0. 3
0. 2
0. 1
0
0
1
2
3
4
s igma s h ift
5
6
7
8
This graph shows the relative performance of the range chart against the normal, uniform and exponential distributions for sample sizes of 2, 5 and 20.
19
Standard Deviation
OC fo r s ch art
1
blac k = normal
b lu e = u niform
red = e xpo nen tial
proba bility of n ot det ect ing th e s hift
0. 9
0. 8
0. 7
2
0. 6
0. 5
0. 4
2
0. 3
2
5
0. 2
0. 1
20
20
20
5
5
0
0
1
2
3
4
s ig ma s hift
5
6
7
8
This graph shows the relative performance of the standard deviation chart against the normal, uniform and exponential distributions for sample sizes of 2, 5 and
20.
20
Variance
OC for s 2 ch art
1
b lack = no rma l
blu e = unifo rm
red = exp on ential
prob ab ilit y of n ot d et ecting t he s h ift
0. 9
0. 8
2
0. 7
0. 6
0. 5
2
0. 4
2
0. 3
5
0. 2
20
20
20
0. 1
5
5
0
0
1
2
3
4
s igma s h ift
5
6
7
8
This graph shows the relative performance of the variance chart against the normal, uniform and exponential distributions for sample sizes of 2, 5 and 20.
21
Conclusion
This section documents the conclusions reached based on the results presented in the previous section.
X-bar chart conclusion
The result from the comparative performance of the X-Bar chart based on the standard deviation against the
three distributions shows that the X-Bar chart based on the standard deviation is relatively insensitive to the
underlying distribution. The largest differences seen in that chart are from the extreme distribution, the
exponential, at low sample sizes. This leads strong credence to the belief that the Central Limit Theorem is
a strong actor in the X-Bar chart.
Dispersion chart conclusions
The individual operating characteristic graphs show that the variance chart has the best performance for
narrowing distributions and that the standard deviation graph has the best performance for widening
distributions.
The comparative performance graphs show that all three dispersion charts are very sensitive to the
underlying distribution.
22
Appendix One: Matlab scripts
Matlab scripts generated during this project.
New Matlab scripts
OC_curve.m
subgroup_sizes = [2 5 30];
%subgroup_sizes = [2 3 5 10 20 30 50];
points = 10000;
group_result = [];
for subgroup_size = subgroup_sizes
%
subgroup_size
% generate the baseline 30 points
%baseline = unifrnd(10-sqrt(3),10+sqrt(3),subgroup_size,100);
baseline = normrnd(10,1,subgroup_size,100);
% baseline = exprnd(1,subgroup_size,100)+9;
baseline_means = mean(baseline);
baseline_grand_ave = mean(baseline_means');
baseline_ranges = max(baseline) - min(baseline);
baseline_ave_range = mean(baseline_ranges');
baseline_stds = std(baseline);
baseline_ave_std = mean(baseline_stds');
%xbar_UCL_range = baseline_grand_ave + 0.58 * baseline_ave_range;
%xbar_LCL_range = baseline_grand_ave - 0.58 * baseline_ave_range;
xbar_UCL_std = baseline_grand_ave + A3(subgroup_size)*
baseline_ave_std;
xbar_LCL_std = baseline_grand_ave - A3(subgroup_size)*
baseline_ave_std;
shifts = [0:.1:4 5 6];
mid_result = spc_shifter(shifts, [xbar_LCL_std xbar_UCL_std], [1 10 1],
[subgroup_size points]);
group_result = [group_result; mid_result];
mid_result = spc_shifter(shifts, [xbar_LCL_std xbar_UCL_std], [2 10 1],
[subgroup_size points]);
group_result = [group_result; mid_result];
mid_result = spc_shifter(shifts, [xbar_LCL_std xbar_UCL_std], [3 10 1],
[subgroup_size points]);
group_result = [group_result; mid_result];
end
23
%
plot(shifts, group_result);
for gag = [1:3:9]
plot(shifts, group_result(gag,:),'k:', ...
shifts,group_result(gag+1,:),'b--', ...
shifts, group_result(gag+2,:),'r-');
hold on
end
title('OC Chart for X-bar based on s');
xlabel('sigma shift');
ylabel('probability of not detecting the shift');
legend('black = normal','blue = uniform','red
= exponential',0)
hold off
OC_dis_curve.m
subgroup_sizes = [2 5 20];
points = 10000;
method = 1;
dist = 1;
% number of Monte Carlo points
% 1 = range, 2=std, 3=var
% 1 = normal, 2 = uniform, 3 = exponential
group_result = [];
for subgroup_size = subgroup_sizes
subgroup_size
if dist == 1
baseline = normrnd(10,1,subgroup_size,100);
elseif dist == 2
baseline = unifrnd(10-sqrt(3),10+sqrt(3),subgroup_size,100);
elseif dist == 3
baseline = exprnd(1,subgroup_size,100)+9;
end
if method == 1
baseline_ranges = max(baseline) - min(baseline);
baseline_ave_range = mean(baseline_ranges');
UCL = baseline_ave_range * bD4(subgroup_size);
LCL = baseline_ave_range * bD3(subgroup_size);
elseif method == 2
baseline_stds = std(baseline);
baseline_ave_std = mean(baseline_stds');
UCL = baseline_ave_std * B4(subgroup_size);
LCL = baseline_ave_std * B3(subgroup_size);
elseif method == 3
baseline_var = var(baseline);
baseline_ave_var = mean(baseline_var);
UCL = baseline_ave_var * chi2inv((1-(1-0.9974)/2),subgroup_size 1) / (subgroup_size - 1);
LCL = baseline_ave_var * chi2inv((1-0.9974)/2,subgroup_size - 1)
/ (subgroup_size - 1);
end
24
shifts = [.1:.1:3 3.5:.5:8];
mid_result = spc_dis_shifter(shifts, [LCL UCL], [1 10 1],
[subgroup_size points], method);
group_result = [group_result; mid_result];
mid_result = spc_dis_shifter(shifts, [LCL UCL], [2 10 1],
[subgroup_size points], method);
group_result = [group_result; mid_result];
mid_result = spc_dis_shifter(shifts, [LCL UCL], [3 10 1],
[subgroup_size points], method);
group_result = [group_result; mid_result];
end
for gag = [1:3:9]
plot(shifts, group_result(gag,:),'k:', ...
shifts,group_result(gag+1,:),'b-', ...
shifts, group_result(gag+2,:),'r--');
hold on
end
if method == 1
title('OC for range chart');
elseif method == 2
title('OC for s chart');
elseif method == 3
title('OC for s2 chart');
end
xlabel('sigma shift');
ylabel('probability of not detecting the shift');
legend('black = normal','blue = uniform','red
= exponential',0)
hold off
Spc_shifter.m
function [result] = spc_shifter(shifts, cntl_lim, dist, sizes)
% function [result] = spc_shifter(shifts, cntl_lim, dist)
%
% shifts = [ a row vector of how far in nominal std dev/var to shift
the mean ]
% cntl_lim = [LCL UCL]
% dist = [ 1=normal, 2=uniform, 3=exponential mean std_dev ]
%
% result = [ a row vector of the Monte Carlo probability of not
detecting the
%
shifts]
[m n] = size(shifts);
result = [];
dist
for shift = shifts
25
% generate shifted distribution
if dist(1) == 1
shift_dist = normrnd(dist(2) + shift, dist(3), sizes(1),
sizes(2));
elseif dist(1) == 2
shift_dist = unifrnd(dist(2)-sqrt(3*dist(3)) + shift,
dist(2)+sqrt(3*dist(3)) + shift, sizes(1), sizes(2));
elseif dist(1) == 3
shift_dist = exprnd(1, sizes(1), sizes(2))+ dist(2) + shift - 1;
end
shift_means = mean(shift_dist);
counts = sum(1*((shift_means < cntl_lim(1)) | (cntl_lim(2) <
shift_means)));
counts = 1 - counts/sizes(2);
result = [result counts];
end
Spc_dis_shifter.m
function [result] = spc_dis_shifter(shifts, cntl_lim, dist, sizes,
method)
% function [result] = spc_dis_shifter(shifts, cntl_lim, dist)
%
% shifts = [ a row vector of how far in nominal std dev/var to shift
the mean ]
% cntl_lim = [LCL UCL]
% dist = [ 1=normal, 2=uniform, 3=exponential mean std_dev ]
%
% result = [ a row vector of the Monte Carlo probability of not
detecting the
%
shifts]
% method = 1 = range, 2 = std, 3 = var
result = [];
for shift = shifts
% generate shifted distribution
if dist(1) == 1
shift_dist = normrnd(dist(2), shift * dist(3), sizes(1),
sizes(2));
elseif dist(1) == 2
shift_dist = unifrnd(dist(2)-sqrt(3*dist(3)*shift),
dist(2)+sqrt(3*dist(3)*shift), sizes(1), sizes(2));
elseif dist(1) == 3
shift_dist = exprnd(dist(3)*shift, sizes(1), sizes(2))+ dist(2) +
10 - dist(3)*shift;
end
if method == 1
center = max(shift_dist)-min(shift_dist);
elseif method == 2
26
center = std(shift_dist);
elseif method == 3
center = var(shift_dist);
end
counts = sum(1*((center < cntl_lim(1)) | (cntl_lim(2) < center)));
counts = 1 - counts/sizes(2);
result = [result counts];
end
d2.m
function y = d2(num)
% function y = d2(num)
%num
r = [2:1:25 30:5:45 50:10:100];
d2m = [ 1.128, 1.693, 2.059, 2.326, ...
2.534, 2.704, 2.847, 2.970, 3.078, ...
3.173, 3.258, 3.336, 3.407, 3.472, ...
3.532, 3.588, 3.640, 3.689, 3.735, ...
3.778, 3.819, 3.858, 3.895, ...
3.931, 4.086, 4.213, 4.322, 4.415 ...
4.498, 4.639, 4.755, 4.854, 4.939, ...
5.015];
index = find (r == num);
y = d2m(index);
d3.m
function y = d3(num)
% function y = d3(num)
%num
r = [2:1:25 30:5:45 50:10:100];
d3m = [.8525 .8884 .8798 .8641 ...
.8480 .8332 .8198 .8078 .7971
.7873 .7785 .7704 .7630 .7562
.7499 .7441 .7386 .7335 .7287
.7242 .7199 .7159 .7121 ...
.7084 .6927 .6799 .6692 .6601
.6521 .6389 .6283 .6194 .6118
.6052];
...
...
...
...
...
index = find (r == num);
y = d3m(index);
27
bD3.m
function y = bD3(num, sigma)
% function y = bD3(num)
%
% num = the sample size
% sigma (opt) = the number of standard deviations to put the control
%
limit
if nargin < 2
sigma = 3.0;
end
if isempty(sigma)
sigma = 3.0;
end
temp = num >= 7;
y = temp .* (1 - sigma .* d3(num)./d2(num));
function y = D4(num, sigma)
% function y = D4(num)
%
% num = the sample size
% sigma (opt) = the number of standard deviations to put the control
%
limit
if nargin < 2
sigma = 3.0;
end
if isempty(sigma)
sigma = 3.0;
end
y = 1 + sigma .* d3(num)./d2(num);
A2.m
function y = A2(num, sigma)
% function y = A2(num)
%
% num = the sample size
% sigma (opt) = the number of standard deviations to put the control
%
limit
if nargin < 2
sigma = 3.0;
end
if isempty(sigma)
sigma = 3.0;
end
28
y = sigma ./ (d2(num) .* sqrt(num));
A3.m
function y = A3(num, sigma)
% function y = A3(num)
%
% num = the sample size
% sigma (opt) = the number of standard deviations to put the control
%
limit
if nargin < 2
sigma = 3.0;
end
if isempty(sigma)
sigma = 3.0;
end
y = sigma ./ (c4(num) .* sqrt(num));
A4.m
function y = A4(num, sigma)
% function y = A4(num)
%
% num = the sample size
% sigma (opt) = the number of standard deviations to put the control
%
limit
if nargin < 2
sigma = 3.0;
end
if isempty(sigma)
sigma = 3.0;
end
y = sigma ./ (d4(num) .* sqrt(num));
B3.m
function y = B3(num, sigma)
% function y = B3(num)
%
% num = the sample size
% sigma (opt) = the number of standard deviations to put the control
%
limit
if nargin < 2
sigma = 3.0;
end
29
if isempty(sigma)
sigma = 3.0;
end
temp = (num >= 6);
y = temp .* (1.0 - sigma .* sqrt(1 - c4(num) .* c4(num))./ c4(num));
B4.m
function y = B4(num, sigma)
% function y = B4(num, sigma)
%
% num = the sample size
% sigma (opt) = the number of standard deviations to put the control
%
limit
if nargin < 2
sigma = 3.0;
end
if isempty(sigma)
sigma = 3.0;
end
y = 1.0 + sigma .* sqrt(1 - c4(num) .* c4(num))./ c4(num);
c2.m
function result = c2(num)
% result = c2(num)
%
% this is the small c2 constant for SPC
result = sqrt(2./num) .* gamma(num./2)./gamma((num-1)./2);
c4.m
function result = c4(num)
% result = c4(num)
%
% this is the small c4 constant for SPC
result = c2(num) .* sqrt(num ./(num-1));
30
Modified Matlab scripts
Xbarplot2.m
function [outliers, h] = xbarplot2(data,conf,specs)
%XBARPLOT X-bar chart for monitoring the mean.
%
XBARPLOT(DATA,CONF,SPECS) produces an xbar chart of
%
the grouped responses in DATA. The rows of DATA contain
%
replicate observations taken a a given time. The rows
%
should be in time order.
%
%
CONF (optional) is the confidence level of the upper and
%
lower plotted confidence limits. CONF is 0.99 by default.
%
This means that 99% of the plotted points should fall
%
between the control limits.
%
%
SPECS (optional) is a two element vector for the lower and
%
upper specification limits of the response.
%
%
OUTLIERS = XBARPLOT(DATA,CONF,SPECS) returns a vector of
%
indices to the rows where the mean of DATA is out of control.
%
%
[OUTLIERS, H] = XBARPLOT(DATA,CONF,SPECS) also returns a vector
%
of handles, H, to the plotted lines.
%
%
%
B.A. Jones 9/30/94
Copyright (c) 1993-98 by The MathWorks, Inc.
$Revision: 2.6 $ $Date: 1997/11/29 01:47:10 $
if nargin < 2
conf = 0.9974;
end
if isempty(conf)
conf = 0.9974;
end
[m,n] = size(data);
if m > 30
m = 30;
end
xbar = mean(data(1:m,:)')';
avg
= mean(xbar);
s
= sqrt(sum(sum(((data(1:m,:) - xbar(:,ones(n,1))).^2)))./(m*(n1)));
tinverse = tinv(conf,m*(n-1));
UCL = avg + tinverse*s./sqrt(n-1);
LCL = avg - tinverse*s./sqrt(n-1);
tmp = NaN;
31
[m,n] = size(data);
xbar = mean(data')';
incontrol = tmp(1,ones(1,m));
outcontrol = incontrol;
greenpts = find(xbar > LCL & xbar < UCL);
redpts = find(xbar <= LCL | xbar >= UCL);
incontrol(greenpts) = xbar(greenpts);
outcontrol(redpts) = xbar(redpts);
samples = (1:m);
hh = plot(samples,xbar,'k-',samples,UCL(ones(m,1),:),'r',samples,avg(ones(m,1),:),'g-',...
samples,LCL(ones(m,1),:),'r-',samples,incontrol,'b+',...
samples,outcontrol,'r+');
if any(redpts)
for k = 1:length(redpts)
text(redpts(k) + 0.5,outcontrol(redpts(k)),num2str(redpts(k)));
end
end
whitebg(gcf,'white');
t1 = text(0.5,UCL,'UCL','Color','r');
t2 = text(0.5,LCL,'LCL','Color','r');
title('Xbar Chart','Color','w');
if nargin == 3
set(gca,'NextPlot','add');
LSL = specs(1);
USL = specs(2);
t3 = text(m + 0.5,USL,'USL','Color','r');
t4 = text(m + 0.5,LSL,'LSL','Color','r');
hh1 = plot(samples,LSL(ones(m,1),:),'g',samples,USL(ones(m,1),:),'g-');
set(gca,'NextPlot','replace');
hh = [hh; hh1];
end
if nargout > 0
outliers = redpts;
end
if nargout == 2
h = hh;
end
set(hh([3 5 6]),'LineWidth',2);
xlabel('Samples');
ylabel('Measurements');
32
Schart2.m
function [outliers, h] = schart2(data,conf,specs)
%SCHART S chart for monitoring the standard deviation.
%
SCHART(DATA,CONF,SPECS) produces an S chart of
%
the grouped responses in DATA. The rows of DATA contain
%
replicate observations taken a a given time. The rows
%
must be in time order.
%
%
CONF (optional) is the confidence level of the upper and
%
lower plotted confidence limits. CONF is 0.99 by default.
%
This means that 99% of the plotted points should fall
%
between the control limits.
%
%
SPECS (optional) is a two element vector for the lower and
%
upper specification limits of the response.
%
%
OUTLIERS = SCHART(DATA,CONF,SPECS) returns a vector of
%
indices to the rows where the standard deviation of DATA is
%
out of control.
%
%
[OUTLIERS, H] = SCHART(DATA,CONF,SPECS) also returns a vector
%
of handles, H, to the plotted lines.
%
%
Reference: Montgomery, Douglas, Introduction to Statistical
Quality Control, John Wiley & Sons 1991 p. 235.
%
%
%
B.A. Jones 2-13-95
Copyright (c) 1993-98 by The MathWorks, Inc.
$Revision: 2.6 $ $Date: 1997/11/29 01:46:43 $
if nargin < 2
conf = 0.9974;
end
if isempty(conf)
conf = 0.9974;
end
ciprob = 1-(1-conf)/2;
[m,n] = size(data);
if m > 30
m = 30;
end
xbar
s
sbar
= mean(data(1:m,:)')';
= (std(data(1:m, :)'))';
= mean(s);
c4 = sqrt(2/(n-1)).*gamma(n/2)./gamma((n-1)/2);
cicrit = tinv(ciprob,n-1)
b3 = 1 - cicrit*sqrt(1-c4*c4)/c4;
b4 = 1 + cicrit*sqrt(1-c4*c4)/c4;
%chi2crit = chi2inv([(1-conf)/2 1-(1-conf)/2],n-1);
%sigmaci = sbar*sqrt((n-1)./chi2crit)
33
LCL = b3*sbar;
if LCL < 0, LCL = 0; end
UCL = b4*sbar;
tmp = NaN;
[m,n] = size(data);
s = std(data')';
incontrol = tmp(1,ones(1,m));
outcontrol = incontrol;
greenpts = find(s > LCL & s < UCL);
redpts = find(s <= LCL | s >= UCL);
incontrol(greenpts) = s(greenpts);
outcontrol(redpts) = s(redpts);
samples = (1:m);
hh = plot(samples,s,'k-',samples,UCL(ones(m,1),:),'r',samples,sbar(ones(m,1),:),'g-',...
samples,LCL(ones(m,1),:),'r-',samples,incontrol,'b+',...
samples,outcontrol,'r+');
if any(redpts)
for k = 1:length(redpts)
text(redpts(k) + 0.5,outcontrol(redpts(k)),num2str(redpts(k)));
end
end
whitebg(gcf,'white');
t1 = text(0.5,UCL,'UCL','Color','k');
t2 = text(0.5,LCL,'LCL','Color','k');
title('S Chart','Color','k');
if nargin == 3
set(gca,'NextPlot','add');
LSL = specs(1);
USL = specs(2);
t3 = text(m + 0.5,USL,'USL','Color','k');
t4 = text(m + 0.5,LSL,'LSL','Color','k');
hh1 = plot(samples,LSL(ones(m,1),:),'g',samples,USL(ones(m,1),:),'g-');
set(gca,'NextPlot','replace');
hh = [hh; hh1];
end
if nargout > 0
outliers = redpts;
end
if nargout == 2
h = hh;
end
34
set(hh([3 5 6]),'LineWidth',2);
xlabel('Sample Number');
ylabel('Standard Deviation');
S2chart.m
function [outliers, h] = s2chart(data,conf,specs)
%S2CHART S2 chart for monitoring the variance.
%
SCHART(DATA,CONF,SPECS) produces an S chart of
%
the grouped responses in DATA. The rows of DATA contain
%
replicate observations taken a a given time. The rows
%
must be in time order.
%
%
CONF (optional) is the confidence level of the upper and
%
lower plotted confidence limits. CONF is 0.99 by default.
%
This means that 99% of the plotted points should fall
%
between the control limits.
%
%
SPECS (optional) is a two element vector for the lower and
%
upper specification limits of the response.
%
%
OUTLIERS = SCHART(DATA,CONF,SPECS) returns a vector of
%
indices to the rows where the standard deviation of DATA is
%
out of control.
%
%
[OUTLIERS, H] = SCHART(DATA,CONF,SPECS) also returns a vector
%
of handles, H, to the plotted lines.
%
%
Reference: Montgomery, Douglas, Introduction to Statistical
Quality Control, John Wiley & Sons 1991 p. 235.
%
%
%
B.A. Jones 2-13-95
Copyright (c) 1993-98 by The MathWorks, Inc.
$Revision: 2.6 $ $Date: 1997/11/29 01:46:43 $
if nargin < 2
conf = 0.9973;
end
if isempty(conf)
conf = 0.9973;
end
ciprob = 1-(1-conf)/2;
[m,n] = size(data);
if m > 30
m = 30;
end
xbar = mean(data(1:m,:)')';
s2
= (var(data(1:m, :)'))';
s2bar
= mean(s2);
%c4 = sqrt(2/(n-1)).*gamma(n/2)./gamma((n-1)/2);
35
%cicrit = tinv(ciprob,n-1);
%b3 = 1 - cicrit*sqrt(1-c4*c4)/c4;
%b4 = 1 + cicrit*sqrt(1-c4*c4)/c4;
%chi2crit = chi2inv([(1-conf)/2 1-(1-conf)/2],n-1);
%sigmaci = sbar*sqrt((n-1)./chi2crit)
%LCL = b3*sbar;
LCL = s2bar * chi2inv((1-conf)/2,n - 1) / (n - 1);
if LCL < 0, LCL = 0; end
%UCL = b4*sbar;
UCL = s2bar * chi2inv(ciprob,n - 1) / (n - 1);
tmp = NaN;
[m,n] = size(data);
s = std(data')';
incontrol = tmp(1,ones(1,m));
outcontrol = incontrol;
greenpts = find(s > LCL & s < UCL);
redpts = find(s <= LCL | s >= UCL);
incontrol(greenpts) = s(greenpts);
outcontrol(redpts) = s(redpts);
samples = (1:m);
hh = plot(samples,s,'k-',samples,UCL(ones(m,1),:),'r',samples,s2bar(ones(m,1),:),'g-',...
samples,LCL(ones(m,1),:),'r-',samples,incontrol,'b+',...
samples,outcontrol,'r+');
if any(redpts)
for k = 1:length(redpts)
text(redpts(k) + 0.5,outcontrol(redpts(k)),num2str(redpts(k)));
end
end
whitebg(gcf,'white');
t1 = text(0.5,UCL,'UCL','Color','k');
t2 = text(0.5,LCL,'LCL','Color','k');
title('S2 Chart','Color','k');
if nargin == 3
set(gca,'NextPlot','add');
LSL = specs(1);
USL = specs(2);
t3 = text(m + 0.5,USL,'USL','Color','k');
t4 = text(m + 0.5,LSL,'LSL','Color','k');
hh1 = plot(samples,LSL(ones(m,1),:),'g',samples,USL(ones(m,1),:),'g-');
set(gca,'NextPlot','replace');
hh = [hh; hh1];
end
36
if nargout > 0
outliers = redpts;
end
if nargout == 2
h = hh;
end
set(hh([3 5 6]),'LineWidth',2);
xlabel('Sample Number');
ylabel('Variance');
Appendix 2: Special result
Professor Tsao specifically asked about the performance of the X-Bar based on range and X-Bar based on
the standard deviation with an underlying exponential distribution. The following chart was generated to
answer his question. It shows the operating characteristic curve for both charts against an exponential
distribution at sample sizes of two, five and twenty.
OC Chart for X-bar based on range vs based on std dev
1
black = range
red = std dev
probability of not detecting the shift
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20
0
0.5
1
5
1.5
sigma shift
2
2
2.5
3
Note that there is no significance in performance between the range and standard deviation based X-Bar
charts.
37