Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inductive probability wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Statistical inference wikipedia , lookup
Law of large numbers wikipedia , lookup
German tank problem wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Time series wikipedia , lookup
Student's t-test wikipedia , lookup
SMU EMIS 7364 NTU TO-570-N Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow Analysis of Process Capability Statistical Analysis Updated: 1/28/04 1 The Situation In many situations, our knowledge is limited to the information that can be obtained from data that has been obtained or that will be obtained 2 The Problem The challenge is to obtain the maximum information from the data and to arrive at the most accurate conclusions 3 Nature of Data Most data are characterized by variation, as opposed to deterministic, due to variation in • • • • • • Processes and materials Product/Manufacturing Inspection & Measurement Operation Environment etc 4 Need Methods and techniques are needed for analysis of data that account for • Variation in the data • Uncertainty in conclusion 5 Statistics • Statistics is the science of analyzing data and drawing conclusions • Statistical methods and techniques that provide tools for: - experimental design - analysis of data - making inferences 6 7 8 9 10 11 Example The number of defects per inspected PC-X based on a random sample of 15 from a days’ production is: 1, 3, 1, 0, 2, 0, 0, 1, 1, 1, 0, 1, 2, 1, 1 (i.) Analyze these data and present your results. (ii.) Estimate the probability that a randomly selected PC will have at least 3 defects. 12 Example Twenty-five customers selected at random were asked to rate the overall satisfaction, a measure of quality, with the PC-X. Five factors were ranked by each selected customer. Each factor was assigned a rank between 1 and 10, with 10 indicating the highest level of customer satisfaction. The ratings were averaged over the five factors for each customer. The results are: 7.7 5.5 9.3 6.5 7.5 5.2 7.7 8.5 6.0 8.8 6.2 8.6 7.1 8.0 7.9 7.8 5.9 9.6 8.3 6.7 7.6 6.9 7.3 9.1 7.8 (i.) Analyze the survey data and present your results. (ii.) Estimate the proportion of the customer population whose average satisfaction rating is at least 9. 13 Example Solution a. X = number of defects per… Since X represents a count, it is a discrete random variable. (i) Sample mean = x = 15/15 = 1 Sample mode = 1 Sample median = x0.5 = 1 Ratio of mean to median = 1/1 = 1 Sample range = Xmax - Xmin = 3 - 0 = 3 Sample variance = s2 = 0.714286 Sample standard deviation = 0.845154 14 Example Solution Histogram Frequency 10 8 6 4 2 0 0 1 2 3 4 5 x 15 Example Solution The sample could be from a Poisson distribution with probability mass function μ x e μ px x! for x = 0, 1, 2, ... Here we estimate as ^ μ x 1 so that e 1 px x! ^ for x = 0, 1, 2, ... 16 Example Solution Then we compare px with the discrete relative frequency distribution as follows ^ x frequency f/n px ^ 0 4 0.26667 0.36788 1 8 0.53333 0.36788 2 2 0.13333 0.18394 3 1 0.06667 0.06131 4 0 0 0.01533 17 Example Solution (ii) We can estimate P(X 3) as follows 1 pX 3 15 0.067 ^ or using the Poisson distribution pX 3 1 pX 2 ^ ^ 1 p0 p1 p2 ^ ^ ^ 0.0803 18 Example Solution b. (i)Sample data analysis indicates that the sample may be from a normal distribution, N(, ). Estimates of and are ^ μx 7.5 and n 1 σs n ^ 1.1562 19 Example Solution Data analysis: X = 7.5 X0.5 = 7.7 X/X0.5 = 0.974 R = 4.4 s2 = 1.3925 x = 1.18004 20 Example Solution Histogram 12 Frequency 10 8 6 4 2 0 5 6 7 8 9 x 21 Example Solution (ii) By using the Normal distribution 9 7.5 Px 9 P Z 1.1562 ^ PZ 1.2973 1 0.901475 0.098525 Or, by using the Binomial distribution # values 9 3 Px 9 0.12 n 25 ^ 22 • Basic Concepts • Analysis of Location, or Central Tendency • Analysis of Variability • Analysis of Shape 23 Population vs. Sample Population the total of all possible values (measurement, counts, etc.) of a particular characteristic for a specific group of objects. Sample a part of a population selected according to some rule or plan. Why sample? - Population does not exist - Sampling and testing is destructive 24 Sampling Characteristics that distinguish one type of sample from another: • the manner in which the sample was obtained • the purpose for which the sample was obtained 25 Types of Samples Simple Random Sample The sample X1, X2, ... ,Xn is a random sample if X1, X2, ... , Xn are independent identically distributed random variables. Remark: Each value in the population has an equal and independent chance of being included in the sample. 26 Analysis of Data • Data represents the entire population Statistical analysis is primarily descriptive. • Data represents sample from population Statistical analysis - describes the sample - provides information about the population 27 Analysis of Location or Central Tendency • Sample (Arithmetic) Mean • Sample Midrange • Sample Mode • Sample Median • Sample Percentiles 28 Sample Mean 1 n • Formula: x x i n i 1 • Remarks: Most frequently used statistic Easy to understand May be misleading due to extreme values 29 Sample Mode • Definition: Most frequently occurring value in the sample • Remarks: A sample may have more than one mode The mode may not be a central value Not well understood, nor frequently used 30 Sample Median xk Formula: , if n is odd & K = (n+1)/2 x 0.5 x k x k 1 , if n is even & K = n/2 2 where the sample values X1, X2, ... , Xn are arranged in numerical order • Remarks: Not well understood, nor accepted All sample data does not appear to be utilized Not affected by extreme values 31 Analysis of Variability • Sample Range • Sample Variance • Sample Standard Deviation • Sample of Coefficient of Variation 32 Sample Range • Formula: R = Xmax - Xmin where Xmax is the largest value in the sample and Xmin is the smallest sample value • Remarks: Easy to determine Easily understood Determined by extreme values Does not use all sample data 33 Sample Variance & Standard Deviation • Sample Variance n 1 2 s n 1 i 1 2 n x i x i 2 i 1 i 1 xi x n n 1 n n 2 • Sample Standard Deviation s = (sample variance)1/2 • Remarks Most frequently used measure of variability Not well understood 34 Sample Coefficient of Variation • Sample Variance CVs s x • Remarks Relative measure of variation Used for comparing the variation in two samples of data that are measured in two different units 35 Analysis of Shape • Skewness • Kurtosis 36 Estimate of Skewness x xr x0 . 5 For a unimodal distribution, xr is an indicator of distribution shape << 1 , indicates skewed to the left xr ≈1 , indicates symmetric >> 1 , indicates skewed to the right 37 Comparison of Distribution Skewness •Normal 1 0 • Exponential 1 2 38 Estimation of Skewness • Estimate of skewness of a distribution from a random sample ^ 1 b1 m3 /( m 2 )3 / 2 where n 1 m2 xi x n i 1 and 2 n 1 m3 x i x n i 1 3 1 n x xi n i 1 39 Estimation of Kurtosis • Estimate of kurtosis of a distribution (2) from a random sample ^ 2 b2 m4 /(m2 ) 2 where n 1 m2 xi x n i 1 and 2 n 1 m4 xi x n i 1 4 1 n x xi n i 1 40 Comparison of Kurtosis f(x) 1.4 2= 3 (normal distribution) 1.2 2= 1.8 (uniform distribution) 1.0 0.8 0.6 0.4 0.2 0 -0.5 0 0.5 1.0 1.5 41 Presentation of Data 42 • Time Series Graph or Run Chart • Stem-and-Leaf Plot • Digidot Plot • Box Plot • Frequency Distribution • Histogram and Relative Frequency 43 Time Series Graph or Run Chart • A plot of the data set x1, x2, …, xn in the order in which the data were obtained • Used to detect trends or patterns in the data over time 44 Stem-and-Leaf Plots • A quick way to obtain an informative visual representation of the set of data x1, x2, …, xn for which each xi consists of at least two digits • Steps for constructing a stem-and-leaf display (1) Select one or more leading digits for the stem values. The trailing digits become the leaves. (2) List possible stem values in a vertical column. (3) Record the leaf for every observation beside the corresponding stem value. (4) Indicate the units for stems and leaves someplace in the display • The stem and leaf display does not take the time order of the observed data into account 45 Stem-and-Leaf Plot - Example Here are test scores for 25 students: 69, 55, 80, 95, 94, 98, 51, 70, 93, 57, 62, 52, 52 58, 61, 51, 64, 67, 78, 68, 69, 68, 96, 73, 71 The first step is to place the numbers in order from least to greatest: 51, 51, 52, 52, 55, 57, 58, 61, 62, 64, 67, 68, 68, 69, 69, 70, 71, 73, 78, 80, 93, 94, 95, 96, 98 46 Stem-and-Leaf Plot - Example Now create the graph: Test Scores 5 1122578 6 12478899 7 0138 8 0 9 34568 The numbers on the left side of the vertical line are the stems. The numbers on the right side are the leaves. In this graph, the stems are the tens digits and the leaves are the unit digits. In this case, 9|3 represents a score of 93. 47 Digidot Plot A combination of the time series graph with the stem and leaf display 48 Box Plot • A pictorial summary used to describe the most prominent statistical features of the data set, x1, x2, …, xn, including its: - Center or location - Spread or variability - Extent and nature of any deviation from symmetry - Identification of ‘outliers’ 49 Box Plot • Shows only certain statistics rather than all the data, namely - median - quartiles - smallest and greatest values in the distribution • Immediate visuals of a box plot are the center, the spread, and the overall range of distribution 50 Box Plot Given the following random sample of size 25: 38, 10, 60, 90, 88, 96, 1, 41, 86, 14, 25, 5, 16, 22, 29, 34, 55, 36, 37, 36, 91, 47, 43, 30, 98 Arranged in order from least to greatest: 1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36, 37, 38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98 51 Box Plot • First, find the median, the value exactly in the middle of an ordered set of numbers. The median is 37 • Next, we consider only the values to the left of the median: 1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36 • We now find the median of this set of numbers. The median for this group is (22 + 25)/2 = 23.5, which is the lower quartile. 52 Box Plot • Now consider the values to the right of the median. 38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98 The median for this set is (60 + 86)/2 = 73, which is the upper quartile. • We are now ready to find the interquartile range (IQR), which is the difference between the upper and lower quartiles, 73 - 23.5 = 49.5 49.5 is the interquartile range 53 Box Plot The The The The lower quartile 23.5 median is 37 upper quartile 73 interquartile range is 49.5 lower extreme 0 lower quartile median upper quartile upper extreme 10 20 30 40 50 60 70 80 90 100 54 Histogram A graph of the observed frequencies in the data set, x1, x2, …, xn versus data magnitude to visually indicate its statistical properties, including - shape - location or central tendency - scatter or variability 55 Guidelines for Constructing Histograms • If the data x1, x2, …, xn are from a discrete random variable with possible values y1, y2, …, yn count the number of occurrences of each value of y and associate the frequency fi with yi, for i = 1, …, k 56 Guidelines for Constructing Histograms • If the data x1, x2, …, xn are from a continuous random variable - select the number of intervals or cells, r, to be a number between 3 and 20, as an initial value use r = (n)1/2, where n is the number of observations - establish r intervals of equal width, starting just below the smallest value of x - count the number of values of x within each interval to obtain the frequency associated with each interval - construct graph by plotting (fi, i) for i = 1, 2, …, k 57 Statistical Process Control- Histograms Possible answers for a Cliff-like histogram • Hiding data that should be outside the specification • Supplier is screening the product before shipment • Lower specification is a physical limit like zero thickness, but this is not normally the case lower spec upper spec 58 Statistical Process Control- Histograms Possible answers for a Bimodal histogram • Two primary sources of process variation • The process is stable, but it has experienced a large shift during the time the data were collected lower spec upper spec 59 Statistical Process Control - Histograms Possible answers for a Comb-like histogram • Insufficient data collected • Too many classes displayed • Process is unstable • Process is stable but is multimodal lower spec upper spec 60 Statistical Process Control - Histograms Possible answers for a Skewed histogram • May be the natural result of the process • For a machined part, the equipment may be losing tolerance or tools may be wearing out • The process is shifting slowly to the side with the long tail lower spec upper spec 61 Statistical Process Control - Histograms By including specification limits on a histogram, the amount of data that falls outside of the specification limits can be easily seen specification frequency lower spec upper spec 62 Probability Plotting • Data are plotted on special graph paper designed for a particular distribution - Normal - Weibull - Lognormal - Exponential • If the assumed model is adequate, the plotted points will tend to fall in a straight line • If the model is inadequate, the plot will not be linear and the type & extent of departures can be seen • Once a model appears to fit the data reasonably will, percentiles and parameters can63 be estimated from the plot Probability Plotting Procedure • Step 1: Obtain special graph paper, known as probability paper, designed for the distribution under examination. Weibull, Lognormal and Normal paper are available at: http://www.weibull.com/GPaper/index.htm • Step 2: Rank the sample values from smallest to largest in magnitude i.e., X1 X2 ..., Xn. 64 Probability Plotting General Procedure • Step 3: i 0.3 Plot the Xi’s on the paper versus F(x ) 100 n 0.4 or i 0. 3 F( x ) n 0.4 , depending on whether the marked axis on the paper refers to the % or the proportion of observations. The axis of the graph paper on which the Xi’s are plotted will be referred to as the observational scale, and the axis for i 0.3 F( x ) 100 as the cumulative scale. n 0.4 ^ i ^ i ^ i • Step 4: If a straight line appears to fit the data, draw a line on the graph, ‘by eye’. • Step 5: Estimate the model parameters from the graph. 65 Weibull Probability Plotting Paper If T ~ Wβ, θ, the cumulative probability distribution function is F(t ) 1 e t We now need to linearize this function into the form y = ax +b: 66 Weibull Probability Plotting Paper Then x ln 1 F(T ) ln e x ln 1 F(T ) x ln ln 1 F(T) ln 1 ln x ln ln ln 1 F(T ) which is the equation of a straight line of the form y = ax +b, 67 Weibull Probability Plotting Paper 1 , where y ln ln 1 F( t ) a x ln t and b ln , i.e., 68 Weibull Probability Plotting Paper y x ln , which is a linear equation with a slope of and an intercept of ln . Now the x- and y-axes of the Weibull probability plotting paper can be constructed. The x-axis is simply logarithmic, since x = ln(T) and 1 , y ln ln 1 F( t ) 69 Weibull Probability Plotting Paper cumulative probability (in %) x 70 Probability Plotting - example To illustrate the process let 10, 20, 30, 40, 50, and 80 be a random sample of size n = 6. 71 Probability Plotting - example We need value estimates corresponding to each of the sample values in order to plot the data on the Weibull probability paper. These estimates are accomplished with what are called median ranks. 72 Probability Plotting - example Median ranks represent the 50% confidence level (“best guess”) estimate for the true value of F(t), based on the total sample size and the order number (first, second, etc.) of the data. 73 Probability Plotting - example There is an approximation that can be used to estimate median ranks, called Benard’s approximation. It has the form: i 0.3 F̂x i MR i n 0.4 where n is the sample size and i is the sample order number. Tables of median ranks can be found in may statistics and reliability texts. 74 Probability Plotting - example Based on Benard’s approximation, we can now ^ calculate F(t) for each observed value of X. These are shown in the following table: x 10 20 30 40 50 80 ^ F(x) 10.9% 26.6% 42.2% 57.8% 73.4% 89.1% For example, for x=20, F̂20 2 0.3 *100% 6 0.4 26.6% 75 Probability Plotting- example Now that we have y-coordinate values to go with the x-coordinate sample values so we can plot the x , F̂x̂ points on Weibull probability paper. ^ F(x) (in %) x 76 Probability Plotting- example The line represents the estimated relationship between x and F(x): ^ F(x) (in %) x 77 Probability Plotting - example In this example, the points on Weibull probability paper fall in a fairly linear fashion, indicating that the Weibull distribution provides a good fit to the data. If the points did not seem to follow a straight line, we might want to consider using another probability distribution to analyze the data. 78 Probability Plotting - example 79 Probability Plotting - example 80 Probability Paper - Normal 81 Probability Paper - Lognormal 82 Probability Paper - Exponential 83 Probability Plotting Exercise Given the following random sample of size n=8, which probability distribution provides the best fit? i 1 2 3 4 5 6 7 8 xi 79.4 88.1 91.1 98.7 104.2 105.1 106.5 112.0 84 40 Specimens 40 specimens are cut from a plate for tensile tests. The tensile tests were made, resulting in Tensile Strength, x, as follows: i 1 2 3 4 5 6 7 8 9 10 x 48.5 54.7 47.8 56.9 54.8 57.9 44.9 53.0 54.7 46.7 i 11 12 13 14 15 16 17 18 19 20 x 55.0 55.7 49.9 54.8 49.7 58.9 52.7 57.8 46.8 49.2 i 21 22 23 24 25 26 27 28 29 30 x 53.1 49.1 55.6 46.2 52.0 56.6 52.9 52.2 54.1 42.3 i 31 32 33 34 35 36 37 38 39 40 x 54.6 49.9 44.5 52.9 54.4 60.2 50.2 57.4 54.8 61.2 Perform a statistical analysis of the tensile strength data. 85 5/12/2017 40 Specimens Time Series plot: 65.0 60.0 55.0 50.0 45.0 40.0 35.0 30.0 0 5 10 15 20 25 30 35 40 By visual inspection of the scatter plot, there seems to be no trend. 86 40 Specimens Using the descriptive statistics function in excel, the following were calculated: Descriptive Statistics Count Sum Mean Standard Error Median Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum 40 2104.82 52.62 0.70 53.03 4.45 19.83 -0.39 -0.34 18.84 42.35 61.18 87 40 Specimens Using the histogram feature of excel the following data was calculated: Bin 40 45 50 55 60 More Frequency 0 3 10 16 9 2 and the graph: Histogram of Tensile Strengths 18 16 14 12 10 8 6 4 2 0 40 45 50 55 60 More 88 40 Specimens Box Plot The The The The lower quartile 49.45 median is 53.03 upper quartile 55.3 interquartile range is 5.86 lower extreme 40 lower quartile 45 50 upper quartile median 55 average upper extreme 60 65 89 40 Specimens Normal Probability Plot 99.90% 99% 95% 90% 80% 70% 60% 50% 40% 30% 20% 10% 5% 1% 0.10% 40 45 50 55 60 65 90 40 Specimens LogNormal Probability Plot 99.90% 99% 95% 90% 80% 70% 60% 50% 40% 30% 20% 10% 5% 1% 0.10% 10 100 91 40 Specimens Weibull Probability Plot 99.90% 99% 95% 90% 80% 70% 60% 50% 40% 30% 20% 10% 5% 3% 2% 1% 0.50% 0.30% 0.20% 0.10% 41 44 48 52 56 61 92 40 Specimens From looking at the Histogram and the Normal Probability Plot, we see that the tensile strength can be estimated by a normal distribution. The tensile strength distribution can be estimated by X ~ Nμ^ 52.62, σ^ 4.45 1 ^ F(x) 0.8 0.6 0.4 ^ f(x) 0.2 0 49 50 51 52 53 54 55 93