Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Systems Engineering Program Department of Engineering Management, Information and Systems EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering 1 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 •Time Series Graph or Run Chart • Box Plot • Histogram and Relative Frequency Histogram • Frequency Distribution • Probability Plotting 2 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Time Series Graph or Run Chart • A plot of the data set x1, x2, …, xn in the order in which the data were obtained •Used to detect trends or patterns in the data over time 3 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Box Plot • A pictorial summary used to describe the most prominent statistical features of the data set, x1, x2, …, xn, including its: - Center or location - Spread or variability - Extent and nature of any deviation from symmetry - Identification of ‘outliers’ 4 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Box Plot • Shows only certain statistics rather than all the data, namely - median - quartiles - smallest and greatest values in the sample • Immediate visuals of a box plot are the center, the spread, and the overall range of the data 5 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Box Plot Given the following random sample of size 25: 38, 10, 60, 90, 88, 96, 1, 41, 86, 14, 25, 5, 16, 22, 29, 34, 55, 36, 37, 36, 91, 47, 43, 30, 98 Arranged in order from least to greatest: 1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36, 37, 38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98 6 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Box Plot •First, find the median, the value exactly in the middle of an ordered set of numbers. The median is 37 • Next, we consider only the values to the left of the median: 1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36 We now find the median of this set of numbers. The median for this group is (22 + 25)/2 = 23.5, which is the lower quartile. 7 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Box Plot • Now consider the values to the right of the median. 38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98 The median for this set is (60 + 86)/2 = 73, which is the upper quartile. We are now ready to find the interquartile range (IQR), which is the difference between the upper and lower quartiles, 73 - 23.5 = 49.5 49.5 is the interquartile range 8 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Box Plot The lower quartile 23.5 The median is 37 The upper quartile 73 The interquartile range is 49.5 The mean is 45.1 lower extreme 0 lower quartile median mean upper quartile upper extreme 10 20 30 40 50 60 70 80 90 100 9 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Histogram A graph of the observed frequencies in the data set, x1, x2, …, xn versus data magnitude to visually indicate its statistical properties, including Guidelines for Constructing Histograms – Discrete Data - shape - location or central tendency - scatter or variability 10 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Guidelines for Constructing Histograms – Discrete Data • If the data x1, x2, …, xn are from a discrete random variable with possible values y1, y2, …, yk count the number of occurrences of each value of y and associate the frequency fi with yi, for i = 1, …, k, k Note that f i 1 i n 11 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Guidelines for Constructing Histograms – Discrete Data • If the data x1, x2, …, xn are from a continuous random variable - select the number of intervals or cells, r, to be a number between 3 and 20, as an initial value use r = (n)1/2, where n is the number of observations - establish r intervals of equal width, starting just below the smallest value of x - count the number of values of x within each interval to obtain the frequency associated with each interval - construct graph by plotting (fi, i) for i = 1, 2, …, k 12 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Histogram and Relative Frequency Example To illustrate the construction of a relative frequency distribution, consider the following data which represent the lives of 40 car batteries of a given type recorded to the nearest tenth of a year. The batteries were guaranteed to last 3 years. 2.2 3.4 2.5 3.3 4.7 4.1 1.6 4.3 3.1 3.8 3.5 3.1 3.4 3.7 3.2 Car Battery Lives 4.5 3.2 3.3 3.8 3.6 2.9 4.4 3.2 2.6 3.9 3.7 3.1 3.3 4.1 3 3 4.7 3.9 1.9 4.2 2.6 3.7 3.1 3.4 3.5 13 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Histogram and Relative Frequency Example For this example, using the guidelines for constructing a histogram, the number of classes selected is 7 with a class width of 0.5. The frequency and relative frequency distribution for the data are shown in the following table. Relative Frequency Distribution of Battery Lives Class Class Frequency Relative interval midpoint f frequency 1.5-1.9 1.7 2 0.050 2.0-2.4 2.2 1 0.025 2.5-2.9 2.7 4 0.100 3.0-3.4 3.2 15 0.375 3.5-3.9 3.7 10 0.250 4.0-4.4 4.2 5 0.125 4.5-4.9 4.7 3 0.075 Total 40 1.000 14 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Histogram and Relative Frequency The following diagram is a relative frequency histogram of the battery lives with an approximate estimate of the probability density function superimposed. Relative frequency histogram 0.400 Relative Frequency 0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.000 1.7 2.2 2.7 3.2 3.7 4.2 4.7 Battery Lives (years) 15 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting • Data are plotted on special graph paper designed for a particular distribution - Normal - Weibull - Lognormal - Exponential • If the assumed model is adequate, the plotted points will tend to fall in a straight line • If the model is inadequate, the plot will not be linear and the type & extent of departures can be seen • Once a model appears to fit the data reasonably will, percentiles and parameters can be estimated from the plot 16 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting Procedure • Step 1: Obtain special graph paper, known as probability paper, designed for the distribution under examination. Weibull, Lognormal and Normal paper are available at: http://www.weibull.com/GPaper/index.htm • Step 2: Rank the sample values from smallest to largest in magnitude i.e., X1 X2 ..., Xn. 17 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting General Procedure • Step 3: i 0.3 Plot the Xi’s on the paper versus F ( x ) 100 n 0.4 or i 0. 3 F( x ) , depending on whether the marked axis n 0.4 on the paper refers to the % or the proportion of observations. The axis of the graph paper on which the Xi’s are plotted will be referred to as the observational scale, and the axis for i 0.3 F( x ) 100 as the cumulative scale. n 0.4 i ^ i ^ i • Step 4: If a straight line appears to fit the data, draw a line on the graph, ‘by eye’. • Step 5: Estimate the model parameters from the graph. 18 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Weibull Probability Plotting Paper If T ~ Wβ, θ the cumulative probability distribution function is F(t ) 1 e t We now need to linearize this function into the form y = ax +b 19 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Weibull Probability Plotting Paper Then ln 1 F(T ) ln e x x ln 1 F(T ) x ln ln 1 F(T) ln 1 ln x ln ln ln 1 F(T ) which is the equation of a straight line of the form y = ax +b Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 20 Weibull Probability Plotting Paper where 1 y ln ln 1 F( t ) a and x ln t b ln , i.e., 21 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Weibull Probability Plotting Paper y x ln which is a linear equation with a slope of b and an intercept of ln . Now the x- and y-axes of the Weibull probability plotting paper can be constructed. The x-axis is simply logarithmic, since x = ln(T) and 1 y ln ln 1 F( t ) 22 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Weibull Probability Plotting Paper cumulative probability (in %) x Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 23 Probability Plotting - Example To illustrate the process let 10, 20, 30, 40, 50, and 80 be a random sample of size n = 6. 24 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting - Example We need value estimates corresponding to each of the sample values in order to plot the data on the Weibull probability paper. These estimates are accomplished with what are called median ranks. 25 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting - Example Median ranks represent the 50% confidence level (“best guess”) estimate for the true value of F(t), based on the total sample size and the order number (first, second, etc.) of the data. 26 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting - Example There is an approximation that can be used to estimate median ranks, called Benard’s approximation. It has the form: i 0.3 F̂x i MR i (100%) n 0.4 where n is the sample size and i is the sample order number. Tables of median ranks can be found in may statistics and reliability texts. 27 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting - Example Based on Benard’s approximation, we can now calculate ^ F(t) for each observed value of X. These are shown in the following table: i xi ^ F(xi) 1 2 3 4 5 6 10 20 30 40 50 80 10.9% 26.6% 42.2% 57.8% 73.4% 89.1% For example, for x2=20, 2 0.3 *100% 6 0.4 26.6% F̂20 28 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Weibull Probability Plotting Paper cumulative probability (in %) x 29 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting - Example Now that we have y-coordinate values to go with the xcoordinate sample values so we can plot the x , F̂x̂ points on Weibull probability paper. ^ F(x) (in %) x Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 30 Probability Plotting - Example The line represents the estimated relationship between x and F(x): ^ F(x) (in %) x Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 31 Probability Plotting - Example In this example, the points on Weibull probability paper fall in a fairly linear fashion, indicating that the Weibull distribution provides a good fit to the data. If the points did not seem to follow a straight line, we might want to consider using another probability distribution to analyze the data. 32 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting - Example 33 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Plotting - Example 34 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Paper - Normal 35 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Paper - Lognormal 36 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Probability Paper - Exponential 37 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 Example - Probability Plotting Given the following random sample of size n=8, which probability distribution provides the best fit? i 1 2 3 4 5 6 7 8 xi 79.40968 88.12093 91.06394 98.73094 104.1536 105.1019 106.5036 112.0354 38 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens 40 specimens are cut from a plate for tensile tests. The tensile tests were made, resulting in Tensile Strength, x, as follows: 1 2 3 4 5 6 7 8 9 10 48.5 54.7 47.8 56.9 54.8 57.9 44.9 53.0 54.7 46.7 11 12 13 14 15 16 17 18 19 20 55.0 55.7 49.9 54.8 49.7 58.9 52.7 57.8 46.8 49.2 21 22 23 24 25 26 27 28 29 30 53.1 49.1 55.6 46.2 52.0 56.6 52.9 52.2 54.1 42.3 31 32 33 34 35 36 37 38 39 40 54.6 49.9 44.5 52.9 54.4 60.2 50.2 57.4 54.8 61.2 Perform a statistical analysis of the tensile strength data. 39 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens Time Series plot: 65.0 60.0 55.0 50.0 45.0 40.0 35.0 30.0 0 5 10 15 20 25 30 35 40 By visual inspection of the scatter plot, there seems to be no trend. 40 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens Using the descriptive statistics function in Excel, the following were calculated: Descriptive Statistics Count Minimum Maximum Range Sum Mean Median Sample Variance Standard Deviation Kurtosis Skewness 40 42.35 61.18 18.84 2104.82 52.62 53.03 19.83 4.45 2.51 -0.34 41 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens Using the histogram feature of excel the following data was calculated: Bin 40 45 50 55 60 More Frequency 0 3 10 16 9 2 and the graph: Histogram of Tensile Strengths 18 16 From looking at the Histogram and the Normal Probability Plot, we see that the tensile strength can be estimated by a normal distribution. 14 12 10 8 6 4 2 0 40 45 50 55 60 More 42 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens Box Plot The lower quartile 49.45 The median is 53.03 The mean 52.6 The upper quartile 55.3 The interquartile range is 5.86 lower extreme 40 median upper lower mean quartile quartile 45 50 55 upper extreme 60 65 43 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens Normal Probability Plot 99.90% 99% 95% 90% 80% 70% 60% 50% 40% 30% 20% 10% 5% 1% 0.10% 40 45 50 55 60 65 44 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens LogNormal Probability Plot 99.90% 99% 95% 90% 80% 70% 60% 50% 40% 30% 20% 10% 5% 1% 0.10% 10 100 45 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens Weibull Probability Plot 99.90% 99% 95% 90% 80% 70% 60% 50% 40% 30% 20% 10% 5% 3% 2% 1% 0.50% 0.30% 0.20% 0.10% 41 44 48 52 56 61 46 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08 40 Specimens The tensile strength distribution can be estimated by X ~ Nμ̂ 52.62, ˆ 4.45 1 ^ F(x) 0.8 0.6 0.4 ^ f(x) 0.2 0 49 50 51 52 53 54 55 47 Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08