Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
History of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Statistical inference wikipedia , lookup
Misuse of statistics wikipedia , lookup
IŞIKIE IE 256- Engineering Statistics Fundamental Sampling Distributions and Data Descriptions IE256 Engineering Statistics – Summer 2010 Sampling 1 Graphical Methods and Data Description IŞIKIE Summarizing or characterizing the nature of collections of data is very important. Display of data can enhance statistical inference about scientific systems. Stem-and-Leaf Plot Consider the data on the “life” of 40 similar Car Battery Life car batteries recorded to the nearest tenth 2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6 of a year given in the following table. The 3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7 2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1 batteries are guaranteed to last 3 years. 3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4 First, we split each observation into two 4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5 parts consisting of a stem and a leaf such that the stem represents the digit preceding the decimal and the leaf corresponding to the decimal part of the number. For example for the number 3.7 the digit 3 is the stem and the digit 7 is the leaf. This data can be represented using four stems, 1, 2, 3, and 4. The leaves will be listed next to each stem. IE256 Engineering Statistics – Summer 2010 Sampling 2 IŞIKIE Graphical Methods and Data Description Stem-and-Leaf Plot of Battery Life Stem 1 2 3 4 Leaf 69 25669 00111122233344445567778899 11234577 Frequency 2 5 25 8 The stem-and-leaf plot above contains only four stems and consequently does not provide an adequate picture of the distribution. To remedy this we need to increase the number of stems in our plot. One simple way to accomplish this is to divide further each stem into two and then record the leaves 0, 1, 2, 3, and 4 opposite to the first stem and the leaves 5, 6, 7, 8, and 9 opposite to the second stem. Stem-and-Leaf Plot of Battery Life Stem 1• 2* 2• 3* 3• 4* 4• Leaf 69 2 5669 0011112223334444 5567778899 11234 577 IE256 Engineering Statistics – Summer 2010 Frequency 2 1 4 15 10 5 3 Sampling 3 Graphical Methods and Data Description IŞIKIE Frequency Histogram The use of frequency distribution is another effective way to summarize data. The data are grouped into different classes or intervals and the number of data points belonging to each interval reveals the frequency distribution. Dividing each class frequency by the total number of observations, we obtain the proportion of the set of observations in each of the classes. The information provided by the relative frequency distribution in tabular form is easier to grasp if presented graphically. IE256 Engineering Statistics – Summer 2010 Sampling 4 IŞIKIE Graphical Methods and Data Description Relative Frequency Distribution of Battery Life Relative Frequency Class Interval 1.5-1.9 2.0-2.4 2.5-2.9 3.0-3.4 3.5-3.9 4.0-4.4 4.5-4.9 Class Midpoint 1.7 2.2 2.7 3.2 3.7 4.2 4.7 Frequency f 2 1 4 15 10 5 3 Relative Frequency 0.050 0.025 0.100 0.375 0.250 0.125 0.075 0.375 0.250 0.125 0 1.7 2.2 IE256 Engineering Statistics – Summer 2010 2.7 3.2 3.7 Battery Life 4.2 4.7 Sampling 5 IŞIKIE Graphical Methods and Data Description Skewness of Data Distributions can be classified according to their tails. A distribution with equal tails is said to be symmetric whereas a longer and/or heavier left tail is said to be skewed to the left and a longer and/or heavier right tail is said to be skewed to the right. A frequency histogram may be classified based on skewness as well. skewed to the left IE256 Engineering Statistics – Summer 2010 symmetric skewed to the right Introduction 6 Graphical Methods and Data Description IŞIKIE While the median divides the data into two, the lower and upper halves of the data, we can divide the data into further parts. Quartiles divide the data into four parts. The third quartile separates the upper quarter (25%) of the data from the rest of the data, the second quartile is the median, and the first quartile separates the lower quarter from the upper 75% of the data. For example consider a data set with 40 observations (n = 40). One quarter of the data equals 10 observations. First quartile needs to separate the data into the lower 10 observations and the upper 30 observations, therefore the mid point of x(10) and x(11) constitutes first quartile. x ( 10) x ( 11) First Quartile 2 Second x (20) x (21) Quartile 2 (median) IE256 Engineering Statistics – Spring 2011 x ( 30) x ( 31) Third Quartile 2 Introduction 7 Graphical Methods and Data Description IŞIKIE The data can even be further divided by computing percentiles of the distribution. Percentiles divide the data into hundred parts. For example the 95th percentile separates the highest 5% from the bottom 95% while the 10th percentile separates the bottom 10% from the top 90%. Consider a data set with 40 observations (n = 40). 5% of the the data consists of 2 observations. 95th percentile = midpoint between x(38) and x(39) 10th percentile = midpoint between x(4) and x(5) IE256 Engineering Statistics – Summer 2010 Introduction 8 IŞIKIE Graphical Methods and Data Description Box-and-Whisker Plot Box-and-whisker plot encloses the interquartile range of the data in a box that has the median displayed within. The interquartile range is between the 75th percentile (upper quartile) and 25th percentile (lower quartile). In addition to the box, “whiskers” extend, showing the largest and smallest observations. Consider the nicotine data (Exercise 1.21) given below: 1.09 0.85 1.86 1.82 1.40 0.50 1.92 1.24 1.90 1.79 1.64 2.31 1.58 1.68 2.46 2.09 1.79 2.03 1.51 1.88 1.75 2.28 1.70 1.64 2.08 1.63 1.00 IE256 Engineering Statistics – Summer 2010 1.74 2.17 0.72 1.67 2.37 1.50 1.47 2.55 1.69 1.37 1.75 1.97 2.11 1.85 1.93 1.69 2.00 min 0.72 q 1 1.64 median 1.77 q3 2.00 max 2.55 2.50 Sampling 9 IŞIKIE Graphical Methods and Data Description 0.72 0.50 1.64 1.77 1.00 1.50 2.00 2.00 2.55 2.50 A variation called box plot can provide the viewer information regarding which observations may be outliers. One common procedure to detect outliers is to use a multiple of the interquartile range. For example, if the distance from the box exceeds 1.5 times the interquartile range (in either direction), the observation may be labeled as an outlier. 0.50 1.00 IE256 Engineering Statistics – Summer 2010 1.50 2.00 2.50 Introduction 10 Populations and Samples IŞIKIE Definition 8.1. A population consists of the totality of the observations with which we are concerned. Populations can be infinite or they might have a finite size. We call a population “population f(x)” if each observation in a population is a value of a random variable X having probability distribution f(x). Ex: Normal Population, Binomial Population. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 11 Populations and Samples IŞIKIE The statistician is interested in arriving at conclusions concerning a population when it is impossible to observe the entire population. Therefore, we must depend on a subset of observations, which brings us to the notion of sampling. Definition 8.2. A sample is a subset of a population To eliminate any bias in the sampling procedure, it is desirable to choose a random sample. Let Xi, i=1,2,…,n, represent the ith sample value we observe from a population f(x). The random variables Xi will constitute a random sample with values x1, x2, …, xn if the measurements are obtained by repeating the experiment n independent times under the same conditions. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 12 IŞIKIE Populations and Samples Definition 8.3. Let X1, X2, …, Xn be n independent random variables, each having the same probability distribution f(x). We define X1, X2, …, Xn to be a random sample of size n from the population f(x) and write its joint distribution as f(x1,x2,…,xn)=f(x1)f(x2)…f(xn) Normal Population µ X1 X5 IE256 Engineering Statistics – Spring 2011 X3 X 2 X 4 Sampling Distributions 13 Some Important Statistics IŞIKIE Definition 8.4. Any function of the random variables constituting a random sample is called a statistic. In this course we are interested in the following major statistics -Sample mean -Sample variance Definition 8.5. If X1, X2,…, Xn represent a random sample of size n, then the sample mean is defined by the statistic 1 n X Xi n i 1 Definition 8.6. If X1, X2,…, Xn represent a random sample of size n, then the sample variance is defined by the statistic 1 n 2 S X X i n 1 i 1 2 IE256 Engineering Statistics – Spring 2011 Sampling Distributions 14 IŞIKIE Some Important Statistics Definition. The sample mode is the element that is observed most often in the sample. Definition. Let X(i) be the i-th value when the sample data is sorted in increasing order. Then, the sample median is defined as follows X( n1) 2 ~ X 1 ( X( n 2) X( n 21) ) 2 IE256 Engineering Statistics – Spring 2011 if n is odd, if n is even, Sampling Distributions 15 IŞIKIE Some Important Statistics Example: Consider the following measurements, in liters for two samples of orange juice bottled by companies A and B: Sample A 0.97 1.00 0.94 1.03 1.06 Sample B 1.06 1.01 0.88 0.91 1.14 -Compute the sample means, the sample medians and the sample variances. Definition 8.7. The sample standard deviation, denoted by S, is the positive square root of the sample variance. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 16 IŞIKIE Sampling Distributions A statistic is a random variable that depends only on the observed sample, hence it must have a probability distribution. Definition 8.10. The probability distribution of a statistic is called a sampling distribution. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 17 Sampling Distribution of Means IŞIKIE Suppose that a random sample of n observations are taken from a normal population with mean m and variance s 2. Each observation Xi, i =1,2, ...,n of the random sample will then have the same normal distribution as the population being sampled. By the reproductive property of the normal distribution (see Theorem 7.11), we conclude that: 1 X1 X2 ... X n n has normal distribution with mean 1 E X m X m m ... m m n n terms and variance X 1 2 s2 2 2 Var ( X ) s 2 s s ... s n n 2 X n terms If we are sampling from a population with unknown distribution, either finite or infinite, distribution of X will still be approximately normal with mean m and s 2/ n provided that the sample size is large. This AMAZING!!! result is an immediate consequence of the following theorem, called the central limit theorem. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 18 Sampling Distribution of Means IŞIKIE Theorem 8.2. Central Limit Theorem: If X is the mean of a random sample of size n taken from a population with mean m and finite variance s 2, then the limiting form of the distribution of Xm s n as n , is the standard normal distribution, n(z;0, 1). Z The normal approximation for X will generally be good if n ≥ 30. If n < 30 the approximation is good only if the population is not too different from a normal distribution. If the population is known to be normal, the sampling distribution of X will follow a normal distribution exactly, no matter how small the size of the sample. The sample size n = 30 is a guideline to use for the central limit theorem. However, as the statement of the theorem implies, the presumption of normality on the distribution of X becomes more accurate as n grows larger. (Figure 8.7 p246) IE256 Engineering Statistics – Spring 2011 Sampling Distributions 19 Sampling Distribution of Means IŞIKIE Example 8.6. An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours. Note: If we were to repeat this experiment (sample 16 bulbs and compute the sample mean) many times, what proportion of the sample averages will be less than 775 hours? How rare an event that X 775 is when m 800 . These are a questions regarding the distribution of the sample mean. Note: the probability of a single light bulb being less than 775 hours is different than the probability of average of 16 light bulbs being less than 775 hours IE256 Engineering Statistics – Spring 2011 Sampling Distributions 20 IŞIKIE Sampling Distribution of Means distribution of X m m X 800 P( X 775) s s X 40 40 775 800 distribution of X m X 800 sX 40 10 16 P( X 775) 10 775 800 IE256 Engineering Statistics – Spring 2011 Sampling Distributions 21 IŞIKIE Sampling Distribution of Means X m 775 m P( X 775) P s s 0.2660 P Z 775 800 PZ 0.625 40 Xm 775 m P( X 775) P s n s n 0.0062 775 800 P Z PZ 2.5 40 16 Only 62 out of 10000 samples (experiments) will result with a sample average less than or equal to 775 hours if in fact the population mean is 800 hours and the population standard deviation is 40 hours. On the other hand, if m, the population mean, truly were 785 hours X 775 would not be a rare event. 775 785 PZ 1.0 0.1587 P( X 775 | m 785) P Z 40 16 IE256 Engineering Statistics – Summer 2010 Sampling Distributions 22 IŞIKIE Inferences on the population mean One very important application of the central limit theorem is the determination of reasonable values of the population mean µ. Topics such as hypothesis testing, estimation, quality control, and others make use of the central limit theorem. Example 8.7. An important manufacturing process produces cylindrical component parts for the automotive industry. It is important that the process produces parts having a mean radius of 5.0 millimeters. The engineer involved conjectures that the population mean is 5.0 millimeters. An experiment is conducted in which 100 parts produced by the process are selected randomly and the diameter measured on each. It is known that the population standard deviation s = 0.1 millimeters. The experiment indicates a sample average diameter x 5.027 millimeters. (a) Does this sample information appear to support or refute the engineer’s conjecture? (b) What is the probability that the sample mean can deviate by as mush as 0.027? IE256 Engineering Statistics – Summer 2010 Sampling Distributions 23 IŞIKIE Sampling Distribution of Means P X m 0.027 P X m 0.027 or X m 0.027 X m 0.027 X m 0.027 P or s n s n s n s n 0.027 0.027 P Z or Z 0.01 0.01 P Z 2.7 P Z 2.7 2P Z 2.7 2 0.0035 0.007 PZ 2.7 PZ 2.7 -3.0 -2.0 -1.0 IE256 Engineering Statistics – Summer 2010 0.0 1.0 2.0 3.0 Sampling Distributions 24 Sampling Distribution of the Difference between Two Averages IŞIKIE In the previous example, the engineer was interested in supporting a conjecture regarding a single population mean. A far more important application involves two populations. A scientist or engineer is interested in a comparative experiment in which two manufacturing methods, 1 and 2, are to be compared. Suppose we have two populations, the first with mean µ1 and variance σ1; and the second with mean µ2 and variance σ2. Let the statistic X 1 be the mean of the random sample of size n1 from population 1. Let the statistic X 2 be the mean of the random sample of size n2 from population 2. What can we say about the sampling distribution of the difference X 1 X 2 for repeated samples of size n1 and n2? IE256 Engineering Statistics – Spring 2011 Sampling Distributions 25 Sampling Distribution of the Difference between Two Averages IŞIKIE Theorem 8.3. If independent samples of size n1 and n2 are drawn at random from two populations with means µ1 and µ2, and variances σ12 and σ22 respectively, then the sampling distribution of the differences of means, X 1 X 2 is normally distributed with mean and variance given by m X 1 X2 m 1 m 2 and s X2 1 X2 s 12 n1 s 22 n2 Hence Z X 1 X 2 m 1 m 2 (s 12 n1 ) (s 22 n2 ) is approximately a standard normal variable. Remark. If both n1 and n2 are greater than or equal to 30, the normal approximation for the distribution of X 1 X 2 is very good when the underlying distributions are not too far away from normal. However, even when n1 and n2 are less than 30, the normal approximation is reasonably good except when the populations are decidedly nonnormal. Of course, if both populations are normal, then X 1 X 2 has a normal distribution no matter what the sizes are of n1 and n2. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 26 Sampling Distribution of the Difference between Two Averages IŞIKIE Example 8.8. Two independent experiments are being run in which two different types of paints are compared. Eighteen specimens are painted using type A and the drying time, in hours, is recorded on each. The same is done with type B. The populations are both approximately normal and the standard deviations are both known to be 1.0 hour. Assuming that the mean drying time is equal for the two types of paint, find P X A X B 1.0 where X A and X B are average drying times for samples of size nA = nB = 18. sX m A mB 0 IE256 Engineering Statistics – Spring 2011 A XB 1.0 Sampling Distributions 27 Sampling Distribution of the Difference between Two Averages IŞIKIE From the sampling distribution of X A X B , we know that the distribution is approximately normal with mean and variance m X A XB m A m B 0 s X2 A XB s A2 nA s B2 nB 1 1 1 18 18 9 1 m A m B 10 z 3 19 13 P X A X B 1.0 PZ z 1 PZ z 1 PZ 3.0 1 0.9987 0.0013 IE256 Engineering Statistics – Spring 2011 Sampling Distributions 28 Sampling Distribution of the Difference between Two Averages IŞIKIE Example 8.9. The television tubes of manufacturer A have a mean life of 6.5 years and a standard deviation of 0.9 year, while those of manufacturer B have a mean lifetime of 6.0 years and a standard deviation of 0.8 year. Population 1 Population 2 m1 = 6.5 s1 = 0.9 m2 = 6.0 s2 = 0.8 n1 = 36 n2 = 49 What is the probability that a random sample of 36 tubes from manufacturer A will have a mean lifetime that is at least 1 year more than the mean lifetime of a sample of 49 tubes from manufacturer B? IE256 Engineering Statistics – Spring 2011 Sampling Distributions 29 Sampling Distribution of the Difference between Two Averages IŞIKIE The sampling distribution of X 1 X 2 will be approximately normal and will have a mean and standard deviation m X1 X2 6.5 6.0 0.5 sX 0.81 0.64 0.189 1 X2 36 49 1.0 0.5 z 2.65 0.189 IE256 Engineering Statistics – Spring 2011 P X 1 X 2 1.0 PZ 2.65 1 PZ 2.65 1 0.9960 0.0040 Sampling Distributions 30 Normal Approximation to Binomial IŞIKIE Recall the binomial distribution function n x b( x ; n, p) p (1 p)n x x Probabilities associated with binomial experiments are readily obtainable from the distribution formula b( x; n, p) or from Table A.1. when n is small. However, it is very difficult to obtain these probabilities when n is large. For instance, consider computing b(20;100,0.3) using your calculators. Therefore, when n is large we use normal approximation. And normal approximation is quite convenient because the cumulative distribution of normal is tabled. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 31 IŞIKIE Normal Approximation to Binomial Theorem 6.2. If X is a binomial random variable with mean µ=np and variance σ2=np(1-p), then the limiting form of the distribution of X np Z np(1 p) as n->∞, is the standard normal distribution n(z;0,1). It turns out that the approximation with µ=np and variance σ2=np(1-p) is very accurate if n is very large and p is not so close to 0 or 1. As a rule of thumb, we use this approximation if np≥5 and n(1-p) ≥5. Let us consider X ~ b(x; 15, 0.4) to illustrate the normal approximation to the binomial distribution. Let us compute P(X=4) and P(7 ≤X≤9) by using the normal approximation. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 32 IŞIKIE Normal Approximation to Binomial When computing the probabilities we need to include the correction factor of 0.5. 0.25 0.20 0.15 0.10 0.05 0.00 0 1 2 3 4 IE256 Engineering Statistics – Spring 2011 5 6 7 8 9 10 11 12 13 14 15 Sampling Distributions 33 IŞIKIE Normal Approximation to Binomial P X 4 b(4; 15, 0.4) 0.1268 4.5 6 3.5 6 P X 4 P Z P 1.32 Z 0.79 3.6 3.6 PZ 0.79 PZ 1.32 0.2148 0.0934 0.1214 P7 X 9 9 b(x; 15,0.4) b* (9; 15,0.4) b* (6; 15,0.4) x 7 0.9662 0.6098 0.3564 9.5 6 6.5 6 P7 X 9 P Z P0.26 Z 1.85 1.897 1.897 PZ 1.85 PZ 0.26 0.9678 0.6026 0.3652 IE256 Engineering Statistics – Spring 2011 Sampling Distributions 34 Normal Approximation to Binomial IŞIKIE Example 6.15. The probability that a patient recovers from a rare blood disease is 0.4. If 100 people are known to have contracted this disease, what is the probability that less than 30 will survive? Let the binomial variable X represent the number of patients that survive. Since n=100, we should obtain fairly accurate results using the normal-curve approximation IE256 Engineering Statistics – Spring 2011 Sampling Distributions 35 Sampling Distribution of S2 IŞIKIE Recall the definition of the sample statistic S2. Definition 8.6. If X1, X2, X3,…,Xn represent a random sample of size n, then the sample variance is defined by the statistic n S2 Xi X 2 i 1 n 1 If a random sample of size n is drawn from A NORMAL POPULATION with mean µ and variance σ2, and the sample variance is computed, we obtain a value of the2 statistic S2. We shall proceed to consider the distribution of the statistic n 1S . s2 Theorem 8.4. If S2 is the variance of a random sample of size n drawn from a normal population having the variance σ2, then the statistic 2 (n 1)S s2 2 n i 1 Xi X 2 s2 has a chi-squared distribution with n = n – 1 degrees of freedom. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 36 Sampling Distribution of S2 IŞIKIE But what is a chi-square distribution? Theorem. If Z1, Z2,…, Zn are independent random variables each having standard normal distributions, then Y Z12 Z 22 ... Z n2 has a chi-squared distribution with n degrees of freedom. Then, why does the distribution in Theorem 8.4. have n = n – 1 degrees of freedom? What is “degree of freedom”? IE256 Engineering Statistics – Spring 2011 Sampling Distributions 37 Sampling Distribution of S2 IŞIKIE The probability that a random sample produces a 2 value greater than some specified value is equal to the area under the curve to the right of this value. It is 2 customary to let a represent the 2 value above which we find an area of a . This is illustrated below by the shaded region: 0.12 0.10 Area = a 0.08 0.06 0.04 0.02 0.00 0.0 5.0 10.0 2 a 15.0 20.0 25.0 30.0 2 Table A.5 on page 755-756 in the text book gives values of a for various values of a and n . The areas, a , are given in the left column, and the table entries are the values. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 38 Sampling Distribution of S2 IŞIKIE 2 The 2 value with 7 degrees of freedom, leaving an area of 0.05 to the right, is 0.05 14.07 . IE256 Engineering Statistics – Spring 2011 Sampling Distributions 39 Sampling Distribution of S2 IŞIKIE Example: Assume a random variable Y has a chi-squared distribution with 9 degrees of freedom. What is the probability that (a) 12.24<Y<16.92 (b) 21.67<Y Example: Assume a population is normally distributed with mean µ and variance σ2. A random sample of 11 items is drawn from the population and the sample variance s2 is computed. What is the probability that the sample variance s2 is at least 1.6 times the variance of the population σ2. Example: Assume a population is normally distributed with a mean of 100 and a standard deviation of 5 units. A random sample of 11 items is drawn from the population and the sample standard deviation s is computed. What is the probability that (a) 6<s (b) 3<s<7 IE256 Engineering Statistics – Spring 2011 Sampling Distributions 40 Sampling Distribution of S2 IŞIKIE Example 8.10. A manufacturer of car batteries guarantees that his batteries will last, on the average, 3 years with a standard deviation of 1 year. If five of these batteries have lifetimes of 1.9, 2.4, 3.0, 3.5, and 4.2 years, is the manufacturer still convinced that his batteries have a standard deviation of 1 year? Assume that the battery lifetime follows a normal distribution. First find the sample variance (5)(1.92 2.42 3.02 3.52 4.2 2 ) (1.9 2.4 3.0 3.5 4.2) 2 s (5)(4) 2 (5)(48.26) (15) 2 0.815 (5)(4) Then compute 2 (n 1)s 2 s2 (5 1)(0.815) 3.26 2 1 Now, what is your conclusion? IE256 Engineering Statistics – Spring 2011 Sampling Distributions 41 Sampling Distribution of S2 0.20 IŞIKIE 51 0.18 0.16 0.14 02.975 0.484 0.12 0.10 Area = 2.5% 02.025 11.143 0.08 0.06 0.04 0.02 0.00 0.0 5.0 0.484 IE256 Engineering Statistics – Spring 2011 10.0 15.0 20.0 11.143 Sampling Distributions 42 IŞIKIE t-Distribution Theorem 8.5. Let Z be a standard normal random variable and V a chi-squared random variable with n degrees of freedom. If Z and V are independent, then the distribution of the random variable T, where T Z V / is given by the density function ( 1) / 2 / 2 t 1 ( 1) / 2 for t This is known as the t-distribution with n degrees of freedom. h( t ) IE256 Engineering Statistics – Spring 2011 2 Sampling Distributions 43 IŞIKIE t-Distribution Now consider a sample of size n drawn from a normal population and remember that Xm Z s n has the standard normal distribution. For the same sample, consider the following statistic related to the variance of the sample: V (n 1)S 2 s2 Random variable V has chi-squared distribution with (n-1) degrees of freedom. Then T X m s/ n (n 1)S 2 X m S/ n s2 n 1 has a t-distribution with (n-1) degrees of freedom. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 44 IŞIKIE t-Distribution Corollary 8.1. Let X1, X2,…, Xn be independent random variables that are all normal with mean µ and standard deviation σ. Let 1 n X Xi n i1 and 1 n 2 S ( X X ) i n 1 i 1 2 X m Then the random variable T has a t-distribution with n = n – 1 degrees of S/ n freedom. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 45 IŞIKIE t-Distribution Below, you can see a graph of the t-distribution with various degrees of freedom. Observe that t-distribution approaches the standard normal distribution as the sample size n increases. Z T dof = 15 T dof = 4 T dof = 2 -5.0 -4.0 -3.0 -2.0 IE256 Engineering Statistics – Spring 2011 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 Sampling Distributions 46 IŞIKIE t-Distribution It is customary to let ta represent the t-value above which we find an area equal to a . Hence, the t-value with 10 degrees of freedom leaving an area of 0.05 to its right is t = 1.812. Since the t-distribution is symmetric about a mean of zero, we have t1-a = – ta ; that is, the t-value leaving an area of 1 – a to the right and therefore an area of a to the left is equal to the negative t-value that leaves an area of a in the right tail of the distribution. That is, t0.95 = – t0.05 , t0.99 = – t0.01 and so forth. T d.o.f. 10 P(T t0.05 ) P(T 1.812) 0.05 t0.05 1.812 Area = 0.05 -4.0 0.0 1.812 IE256 Engineering Statistics – Spring 2011 4.0 Sampling Distributions 47 IŞIKIE t-Distribution Example 8.11. What is the t-value with v=14 degrees of freedom that leaves an area of (a) 0.025 to the right (b) 0.975 to the right (c) 0.75 to the right Example 8.13. Find k such that P(k<T<-1.761)=0.045 for a random sample of size 15 selected from a normal distribution and T X m S/ n IE256 Engineering Statistics – Spring 2011 Sampling Distributions 48 t-Distribution IŞIKIE Example 8.14. A chemical engineer claims that the population mean yield of a certain batch process is 500 grams per milliliter of raw material. To check this claim he samples 25 batches each month. If the computed t-value falls between – t0.05 and t0.05 , he is satisfied with his claim. What conclusion should he draw from a sample that has a mean x 518 grams per milliliter and a sample standard deviation s= 40grams per milliliter? Assume the distribution of yields to be approximately normal. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 49 t-Distribution IŞIKIE Example 8.50. A manufacturing firm claims that the batteries used in their electronic games will last an average of 30 hours. To maintain this average, 16 batteries are tested each month. If the computed t-value falls between –t0.025 and t0.025, the firm is satisfied with its claim. What conclusion should the firm draw from a sample that has a mean of 27.5 hours and a standard deviation of 5 hours? Assume that the distribution of battery lives to be approximately normal. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 50 IŞIKIE F-Distribution The F-distribution is used in the comparison of sample variances. The statistic F is defined to be the ratio of two independent chi-squared random variables, each divided by its number of degrees of freedom. Hence, we can write F U /1 V / 2 where U and V are independent random variables having chi-squared distributions with v1 and v2 degrees of freedom, respectively. Theorem 8.6. Let U and V be two independent random variables having chi-squared distributions with v1 and v2 degrees of freedom, respectively. The the distribution of the random variable F U /1 is given by the density V / 2 1/2 ( ) / 2 ( / ) 1 2 1 2 h( f ) 1/ 2 2/ 2 0 f (1/2) 1 (1 1 f ) (1)2/2 if f 0 if f 0 This is known as the F-distribution with v1 and v2 degrees of freedom. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 51 IŞIKIE F-Distribution The curve of F-distribution depends not only on the two parameters n1 and n2 but also on the order in which we state them. 1.00 F (10,30) 0.8 0 F (6,10) 0.6 0 0.4 0 0.20 0.0 0.0 0 0 0.5 0 1.00 1.50 2.00 IE256 Engineering Statistics – Spring 2011 2.50 3.0 0 3.50 4.0 0 Sampling Distributions 52 IŞIKIE F-Distribution We let fa be the f value above which we find an area equal to a. Table A.6. in your textbook gives values of fa only for a0.05. F (6,20) Area a 0 fa (6,20) Theorem 8.7. Writing fa1,2 for fa with 1 and 2 degrees of freedom, we obtain 1 f (v 2 , v 1) 1a fa (v 1 , v 2 ) Example. Find f (6 , 10) 0.95 IE256 Engineering Statistics – Spring 2011 Sampling Distributions 53 F-Distribution IE256 Engineering Statistics – Spring 2011 IŞIKIE Sampling Distributions 54 F-Distribution IE256 Engineering Statistics – Spring 2011 IŞIKIE Sampling Distributions 55 IŞIKIE F-Distribution Suppose that random samples of size n1 and n2 are selected from two normal populations with variances s12 and s22, respectively. Define two random variables U and V: (n1 1)S12 (n2 1)S22 U V 2 2 s1 s2 We know that U and V are random variables having chi-squared distributions with n1 = n1 – 1 and n2 = n2 – 1 degrees of freedom respectively. Recall that U /1 S12 s 12 F 2 2 V /2 S2 s 2 is an F-distributed random variable with 1 and 2 degrees of freedom as stated in the following theorem. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 56 IŞIKIE F-Distribution Theorem 8.8. If S12 and S22 are the variances of independent random samples of size n1 and n2 taken from normal populations with variances σ12 and σ 22, respectively, then F S12 / s 12 S22 / s 22 has an F-distribution n1 = n1 – 1 and n2 = n2 – 1 degrees of freedom. Example: If S12 and S22 represent the variances of independent random samples of size n1 = 8 and n2 = 12, taken from normal populations with equal variances, 2 2 find P(S1 / S2 4.89) What if s 12 4 and s 22 2 ? IE256 Engineering Statistics – Spring 2011 Sampling Distributions 57 IŞIKIE Some Exercises Exercise 8.54. Pull-strength tests on 10 soldered leads for a semi conductor device yield the following results in pounds force required to rupture the bond: 19.8, 12.7, 13.2, 16.9, 10.6, 18.8, 11.1, 14.3, 17.0, 12.5 Another set of 8 leads was tested after the encapsulation to determine whether the pull-strength has been increased by encapsulation of the device, with the following results: 24.9, 22.8, 23.6, 22.1, 20.4, 21.6, 21.8, 22.5 Comment on the evidence concerning equality of the two population variances. IE256 Engineering Statistics – Spring 2011 Sampling Distributions 58 Some Exercises IŞIKIE Example: The expression level of a gene is measured in a number of control subjects and patients. The values measured in controls are: 10, 12, 11, 15, 13, 11, 12 and the values measured in patients are: 12, 13, 13, 15, 12, 18, 17, 16, 16, 12, 15, 10, 12. Is the variance different between controls and patients? IE256 Engineering Statistics – Spring 2011 Sampling Distributions 59 IŞIKIE Some Exercises Exercise 8.67. The breaking strength of a certain rivet used in a machine engine has a mean 5000 psi and standard deviation 400 psi. A random sample of 36 rivets is taken. Consider the distribution of X, the sample mean breaking strength. (a) What is the probability that the sample mean falls between 4800 psi and 5200 psi (b) What sample size n would be necessary in order to have P(4900 X 5100) 0.99 IE256 Engineering Statistics – Spring 2011 Sampling Distributions 60 Some Exercises IŞIKIE Exercise 8.52. A maker of a certain brand of low fat cereal bars claims that their average saturated fat content is 0.5 gram. In a random sample of 8 cereal bars of this brand, the saturated fat content was 0.6, 0.7, 0.7, 0.3, 0.4, 0.5, 0.4 and 0.2. Would you agree with the claim? Assume a normal distribution. Exercise 8.74. Suppose a filling machine is used to fill cartons with a liquid product. The specification that is strictly enforced for the filling machine is 9±1.5 oz. If any carton is produced with weight outside these bounds, it is considered defective. It is hoped that at least 99% of cartons will meet these specifications. With the conditions µ=9 and σ=1, what proportion of cartons from the process are defective? If changes are made to reduce variability, what must σ be reduced to in order to meet specification with probability 0.99? Assume a normal distribution for the weight. Exercise 8.75. Consider the situation in 8.74. Suppose a considerable quality effort is conducted to tighten the variability in the system. Following the effort, a random sample of size 40 is taken and the sample variance s2=0.188 ounces2. Do we have a strong evidence that σ2 has been reduced below 1.0? IE256 Engineering Statistics – Spring 2011 Sampling Distributions 61 IŞIKIE Some Exercises Example: A sample of 10 fiber-reinforced polymer (FRP) strips mechanically fastened to highway bridges were tested for bearing strength. The strength measurement Y (in mega Pascal units, MPa) was recorded for each strip. Assume that Y is normally distributed with variance σ2=100. a) Find the probability that s2 is less than 16.92. b) The data for the experiment are listed below. Do these data tend to contradict or support the assumption that σ2=100? 240.9, 248.8, 215.7, 133.6, 231.4, 230.9, 225.3, 247.3, 235.5, 238.0 IE256 Engineering Statistics – Spring 2011 Sampling Distributions 62