Download P(A | B)

III Modeling Random Behavior A. Probability 1. Overview Statisticians use probability to model uncertainty. Consider-these statements: • The probability that the next batch of Ti02 (white pigment) is unacceptable is .01. • There is a 25% chance our firm will get the IBM order. In each case, what we mean by a probability is the "size" of a set of interest "size" of the set of all possible outcomes Some notation will aid our discussion. An event is a set of possible outcomes of interest. The sample space, S, is the set of all possible outcomes. If A is an event, the probability that A occurs is Note: size of A P( A)  size of S • 0 ≤ P(A) ≤ 1 • P(S) = 1 • Probability may be either objective (based on prior experience) or poorly subjective. • Ultimately, the accuracy of specific probabilities depends on assumptions. • If the assumptions upon which we base a specific probability are wrong, then we should not expect the specific probability to be any good. Example: Making Nickel Battery Plate A particular process for making nickel battery plate requires an operator to sift nickel powder into a frame. The process uses a very tight weight specification, which is difficult to make. The supervisor monitored the last 1000 attempts made by an operator. The operator successfully made the specification 379 times. One way to get the probability of a successful attempt is 379 P(successful attempt)   .379 1000 The supervisor noted, however, that the operator seemed to get better over time. In this case, the supervisor may believe that the actual probability is something larger than .379. A perfectly reasonable, but subjective, estimate of the probability of a successful attempt is 0.4. 2. Making Inferences Using Probabilities. Suppose the supervisor really believes that the probability of a successful attempt is 0.4. Suppose further that out of the next 50 attempts, she is never successful. Would you now believe that the probability of a success is still 0.4? OF COURSE NOT! Consider another scenario. Suppose the first attempt is unsuccessful. Do you have good reason to believe that the true probability of a success is not 0.4? Suppose the first two attempts are unsuccessful. Suppose the first three are unsuccessful. At what point do we begin to believe that the probability really is not 0.4? The answer lies in calculating the probability of seeing y in a row assuming a probability of 0.4. Once that probability is small enough, we can reasonably conclude that the true probability is not 0.4 3. Conditional Probability Often, two events are related. Knowing the relationship between the two events allows us to model the behavior of one event in terms of the other. Conditional Probability quantifies the chances of one event occuring given that the other occurs. We denote the probability that an event A occurs given that the event B has occured by P(A|B). The key to conditional probability: the intersection of the two events defines the nature of their relationship. This concept is best illustrated by an example: Personal Computers A major manufacturer of personal computers has introduced a new p.c. • As with most new products, there seem to be some problems. • This manufacturer offers a one-year warranty on this model. Let • A be the event that the hard drive on a specific computer fails within one year. • B be the event that the floppy drive on a specific computer fails within one year. Consider a specific computer whose floppy drive has failed. In this case, we know that the event B has occured. What now is the probability that this same computer will have its hard drive fail? What we seek is P(A|B). Note: once we know that B has occured, the sample space of interest is restricted to B. Similarly, once we know that B has occured, the set of interest is restricted to that portion of A which resides in B, A ∩ B. size of the set of interest ( A  B) P( A | B)  1 size of the set of all possible outcomes (B) 1 size of the set of interest ( A  B)   size of S size of the set of all possible outcomes (B) 1 size of S size of the set of interest ( A  B) size of S  size of the set of all possible outcomes (B) size of S P( A  B)  P( B) Definition: Conditional Probability Let A and B be events in S. The conditional probability of B given that A has occurred is P( A  B) P ( B | A)  P ( A) if P(A) > 0. Similarly, the conditional probability of A given that B has occurred is P( A  B) P( A | B)  P( B) if P(B) > 0. Example: Personal Computers - Continued The reliability engineers have determined that: P(A) = .02 P(B) = .05 P(A U B) = .01 Note: P(A U B) is the probability that both the hard and floppy drives on a specific computer fail within one year). The conditional probability that the hard drive fails given that the floppy drive fails is P( A  B) .01 P( A | B)    .2 P( B) .05 As a result, if we know that the floppy drive failed on a given machine, then the probability the hard drive will fail also is 20%. 4. Independence In many engineering situations, two events have no real relationship. Knowing that one event has occured offers no new information about the chances the other will occur. We call two such events independent. Independence is important for a number of reasons: • many engineering events either are independent or close enough for a first approximation • independence provides a powerful basis for modeling the joint behavior of several events • the formal concept of a random sample assumes that the observations are independent. Definition: Independence Let A, B be events in S. A and B are said to be independent if P(A | B) = P(A) Similarly, if A and B are independent, then P(B | A) = P(B) Example: Personal Computer - Continued Recall: P(A) = .02 P(B) = .05 P(A | B) = .20 Note: the hard drive failing and the floppy drive failing are not independent events because P(A | B) ≠ P(A) Why? Many personal computer designs use the floppy drive as an expensive air filter. As the floppy drive gets dirty, it increases the likelihood of it failing. Also, as the floppy get dirty, the p.c. does not vent heat as well, which increases the likelihood that the hard drive fails. 5. Basic Rules of Probability 1. 0 ≤ P(A) ≤ 1. 2. If Ø is the empty set, then P(Ø) = 0. 3. The Probability of Complements If A is an event in some sample space S, then the complement of the set A relative to S is the set of outcomes in S which are not in A. We denote the complement of A by Ā. P(Ā) = 1 - P(A) 4. The Additive Law of Probability If A and B are events in S, then the union of A and B, denoted by A U B, is the set of outcomes either in A or in B or in both. The Additive Law of Probability is P(A U B) = P(A) + P(B) - P(A ∩ B) If A and B are mutually exclusive, then A ∩ B = Ø and P(A ∩ B) = 0; thus, P(A U B) = P(A) + P(B) 5. The Multiplicative Law of Probability If A and B are events in S, with P(A) > 0 and P(B) > 0, then P( A  B) P ( B | A)  P ( A) Thus, P(A ∩ B) = P(A) • P(B | A) Similarly, P(A ∩ B) = P(B) • P(A | B) If A and B are independent, then P(A | B) = P(A) P(B | A) = P(B) Thus, if A and B are independent, then P(A ∩ B) = P(A) • P(B) This property is a very powerful result, making independence quite important for finding the probabilities associated with the intersections of events. 6. Simplest Form of the Law of Total Probability Let A and B be events in S. We may partition B into two parts: • that which overlaps A, A ∩ B, and • that which overlaps Ā, Ā ∩ B Thus, P(B) = P(A ∩ B) + P(Ā ∩ B) = P(A) • P(B | A) + P(Ā) • P(B | Ā) 7. The Simplest Form of Bayes Rule Let A and B be events in S. Suppose we are given P(A), P(B | A), and P(B | Ā) P( A  B) P( A)  P( B | A) P( A | B)   P( B) P( A)  P( B | A)  P( A )  P( B | A ) Example for Toothpaste Containers A toothpaste company uses four injection molding processes to make its toothpaste containers. These are older pieces of equipment and subject to problems. Event A B C D Description Machine 1 has a problem on any specific day Machine 2 has a problem on any specific day Machine 3 has a problem on any specific day Machine 4 has a problem on any specific day Prob 0.1 0.2 0.05 0.05 What is the probability that no problems occur on any specific day? Note: no problems means • Machine 1 has no problems, A , and • Machine 2 has no problems, B , and • Machine 3 has no problems, C , and • Machine 4 has no problems, D Thus, we seek the probability of an intersection. If we can assume independence, then the probability of the intersection is the product of the individual probabilities. P(no problems)  P( A  B  C  D )  P( A )  P( B )  P(C )  P( D )  (1  0.1)(1  0.2)(1  0.05)(1  0.05)  (.9)(.8)(.95)(.95)  0.6498 B. Discrete Random Variables 1. Overview Let Y be the number of problems that occur on a given day. What does Y = 0 mean? No problems, which is A  B  C  D  P(Y  0)  .6498 What does Y=1 mean? Exactly one problem, which is A  B  C  D or A  B  C  D or A  B  C  D or A  B  C  D Note: Each one of these events is mutually exclusive of the others; thus, P(Y = 1)  P( A  B  C  D )  P( A  B  C  D )  P( A  B  C  D )  P( A  B  C  D) When all is said and done P(Y = 1) = 0.30305 In a similar manner, we can show that P(Y = 2) = 0.04455 P(Y = 3) = 0.00255 P(Y = 4) = 0.00005 Y is an example of a random variable. We describe the behavior of a random variable by its distribution. Every random variable has a cumulative distribution function, F(Y) defined by F(y) = P(Y ≤ y) In our case y 0 1 2 3 4 F(y) 0.64980 0.95285 0.99740 0.99995 1.00000 There are two types of random variables: • Discrete, which have a countable number of possible values • Continuous, which are over a continuum (have an uncountable number of value). Discrete random variables have a probability function, p(y) defined by p(y) = P(Y = y) For our example y 0 1 2 3 4 p(y) 0.64980 0.30305 0.04455 0.00255 0.00005 2. Expected Values Random variables and their distributions provide a way to model random behavior and populations. Parameters are important characteristics of populations. For example, • the typical number of problems which occur each day • the variability in the number of problems which occur. We have already outlined a distribution which describes this number. We can use this distribution to define measures of typical and of variability. Let Y be the discrete random variable of interest. For example, let Y be the number of problems which occur on any given day with the injection molding process for toothpaste tubes. A measure of the typical value for Y is the population example, or the expected value for Y.   E (Y )   y  p( y) y A measure of the variability of Y is the population variance, σ2, defined by   E( y )   2 2 2 where E (Y )   y  p( y) 2 2 y What are What are the units of the population variance? As a result, we often use the population standard deviation, σ, as a measure of variability where   2 Many texts note that virtually all of the data for a particular distribution should fall in the interval μ ± 3σ (the empirical rule). In general, we should take this recommendation with a grain of salt because very skewed or heavy tailed distributions are exceptions. The empirical rule does point out that we can begin to describe the behavior of many distributions with just two measures: • the population mean and • the population standard deviation. Many engineers commonly evaluate their data using this notion of the mean plus or minus three standard deviations. Example: Number of problems with an injection molding process for toothpaste tube. y 0 1 2 3 4 p(y) 0.64980 0.30305 0.04455 0.00255 0.00005 Total y2 y•p(y) 0 0 0.30305 1 0.08910 4 0.00765 9 0.00020 16 0.4   E (Y )   y  p( y)  0.4 y y2•p(y) 0 0.30305 0.17820 0.02295 0.00080 0.505 E (Y )   y  p ( y )  0.505 2 2 y   E (Y )    0.505  (0.4)  .345 2 2 2 2     0.587 2 Note: μ ± 3σ = 0.4 ± 3(0.587) = (-1.361, 2.161) The chances of seeing data within this interval are 99.74\%. 3. Binomial Distribution The manufacturer of nickel battery plate has imposed a tight initial weight specification which is difficult to meet. Consider the next three attempts made by an operator who has a 40 % chance of being successful. Let S represent a successful attempt. Let F represent a failed attempt. Let Y represent the number of successful attempts she makes. Consider the probability that exactly two out of these three attempts are successful, i.e, P(Y = 2). The possible ways she can get exactly two successful attempts are (SSF) (SFS) (FSS) Since these events are mutually exclusive, then the probability of exactly two successful attempts is P(Y = 2) = P(SSF) + P(SFS) + P(FSS) In this situation, we can reasonably assume that each attempt is independent of the others. Let p be the probability that she succeeds in meeting the weight specification on any given attempt. Thus, p = 0.4. Let q = 1 - p be the probability that she fails. In this specific case, q = .6. Since each attempt is independent of the others, then P(SSF) = P(S) • P(S) • P(F) = p • p • q = p2 • q = 0.096 P(SFS) = P(S) • P(F) • P(S) = p • q • p = p2 • q = 0.096 P(FSS) = P(F) • P(S) • P(S) = q • p • p = p2 • q = 0.096 As a result, P(Y = 2) = P(SSF) + P(SFS) + P(FSS) = p2 • q + p2 • q + p2 • q = 3 • p2 • q = (number of ways to get 3 successes) • p2 • q = 3 (0.096) = 0.288 In general, if she makes n total attempts, the probability that she succeeds exactly y times is P(Y = y) = (number of ways, y successes out of n) •py • qn-y We commonly use the binomial coefficient get y successes from n total attempts. We define n    y to denote the number of ways to n n!    y  y!(n  y )! by By definition n    y 0!= 1 We now can write the probability of obtaining exactly y successes out of n total attempts as P(Y = y) = (number of ways to get y successes out of n tries) • py • qn-y = n  p q  y y n y n!  pq y!(n  y )! y n y Consider an experiment which meets the following conditions: 1. the experiment consists of a fixed number of trials, n; 2. each trial can result in one of only two possible outcomes: a “successes” or a “failure”; 3. the probability, p, of a “success” is constant for each trial; 4. the trials were independent; and 5. the random variable of interest, Y is the number of successes over the n trials. If these conditions hold, then Y is said to follow a binomial distribution with parameters, n and p. The probability function for a binomial random variable is n p( y )    p q  y y n y n!  pq y!(n  y )! The mean, variance, and standard deviation are   E (Y )  np   npq 2   npq y n y Example NASA downloads massive data files from a specific satellite three times a day. Historically, the probability that the data file is corrupted during transmission is .10. Consider a day's set of transmissions. What is the probability that exactly two data files are corrupted? Let Y = number of files corrupted. P(y = 2) n  3  p ( 2)    p q   (.1) (.9)  y  2 3!  (.1) (.9) 2!(1!) 3  2!  (.1) (.9)  3(.1) (.9) 2!  0.027 y n y 2 2 2 2 1 Find the mean number of files corrupted. μ = np = 3(.1) = .3 Find the variance and standard deviation for the number of files corrupted. σ2 = npq = 3(.1)(.9) = .27     0.52 2 Using the empirical rule, we expect virtually of the data to fall within the interval μ ± 3σ = 0.3 ± 3(0.52) = (-1.26, 1.86) As a result, we should rarely see 2 or more corrupted files. 4. Poisson Distribution Many engineering problems require us to model the random behavior of small counts. For example, a manufacturer of nickel- hydrogen batteries ran into a problem with cells shorting out prematurely. Each cell used 60 nickel plates. The manufacturer and its customer cut open several cells and discovered that the problem cells all had plates with “blisters” while the good cells did not. Two possible approaches: • Classifies each plate as either conforming (blister free) or non-conforming (one or more blisters). -- Model with a binomial distribution. -- Reduces the data into either acceptable or not acceptable. -- Often ignores the subtleties in the data. • Count the number of blisters on each cell. -- Conforming plates have counts of 0. -- Non-conforming plates have counts of 1 or more. -- A plate with many blisters truly is defective and does short out a cell. -- A plate with only one blister may function perfectly well. Counting the number of blisters provides more information about the specific problem. The Poisson distribution often proves useful for modeling small counts. Let λ be the rate of these counts. If Y follows a Poisson distribution, then   e p( y )  P(Y  y )   y!  0 y With  y  0,1,2, otherwise   E (Y )     2   Example: Consider a maintenance manager of an industrial facility. Historically, a certain department averages six repairs per week. What is the probability that during a randomly selected week, this department will require only two repairs? Let Y = number of repairs. P(Y=2)   y e  y! ( 6)  e 2! 36  (e ) 2  0.0446 2 6 6 What is the probability of at least one repair? P (Y  1)  1  P (Y  0) ( )  1 e 0!  1 e  1  .0025  .9975 0  6 What is the expected number of repairs?   E (Y )    6 What are the variance and standard deviation for the number of repairs?   6 2     6  2.45 Using the empirical rule, we expect virtually of the data to fall within the interval   3   6  3(2.45)  (1.35, 13.35) As a result, we should rarely see 14 or more repairs in any given week. C. Continuous Random Variables 1. Overview The continuous random variables studied in this course have probability density functions, f(y). Some Properties of f(y): 1. f ( y)  0 2.  f ( y ) dy  1 3. F ( y )  P(Y  y )   f ( y)dy 4. P( y  Y  y )   f ( y)dy  F ( y )  F ( y ) 5. P(Y  y )  0   y0 0  0 y2 1 2 y1 2 1 0 P(Y  y )   f ( y)dy  0 y0 0 y0 A very important example of a continuous random variable is one which follows an exponential distribution. The exponential distribution often provides an excellent model for describing the behavior of equipment life times. Example: The times between repairs for an ethanol-water distillation column are well modeled by an exponential distribution which has the form e  y f ( y)    0 y  0,   0 otherwise where λ is the rate of repairs. In this case, λ = .001 repairs/hr. Thus, this column, on the average, requires 1 repair every 1000 hours of operation. What is the probability that the next time to repair will be less than 100 hours from the previous repair? P(Y  y )   f ( y)dy   0 For our example,   e dy  e |  y y0 0  y y0 0  1 e In our case, λ = .001 and y0 = 100; thus,  y 0 P(Y  100)  1  e  1 e  1  0.905  0.095  (.001)( 100 ) .1 What is the probability that the time between repairs will be between 500 and 1500 hours? P( y  Y  y )   e dy  y y2 1 2 y1  e |  y e  y1 y2 y1 e  y 2 In this case, y1=500 and y2=1500; thus, P(500  Y  1500 )  e  e 0.5 1.5  0.383 2. Expected Values – Revisited For a continuous random variable, Y, the expected value is   E (Y )   yf ( y )dy   The variance of Y is once again   E( y )   2 2 2 where E (Y )   y f ( y )dy 2  2  Once again, the standard deviation is   2 Example: The time between repairs We said these times were well modeled by an exponential distribution with λ = .001 accidents/hr. e  y f ( y)    0 y  0,   0 otherwise E (Y )   yf ( y )dy     ye dy 1    y 0  1   1000 hours .001 E (Y )   y f ( y )dy  2 2    y e dy 2   2  y 0  2   E (Y )   2 2 1       2 1   2 2  2  1  2  2 2 2    2 1  In our case,  1      1,000,000  .001 1   1000 .001 2 2 3. Relationship of Distributions and Data Displays Distributions can provide a powerful basis for modeling the random behavior of important characteristics of interest. Formal statistical analyses require certain assumptions about the underlying distribution of the data. Typically, these assumptions center on the “shape” of the data. Appropriate data displays provide a quick and easy way to check these assumptions, especially the stem-and-leaf display and the histogram. The theoretical shape of a stem-and-leaf display for a given set of data is • the probability function, p(y) for a discrete random variable, and • the pdf, f(y), for a continuous random variable. Example: Times Between Industrial Accidents Lucas (1985) analyzed the times between accidents at an industrial facility. We can model these times by an exponential distribution with λ = 0.05. The following plot graphs the pdf for this specific distribution. Consider overlaying an appropriately scaled plot of the pdf on a histogram of the data. This plot indicates that the exponential distribution does provide a reasonable basis for modeling these times. 4. The Normal Distribution The normal distribution is the single most important distribution in classical statistics. Many naturally occuring phenomenon are well modeled by this distribution. Let Y be a normally distributed random variable, its pdf is given by 1 f ( y)  e 2 1 y     2   Note: the pdf depends on the parameters • μ – the population mean • σ2 – the population variance Thus, E(Y) = μ var(Y) = σ2 2 The plot of the pdf looks like The plot is single peaked, centered at μ, symmetric, and the tails die out rapidly. • 68.3% of the area of the curve falls within the interval μ ± σ • 95.4% falls within μ ± 2σ • 99.7% falls within μ ± 3σ We can find any probabilities we need through the standard normal random variable. The standard normal distribution has •μ=0 • σ2 = 1 We denote a standard normal random variable by Z. The values listed in Table I of the Appendix are P(Z ≤ z_0). Thus, P(Z ≤ 1.96) = 0.9750 Consider P(Z > z0) P(Z > 2.33) = 1- P(Z ≤ 2.33) = 1 - .9901 = .0099 Finally, consider P(z1 < Z ≤ z2) Consider P(-1.00 ≤ Z ≤ 1.50) P(Z ≤ 1.50) - P(Z ≤ -1.00) = .9332 - .1587 = .7745 . We often need to use the Z-value associated with specific “tail” areas of the standard normal distribution. Let z z is that value for Z such that   represent the Z-value associated with a right hand “tail area” of  . P( Z  z )    As a result, z  is that value from the table which satisfies 1.0  P(Z  z )    or P( Z  z )  1.0    value from table  1.0   For example, z0.025 is that Z such that P(Z ≤ z0.025) = 1.0 - 0.025 = 0.975 . Looking into the body of the table, we obtain Z0.025 = 1.96 We can transform any normal random variable, Y, to a standard normal, Z, by Y  Z  By subtracting μ, we recenter the random variable around 0. By dividing by σ, we rescale the random variable so that the variance is 1. By subtracting μ, which is the expected value of Y, the expected value of Z is 0. By dividing by σ, we rescale the random variable so that the resulting Z value represents the number of standard deviations a value of a random variable lies from its mean. Example: Suppose that you are an engineer assigned to the bottling department of the Busch Beer Company. A particular 12 oz. bottling machine is known to dispense beer according to a normal distribution with a mean of 12 oz and a variance of .04 oz2. What is the probability that this machine dispenses more than 12.5 oz? Let Y be the amount dispensed. We seek P (Y  12.5)  P (Y    12.5   )  Y   12.5     P       12.5  12.0    P Z   .2    P ( Z  2.5)  1  P ( Z  2.5)  1  .9938  .0062 What is the probability that between 11.75 and 12.5 oz. are dispensed?  11.75   Y   12.5    P(11.75  Y  12.5)  P         12.5  12.0   11.75  12.0  P Z  .2 .2    P(1.25  Z  2.5)  P( Z  2.5)  P( Z  1.25)  .9938  .1056  .8882 D. Random Behavior of Means 1. The Sample Mean Definition: Sample mean Let y1, y2, …, yn be a sample of n observations. The sample mean, y , is given by 1 y y n n i 1 i The sample mean is a measure of the typical value for a data set. It represents the “center of gravity”. Example Battery Plate Porosities Nickel - Hydrogen (Ni-H) batteries use a nickel plate as its anode. A critical quality characteristic is the plate's porosity which controls the interface of the anode with the potassium hydroxide electrolyte solution. A recent random sample of ten porosities yielded: 79.1 79.5 79.3 79.3 78.8 79.0 79.2 79.7 79.0 79.2 The sample mean is 1 y y n 792.1  10  79.21 n i 1 i 2. Random Samples Define: Random Sample Let y1, y2, …, yn be a sample of n observations taken from some population. If these observation are independent of each other and if each observation follows the same distribution, then y1, y2, …, yn is said to be a random sample. All the distribution theory of classical statistics is based upon this concept of a random sample. 3. Central Limit Theorem Consider taking a series of random samples, all of size n, from some population, and calculating y for each one. Since the data are random, y is also a random variable! An important question: What is its distribution? If the population from which we sample is normal, the y also follows a normal distribution. But, how often do you know that the population really is normal? Very Rarely! The Central Limit Theorem: Better Known as the Statistician's Full Employment Act. Consider a population with mean μ and variance σ2. As the sample size, n, approaches infinity, the distribution of y Z / n approaches the standard normal distribution. Bottom line: If n is sufficiently large, then y approximately follows a normal distribution with •μ • “standard error”  / n • Z represents the number of standard errors y lies from μ. What is the catch? What constitutes sufficiently large? If the parent population is normal, n = 1 If population is symmetric and the tails die out rapidly, then n = 3-5 is large enough. A classic example is the uniform distribution. Note: • The distribution is symmetric. • It does not have a unique peak. • When its tails die, they die! In this case, sample sizes of 6-12 are considered adequate for applying the Central Limit Theorem. As the parent distribution looks less and less normal, the sample size required to assume the Central Limit Theorem gets larger. Important point: When determining if the sample size is big enough, we need to look at the distribution for the parent population. In practice, what must we check to see whether the Central Limit Theorem applies? • Stem-and-Leaf displays • Normal Probability Plots 4. Normal Probability Plot The normal probability plot is a simple graphical tool for assessing if the data come close to following a normal distribution. Many software packages generate it automatically. If the data follow a normal distribution, the normal probability plot should look like a straight line. Significant deviations from the straight line suggest that the data are not “wellbehaved”. A reasonable question: How straight is straight? Many analysts use the “fat pencil” rule. For a suitably scaled plot, if we can cover the points with a fat pencil, the line is straight enough. Example: The Plate Porosities The Stem-and-Leaf Display Stem 78.•: 79.*: 79.t: 79.f: 79.s: Leaves 8 001 2233 5 7 No. 1 3 4 1 1 Depth 1 4 2 1 The Normal Probability Plot Quantiles of standard normal 2 1 0 -1 -2 78.8 78.9 79.0 79.1 79.2 79.3 79.4 79.5 79.6 79.7 y For a sample size of 10, we should feel reasonably comfortable assuming the Central Limit Theorem in this case. 5. Using the Central Limit Theorem Suppose the historic standard deviation for these porosities has been 0.25. Suppose further that the target porosity is 79.0. A reasonable question: What is the probability that we see a sample mean for 10 porosities ≥ 79.21 (the observed sample mean from our sample)? We seek P( y  79.21)  y   79.21    P( y  79.21)  P   / n   / n 79.21      P Z   / n   In our case μ = 79.0 σ = 0.25 n = 10 Thus, 79.21  79.0   P( y  79.21)  P Z   0.25 / 10    P( Z  2.66)  1  [ P( Z  2.66)]  1  .9961  .0039 Note: it is a very rare event to see an average of ten porosities greater than or equal to 79.21 when the true mean porosity is 79.0. But we actually observed an average of 79.21, which suggests that the true mean porosity, at least for the time period studied, is larger than 79.0. E. Random Behavior of Means, Variance Unknown 1. The Sample Variance When the variance, σ2 is known, the Central Limit Theorem suggests that y / n follows a standard normal distribution if $n$ is big enough. What would seem to be a logical thing to do when σ2 is unknown? ESTIMATE IT! Definition: The Sample Variance. Let y1, y2, …, yn be a random sample of n observations. \bigskip The sample variance, s2, is defined by 1 s   ( y  y) n 1 n 2 i 1 2 i Note: • s2 looks like an “average” In fact, it is the “average” squared deviation from y using n-1 instead of n in the denominator. • The reason for using n-1 will be discussed later. • s2 ≥ 0 The sample standard deviation, s, is s 2 The computational form of s2 is n y  ( y ) s  n(n  1) 2 n 2 n i 1 i i 1 2 i Example: Thicknesses of Silicon Wafers A major semiconductor manufacturer grinds wafers in batches of 31. For this particular product, suppose that the target thickness is 244 μm. A random sample yielded the following results: 240 243 250 253 248 The sample mean, sample variance, and the sample standard deviation are 1 y y n 1234   246.8 5 n i i 1 n y  ( y ) s  n(n  1) 5(304662)  (1234)  5(4)  27.7 2 s  s  27.7  5.263 2 n 2 n i 1 i i 1 2 i 2 2. The t Distribution Question: What distribution does y s/ n follow? If the data come from a parent distribution which follows a normal distribution, then y s/ n follows a t distribution with n-1 degrees of freedom 1. The t statistic represents the number of estimated standard errors a given value for y lies from its mean. 2. The t distribution is shorter, squatter version of the Z. 3. As n gets sufficiently large the tn-1 distribution is well approximated by the Z distribution. 4. The t statistic is well known to be “robust” to the normality assumption. In general, we feel comfortable using the t statistic whenever we sample from a “well-behaved” distribution. As the sample size get bigger, the parent distribution can be less and less wellbehaved. Example: Thicknesses of Silicon Wafers, Continued For this particular product, suppose that the target thickness is 244 μm. A random sample yielded the following results: 240 243 250 253 248 We already have found • y  246.8 • s2 = 27.7 • s = 5.263 Our t statistic is y s/ n 246.8  244   1.19 5.263 / 5 t This t value suggests that the observed sample mean is quite close to the target value. F. The Normal Approximation to the Binomial Distribution Recall the binomial distribution. • Y represents the number of “successes” in n trials. • p is the probability of a success on any given trial. • n is the total number of trials. • μ = E(Y) = np. • σ2 = np(1-p) = npq. It can be shown that as n gets large, the distribution of a standard normal. Y  np np(1  p) approaches Bottom line: If n is sufficiently large, the binomial distribution is well approximated by a normal. What is sufficiently large? General Rule of Thumb: n is sufficiently large if • np > 5, (the expected number of successes), and • n(1-p) > 5 (the expected number of failures). There is a slight catch. Let y0 be an integer. Consider P(Y = y0). Remember, Y follows a binomial distribution, which is discrete; therefore, P(Y = y0) > 0 for 0 ≤ y0 ≤ n. But P(Y = y0) = 0 for a normal random variable. What should we do? Recall my example of my height. To be 6’1” tall means that someone is between 6’0(1/2)” and 6’1(1/2)” tall. We shall do the same thing know. (Called a correction factor). Let Y* be a normally distributed random variable with • mean np • variance npq. Note: 1. Y* has the same mean and the same variance as Y, the original binomial random variable. 2. Y  np is a standard normal random variable. np(1  p) We can approximate P(Y = a) by 1 1  P(Y  a)  P a   Y  a   2 2  * Similarly, • P(Y  a )  PY  a  .5 * • P(Y  a)  PY  a  .5 * • P(Y  a )  PY  a  .5 * • P (Y  a )  PY  a  .5 * Example: Consider a production line of decorative bricks. Historically, the probability that any given brick is rejected is 0.01. Suppose an inspector examines 1000 bricks per day. What is the probability that she rejects less than 2 bricks? Let Y = number of defective bricks found. Note, we seek P(Y < 2). g P (Y  2)  P (Y  2  .5) *  Y  np 1.5  np    P  np(1  p )   np(1  p ) *  1.5  1000(0.01)    P Z  1000(0.01)(.99)    P ( Z  2.70)  .0035 What is the probability that she rejects between 8 and 13 bricks, inclusive? We seek P(8 ≤ Y ≤ 1) P (8  Y  13)  P (8  .5  Y  13  .5) *  7.5  np Y  np 13.5  np    P   np(1  p ) np(1  p )   np(1  p )  7.5  10 13.5  10    P Z 10(.99)   10(.99)  P (0.79  Z  1.11)  .8665  .2148  .6517 *

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download P(A | B)