Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 4: Commonly Used Distributions 1 Section 4.1: The Bernoulli Distribution We use the Bernoulli distribution when we have an experiment which can result in one of two outcomes. One outcome is labeled “success,” and the other outcome is labeled “failure.” The probability of a success is denoted by p. The probability of a failure is then 1 – p. Such a trial is called a Bernoulli trial with success probability p. 2 X ~ Bernoulli(p) For any Bernoulli trial, we define a random variable X as follows: If the experiment results in a success, then X = 1. Otherwise, X = 0. It follows that X is a discrete random variable, with probability mass function p(x) defined by: x 1 p p( x) P( X x) 1 p x0 0 otherwise 3 Figure 4.1 Bernoulli(p=0.5) Bernoulli(p=0.8) Examples 1 and 2 1. Toss a coin. If we define heads to be the success, then p is the probability that the coin comes up heads. For a fair coin, p = 1/2. 2. Select a component from a population of components, some of which are defective. Define success to be a defective component, then p is the proportion of defective components in the population. 3. Roll a six-sided die. Let success be when you roll a 4. 5 Mean and Variance If X ~ Bernoulli(p), then 1 E[ X ] x p( x) i 0 X = 0(1- p) + 1(p) = p 1 E[( X ) ] ( x ) p( x) 2 2 2 . i 0 (0 p) (1 p) (1 p) ( p) p(1 p) 2 X 2 2 6 Example 3 Ten percent of components manufactured by a certain process are defective. A component is chosen at random. Let X = 1 if the component is defective, and X = 0 otherwise. 1. What is the distribution of X? 2. Find the mean and variance of X? 7 Section 4.2: The Binomial Distribution If a total of n Bernoulli trials are conducted, and The trials are independent. Each trial has the same success probability p. X is the number of successes in the n trials. then X has the binomial distribution with parameters n and p, denoted X ~ Bin(n,p). In other words, X counts the number of successes in n independent trials (experiments) 8 Binomial R.V.: pmf, mean, and variance If X ~ Bin(n, p), the probability mass function of X is (derive): n! x n x p (1 p ) , x 0,1,..., n p( x) P( X x) x!(n x)! 0, otherwise Mean (derive): X = np Variance (derive): np(1 p) 2 X 9 More on the Binomial • Assume n independent Bernoulli trials are conducted. • Each trial has probability of success p. • Let Y1, …, Yn be defined as follows: Yi = 1 if the ith trial results in success, and Yi = 0 otherwise. (Each of the Yi has the Bernoulli(p) distribution.) • Now, let X represent the number of successes among the n trials. So, X = Y1 + …+ Yn . This shows that a binomial random variable can be expressed as a sum of Bernoulli random variables. It also helps us calculate the mean and variance of a Binomial RV 10 Figure 4.2 X ~ Bin(10,0.4) X ~ Bin(20,0.1) Example 6 A large industrial firm allows a discount on any invoice that is paid within 30 days. Of all invoices, 10% receive the discount. In a company audit, 12 invoices are sampled at random. What is the probability that fewer than 4 of the 12 sampled invoices receive the discount? • Assume that invoices are independent of one another 12 Another Use of the Binomial Assume that a finite population contains items of two types, successes and failures, and that a simple random sample is drawn from the population. Then if the sample size is no more than 5% of the population, the binomial distribution may be used to model the number of successes. 13 Example 5 A lot contains several thousand components, 10% of which are defective. 15 components are sampled from the lot. Let X represent the number of defective components in the sample. • What is the distribution of X? • How many components do we expect to be defective? • What is that standard deviation of the number of defective components? • A lot is only accepted if no defects are found in the inspected items. How many components should be inspected so that the chance of accepting a lot when the lot contains 10% defective items is less than 5%? 14 Estimate of p To estimate p, conduct n independent trials and let X = the number of success. If X ~ Bin(n, p), then the sample proportion pˆ X / n is used to estimate the (true) success probability p. Bias is the difference: pˆ p An estimator is unbiased if: E[ bias ] = 0 E[estimator - parameter estimating] = 0 or E[estimator] = parameter estimating 15 Estimate of p Is the sample proportion pˆ X / n an unbiased estimator of p? p E[ pˆ ] (derive) p The uncertainty (standard error) in p̂ is: (derive) pˆ p (1 p ) . n In practice, when computing , we substitute p̂ for p, since p is unknown. 16 Example 7 In a sample of 100 newly manufactured automobile tires, 7 are found to have minor flaws on the tread. If four newly manufactured tires are selected at random and installed on a car, estimate the probability that none of the four tires have a flaw, and find the uncertainty in this estimate. 17 Section 4.3: The Poisson Distribution One way to think of the Poisson distribution is as an approximation to the binomial distribution when n is large and p is small. We can approximate the binomial mass function with a quantity λ = np; this λ is the parameter in the Poisson distribution. x n x e n x lim p (1 p) , x 0,1, n x x ! np 18 Poisson R.V.: pmf, mean, and variance If X ~ Poisson(λ), the probability mass function of X is e x , for x = 0, 1, 2, ... p ( x) P( X x) x ! 0, otherwise Mean: X = λ Variance: X2 Note: X must be a discrete random variable and λ must be a positive constant. 19 Figure 4.3 Poisson(1) Poisson(10) Example 8 Particles are suspended in a liquid medium at a concentration of 6 particles per mL. A large volume of the suspension is thoroughly agitated, and then 3 mL are withdrawn. What is the probability that exactly 15 particles are withdrawn? • is thought of as a rate, the rate that event X occurs per some unit, on average (e.g. 6 particles per mL). • However, to choose the value of lambda properly, we must take into account the units of the question. Here, the question is, P(15 particles in 3 mL) • So, what is the average rate of particles per 3 mL? 21 Example 9 A machine randomly drops Hershey’s Kisses into a crate of 50 Easter baskets. How many kisses should the machine drop so that the probability of selecting a random basket without any kisses is 0.03 or less? • What is probability question? • What is the rate? 22 Section 4.4: Some Other Discrete Distributions • Consider a finite population containing two types of items, which may be called successes and failures. • A simple random sample is drawn from the population. • Each item sampled constitutes a Bernoulli trial. • As each item is selected, the probability of successes in the remaining population decreases or increases, depending on whether the sampled item was a success or a failure. • This scenario is sometimes called Sampling without replacement. 23 Hypergeometric • The trials are not independent, so the number of successes in the sample does not follow a binomial distribution. • The distribution that properly describes the number of successes is the hypergeometric distribution. 24 Hypergeometric pmf Assume a finite population contains N items, of which R are classified as successes and N – R are classified as failures. Assume that n items are sampled from this population, and let X represent the number of successes in the sample. Then X has a hypergeometric distribution with parameters N, R, and n, which can be denoted X ~ H(N, R, n). The probability mass function of X is R N R x n x , if max(0, R n N ) x min(n, R) p( x) P( X x) N n 0, otherwise 25 Mean and Variance If X ~ H(N, R, n), then nR Mean of X: X N R N n R Variance of X: n 1 N N N 1 2 X 26 Example 10 • Of 50 buildings in an industrial park, 12 have electrical code violations. If 10 buildings are selected at random for inspection, what is the probability that exactly 3 of the 10 have code violations? What is the mean and variance of X? • Pick 5 cards from a standard deck, (a) what is distribution for the number of 8’s that are drawn? (b) What is the probability of drawing exactly two aces? • There are 5 cokes, 5 sprites, 3 pepsi’s, and 2 dr.pepper’s in a cooler. Pick 3 drinks at random, (a) what is distribution for the number of dr. peppers that are drawn? (b) What is the probability of drawing exactly one dr. pepper? 27 Geometric Distribution • Assume that a sequence of independent Bernoulli trials is conducted, each with the same probability of success, p. • Let X represent the number of trials up to and including the first success. • Then X is a discrete random variable, which is said to have the geometric distribution with parameter p. • We write X ~ Geom(p). 28 Geometric R.V.: pmf, mean, and variance If X ~ Geom(p), then The pmf of X is p(1 p) x 1 , x 1,2,... p ( x) P( X x) 0, otherwise The mean of X is 1 X p . 1 p The variance of X is 2 . p 2 X 29 Example 11 Products are inspected on an assembly line. It is known that approximately 10% of products have a defect. What is the probability that the first defect occurs on the 5th inspection. 1. Define a RV 2. What is the distribution of X? 3. Answer the question 30 Section 4.8: The Uniform Distribution The uniform distribution has two parameters, a and b, with a < b. If X is a random variable with the continuous uniform distribution then it is uniformly distributed on the interval (a, b). We write X ~ U(a,b). The pdf is 1 , a xb f ( x) b a 0, otherwise 31 Mean and Variance If X ~ U(a, b). Then the mean is (derive): ab μX 2 and the variance is (derive): (b a ) σ . 12 2 2 X 32 Example 19 When a motorist stops at a red light at a certain intersection, the waiting time for the light to turn green, in seconds, is uniformly distributed on the interval (0, 30). Find the probability that the waiting time is between 10 and 15 seconds. 33 Section 4.5: The Normal Distribution The normal distribution (also called the Gaussian distribution) is by far the most commonly used distribution in statistics. This distribution provides a good model for many, although not all, continuous populations. The normal distribution is continuous rather than discrete. The mean of a normal population may have any value, and the variance may have any positive value. 34 Normal R.V.: pdf, mean, and variance The probability density function of a normal population with mean and variance 2 is given by 1 f ( x) e ( x ) / 2 , x 2 2 2 If X ~ N(, 2), then the mean and variance of X are given by X X2 2 35 68-95-99.7% Rule This figure represents a plot of the normal probability density function with mean and standard deviation . Note that the curve is symmetric about , so that is the median as well as the mean. It is also the case for the normal population. About 68% of the population is in the interval . About 95% of the population is in the interval 2. About 99.7% of the population is in the interval 3. 36 Standard Units •The proportion of a normal population that is within a given number of standard deviations of the mean is the same for any normal population. •For this reason, when dealing with normal populations, we often convert from the units in which the population items were originally measured to standard units. •Standard units tell how many standard deviations an observation is from the population mean. 37 Standard Normal Distribution In general, we convert to standard units by subtracting the mean and dividing by the standard deviation. Thus, if x is an item sampled from a normal population with mean and variance 2, the standard unit equivalent of x is the number z, where z = (x - )/. The number z is sometimes called the “z-score” of x. The z-score is an item sampled from a normal population with mean 0 and standard deviation of 1. This normal distribution is called the standard normal distribution. 38 Example 13 Aluminum sheets used to make beverage cans have thicknesses that are normally distributed with mean 10 and standard deviation 1.3. A particular sheet is 10.8 thousandths of an inch thick. Find the z-score. 39 Example 13 cont. The thickness of a certain sheet has a z-score of -1.7. Find the thickness of the sheet in the original units of thousandths of inches. 40 Finding Areas Under the Normal Curve The proportion of a normal population that lies within a given interval is equal to the area under the normal probability density above that interval. This would suggest integrating the normal pdf, but this integral does not have a closed form solution. So, the areas under the curve are approximated numerically and are available in Table A.2. (See website link for the normal table)This table provides area under the curve for the standard normal density. We can convert any normal into a standard normal so that we can compute areas under the curve. The table gives the area in the left-hand tail of the curve. Other areas can be calculated by subtraction or by using the fact that the total area under the curve is 1. 41 Example 14 Find the area under normal curve to the left of z = 0.47. Find the area under the curve to the right of z = 1.38. 42 Example 15 Find the area under the normal curve between z = 0.71 and z = 1.28. What z-score corresponds to the 75th percentile of a normal curve? 43 Example 15b SAT Math scores are Normally distributed with a mean of 550 and standard deviation 27. a) What is the probability that a student scores less than 610? More than 500? Between 510 and 610? b) Suppose administrators want to use the SAT score to determine if a student is “at risk”. At risk, are students in the lower 7th percentile. What scores should be used as a cut-off for admission to the “at risk” group. c) If you select 10 people at random, what is the probability that exactly 3 people scored 610 or higher? 44 Estimating the Parameters If X1,…,Xn are a random sample from a N(,2) distribution, is estimated with the sample mean and 2 is estimated with the sample variance. As with any sample mean, the uncertainty in X is / n which we replace with s / n , if is unknown. The mean is an unbiased estimator of . 45 Linear Functions of Normal Random Variables Let X ~ N(, 2) and let a ≠ 0 and b be constants. Then aX + b ~ N(a + b, a22). Let X1, X2, …, Xn be independent and normally distributed with means 1, 2,…, n and variances σ12 , σ 22 ,K , σ n2 . Let c1, c2,…, cn be constants, and c1 X1 + c2 X2 +…+ cnXn be a linear combination. Then c1 X1 + c2 X2 +…+ cnXn ~ N(c11 + c2 2 +…+ cnn, ,c12σ12 c22σ 22 L cn2σ n2 ) 46 Example 16 A chemist measures the temperature of a solution in oC. The measurement is denoted C, and is normally distributed with mean 40oC and standard deviation 1oC. oF by the equation F = The measurement is converted to to oF 1.8C + 32. What is the distribution of F? What is the probability that a randomly selected sample is less than 104.5 oF 47 Distributions of Functions of Normals Let X1, X2, …, Xn be independent and normally distributed with mean and variance 2. Then σ2 X ~ N μ, . n Let X and Y be independent, with X ~ N(X, 2 Y ~ N(Y, σY ). Then σ X2 ) and X Y ~ N ( μ X μY , σ σ ) 2 X 2 Y X Y ~ N ( μ X μY , σ σ ) 2 X 2 Y 48 Example 16b A length of board is Normally distributed with mean 20 cm and standard deviation 4 cm. Select 16 boards at random. What is the probability distribution for the average length of these 16 boards? What is the probability that the average board length is more than 21.3 cm? 49 Example 16c A length of board that is Normally distributed with mean 20 cm and standard deviation 4 cm is joined endto-end with another board that is Normally distributed with mean 30 cm and standard deviation 3 cm. Find the probability that the total length of such a component is less than 45 cm. 50 Section 4.7: The Exponential Distribution The exponential distribution is a continuous distribution that is sometimes used to model the time that elapses before an event occurs. Such a time is often called a waiting time. The probability density of the exponential distribution involves a parameter, which is a positive constant λ whose value determines the density function’s location and shape. We write X ~ Exp(λ). 51 Exponential R.V.: pdf, cdf, mean and variance The pdf of an exponential r.v. is e x , x 0 f ( x) . 0, otherwise The cdf of an exponential r.v. is 0, x 0 F ( x) . x 1 e , x 0 The mean of an exponential r.v. is X 1/ . The variance of an exponential r.v. is 1/ . 2 X 2 52 Figure 4.17 Example 17 The lifetime of a component (in months) is exponentially distributed with lambda=0.2. 1. What is the probability that a randomly select component lasts longer that 8 months? 2.Between 4 and 10 months? 3.What is the mean lifetime? 4.What is the standard deviation of the lifetime? 5.What is the 3rd quartile of lifetimes? 54 Example 17b The time between trains is exponentially distributed with mean 5 minutes. A train just departed, what is the probability that the next train arrives in less than a minute? 1. What is the probability that a randomly select component lasts longer that 8 months? 2.Between 4 and 10 months? 3.What is the mean lifetime? 4.What is the standard deviation of the lifetime? 5.What is the 3rd quartile of lifetimes? 55 Lack of Memory Property Interesting, but not responsible for. The exponential distribution has a property known as the lack of memory property: If T ~ Exp(λ), and t and s are positive numbers, then P(T > t + s | T > s) = P(T > t). Suppose the time between trains is exponentially distributed with mean 5 minutes. Given that you have waited for 3 minutes, what is the probability that takes less than a minute from now to arrive? The answer is that it is simply the probability that the train arrives in less than a minute. In other words, it doesn’t matter how long you have waited. 56 Section 4.11: The Central Limit Thereom The Central Limit Theorem Let X1,…,Xn be a random sample from a population with mean and variance 2 . Let X X1 L X n be the sample mean. n Let Sn = X1+…+Xn be the sum of the sample observations. Then if n is sufficiently large, 2 X ~ N , n and S n ~ N (n , n ) approximately. 2 For most populations, if the sample size is greater than 30, the Central Limit Theorem approximation is good. 57 Example 22 The manufacturer of a certain part requires two different machine operations. The time on machine 1 has mean 0.4 hours and standard deviation 0.1 hours. The time on machine 2 has mean 0.45 hours and standard deviation 0.15 hours. The times needed on the machines are independent. Suppose that 65 parts are manufactured. What is the distribution of the total time on machine 1? On machine 2? What is the probability that the total time used by both machines together is between 50 and 55 hours? 58