Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Distributions A variable (A, B, x, y, etc.) can take any of a specified set of values. When the value of a variable is the outcome of a statistical experiment, that variable is a random variable. Generally, statisticians use a capital letter to represent a random variable and a lower-case letter, to represent one of its values. For example, let x represents the random variable X. Then P(x) represents the probability of X. P(X = x) refers to the probability that the random variable X is equal to a particular value, denoted by x. As an example, P(X = 1) refers to the probability that the random variable X is equal to 1. All the probabilities must be between 0 and 1; 0≤ P(X=x)≤ 1. The sum of the probabilities of the outcomes must be 1. ∑ P(X=x)=1 Probability Distributions Discrete Probability Distributions Binomial Poisson Continuous Probability Distributions Normal All possible outcomes of an experiment comprise a set that is called the sample space. We are interested in some numerical description of the outcome. For example, when we toss a coin 3times, and we are interested in the number of heads that fall, then a numerical value of 0,1,2,3 will be assigned to each sample point. They may be thought of as the values assumed by some random variable x, which in this case represents the number of heads when a coin is tossed 3 times. So we could write x1 = 0, x2 = 1, x3 = 2 and x4 = 3. Example 1 Suppose you flip a coin two times. This simple statistical experiment can have four possible outcomes: S={HH, HT, TH, TT} Now, let the variable X represent the number of Heads that result from this experiment. The variable X can take on the values 0, 1, or 2. In this example, X is a random variable; because its value is determined by the outcome of a statistical experiment. The probability distribution of the given experiment is given by, Number of heads 0 1 2 Probability 0.25 0.50 0.25 A cumulative probability refers to the probability that the value of a random variable falls within a specified range. What is the probability that the coin flips would result in one or fewer heads? It would be the probability that the coin flip experiment results in zero heads plus the probability that the experiment results in one head. P(X < 1) = P(X = 0) + P(X = 1) = 0.25 + 0.50 = 0.75 A cumulative probability distribution is given by Number of heads: x Probability: P(X = x) Cumulative Probability: P(X < x) 0 0.25 0.25 1 0.50 0.75 2 0.25 1.00 Example 2 Suppose a die is tossed. What is the probability that the die will land on 5 ? When a die is tossed, there are 6 possible outcomes represented by: S = { 1, 2, 3, 4, 5, 6 }. Each possible outcome is a random variable (X), and each outcome is equally likely to occur. Therefore, the P(X = 5) = 1/6. Example 3 Suppose we repeat the dice tossing experiment described in Example 2. This time, we ask what is the probability that the die will land on a number that is smaller than 5 ? This problem involves a cumulative probability. The probability that the die will land on a number smaller than 5 is equal to: P( X < 5 ) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) = 1/6 + 1/6 + 1/6 + 1/6 = 2/3 A binomial experiment is a statistical experiment that has the following properties: The random experiment consists of n identical trials. Each trial can result in one of two outcomes, which we denote by success, S or failure, F. The trials are independent. The probability of success is constant from trial to trial, we denote the probability of success by p and the probability of failure is equal to (1 - p) = q. Examples: 1. No. of getting a head in tossing a coin 10 times. 2. No. of getting a six in tossing 7 dice. 3. A firm bidding for contracts will either get a contract or not 7 A binomial experiment consist of n identical trial with probability of success, p in each trial. The probability of x success in n trials is given by P( X x) nCx p x q n x x = 0, 1, 2, ......, n The Mean and Variance of X If X ~ B(n,p), then Mean : E ( X ) np Variance : Std Deviation : 2 V ( X ) np(1 p) npq npq where n is the total number of trials, p is the probability of success and q is the probability of failure. 8 Example 4 Given that X ~ b(12, 0.4), find a) P ( X 2) b) P ( X 3) c) P ( X 4) d) P (2 X 5) e) E( X ) f) Var( X ) Answer a) P ( X 2) 12C2 (0.4) 2 (0.6)10 0.0639 b) P ( X 3) 12C3 (0.4)3 (0.6)9 0.1419 9 c) P ( X 4) 12C4 (0.4) 4 (0.6)8 0.2128 d) P (2 X 5) P ( X 2) P( X 3) P ( X 4) 0.0639 0.1419 0.2128 =0.4185 e) E ( X ) = np = 12(0.4) =4.8 f) Var ( X ) 2 = npq = 12(0.4)(0.6) = 2.88 10 provided in the tables are in the cumulative form, the following guidelines can be used: P X x Example 5 Example 6 Exercises A machine produces parts of which 5% are defective. If a random sample of ten parts produced by this machine contains more than one defective part, the machine is shut down for repairs. Find the probability that the machine will be shut down for repairs based on this sampling plan. (answer: 0.0861) According to the USA Snapshot ® “Knowing drug addicts,” 45% of Americans know somebody who became addicted to a drug other than alcohol. Assuming this to be true, what is the probability that out of a group of 30 randomly selected Americans: a. exactly 15 know somebody who became addicted to a drug? (answer: 0.124) b. at most 15 know somebody who became addicted to a drug? (answer: 0.769) c. more than 15 know somebody who became addicted to a drug? (answer: 0.231) d. between 10 and 15 know somebody who became addicted to a drug? (answer: 0.70) 6. Suppose that you take a five-question multiple-choice quiz by guessing. Each question has possible answers a, b, c, d and only one is correct. a. What is the probability that you guess more than half of the answer correctly? (answer: 0.104) a. What is the probability that the first question is correct if quessing? (answer: 0.25) 1) P(x > 1) = P(x ≥ 2) = 1 - binomcdf(10,.05,1) ≈ 1-0.9139 ≈ 0.0861 2) (a) P(x = 15) = binompdf(30,.45,15) ≈ 0.12425 ≈ 0.124 (b) P(x ≤ 15) = binomcdf(30,.45,15) ≈ 0.76909 ≈ 0.769 (c) P(x ≥ 16) = 1 - binomcdf(30,.45,15) ≈ 0.23091 ≈ 0.231 (d) P(10 ≤ x ≤ 15) = binomcdf(30,.45,15) - binomcdf(30,.45,9) ≈0.69968 ≈0.700 3) (a) P(x ≥ 3) = 1 - binomcdf(5,.25,2) ≈ 0.10352 ≈ 0.104 (b) P(Answer 1st Question by guessing) = 1/4 = 0.25 A random variable X has a Poisson distribution and it is referred to as a Poisson random variable if and only if its probability distribution is given by e x P( X x) for x 0,1, 2,3,... x! is the long run mean number of events for a specific time or space dimension of interest. Space can be dimensions, place or time or combination of them. A random variable X having a Poisson distribution can also be written as X ~ Po ( ) with E ( X ) and Var ( X ) Examples: cars passing a toll booth in one hour. defects in a square meter of fabric No. of network error experienced in a day. 1 8 Example 6 Given that X ~ Po (4.8) , find a) P( X 0) b) P( X 9) c) P( X 1) Answer e 4.8 4.80 a) P ( X 0) 0.0082 0! e 4.8 4.89 b) P( X 9) 0.0307 9! c) 1 P ( X 0) 1 0.0082 = 0.9918 1 9 Example 7 Suppose that the number of errors in a piece of software has a Poisson distribution with parameter 3 . Find a) the probability that a piece of software has no errors. b) the probability that there are three or more errors in piece of software . c) the mean and variance in the number of errors. Answer e 3 30 a) P( X 0) 0! e3 0.050 b)P( X 3) 1 P( X 0) P( X 1) P( X 2) e 3 30 e 3 31 e 3 32 1 0! 1! 2! 1 3 9 1 e3 1 1 2 1 0.423 0.577 20 Example 8 use 2.4 4 9.6 Exercise The number of industrial injuries per working week in a particular factory is known to follow a Poisson distribution with mean 0.5. Find the probability that (a) in a particular week there will be: (i) less than 2 accidents (Answer: 0.9098) (ii) more than 2 accidents (Answer: 0.0144) (b) in a three week period there will be no accidents. (Answer: 0.223) Example 8 Exercise On the average, 1 computer in 800 crashes during a severe thunderstorm. A certain company had 4,000 working computers when the area was hit by a severe thunderstorm. a) Compute the expected value and variance of the number of crashed computers. (5, 4.994) b) Compute the probability that less than 10 computers crashed. (0.968) c) Compute the probability that exactly 10 computers crashed. (0.018) Let X be the number of crashed computers. This is the number of ”successes” (crashed computers) out of 4,000 ”trials” (computers), with the probability of success 1/800. Thus, it has Binomial distribution with parameters n=4000 and p=1/800. A continuous variable involves a measurement of something, such as the height of a person, the weight of a newborn baby, or the length of time a car battery lasts. The probabilities are presented by the areas under the continuous curves (probability densities or continuous distribution). Probability densities are characterized by the fact that the area under the curve between any two values a and b gives the probability that a random variable having this continuous distribution will take on a value on the interval from a to b. The total area under the curve must equal to one. A continuous random variable X is said to have a normal distribution with parameter and 2, where and 2 0 with probability density function is 1 f ( x) e 2 1 x 2 2 x If X ~ N ( , 2 ) then E X and V X 2 ‘Bell Shaped’ Symmetrical Mean, Median and Mode are equal Location is determined by mean, μ. Spread is determined by the standard deviation, σ . Rules of Data Dispersion The Standard Normal curve, shown here, has mean 0 and standard deviation 1. If a dataset follows a normal distribution, based on empirical rule 1. 68% of the observations will lie within one standard deviation of the mean . 2. 95% of the observations will lie within two standard deviation of the mean 2 . 3. 99.7% of the observations will lie within three standard deviation of the mean 3 . By varying the parameters μ and σ, there are infinitely many different normal distributions. Standardizing converts all normal distributions to the standard normal distribution. Standard normal distribution The normal distribution with parameters 0 and 2 1 is called a standard normal distribution. A random variable that has a standard normal distribution is called a standard normal random variable and denoted by Z ~ N (0,1) If x is a random normal variable with E x normal random variable is defined as Z and V x . The standard 2 x with E z 0 and V z 1 . The total area under the standard normal curve is 1. The standard normal curve is symmetric about 0. Almost all the area under the standard normal curve lies between -3 and 3. 29 Example 9 Example 10 Determine the probability or area for the portions of the Normal distribution described. a) P (0 Z 0.45) b) P (2.02 Z 0) c) P ( Z 0.87) d) P (2.1 Z 3.11) e) P (1.5 Z 2.55) 32 Answer a) P(0 Z 0.45) = 0.1736 b) P(2.02 Z 0) = 0.47831 c) P( Z 0.87) 0.5 0.3078 0.8078 d) e) 33 Example 10 Example 11 Answer Determine Z such that a) P ( Z 0.6745) 0.25 a) P( Z Z ) 0.25 b) P ( Z 0.3585) 0.36 b) P( Z Z ) 0.36 c) P( Z Z ) 0.983 d) P( Z Z ) 0.89 c) P ( Z 2.1201) 0.983 d) P ( Z 1.2265) 0.89 35 Example 12 Suppose X is a normal distribution N(25,25). Find a) P(24 X 35) b) P( X 20) Answer 24 − 25 35 − 25 a) 𝑃(24 < 𝑋 ≤ 35) = 𝑃 <𝑍≤ 5 5 = 𝑃(−0.2 < 𝑍 ≤ 2) = 0.0793 + 0.4772 = 0.5565 20 − 25 b) 𝑃(𝑋 ≥ 20) = 𝑃 𝑍 ≥ 5 = 𝑃(𝑍 ≥ −1) = 0.5 + 0.3413 = 0.8413 36 When the number of observations or trials n in a binomial experiment is relatively large, the normal probability distribution can be used to approximate binomial probabilities. A convenient rule is that such approximation is acceptable when n 30, and both np 5 and nq 5. Definition Given a random variable X ~ b(n, p), if n 30 and both np 5 and nq 5, then X ~ N ( np, npq) X np with Z npq 38 The continuous correction factor needs to be made when a continuous curve is being used to approximate discrete probability distributions. 0.5 is added or subtracted as a continuous correction factor according to the form of the probability statement as follows: c .c a) P( X x) P( x 0.5 X x 0.5) c .c b) P( X x) P( X x 0.5) c .c c) P( X x) P( X x 0.5) c .c d) P( X x) P( X x 0.5) c .c e) P( X x) P( X x 0.5) c.c continuous correction factor 39 Example In a certain country, 45% of registered voters are male. If 300 registered voters from that country are selected at random, find the probability that at least 155 are males. 40 Solutions X is the number of male voters. X ~ b(300, 0.45) c .c P ( X 155) P( X 155 0.5) P( X 154.5) np 300(0.45) 135 5 nq 300(0.55) 165 5 154.5 300(0.45) 154.5 135 PZ P Z 300(0.45)(0.55) 74.25 P( Z 2.26) 0.01191 41 Suppose that 5% of the population over 70 years old has disease A. Suppose a random sample of 9600 people over 70 is taken. What is the probability that fewer than 500 of them have disease A? When the mean of a Poisson distribution is relatively large, the normal probability distribution can be used to approximate Poisson probabilities. A convenient rule is that such approximation is acceptable when 10. Definition Given a random variable X ~ Po ( ), if 10, then X ~ N ( , ) with Z X 43 Example A grocery store has an ATM machine inside. An average of 5 customers per hour comes to use the machine. What is the probability that more than 30 customers come to use the machine between 8.00 am and 5.00 pm? 44 Solutions X is the number of customers come to use the ATM machine in 9 hours. X ~ Po (45) 45 10 X ~ N (45, 45) c .c P( X 30) P ( X 30 0.5) P ( X 30.5) 30.5 45 PZ P( Z 2.16) 45 0.98461 45