Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability 3. Two sorts of random variables are of interest: DISCRETE: the number of outcomes is countable CONTINUOUS: the number of outcomes is infinite (not countable). Random variables are often described with probability distribution functions. These are graphs, tables or formula which allow for the computation of probabilities. 1 A few common Discrete probability distributions are: Uniform: P(Y=y) = 1/number of outcomes (all outcomes are equally likely) Binomial P(Y=y) = nCy * y * (1- )(n-y) where: nCy is the combination of n things taken y at the time n is the number of trials y is the number of successes is the probability of succeeding in one trial in each trial, the only outcomes are 2 success and failure (0,1). y e Poisson P(Y=y) = ; y! y=0,1,2,… (for example number of people waiting in line at a teller) = the population mean of Y. These are a few of many discrete distributions and are used when there are only a few possible outcomes: number of defects on a circuit board, number of tumors in a mouse, pregnant or not, dead or alive, number of accidents at an intersection and so on. 3 A useful Continuous probability distribution is the Normal. There are several others which we will use more (Students-t, F, 2), but they are all based upon the Normal. The Normal (N) is the ‘bell shaped curve’. If you identify the mean () of y (the random variable) along a number line, then N will be highest at symmetric about and (the mean, median, and mode are all the same). 4 The height of N at any value of y can be computed from the following function: f ( y) 1 2 e ( y )2 2 2 Where: is the population mean of y is the standard deviation. So, N can have any mean and variance. A probability cannot be computed for a single value of y, but for an interval (say y between a and b) by integrating f(y) over the interval a,b. Probabilities, then are defined as the area under the N curve between a and b. 5 For example: 15 second pulse rate has a mean of 20 and a standard deviation of 2. It is Normally distributed. What is the probability of selecting someone at random with a pulse rate greater than 22 ? (or, what percent of people have a pulse rate greater than 22 ?). Easy, integrate f(y) from 22 to ! It’s easy with a computer, but by hand it is a bit awkward. The process for computing this probability is to change from the pulse rate distribution to the standard Normal and look it up in a table. The standard Normal is called Z and has a mean of 0 and standard deviation of 1. A short hand notation is written: Z is N(O,1) while pulse rate is N(20,2) The solution to the problem above involves 5 steps. 6 Pulse rate probability example: 5 steps: area of interest 1. Draw a picture of the problem: 20 22 2. Express the problem in the following form (y represents pulse rate) P(22 < y) = p 3. Subtract the mean from each component inside the ( ). Where it’s a number, subtract the number, where a symbol, subtract the symbol. 4. Divide each component by the standard deviation, likewise. 22 - 20 y P P 1 z p 2 7 Pulse rate probability example: 5 steps continued: 5. Look up the value for z in the table in Appendix 2…. P( 1 < z ) = .1587 It isn’t always that easy… practice the following: 1. P( y < 18) = ? (.1587) 2. P( y < 22) = ? (.8413) 3. P( 18 < y < 22 ) = ? (.6826) 4. P( Y < 20 ) = ? (.5000) 8 Pulse rate probability example: another: Another interesting aspect of the problem is this: If pulse rate is N(20,2), above what value will 1/3 (33%) of people fall ? 1. Draw a picture: Area of interest = .33 20 ? 2. Express in the form(we need to solve for ?): P( y > ? ) = .33 3&4. Subtract mean and divide by standard deviation gives: ? 20 y - ? 20 P P z .33 2 2 9 Pulse rate probability example: another continued: 5. From Appendix table 2, we also know that: P( z > .44) = .33 so: ? 20 P z .33 P(z .44) 2 ? 20 which implies that : .44 2 so : ? 2(.44) 20 20.88 So, 33% of people have a pulse rate greater than 20.88 10 There is much more marvelous information about N in the text and the notes I recommend you look over these resources, work some problems and call if you have a question. All this begs the question, why so much interest in the Normal distribution? That leads to the next discussion, the sampling distribution for the mean and the Central Limit Theorem. Consider drawing repeated samples from the same population. All the samples are the same size and the mean is computed for each. Obviously, the means will not all be the same. END Probability 3. 11