Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Normal Probabilities, Standardized Values Computers provide an easy means for calculating probabilities from a variety of probability distributions. In the "old" days (and "old" is in quotes because this method is still in textbooks), tables of computed values were used. This old technique, it turns out, still has some value, and so in this "tutorial", we'll go through both techniques. Using R to compute normal probabilities Sample Problem: Height of adult humans is "known" to be normally distributed, with a mean of 60" and a SD of 3". Suppose we randomly select an adult human. Find the probability that this person is taller than 64". Notation: We could define a random variable X to be the height of this randomly selected person. We are then asking to find P(X > 64). All probability calculations use a few simple facts. • First, since P(-infty < X < + infty) = 1 (I'm using "infty" to represent infinity), in other words, X has to be some real number, then P(X < a) + P(X >= a) = 1. • Hence P(X > a) = 1 - P(X <= a). •This also implies P(a < X < b) = P(X < b) - P(X < a). If you think of P(a < X < b) as representing the area under a curve between the limits a and b, this should make geometric sense. •Finally, for continuous distributions, P(X = a) = 0. And so P(X < a) = P(X <= a). If you type help("pnorm") into R, you will see that R has four functions dealing directly with the normal distribution. The one we're interested in here is pnorm. This is how it works: pnorm(a, mu, sd) = P(X < = a) when X follows a normal distribution with mean mu and SD sd. Hence P(X > 64) = 1 - P(X < 64) = 1 - pnorm(64,60,3) = 1-.9087888 = 0.0912112 Also, we could easily calculate P(57 < X < 63) = pnorm(63, 60,3) - pnorm(57,60,3) = 0.6827. We can also solve the "inverse" problem. That is, given that the distribution of X is N(60,3) -- a normal distribution with mean 60 and SD 3 -- find x such that: P(X < x) = 0.05. The value of x that solves this equation is called a quantile. R computes these with the qnorm command: > qnorm(.05, 60,3) [1] 55.06544 Note that we've now used 3 of the four commands given in the "help" menu: rnorm we used in class to generate random numbers from a normal distribution, pnorm computes probabilities, and qnorm quantiles. The command dnorm gives you the value of the density curve. Thus, you can plot the normal density function with these commands: > x <- seq(-4,4,by=.1) > plot(x,dnorm(x)) > lines(x,dnorm(x)) The "lines" command superimposes the smooth line onto the dots. If you wanted the plot without the individual points show, you must type: plot(x,dnorm(x),type="n") lines(x,dnorm(x)) One last word: R uses the same format for many other probability distribution functions. So if, for example, you wanted to work with the exponential function, you could type: help.search("exponential") which would tell you to do a help search on the word "Exponential" help("Exponential") which in turn would show you the commands dexp, pexp, qexp, and rexp. (If you had typed help("exponential") -- note the lower-case e -- you would have seen: > help("exponential") Error in help("exponential") : No documentation for `exponential' in specified packages and libraries: you could try `help.search("exponential")' The "Old School" Method The old-school method uses tabled values. Amost every statistics and probability textbook has at least one such table in the back. Even with computers, they are still very useful. This technique makes use of this fact: Linear transformations of a normally distributed random variable are still normal. Remember also the rules for the expectation and standard deviation of linear transformations of random variables: If X is a RV and a and b are constants: E(aX + b) = aE(X) + b SD(aX + b) = aSD(X) Putting these three rules together allows us to use one table to find all normal probabilities. The trick is to convert your normal random variable to a "standard" normal random variable. A standard normal random variable is one for which the mean is 0 and the SD is 1. Suppose X has the distribution N(60, 3). Then define Z = (X - 60)/3. E(Z) = E((X - 60)/3) = (1/3) * ( E(X) - 60) = (1/3) * (60 - 60) = 0. SD(Z) = (1/3) SD(X) = (1/3) * 3 = 1. Also note, by algebra, that P(X < a) = P(X - 60 < a - 60) = P( (X - 60)/3 < (a - 60)/3) = P(Z < (a -60)/3). The upshot is this rule: If X is N(mu, sd), to calculate P(X < a), just standardize everything and calculate: P(Z < (a - mu)/sd). These standard normal variables: Z = (X - mu)/sd are very useful in other contexts because they have a nice meaning: they measure how many standard units away from the mean the variable X is. Say, for example, that my height in standard units is 1.0. This means I'm 1 SD above average. If average is 60, and a SD is 3, I can conclude then that I'm 63 inches tall. Why use standard units and not the original units in which a variable was measured? Because standard variables are "unitless", and sometimes there's an advantage to this. To use a famous example: Suppose you scored 60 on the first midterm, and 75 on the second? On which midterm did you do best? The naive answer is to say it was on the second, because your score was higher. But if you're graded on a curve then what matters is how you did relative to others. So if I tell you that the mean on both midterms was 50, but the SD on midterm 1 was 5 and on midterm it was 20, then the situation is different. On the first midterm you scored 2 SDs above average. On the second you scored only (75 - 50)/20 = 1.25 SDs above average. You did better on the first midterm. This is a very general technique for comparing two variables that are measured in different units. For example, do tall people tend to be heavier? It makes no sense to compare height to weight since the units are different. But if you standardize first, you remove the units and a comparison is possible. This is exactly what the correlation coefficient does, as we shall see later. Finally, note that you can go "backwards". That is, if Z follows a N(0,1) distribution, then X = mu + SD * Z follows a N(mu, sd) distribution. Homework Find each of the probabilities. First by using a table in the back of a book, and second by using R. 1. X represents the observed weight of a penny, measured on a particular scale. It follows a normal distribution with mean 1 gram, SD = 0.001 grams. a) P(X > 1.0009) b) P(X < 1.015) c) P(0.997 < X < 1.003) d) Suppose you're told that the probability of seeing a certain weight less than or equal to x is 99.9%. What is x? 2. The "empirical rule" as stated in class goes something like this: about 68% of your observations will be within 1 SD of the mean, about 95% within 2 SDs, and about 99.7% within 3. Another way to state the empirical rule is to say: most distributions are approximately normal. To see this, let Z be a N(0,1) random variable and find: a) P(-1 < Z < 1) b) P(-2 < Z < 2) c) P(-3 < Z < 3)