Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Outline Continuous Random Variables Normal Distribution The Normal Distribution∗ Alan T. Arnholt Department of Mathematical Sciences Appalachian State University [email protected] Spring 2006 R Notes ∗ 1 c 2006 Alan T. Arnholt Copyright The R Script Outline Continuous Random Variables Normal Distribution Continuous Random Variables Overview of Continuous Random Variables Normal Distribution Overview of Normal Distribution The R Script 2 The R Script Outline Continuous Random Variables Normal Distribution The R Script Continuous Random Variable Recall that discrete random variables could only assume a countable number of outcomes. When we have a random variable whose set of possible values is an entire interval of numbers, we say that X is a continuous random variable. For example, if we randomly select a 12 ounce can of beer and measure its actual fluid contents X, then X is a continuous random variable because any value for X between 0 and the capacity of the beer can is possible. 3 Outline Continuous Random Variables Normal Distribution The R Script Properties of Continuous Random Variables Continuous Probability Density Functions’ Properties The function f (x) is a pdf for the continuous random variable X, defined over the set of real numbers R if, 1. f (x) ≥ 0, −∞ < x < ∞. 4 Outline Continuous Random Variables Normal Distribution The R Script Properties of Continuous Random Variables Continuous Probability Density Functions’ Properties The function f (x) is a pdf for the continuous random variable X, defined over the set of real numbers R if, 1. f (x) ≥ 0, −∞ < x < ∞. 2. Z∞ f (x) dx = 1. (The total area under the probability density −∞ curve is 1.00, which corresponds to 100%.) 5 Outline Continuous Random Variables Normal Distribution The R Script Properties of Continuous Random Variables Continuous Probability Density Functions’ Properties The function f (x) is a pdf for the continuous random variable X, defined over the set of real numbers R if, 1. f (x) ≥ 0, −∞ < x < ∞. 2. Z∞ f (x) dx = 1. (The total area under the probability density −∞ curve is 1.00, which corresponds to 100%.) Zb 3. P(a ≤ X ≤ b) = f (x) dx. (Area under the density curve a between a and b.) 6 Outline Continuous Random Variables Normal Distribution The R Script Graphical Illustration of Continuous Distribution P(a ≤ X ≤ b) P(X ≤ b) f (x) Rb a a b f (x) dx P(X ≤ a) f (x) Rb −∞ b f (x) dx f (x) Ra a f (x) dx −∞ Figure: Graphical illustration of P(a ≤ X ≤ b) = P(X ≤ b) − P(X ≤ a). 7 Outline Continuous Random Variables Normal Distribution The R Script Normal Distribution • The normal or Gaussian distribution is more than likely the most important distribution in statistical applications. This is due to the fact that many numerical populations have distributions that can be approximated with the normal distribution. 8 Outline Continuous Random Variables Normal Distribution The R Script Normal Distribution • The normal or Gaussian distribution is more than likely the most important distribution in statistical applications. This is due to the fact that many numerical populations have distributions that can be approximated with the normal distribution. • Examples of distributions following an approximate normal distribution include physical characteristics such as the height and weight of a particular species. Further, certain statistics, such as the mean, follow an approximate normal distribution when certain conditions are satisfied. 9 Outline Continuous Random Variables Normal Distribution Normal PDF Normal Distribution X ∼ N (µ, σ) (x−µ)2 1 e− 2σ2 , −∞ < x < ∞, f (x) = √ 2πσ 2 where − ∞ < µ < ∞, and 0 < σ < ∞. E[X] = µ Var[X] = σ 2 10 The R Script Outline Continuous Random Variables Normal Distribution The R Script Three Different Normal Distributions σ µ µ µ Figure: Three normal distributions each with an increasing σ value as read from left to right. 11 Outline Continuous Random Variables Normal Distribution The R Script Standard Normal Distribution A normal random variable with µ = 0 and σ = 1, often denoted Z, is called a standard normal random variable. The cdf for the standard normal distribution, given in (2), is computed by first standardizing the random variable X, where X ∼ N (µ, σ), using the change of variable formula in (1). Z= X −µ ∼ N (0, 1) σ x−µ F (x) = P(X ≤ x) = P Z ≤ σ 1 =√ 2π (1) Z (x−µ) σ z2 e− 2 dz −∞ (2) 12 Outline Continuous Random Variables Normal Distribution The R Script Graphical representation for computing P (a ≤ X ≤ b) X ∼ N (µ, σ) P(a ≤ X ≤ b) P(X ≤ b) f (x) a Rb a f (x) b P( a−µ σ ≤Z ≤ b−µ σ ) Rb −∞ 13 a−µ σ b−µ σ a−µ σ a m b−µ σ ) Ra −∞ f (x)dx m P(Z ≤ f (z) b−µ σ f (z)dz f (x)dx P(Z ≤ f (z) R f (x) b f (x)dx m P(X ≤ a) f (z) a−µ σ b−µ σ R b−µ σ −∞ f (z)dz a−µ σ ) R a−µ σ −∞ f (z)dz 14 Outline Continuous Random Variables Normal Distribution The R Script Example Scores on a particular standardized test follow a normal distribution with a mean of 100 and standard deviation of 10. (a) What is the probability that a randomly selected individual will score between 90 and 115? Outline Continuous Random Variables Normal Distribution The R Script Example Scores on a particular standardized test follow a normal distribution with a mean of 100 and standard deviation of 10. (a) What is the probability that a randomly selected individual will score between 90 and 115? (b) What score does one need to be in the top 10%? 15 Outline Continuous Random Variables Normal Distribution The R Script Example Scores on a particular standardized test follow a normal distribution with a mean of 100 and standard deviation of 10. (a) What is the probability that a randomly selected individual will score between 90 and 115? (b) What score does one need to be in the top 10%? (c) Find the constant c such that P(105 ≤ X ≤ c) = 0.10. 16 Outline Continuous Random Variables Normal Distribution The R Script Solution To find P(90 ≤ X ≤ 115), we first draw a picture representing the desired area such as the one in Figure 4 on page 23. Note that finding the area between 90 and 115 is equivalent to finding the area to the left of 115 and from that area, subtracting the area to the left of 90. In other words, P(90 ≤ X ≤ 115) = P(X ≤ 115) − P(X ≤ 90). To find P(X ≤ 115) and P(X ≤ 90), we standardize using (1). That is, 115 − 100 = P(Z ≤ 1.5), P(X ≤ 115) = P Z ≤ 10 and 90 − 100 P(X ≤ 90) = P Z ≤ = P(Z ≤ −1.0). 10 17 Outline Continuous Random Variables Normal Distribution The R Script Using pnorm() The R function pnorm() finds P(X ≤ x) given X ∼ N (µ, σ). 1. The default arguments for pnorm() are pnorm(q, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE). 18 Outline Continuous Random Variables Normal Distribution The R Script Using pnorm() The R function pnorm() finds P(X ≤ x) given X ∼ N (µ, σ). 1. The default arguments for pnorm() are pnorm(q, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE). 2. For more information please read the help file for pnorm() by typing ?pnorm at the R prompt. 19 Outline Continuous Random Variables Normal Distribution The R Script Using pnorm() The R function pnorm() finds P(X ≤ x) given X ∼ N (µ, σ). 1. The default arguments for pnorm() are pnorm(q, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE). 2. For more information please read the help file for pnorm() by typing ?pnorm at the R prompt. 3. Note that the default values for pnorm() are µ = 0 and σ = 1. 20 Outline Continuous Random Variables Normal Distribution The R Script Solution Continued Using the R function pnorm() we find the areas to the left of 1.5 and −1.0 to be 0.9332 and 0.1586 respectively. Consequently, P(90 ≤ X ≤ 115) = P(−1.0 ≤ Z ≤ 1.5) = P(Z ≤ 1.5) − P(Z ≤ −1.0) = 0.9332 − 0.1587 = 0.7745. > pnorm(1.5,mean=0,sd=1) [1] 0.9331928 > pnorm(-1,mean=0,sd=1) [1] 0.1586553 > pnorm(1.5)-pnorm(-1) [1] 0.7745375 > pnorm(115,100,10) - pnorm(90,100,10) [1] 0.7745375 21 22 Outline Continuous Random Variables Normal Distribution The R Script Graphical representation for finding P (90 ≤ X ≤ 115) given X ∼ N (100, 10). The area between 90 and 115 is 0.7745 90 100 115 X~Normal (µ = 100, σ = 10) Figure: Graphical representation for finding P(90 ≤ X ≤ 115) given X ∼ N (100, 10). Outline Continuous Random Variables Normal Distribution The R Script Graphical representation for finding P (90 ≤ X ≤ 115) given X ∼ N (100, 10). X ∼ N (100, 10) P(90 ≤ X ≤ 115) P(X ≤ 115) f (x) f (x) 90 115 115 m m P( 90−100 ≤Z≤ 10 115−100 ) 10 f (z) 23 −1 1.5 P(X ≤ 90) P(Z ≤ f (x) 90 m 115−100 ) 10 P(Z ≤ f (z) 1.5 90−100 10 ) f (z) −1 Outline Continuous Random Variables Normal Distribution The R Script Solution Part (b) Finding the value c such that 90% of the area is to its left is equivalent to finding the value c such that 10% of its area is to the right. That is, finding the value c that satisfies P(X ≤ c) = 0.90 is equivalent to finding the value c such that P(X ≥ c) = 0.10. X − 100 c − 100 P(X ≤ c) = P Z = ≤ = 0.90 for c. 10 10 Using the R function qnorm(), we find the Z value (1.2816) such that 90% of the area in the distribution is to the left of that value. Consequently, to be in the top 10%, we need to be more than 1.2816 standard deviations above the mean. c − 100 set = 1.2816 10 and solve for c ⇒ c = 112.816. To be in the top 10%, one needs to score 112.816 or higher. 24 Outline Continuous Random Variables Normal Distribution The R Script Using qnorm() The function qnorm() finds the quantile in a normal distribution. It has the same default values as does pnorm(). > qnorm(.90) [1] 1.281552 > qnorm(.90,100,10) [1] 112.8155 • pnorm() finds P(X ≤ x), the area to the left of x (a number between 0 and 1). 25 Outline Continuous Random Variables Normal Distribution The R Script Using qnorm() The function qnorm() finds the quantile in a normal distribution. It has the same default values as does pnorm(). > qnorm(.90) [1] 1.281552 > qnorm(.90,100,10) [1] 112.8155 • pnorm() finds P(X ≤ x), the area to the left of x (a number between 0 and 1). • qnorm() finds the value c such that P(X ≤ c) = some area (a number between 0 and 1). 26 Outline Continuous Random Variables Normal Distribution The R Script Using qnorm() The function qnorm() finds the quantile in a normal distribution. It has the same default values as does pnorm(). > qnorm(.90) [1] 1.281552 > qnorm(.90,100,10) [1] 112.8155 • pnorm() finds P(X ≤ x), the area to the left of x (a number between 0 and 1). • qnorm() finds the value c such that P(X ≤ c) = some area (a number between 0 and 1). • The first argument of pnorm() is x while the first argument to qnorm() is the area to the left of c. pnorm() returns the cdf of x while qnorm() returns the inverse cdf of x. 27 Outline Continuous Random Variables Normal Distribution The R Script Solution Part (c) P(105 ≤ X ≤ c) = 0.10 is the same as 105 − 100 P(X ≤ c) = 0.10 + P(X ≤ 105) = 0.10 + P Z ≤ . 10 105 − 100 P Z≤ 10 = P(Z ≤ 0.5) = 0.6915. It follows then that P(X ≤ c) = 0.7915. c − 100 X − 100 ≤ = 0.7915 P(X ≤ c) = P Z = 10 10 c − 100 = 0.8116 ⇒ c = 108.116 10 Note that a Z value of 0.8116 has 79.15% of its area to the left of that value. is found by solving 28 Outline Continuous Random Variables Normal Distribution The R Script Solution Part (c) with R The solution to P(105 ≤ X ≤ c) = 0.10 when X ∼ N (100, 10) is the same as P(X ≤ c) = 0.10 + P(X ≤ 105) which can be computed as > qnorm(.10 + pnorm(105,100,10), 100, 10) [1] 108.1151 29 Outline Continuous Random Variables Normal Distribution Link to the R Script • Go to my web page Script for Normal Distribution • Homework: problems 3.53 - 3.63 • See me if you need help! 30 The R Script