Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Distributions, Cumulative Distributions and Quantiles in R H. D. Vinod ∗ April 8, 2014 Abstract We give a very elementary description of probability distribution functions (pdf) cumulative distribution functions and quantiles by using R graphics tools. 1 probability distribution functions The following R code cleans R memory, removes the prompt etc. A good way to start an R session with date etc. rm(list=ls()) options(prompt = " ", continue = " useFancyQuotes = FALSE) print(date()) ", width = 68, Now we are ready to begin by a mathematical description of the standard Normal density. It is a continuous probability distribution function (pdf) defined by 1 2 f (z) = √ e−(z )/2 . 2π (1) The standard normal density: f (z) ∼ N(0,1), is one of the most important densities in Statistics. It is related to several other densities. For example, the Chi-square density with n degrees of freedom (df) is created by adding n independently distributed squares of z. Even if z can be negative, its squares must be positive. Formally, given that z1 , z2 , · · · zn are independent of each other, and distributed as in eq.1, we construct the Chi-square density as follows. f (χ2 |df = n) = n X zi2 . (2) i=1 ∗ Professor of Economics, Fordham University, Bronx, New York, USA 10458. E-mail: vinod@fordham. edu. 1 Gosset, a person who used to work for the Guinness Brewery in Dublin, invented the Student’s t distribution in 1908. Perhaps because he did not want his employers to know about his extracurricular interest in mathematical statistics, he published under the pseudonym ‘Student’. The t density is obtained from the ratio of z to a denominator defined as the square root of Chi-square divided by the degrees of freedom. z f (t) = √ χ2 (n) (3) Another way to write the t density with n degrees of freedom is (1 + t2 /n)−(n+1)/2 , f (t) = √ n B(1/2, n/2) where B(a, b) = Γ(a)Γ(b) Γ(a+b) (4) denotes the beta function based on the gamma function defined as Γ(a) = Z ∞ e−u ua−1 du. (5) 0 The gamma function is a generalization of the factorial function. R has a function called ‘gamma’ to evaluate the above integral in eq. (5). In fact, when a is an integer, “gamma(a)” equals “factorial(a-1).” Now we turn to creating plots for the Normal and t densities. z=seq(-4,4,by=.1) y=dnorm(z) plot(z,y,typ="l", main="Standard Normal and t (df=3) Densities") y2=dt(z,df=3) lines(z,y2,typ="l",col="red") The output of the above code for the standard normal density: f (z) ∼ N(0,1), is in Figure 1. One can see that Student’s t density is very similar to standard Normal density except that the t density has an additional parameter called degrees of freedom (df). Each new choice of df will produce a new t density. Our figure has df=3. If df=100 or larger, t density is almost the same as standard Normal. Exercise: Note that all code seen in the red font is ready to be copied and pasted into your R. Change df=3 to various values (e.g. df=30) and see the two curves together. 2 Figure 1: Standard Normal Density and Student’s t Density 0.2 0.1 0.0 y 0.3 0.4 Standard Normal and t (df=3) Densities −4 −2 0 2 4 z Note: t density in red color has fatter tails 3 2 Cumulative Density and Qunatiles In addition to the probability density, we are also interested in cumulative probabilities. These are readily computed as follows. Note that ‘pnorm’ in R computes the cumulative probabilities and ‘qnorm’ computes the quantiles. z=seq(-4,4,by=.1) y=pnorm(z) plot(z,y,typ="l", main="Standard Normal and Cumulative Densities") y2=dnorm(z) lines(z,y2,typ="l",col="red") qnorm(seq(0.2,1,by=.2)) The last line of the above code computes the quantile of the normal for given cumulative probabilities starting with .2. The final answer for the quantile given by R is ‘Inf’ or infinity. [1] -0.8416212 -0.2533471 0.2533471 0.8416212 Inf It is important to understand that the quantiles are obtained for any given cumulative probability by simply starting from the given cumulative probability, drawing a horizontal line till it meets the cumulative probability function and then drawing a vertical line to the horizontal axis. See Figure 2. The median of f (z) is zero and is verified by starting at 0.5 on the vertical axis. Now one can do similar plots for the Student’s t distribution. The notation is standardized in R. ‘dt’ is for density of t, ‘pt’ for cumulative pdf of t and ‘qt’ for qunatiles of t denisty. z=seq(-4,4,by=.1) y=pt(z,df=3) plot(z,y,typ="l", main="Students t (df=3) and Cumulative Densities") y2=dt(z,df=3) lines(z,y2,typ="l",col="red") qt(seq(0.2,1,by=.2),df=3) The last line of the above code computes the quantile of the t for given cumulative probabilities starting with 0.2. The final answer for the quantile given by R is ‘inf’ or infinity. See Fig 3 for the t density graphics. [1] -0.9784723 -0.2766707 0.2766707 0.9784723 4 Inf Figure 2: Standard Normal Density and Cumulative Density in Same Plot 0.0 0.2 0.4 y 0.6 0.8 1.0 Standard Normal and Cumulative Densities −4 −2 0 2 4 z Note: You can read the quantiles of Normal along the horizontal axis Figure 3: t Density (df=3) and Cumulative Density in Same Plot 0.0 0.2 0.4 y 0.6 0.8 1.0 Student's t (df=3) and Cumulative Densities −4 −2 0 2 4 z Note: You can read the quantiles of t along the horizontal axis 5