Download Probability Distributions, Cumulative Distributions and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Probability Distributions, Cumulative Distributions and
Quantiles in R
H. D. Vinod
∗
April 8, 2014
Abstract
We give a very elementary description of probability distribution functions (pdf)
cumulative distribution functions and quantiles by using R graphics tools.
1
probability distribution functions
The following R code cleans R memory, removes the prompt etc. A good way to start an R
session with date etc.
rm(list=ls())
options(prompt = " ", continue = "
useFancyQuotes = FALSE)
print(date())
", width = 68,
Now we are ready to begin by a mathematical description of the standard Normal density.
It is a continuous probability distribution function (pdf) defined by
1
2
f (z) = √ e−(z )/2 .
2π
(1)
The standard normal density: f (z) ∼ N(0,1), is one of the most important densities in
Statistics. It is related to several other densities.
For example, the Chi-square density with n degrees of freedom (df) is created by adding
n independently distributed squares of z. Even if z can be negative, its squares must be
positive. Formally, given that z1 , z2 , · · · zn are independent of each other, and distributed as
in eq.1, we construct the Chi-square density as follows.
f (χ2 |df = n) =
n
X
zi2 .
(2)
i=1
∗
Professor of Economics, Fordham University, Bronx, New York, USA 10458. E-mail: vinod@fordham.
edu.
1
Gosset, a person who used to work for the Guinness Brewery in Dublin, invented the Student’s t distribution in 1908. Perhaps because he did not want his employers to know about
his extracurricular interest in mathematical statistics, he published under the pseudonym
‘Student’. The t density is obtained from the ratio of z to a denominator defined as the
square root of Chi-square divided by the degrees of freedom.
z
f (t) = √ χ2
(n)
(3)
Another way to write the t density with n degrees of freedom is
(1 + t2 /n)−(n+1)/2
,
f (t) = √
n B(1/2, n/2)
where B(a, b) =
Γ(a)Γ(b)
Γ(a+b)
(4)
denotes the beta function based on the gamma function defined as
Γ(a) =
Z ∞
e−u ua−1 du.
(5)
0
The gamma function is a generalization of the factorial function. R has a function called
‘gamma’ to evaluate the above integral in eq. (5). In fact, when a is an integer, “gamma(a)”
equals “factorial(a-1).”
Now we turn to creating plots for the Normal and t densities.
z=seq(-4,4,by=.1)
y=dnorm(z)
plot(z,y,typ="l", main="Standard Normal and t (df=3) Densities")
y2=dt(z,df=3)
lines(z,y2,typ="l",col="red")
The output of the above code for the standard normal density: f (z) ∼ N(0,1), is in Figure 1.
One can see that Student’s t density is very similar to standard Normal density except that
the t density has an additional parameter called degrees of freedom (df). Each new choice
of df will produce a new t density. Our figure has df=3.
If df=100 or larger, t density is almost the same as standard Normal.
Exercise: Note that all code seen in the red font is ready to be copied and pasted into
your R. Change df=3 to various values (e.g. df=30) and see the two curves together.
2
Figure 1: Standard Normal Density and Student’s t Density
0.2
0.1
0.0
y
0.3
0.4
Standard Normal and t (df=3) Densities
−4
−2
0
2
4
z
Note: t density in red color has fatter tails
3
2
Cumulative Density and Qunatiles
In addition to the probability density, we are also interested in cumulative probabilities.
These are readily computed as follows. Note that ‘pnorm’ in R computes the cumulative
probabilities and ‘qnorm’ computes the quantiles.
z=seq(-4,4,by=.1)
y=pnorm(z)
plot(z,y,typ="l", main="Standard Normal and Cumulative Densities")
y2=dnorm(z)
lines(z,y2,typ="l",col="red")
qnorm(seq(0.2,1,by=.2))
The last line of the above code computes the quantile of the normal for given cumulative
probabilities starting with .2. The final answer for the quantile given by R is ‘Inf’ or infinity.
[1] -0.8416212 -0.2533471
0.2533471
0.8416212
Inf
It is important to understand that the quantiles are obtained for any given cumulative
probability by simply starting from the given cumulative probability, drawing a horizontal
line till it meets the cumulative probability function and then drawing a vertical line to the
horizontal axis. See Figure 2. The median of f (z) is zero and is verified by starting at 0.5
on the vertical axis.
Now one can do similar plots for the Student’s t distribution. The notation is standardized
in R. ‘dt’ is for density of t, ‘pt’ for cumulative pdf of t and ‘qt’ for qunatiles of t denisty.
z=seq(-4,4,by=.1)
y=pt(z,df=3)
plot(z,y,typ="l", main="Students t (df=3) and Cumulative Densities")
y2=dt(z,df=3)
lines(z,y2,typ="l",col="red")
qt(seq(0.2,1,by=.2),df=3)
The last line of the above code computes the quantile of the t for given cumulative probabilities starting with 0.2. The final answer for the quantile given by R is ‘inf’ or infinity. See
Fig 3 for the t density graphics.
[1] -0.9784723 -0.2766707
0.2766707
0.9784723
4
Inf
Figure 2: Standard Normal Density and Cumulative Density in Same Plot
0.0
0.2
0.4
y
0.6
0.8
1.0
Standard Normal and Cumulative Densities
−4
−2
0
2
4
z
Note: You can read the quantiles of Normal along the horizontal axis
Figure 3: t Density (df=3) and Cumulative Density in Same Plot
0.0
0.2
0.4
y
0.6
0.8
1.0
Student's t (df=3) and Cumulative Densities
−4
−2
0
2
4
z
Note: You can read the quantiles of t along the horizontal axis
5