Download Quantifying Population Attributes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Quantifying Population Attributes
Ba Chu
E-mail: ba [email protected]
Web: http://www.carleton.ca/∼bchu
(Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for
details. Examples will be given and explained in the class.)
1
Objectives
It is well-known that the behavior of a random variable is characterized through the probability
density function (pdf). However, usually, one is interested in specific attributes of a pdf. This is
true because it is through specific attributes that one comprehends the entire pdf, and also because
it is easier to draw inferences about a specific attribute than about the entire pdf. Accordingly, this
lecture examines several popular attributes that are useful in economic statistics.
2
Measure of Symmetry
We begin this section with a question: “Where is the ‘middle’ of a normal distribution”. Since the
normal distribution has a bell shape, the obvious answer is this question is that the ‘middle’ of the
distribution is the mean (or µ).
In general, let f denote the pdf of X. If there exists a value θ ∈ R such that f (θ + x) = f (θ − x)
for every x ∈ R, then X is a symmetric random variable and θ is its center of symmetry.
Another example that I am going to explain in the class is the uniform random variable. X ∼
Uniform[a, b] has the center of symmetry (a + b)/2.
1
Definition 1. Let X be a random variable. If there exists a value θ ∈ R such that the random
variables X − θ and θ − X have the same distribution, then X is a symmetric random variable and
θ is its center of symmetry.
3
Quantiles
Definition 2. Let X be a continuous random variable and let α ∈ (0, 1). If q = q(X, α) such that
P (X < q) = α and P (X > q) = 1 − α, then q is called an α population quantile of X. (Note that
I hereafter write quantile instead of population quantile for brevity.)
Example 1. Suppose that X ∼ Uniform[a, b] has the pdf f . Then q is the value in [a, b] for which
α = P (X < q) = Area[a,q) (f ) = (q − a)/(b − a), i.e., q = a + α(b − a).
Example 2. Suppose that X has the pdf


 x/2,
f (x) =

 0,
x ∈ [0, 2],
otherwise.
Then q is the value in (0, 2) for which α = P (X < q) = Area[a,q] (f ) = 1/2.(q − 0).(q/2 − 0) = q 2 /4,
√
i.e., q = 2 α.
Definition 3. The mid-point of the interval of all values of the α = 0.5 quantile is called the
population median.
The above definition of quantiles applies to only continuous random variables. A general definition of quantiles is stated as follows:
Definition 4. Let X be a random variable and let α ∈ (0, 1). If q = q(X, α) is such that P (X <
q) ≤ α and P (X > q) ≤ 1 − α, then q is called an α quantile of X. Moreover, the first, second, and
third quantiles of X, denote q1 (X), q2 (X), and q3 (X), are the α = 0.25, α = 0.50, and α = 0.75
quantiles of X. The second quantile is also called the median.
A useful property of the median is that it is insensitive to the influence of outliers in data. I
shall give an example.
2
Example 3. Let Xk denote a discrete random variable that assumes values in {−1, 0, 1, 10k } for
k = 1, 2, 3, . . . . Suppose that Xk has the pdf Since P (X < 0) = 0.19 and P (X > 0) = 0.21, so the
x
-1
0
1
10k
pk (x)
0.19
0.60
0.19
0.02
median of X is q2 (X) = 0, which does not depend on k.
Definition 5. Let X be a random variable with the first and third quantiles q1 and q3 . The interquantile range of X, which is a measure of dispersion, is the quantity iqr(X) = q3 − q1 .
Suppose that we have a sample of data, {X1 , . . . , Xn }. By ordering the date in ascending order,
we obtain the order statistics X(1) ≤ X(2) ≤ · · · ≤ X(n) . Denote p =
k
,
n+1
then Xk is the p-th sample
quantile (the sample estimate of the p-th population quantile).
4
Mean and Variance
Like the population median, the population mean has an appealing intepretation that commends
its use as a measure of centrality. If X is a symmetric random variable with center of symmetry
θ, then µ = E(X) = θ and q2 = θ, so the population mean and the population median agree. In
general, this is not the case. If X is NOT symmetric, then one should think carefully about whether
one is interested in the population mean and the population median. Of course, computing both
the mean and the median is highly recommended. However, the mean is rather sensitive to the
influence of outliers. In the table above, E(Xk ) = 2.10k−2 .
The variance of X, which is defined as σ 2 = E(X − µ)2 (σ is the standard deviation), measures
dispersion in squared units. Just as it is natural to use the median and the interquantile range
together, so is it natural to use the mean and the standard deviation together. Like the mean,
the standard deviation is extremely sensitive to the influence of outliers. In the table above, σ 2 =
0.38 + 196.100k−2 .
3
Now I provide a financial application of the mean and the variance in measuring expected return
and risk.
Example 4. Suppose that you are interested in buying a risky asset and a riskless asset with returns
R and rf respectively. Your initial capital, W = 1, is allocated between these assets (α in the risky
asset and 1 − α in the riskless asset). The expected payoff is µ(α) = αE(R) + (1 − α)rf ; and the
risk, which is measured by the variance, is σ 2 (α) = α2 E[(R − E(R))2 ]. Suppose that you wish to
receive an expected payoff of $1.50, what proportion, α, should you put on the risky asset?
Answer: You want to choose α such that, for a given expected payoff, the risk is minimized, i.e.,
minα∈[0,1] σ 2 (α) subject to µ(α) = 1.5. This is a standard linear programming problem, which leads
to the following equation:


2ασ 2 − λµ + λrf = 0,
αµ + (1 − α)r = 1.5,

f
where µ = E(R) and σ 2 = E[(R − E(R))2 ].
There are other useful population attributes that you may need to know.
Definition 6. Let X denote a random variable with mean µ and variance σ 2 , the skewness of X
is given by
γ1 =
E[(X − µ)3 ]
,
σ3
and the variance of X is given by
γ2 =
E[(X − µ)4 ]
− 3.
σ4
4
5
Exercises
1. Consider the function g : R −→ R defined by



0,






x,



g(x) =
1,





3 − x,





 0,
x < 0,
x ∈ [0, 1],
x ∈ [1, 2],
x ∈ [2, 3],
x > 3.
Let f (x) = cg(x), where c is an undetermined constant.
(a) For what value of c is f a probability density function (pdf)?
(b) Suppose that a continuous random variable X has the pdf f . Compute P (1.5 < X < 2.5).
(c) Compute E(X).
(d) Let F denote the cdf of X. Compute F (1).
(e) Determine the 0.90 quantile of f .
2. Suppose that X is a continuous random variable with the pdf
f (x) =









0,
x < 0,
x,
x ∈ (0, 1),


(3 − x)/4,






0,
x ∈ (1, 3),
x > 3.
(a) Compute q2 (X), the median.
(b) Which is greater, q2 (X) or E(X)? Explain your reasoning.
(c) Compute P (0.5 < X < 1.5).
(d) Compute iqr(X), the interquantile range.
5
3. A random variable X ∼ Uniform(5, 15) has the mean µ = 10 and the variance σ 2 = 225. Let
Y denote a normal random variable with the same mean and variance.
(a) Consider X. What is the ratio of its interquantile range to its standard deviation?
(b) Consider Y . What is the ratio of its interquantile range to its standard deviation?
4. For each of the following random variables, discuss whether the median or the mean would
be a more useful measure of centrality:
(a) The return of a share.
(b) the lifetime of 75-watt light bulbs.
6