Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Quantifying Population Attributes Ba Chu E-mail: ba [email protected] Web: http://www.carleton.ca/∼bchu (Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for details. Examples will be given and explained in the class.) 1 Objectives It is well-known that the behavior of a random variable is characterized through the probability density function (pdf). However, usually, one is interested in specific attributes of a pdf. This is true because it is through specific attributes that one comprehends the entire pdf, and also because it is easier to draw inferences about a specific attribute than about the entire pdf. Accordingly, this lecture examines several popular attributes that are useful in economic statistics. 2 Measure of Symmetry We begin this section with a question: “Where is the ‘middle’ of a normal distribution”. Since the normal distribution has a bell shape, the obvious answer is this question is that the ‘middle’ of the distribution is the mean (or µ). In general, let f denote the pdf of X. If there exists a value θ ∈ R such that f (θ + x) = f (θ − x) for every x ∈ R, then X is a symmetric random variable and θ is its center of symmetry. Another example that I am going to explain in the class is the uniform random variable. X ∼ Uniform[a, b] has the center of symmetry (a + b)/2. 1 Definition 1. Let X be a random variable. If there exists a value θ ∈ R such that the random variables X − θ and θ − X have the same distribution, then X is a symmetric random variable and θ is its center of symmetry. 3 Quantiles Definition 2. Let X be a continuous random variable and let α ∈ (0, 1). If q = q(X, α) such that P (X < q) = α and P (X > q) = 1 − α, then q is called an α population quantile of X. (Note that I hereafter write quantile instead of population quantile for brevity.) Example 1. Suppose that X ∼ Uniform[a, b] has the pdf f . Then q is the value in [a, b] for which α = P (X < q) = Area[a,q) (f ) = (q − a)/(b − a), i.e., q = a + α(b − a). Example 2. Suppose that X has the pdf x/2, f (x) = 0, x ∈ [0, 2], otherwise. Then q is the value in (0, 2) for which α = P (X < q) = Area[a,q] (f ) = 1/2.(q − 0).(q/2 − 0) = q 2 /4, √ i.e., q = 2 α. Definition 3. The mid-point of the interval of all values of the α = 0.5 quantile is called the population median. The above definition of quantiles applies to only continuous random variables. A general definition of quantiles is stated as follows: Definition 4. Let X be a random variable and let α ∈ (0, 1). If q = q(X, α) is such that P (X < q) ≤ α and P (X > q) ≤ 1 − α, then q is called an α quantile of X. Moreover, the first, second, and third quantiles of X, denote q1 (X), q2 (X), and q3 (X), are the α = 0.25, α = 0.50, and α = 0.75 quantiles of X. The second quantile is also called the median. A useful property of the median is that it is insensitive to the influence of outliers in data. I shall give an example. 2 Example 3. Let Xk denote a discrete random variable that assumes values in {−1, 0, 1, 10k } for k = 1, 2, 3, . . . . Suppose that Xk has the pdf Since P (X < 0) = 0.19 and P (X > 0) = 0.21, so the x -1 0 1 10k pk (x) 0.19 0.60 0.19 0.02 median of X is q2 (X) = 0, which does not depend on k. Definition 5. Let X be a random variable with the first and third quantiles q1 and q3 . The interquantile range of X, which is a measure of dispersion, is the quantity iqr(X) = q3 − q1 . Suppose that we have a sample of data, {X1 , . . . , Xn }. By ordering the date in ascending order, we obtain the order statistics X(1) ≤ X(2) ≤ · · · ≤ X(n) . Denote p = k , n+1 then Xk is the p-th sample quantile (the sample estimate of the p-th population quantile). 4 Mean and Variance Like the population median, the population mean has an appealing intepretation that commends its use as a measure of centrality. If X is a symmetric random variable with center of symmetry θ, then µ = E(X) = θ and q2 = θ, so the population mean and the population median agree. In general, this is not the case. If X is NOT symmetric, then one should think carefully about whether one is interested in the population mean and the population median. Of course, computing both the mean and the median is highly recommended. However, the mean is rather sensitive to the influence of outliers. In the table above, E(Xk ) = 2.10k−2 . The variance of X, which is defined as σ 2 = E(X − µ)2 (σ is the standard deviation), measures dispersion in squared units. Just as it is natural to use the median and the interquantile range together, so is it natural to use the mean and the standard deviation together. Like the mean, the standard deviation is extremely sensitive to the influence of outliers. In the table above, σ 2 = 0.38 + 196.100k−2 . 3 Now I provide a financial application of the mean and the variance in measuring expected return and risk. Example 4. Suppose that you are interested in buying a risky asset and a riskless asset with returns R and rf respectively. Your initial capital, W = 1, is allocated between these assets (α in the risky asset and 1 − α in the riskless asset). The expected payoff is µ(α) = αE(R) + (1 − α)rf ; and the risk, which is measured by the variance, is σ 2 (α) = α2 E[(R − E(R))2 ]. Suppose that you wish to receive an expected payoff of $1.50, what proportion, α, should you put on the risky asset? Answer: You want to choose α such that, for a given expected payoff, the risk is minimized, i.e., minα∈[0,1] σ 2 (α) subject to µ(α) = 1.5. This is a standard linear programming problem, which leads to the following equation: 2ασ 2 − λµ + λrf = 0, αµ + (1 − α)r = 1.5, f where µ = E(R) and σ 2 = E[(R − E(R))2 ]. There are other useful population attributes that you may need to know. Definition 6. Let X denote a random variable with mean µ and variance σ 2 , the skewness of X is given by γ1 = E[(X − µ)3 ] , σ3 and the variance of X is given by γ2 = E[(X − µ)4 ] − 3. σ4 4 5 Exercises 1. Consider the function g : R −→ R defined by 0, x, g(x) = 1, 3 − x, 0, x < 0, x ∈ [0, 1], x ∈ [1, 2], x ∈ [2, 3], x > 3. Let f (x) = cg(x), where c is an undetermined constant. (a) For what value of c is f a probability density function (pdf)? (b) Suppose that a continuous random variable X has the pdf f . Compute P (1.5 < X < 2.5). (c) Compute E(X). (d) Let F denote the cdf of X. Compute F (1). (e) Determine the 0.90 quantile of f . 2. Suppose that X is a continuous random variable with the pdf f (x) = 0, x < 0, x, x ∈ (0, 1), (3 − x)/4, 0, x ∈ (1, 3), x > 3. (a) Compute q2 (X), the median. (b) Which is greater, q2 (X) or E(X)? Explain your reasoning. (c) Compute P (0.5 < X < 1.5). (d) Compute iqr(X), the interquantile range. 5 3. A random variable X ∼ Uniform(5, 15) has the mean µ = 10 and the variance σ 2 = 225. Let Y denote a normal random variable with the same mean and variance. (a) Consider X. What is the ratio of its interquantile range to its standard deviation? (b) Consider Y . What is the ratio of its interquantile range to its standard deviation? 4. For each of the following random variables, discuss whether the median or the mean would be a more useful measure of centrality: (a) The return of a share. (b) the lifetime of 75-watt light bulbs. 6