Download Economics 140A Random Variables Probability In everyday usage

Economics 140A Random Variables Probability In everyday usage, probability expresses a degree of belief about an event or statement with a number between 0 and 1. This de nition of probability is subjective in nature because di erent individuals assign di erent probabilities to an event. We work with an objective de nition of probability. Our objective de nition of the probability that an event occurs is given by the limit of the empirical frequency of the event as the number of replications of the experiment, from which the event can occur, increases inde nitely. The assignment of probability does not di er across individuals. Probability is a subject that can be studied independently of statistics; it forms the foundation of statistics. For example, what is the probability that a head comes up twice in a row if we toss an unbiased coin? The answer, .25, is calculated without need of statistical inference. Axioms of Probability De nitions of a few commonly used terms follow. These terms inevitably remain vague until they are illustrated. Sample space. The set of all the possible outcomes of an experiment. Event. A subset of the sample space. Simple Event. An event that cannot be a union of other events. Composite Event. An event that is not a simple event. Example 1. Experiment. Tossing a coin twice. Sample space: fHH, HT, TH, TTg. The event that a tail occurs at least once: HT [ TH [ TT. Example 2. Experiment. Reading the temperature (F) at UCSB at noon on November 1. Sample Space. Real Interval (0,100). Events of interest are intervals contained in the sample space. A probability is a nonnegative number we assign to every event. The axioms of probability are the rules we agree to follow when we assign probabilities. (Often, Venn diagrams are used to determine relations among the probabilities assigned to di erent sets.) Axioms of Probability (1) P (A) 0 for any event A. (2) P (S) = 1, where S is the sample space. (3) If fAi g, i = 1; 2; : : :, are mutually exclusive (that is, Ai \ Aj = ; for all i 6= j), then P (A1 [ A2 [ : : :) = P (A1 ) + P (A2 ) + : : :. The rst two rules are consistent with everyday use of the word probability. The third rule is consistent with the frequency interpretation of probability, for relative frequency follows the same rule. If, at the roll of a die, A is the event that the die shows 1 pip and B is the event that the die shows 2 pips, the relative frequency of A [ B is the sum of the relative frequencies of A and B. We want probability to follow the same rule. If the sample space is discrete, as in example 1, it is possible to assign probability to every event (that is, every possible subset of the sample space) in a way that is consistent with the probability axioms. If the sample space is continuous, however, as in example 2, it is not possible to do so. In such a case we restrict attention to a smaller class of events to which we can assign probabilities in a manner consistent with the axioms. For example, the class of all intervals contained in (0,100) and their unions satis es the condition. Conditional Probability The concept of conditional probability is intuitively easy to understand. For example, it makes sense to talk about the conditional probability that two pips show in the role of a die given that an even number of pips is showing. In the frequency interpretation, this conditional probability can be regarded as the limit of the ratio of the number of times two pips shows to the number of times an even number of pips shows. In general, we consider the \conditional probability of A given B," denoted by P (AjB), for any pair of events A and B in a sample space, provided P (B) > 0, and establish axioms that govern the calculation of conditional probabilities. Axioms of Conditional Probability (In the following it is assumed that P (B) > 0.) (1) P (AjB) 0 for any event A. (2) P (AjB) = 1 for any event B A. (3) If fAi \ Bg, i = 1; 2; : : :, are mutually exclusive then P (A1 [ A2 [ : : : jB) = P (A1 jB) + P (A2 jB) + : : :. = PP (H) . (4) If H B and G B and P (G) 6= 0, then PP (HjB) (GjB) (G) Axioms (1)-(3) are analogous to the corresponding axioms of probability. They imply that we may treat conditional probability just like probability by regarding 2 B as the sample space. Axiom (4) is justi ed by observing that because whenever H or G occurs B occurs, the relative frequency of H versus G remains the same before and after B is known to have occurred. As explained in the background notes, axiom (1) is redundant, that is axiom (1) follows from the other three axioms. Axioms (2)-(4) can also be written as a single, more complicated axiom P (AjB) = P (A \ B) for any pair of events A and B such that P (B) > 0: (0.1) P (B) Statistical Independence We rst de ne the concept of statistical (stochastic) independence for a pair of events. Henceforth it will be referred to simply as independence. De nition (Pairwise). Events A and B are said to be independent if P (A) = P (AjB). The term independence has a clear intuitive meaning. It means that the probability of occurrence of A is not a ected by whether or not B has occurred. Because of (0.1), the above equality is equivalent to P (B) = P (AjB) or to P (A)P (B) = P (A \ B). In many cases two random variables are not independent. For example, the probability that the Chicago Bulls win a game is not independent of whether Michael Jordan plays. Statistics If we know all the features of a random mechanism that generates data, then we use probability theory to construct probabilities. In most cases (and in all cases of empirical interest in the real world) we do not know all features of the random mechanism that has generated the data we are studying. Statistics is the science of observing data and making inferences about the characteristics of a random mechanism that has generated the data. The literal translation of econometrics is measurement in economics, which means measuring the characteristics of a random mechanism that has generated the data under study. As such, we can think of econometrics as drawing on results from statistics. Random Variables 3 To make mathematical analysis tractable, the statistician assigns numbers to objects (for example 1 to heads and 0 to tails in our coin tossing example). A random mechanism whose outcomes are real numbers is called a random variable. De nition. A random variable is a variable that takes values according to a certain probability distribution. A discrete random variable takes a countable ( nite or countably in nite) number of real numbers with preassigned probabilities. A continuous random variable takes a continuum of values in the real line according to the rule determined by a density function. A third type of random variable is formed as a mixture of discrete and continuous random variables. The term probability distribution captures a broader concept that refers to either a set of discrete probabilities or a density function. 4 Example. The roll of a fair die is an example of a univariate discrete random variable. Sample Space Probability Z1 1 if odd Z2 = 0 if even The arrows indicate mappings from the sample space to the random variables. The random variable Z1 can hardly be distinguished from the sample space. Note that the probability distribution of Z2 can be derived from the sample space: P (Z2 = 1) = 12 and P (Z2 = 0) = 21 . The probability distribution of a discrete random variable is completely characterized by the equation P (Z = zi ) = pi ; i = 1; 2; : : : ; n: (0.2) For a discrete random variable the probability of a speci c value can Pnbe nonzero: pi 0. Because one of the possible outcomes must always occur, i=1 pi = 1; n may be 1 in some cases. As we have not yet been precise about a density function, we need to formalize our earlier de nition of a continuous random variable. De nition. If there is a nonnegative function f (y) de ned over the whole line such that Z y2 P (y1 Y y2 ) = f (y)dy; y1 for any y1 , y2 satisfying y1 y2 , then Y is a continuous random variable and f (y) is the density function for YR. One of the possible outcomes must always 1 occur (probability axiom (2)), so 1 f (y)dy = 1. The de nition of an integral implies that the probability that a continuous random variable takes any single value is zero, so it does not matter whether < or is used within the probability bracket. In most practical applications, f (y) is continuous except for possibly a nite number of discontinuities. For such a function the Riemann integral exists, and therefore f (y) is a density function. The cumulative distribution function F (y), gives the probability that y takes values less than or equal to a number b. The cumulative distribution function is obtained by accumulating all the relevant probabilities. 5 For a continuous random variable P (y b) = F (b) = For a discrete random variable P (z b) = G(b) = Z X b f (y)dy: 1 P (Z = zi ): i:zi b Location The location of a random variable describes a typical value of the random variable. There are three common measures of location. The rst is the mean, or expected value. We use the notation EY to denote the expected value of the random variable Y , where E is the expectation operator. (An operator denotes a speci c mathematical procedure.) For a continuous random variable Z 1 y f (y)dy = y : EY = 1 For a discrete random variable EZ = N X zi P (Z = zi ) = z: i=1 We see that the mathematical procedure is to take the sum of each possible value multiplied by the probability of the value. Example. Let Z be the number of pips resulting from one roll of a fair die. The mean of Z is 3:5. Note that 3:5 pips are never observed, so the mean is not necessarily the value that occurs on average. Two important features of the expectations operator are apparent. First, the expectations operator is linear, that is, E (Y1 + Y2 ) = EY1 + EY2 . To verify for the discrete case n1 X n2 X E (Z1 + Z2 ) = (z1i + z2j ) P (Z1 = z1i ) P (Z2 = z2j ) = = i=1 j=1 n1 X n2 X i=1 j=1 n1 X z1i P (Z1 = z1i ) P (Z2 = z2j ) + z1i P (Z1 = z1i ) + i=1 n1 X n2 X i=1 j=1 n2 X j=1 6 z2j P (Z2 = z2j ) ; z2j P (Z1 = z1i ) P (Z2 = z2j ) P 2 P 1 where the third line follows from the fact that nj=1 P (Z2 = z2j ) = ni=1 P (Z1 = z1i ) = 1. To verify for the continuous case, simply replace summands with integral signs. The second feature is that the expectation of a deterministic variable is simply the deterministic variable. For example Z Z EY (5) = 5fY (y) dy = 5 fY (y) dy = 5: Because EY is simply a number, like 5, we have Z Z E (EY ) = (EY ) f (y) dy = EY fY (y) dy = EY: The mean is the balance point of the data in that the cumulative distance from the mean of those observations above the mean is equal to the cumulative distance from the mean of those observations below the mean. Because the mean is the balance point, it is extremely sensitive to outliers. (Diagram with a teeter totter the following two examples: f10,12,17 - mean equals 13g and f30,35,38,57 - mean equals 40g- as the last value shifts out, the entire sample is forced to the left of the balance point (mean).) A more robust measure of location, one that is generally not as sensitive to outliers, is the median. The median is the value (not necessarily unique) that half of the observations equal or fall below. Finally, the mode is an alternative way to represent typical values. The mode is the value (not necessarily unique) with the highest probability (for discrete random variables) or the highest point of the density function (for continuous random variables). Variance While location describes a typical value of the random variable, we need some measure of dispersion to describe how likely we are to get a typical value. The variance of a random variable is a common measure of dispersion. To ensure that positive and negative values do not cancel out, we study the squared di erence between possible values of Y and the expected value of Y . De nition: The variance of Y is the average squared distance between Y and EY : V ar(Y ) = E (Y = E[Y 2 2Y EY + (EY )2 ] = E (Y 2 ) 7 EY )2 2(EY )2 + (EY )2 = E Y 2 (EY )2 ; where the second equality follows from the fact that expectation is a linear operator and EY is not random. Clearly the variance is nonnegative. Often the square root of the variance, termed the standard deviation, is reported rather than the variance. To accommodate the fact that the variance is nonnegative, the standard deviation is de ned to be the positive square root of the variance. Interpretation of the standard deviation is not as straightforward as interpretation of the mean. To help in interpretation we use Chebyshev's inequality, which states that in any set of data at least 1 k12 of the data are within k standard deviations of the mean. For k = 2 we nd that at least 34 of the data lie within 2 standard deviations of the mean. (The bound is not always sharp: For a Gaussian random variable roughly 95 percent of the data lie within 2 standard deviations of the mean and nearly 23 of the data lie within one standard deviation.) As with the mean, the variance is sensitive to outliers. A more robust measure of dispersion is the interquartile range, which is de ned as the range encompassed by the middle half of the data. Note that the nal expression for the variance involved the term E (Y 2 ). From the above de nition for the expected value of Y we infer that Z 1 2 E Y = y 2 f (y)dy: 1 The expressions for EY and E (Y 2 ) are examples of moments. For any positive value of k (note k does not need to be an integer) Z 1 k E Y = y k f (y)dy: 1 We have seen that the moments for k = 1; 2 are of interest to us. Note the distinction between the second moment and the variance. The second moment equals the variance only if the rst moment is zero. Standardization Standardization is the process by which a random variable with mean and variance 2 is transformed to a random variable with mean zero and variance one. Because random variables are typically standardized to allow one to use statistical tables, in our analysis we often standardize random variables. Suppose 8 Y N( ; 2 ); where the notation indicates that Y is distributed as a Gaussian random variable with EY = and the variance of Y equal to 2 . The rst step in standardization is to ensure that the transformed random variable has mean 0. To do so, we subtract : E (Y ) = EY E = EY = EY EY = 0: The (transformed) random variable Y has mean 0. The second step in standardization is to ensure that the transformed mean zero random variable has variance 1. To do so, we divide through by the standard deviation, : V ar where E Y Y = 1 E (Y E =E Y E ) = 0. The variance of 2 Y 0 The transformed random variable Y Y 1 = 2 E (Y 2 Y Y ; is )2 = 1: has the distribution N (0; 1) : The second step in the standardization is a special case of the rule V ar(cY ) = c2 V ar(Y ); for any constant c. Standardized variables are useful for comparisons within groups. Let Y s = Y . The standardized variable Y s measures how many standard deviations Y is above or below its mean. If Y equals its mean, then Y s is equal to 0. If Y is 2 standard deviations above its mean, then Y s is equal to 2. If Y is one standard deviation below its mean, then Y s is equal to -1. Multivariate Distributions 9 What if we have more than one random variable? The joint distribution of several random variables is a multivariate distribution. Consider two continuous random variables Y and W . The multivariate density function fY;W (y; w) is dened over R2 . The individual, or marginal, density functions are fY (y) and fW (w) and the conditional density functions are fY jW (yjw) and fW jY (wjy). Recall fY;W (y; w) = fY jW (yjw)fW (w): Conditional moments are constructed as Z E (Y jW = w) = yfY jW (yjw)dy: An important relation between conditional moments and unconditional moments is the Law of Iterative Expectations EY = EW E (Y jW ) : Proof. From the de nition of the joint density function Z Z Z Z EY = yfY;W (y; w)dwdy = yfY jW (yjw)fW (w)dwdy Z Z = yfY jW (yjw)dy fW (w)dw = EW E (Y jW ) : Independence We have already de ned the concept of independence for a pair of events. We now extend the concept of independence to a pair of random variables. If Y and W are unrelated, in the sense that the probability of Y taking on a certain value is not related to the probability of W taking on a certain value then Y and W are independent. If Y and W are independent, then E[Y W ] = E[Y ]E[W ]; and fY jW (yjw) = fY (y) ) fY;W (y; w) = fY (y)fW (w): Covariance 10 The covariance measures the product of deviations from means Cov(Y; W ) = E[(Y EY )(W EW )]: The covariance captures only the linear relationship. For example, if Y = W 2 and W takes values f 3 2 1 0 1 2 3g, then Y takes values f9 4 1 0 1 4 9 g. If W takes each value with equal probability, then EW = 0 and EY = 4. The covariance between W and Y is 1X (wi 7 i=1 7 0)(yi 4) = 0: Thus there is no linear relation between W and Y , but there is clearly a nonlinear relation. When W is independent of Y there is no relation between W and Y : Relation: i) If W is independent of Y , then there is no relation between W and Y : no relation ) no linear relation ) Cov(Y; W ) = 0. ii) If the covariance between W and Y is zero, then there is no linear relation: no linear relation 6 ) no relation. Note that covariance is not scale invariant (the value of the estimated covariance depends on the units in which the random variables are measured). The correlation coe cient provides a scale invariant measure of the linear relation between W and Y : Cov(Y; W ) ; YW = Y where 1 YW W 1. Variance of a Sum The variance of a sum of two random variables is V ar(Y + W ) = E[(Y + W ) = E[Y EY ]2 + E[W E(Y + W )]2 = E[(Y EW ]2 + 2E[(W EY ) + (W EW )(Y = V ar(Y ) + V ar(W ) + 2Cov(Y; W ): 11 EY )] EW )]2 Similarly V ar(Y Z) = V ar(Y ) + V ar(Z) 2Cov(Y; Z): When Cov(Y; W ) = 0: V ar(Y + W ) = V ar(Y ) + V ar(w): Central Limit Theorems A remarkably large class of random variables can be well approximated by a single distribution, the Gaussian distribution. What types of random variables? Why the types we typically encounter in applied work, such as the average height of students, the average income gain from college, or the average e ect of interest rates on investment. The key word is average. Theorem. If Z is a standardized sum of n independent identically distributed random variables with a nite, nonzero standard deviation, then the probability distribution of Z approaches the Gaussian distribution as n increases. The beauty of the theorem is that the conditions do not need to be taken literally. For many applications, n need be only 20 or 30 for the approximation to work reasonably well. Also, the restriction that the data be independent identically distributed random variables is not needed, although the sample size at which the approximation works reasonably well increases as the heterogeneity and dependence increase. 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Economics 140A Random Variables Probability In everyday usage