Download Linear transformations of random variables

Answer the following questions with proofs Probabilities 1. What are the axioms of probability (discrete case)? How do you justify them using frequencies? 2. In an urn there are 3 balls numbered 1 and one ball numbered 2. 2.1. How do you construct the probability space? 2.2. You are paid $1 if you pull out a ball numbered 2 and nothing if you pull out a ball numbered 1. What do you call the resulting variable? Find its mean and variance 3. In the situation of the previous exercise you are given an opportunity to pull a ball from the urn twice. Construct the probability space and give examples of events for which you would use: 3.1. the addition rule, 3.2. the complement rule and 3.3. the multiplication rule. 3.4. Give the necessary definitions (disjoint, complementary and independent events) 4. Draw a cross-table for two discrete random variables (each having two values) and fill it with numbers that make sense 4.1. What do the probabilities in the main body of the table mean? 4.2. What do the probabilities in the margins mean? 4.3. What is the relation between the probabilities in the main body and margins? Why? 4.4. How would you use the table to explain the notion of conditional probability? 4.5. Explain intuitively the notion of independence of random variables. How is independence of variables defined in terms of the cross-table? 5. CDF and density 5.1. Give a simple example motivating interest in cumulative probabilities 5.2. Define the CDF and write down its properties (monotonicity and the limits at  ). 5.3. Define a density of a continuous random variable and write down its properties (total area under the graph and nonnegativity). 5.4. What is the relationship between CDF and density? Mean value and variance 6. Give the definition of the expected (mean) value in the discrete case 6.1. How would you justify it? How do you relate it to usual averages? 6.2. Prove its linearity 6.3. What is the relationship between linearity, additivity and homogeneity of means? 6.4. What is the expected value of a constant? 7. Give the definition of covariance, state and prove its properties: 7.1. Alternative expression 7.2. Linearity (with respect to the 1st argument when the 2nd is fixed and vice versa) 7.3. Symmetry 7.4. Covariance of a variable with a constant 7.5. Do you know other objects in Math with similar properties? 8. Give the definition of variance and interpret it geometrically. Give an example of two data sets with the same ranges but different variances. State and prove properties: 1 8.1. Variance of a linear combination 8.2. Homogeneity 8.3. Additivity 8.4. Alternative expression 8.5. Variance of a variable does not change when a constant is added to that variable 8.6. Nonnegativity (discrete case) 9. Give the definition of standard deviation, state and prove its properties: 9.1. Homogeneity 9.2. Nonnegativity 9.3. Triangle inequality* 10. Give the definition of independence of two random variables, state and prove its consequences (discrete case): 10.1. Multiplicativity of expected values 10.2. If X , Y are independent, what can be said about their covariance and correlation? 10.3. If X , Y are independent, what can be said about the variance of their sum? Linear transformations of random variables 11. Scaling and centering 11.1. If Y  X  EX , what is the mean of Y? 11.2. If X is a random variable such that ( X )  0 , what is the variance of X ? Y ( X ) 11.3. If X is a random variable such that ( X )  0 , what are the mean and X  Ex standard deviation of Y  ? ( X ) 12. In a table give formulas for the three main statistical characteristics of a population (population mean, variance and correlation) and sample (sample mean, variance and correlation) showing the relationship between them 13. Give the definition and interpret statistically and geometrically the properties of correlation: 13.1. What is the range of correlation? 13.2. What happens if   1 ? 13.3. What happens if 0    1 ? 13.4. What happens if   0 ? 13.5. What happens if   1 ? 13.6. What happens if 1    0 ? 14. Prove that the sample variance is an unbiased estimator of the population variance Special cases 15. Binomial distribution 15.1. A fair coin is flipped three times. Construct the sample space and indicate the values of “number of successes” and “proportion of successes” 15.2. Give the definition of the Bernoulli random variable and find its mean and variance 15.3. Give the definition of a binomial distribution, find its mean and variance 15.4. Give the definition of proportion of successes for the binomial distribution, find its mean and variance 2 15.5. What is the probability of x successes in n independent trials from the same Bernoulli population? Prove the formula 15.6. How is the CLT used for counting the number of successes for the binomial distribution? 16. Sums and sample means of i.i.d. variables 16.1. Give the definition of identically distributed random variables. What can be said about their means and variances? 16.2. What are the mean and variance of a sum S of n independent identically distributed variables? 16.3. What are the mean and variance of a sample mean X of n independent identically distributed variables? Normal variable and its derivatives 17. Give the definition and properties of a standard normal variable 17.1. Formula and graph of the density 17.2. Mean value 17.3. Variance and second moment* 17.4. Fourth moment* 18. Give the definition and properties of a normal variable 18.1. Mean value and variance 18.2. How are the mean and variance reflected in geometrical properties of the density of a normal variable? 19. Give the definition and properties of  2 variables 19.1. Mean and variance of a chi-square variable with one d.o.f. 19.2. Mean and variance of a chi-square variable with n d.o.f. 20. Confidence intervals 20.1. How do you find a confidence interval for X ~ N (, 2 ) ? 20.2. How do you find a confidence interval for a sample mean if  2 is known? 20.3. How do you find a confidence interval for a sample mean if  2 is unknown? 20.4. How do you find a confidence interval for a sample standard deviation? 21. Portfolio theory 21.1. What is the relationship between stock prices and stock returns? 21.2. Prove that a return on a portfolio consisting of two stocks is a linear combination of returns on those stocks, the coefficients being the shares of the investment in the two stocks. 21.3. What are the consequences for their means and standard deviations? 22. Give the motivation, definition and prove the properties of uniform distribution: 22.1. Density 22.2. Cumulative distribution function 22.3. Mean 22.4. Variance 23. Give the definition and describe the properties of the t distribution 24. Give the definition and describe the properties of the F distribution Characteristics of random vectors 25. How is expected value generalized for random vectors? State and prove generalizations of linearity, additivity and homogeneity. 26. How is covariance generalized for random vectors? State and prove generalizations of the alternative expression and linearity. Do you think there is symmetry? 3 27. Define variance in the vector case. State and prove its properties, including nonnegativity 28. Write out V ( X ) in terms of its elements. Does it simplify when the mean of X is zero? 29. Define a standard normal vector and derive its properties 30. Define a normal vector and derive its properties Regression 31. Describe simple regression: 31.1. Algebraic and probabilistic forms of dependence 31.2. What is the minimal condition on the error term? Why is it imposed? 31.3. Using an example show how one equation leads to many equations. How are they written in a matrix form? 31.4. Graphically illustrate Gauss’ idea 32. Derive the OLS estimator formula using FOC 33. What are the classical assumptions? Describe and comment each 34. Under classical assumptions: 34.1. What is the distribution of the dependent vector? 34.2. What is the distribution of the OLS estimator? 35. How do you test for significance the estimator of one regression coefficient? Matrix algebra 36. List properties of real numbers related to addition and indicate how they generalize to matrices, with all the necessary details 36.1. Commutativity 36.2. Associativity 36.3. Existence of 0 36.4. Existence of an opposite element 37. Does commutativity apply to multiplication of matrices? Explain 38. Explain definition of a product of two matrices. What is the reason for such a weird definition? Does it possess associativity? Give an example of application of associativity 39. Show that a nonzero square matrix may not have an inverse. What matrices have inverses? 40. Give the rules for calculating determinants of matrices of the 2nd and 3rd orders. Give general properties of determinants that you know 41. What properties of inverse matrices do you know? Which of them can you prove? When you don’t know the proof, indicate an application. What can you say about the inverses of the null and identity matrices? Generalize the product inversion formula to the case of n factors. 42. What properties of transposed matrices do you know? Which of them can you prove? When you don’t know the proof, indicate an application 43. In a table list matrix operations you know, indicate for each of them the restrictions on the arguments and the dimension of the result 44. What is a symmetric matrix? What properties does it have? 45. Explain the definition of order in the set of matrices. How would you check positive definiteness of a matrix? Can you indicate two matrices that are not comparable? 46. Check that for any matrix X the matrix A = XX exists and is symmetric. Besides, if detA  0, then the inverse of A is also symmetric. 47. For a column-vector x find the products xx and xx. 4 48. Prove that if A  B, then all diagonal elements of A are greater than or equal to those of B. 49. If det X ' X  0 , then the matrix P  X ( X ' X )1 X ' is symmetric and satisfies P2  P . Prove 50. Calculate the expression for x ' Ax where x is a 2-D vector and A is of size 22  (r ' ) 51. Derive the rule for vector differentiation   ' A 52. Derive the rule for vector differentiation  53. Prove that if the matrix X is of size n  m , Y is of size n1 , then Y ' XX ' Y is a nonnegative scalar Answers 1. Probabilities 1.1. Axioms: 1.1.1. 0  pi  1 for all i (intuitively, they represent proportions or percentages of occurrences of events) 1.1.2. Completeness: p1  ...  pn  1 (intuitively, if this sum is less than 1, then the sample space does not include all possible outcomes) 1.2. In general, P( A  B)  P( A)  P( B)  P( A  B) because the intersection is counted twice. For disjoint events P( A  B)  0 and P( A  B)  P( A)  P( B) . Demonstrate on Venn diagrams 1.3. If  denotes the sample space, then the complement of A contains all points in the sample space that don’t belong to A : Ac   \ A . Complement rule: P( Ac )  1  P( A) . Proof: A  Ac  , A  Ac    By completeness and additivity P( A)  P( Ac )  P()  1  P( Ac )  1  P( A) . Demonstrate on Venn diagrams 1.4. Definition: conditional probability of A given B (such that P( B)  0 ) is P( A  B) defined by P( A | B)  . It implies the multiplication rule: P( B) P( A  B)  P( A | B) P( B) . For independent events the multiplication rule reduces to P( A  B)  P( A) P( B) , so P( A | B)  P( A) (intuitively, knowledge of B does not affect probability of A ).  2. Definition: EX  x1 p1  ...  xn pn in the discrete case and EX   xp( x)dx in the  continuous case (intuitively, each value’s importance in the sum is determined by its probability) 2.1. Linearity: E (aX  bY )  aEX  bEY . Proof: E (aX  bY )  (ax1  by1 ) p1  ...  (axn  byn ) pn  ax1 p1  ...  axn pn  by1 p1  ...  byn pn  aEX  bEY 2.2. Additivity: E ( X  Y )  EX  EY . Proof: put a  b  1 in the linearity property 2.3. Homogeneity: E (aX )  aEX . Proof: put b  0 in the linearity property 2.4. Ec  cp1  ...  cpn  c( p1  ...  pn )  c by the completeness axiom 5 3. Definition: cov( X , Y )  E ( X  EX )(Y  EY ) 3.1. Alternative expression: cov( X , Y )  EXY  ( EX )( EY ) . Proof: cov( X , Y )  E ( XY  XEY  ( EX )Y  ( EX )( EY ))   EXY  EX  EY  EX  EY  EX  EY  EXY  EX  EY 3.2. Linearity with respect to the 1st argument (when the 2nd is fixed): cov(aX  bY , Z )  a cov( X , Z )  b cov(Y , Z ). Proof: by the alternative expression cov(aX  bY , Z )  E (aX  bY ) Z  E (aX  bY )  EZ   E (aXZ  bYZ )  (aEX  bEY )  EZ   aEXZ  bEYZ  aEX  EZ  bEY  EZ   a cov( X , Z )  b cov(Y , Z ) Linearity with respect to the 2nd argument (when the 1st is fixed): cov( X , aY  bZ )  a cov( X , Y )  b cov( X , Z ). The proof is similar 3.3. Symmetry: covariance doesn’t change when the arguments change places: cov( X , Y )  cov(Y , X ) . Proof: cov( X , Y )  E ( X  EX )( E  EY )   E (Y  EY )( X  EX )  cov(Y , X ) 3.4. Covariance of any random variable with a constant is zero: cov( X , c)  E ( X  EX )(c  Ec)  0 4. Definition: variance is the expected value of squares of deviations from the mean or, alternatively, covariance of the variable with itself: V ( X )  E ( X  EX )2  cov( X , X ) 4.1. Variance of a linear combination: V (aX  bY )  a 2V ( X )  2ab cov( X , Y )  b2V (Y ) . Proof: V (aX  bY )  E (aX  bY ) 2  E (aX  bY )  E (aX  bY )   E (a 2 X 2  2abXY  b 2Y 2 )  [(aEX  bEY )(aEX  bEY )]   a 2 EX 2  2abEXY  b 2 EY 2  [a 2 ( EX )2  2abEX  EY  b 2 ( EY ) 2 ]   a 2 [ EX 2  ( EX ) 2 ]  2ab[ EXY  EX  EY ]  b 2 [ EY 2  ( EY ) 2 ]   a 2V ( X )  2ab cov( X , Y )  b 2V (Y ) 4.2. Homogeneity: V (aX )  a 2V ( X ) . Proof: put b  0 in the previous property 4.3. Additivity: if cov( X , Y )  0 and a  b  1 , then V ( X  Y )  V ( X )  V (Y ) . In particular, this is true if X , Y are independent 4.4. Alternative expression: V ( X )  cov( X , X )  EX 2  ( EX )2 4.5. Variance of a variable does not change when a constant is added to that variable: V ( X  c)  E ( X  c  E ( X  c))2  E ( X  c  EX  c)2  V ( X ) 4.6. Nonnegativity: V ( X )  0 because squares of deviations from the mean ( xi  EX ) 2 are nonnegative and probabilities are nonnegative, so V ( X )  E ( X  EX ) 2  p1 ( x1  EX ) 2  ...  pn ( xn  EX ) 2  0 5. Definition: ( X )  V ( X ) is called standard deviation 6 5.1. Homogeneity: (aX )  V (aX )  a 2V ( X ) | a | V ( X ) | a | ( X ) 5.2. Nonnegativity: ( X )  0 because an arithmetic square root is applied 5.3. Triangle inequality*: ( X  Y )  ( X )  (Y ) . Proof: | cov( X , Y ) | |  |  1  | cov( X , Y ) | ( X )(Y )  ( X )(Y ) V ( X  Y )  V ( X )  2 cov( X , Y )  V (Y )   2 ( X )  2( X )(Y )  2 (Y )   ( X )  (Y )  2  ( X  Y )  ( X )  (Y ) 6. Scaling and centering 6.1. EY  EX  EX  0 1 6.2. V Y   2 V (X ) 1  (X ) 6.3. EY   X  X  Ex  1  ( EX  EX )  0 . V Y   V   V   1 ( X )  ( X ) ( X )   ( X )  7. Fill the table it with numbers that make sense X 1 Y 2 3 0 1 7.1. The probabilities in the main body of the table mean probabilities of joint events ( X  1, Y  0),..., ( X  3, Y  1) 7.2. The probabilities in the lower margin are the probabilities of the values of X ; those in the right margin are the probabilities of the values of Y 7.3. Sums of probabilities across rows give probabilities in the right margin, sums across columns give probabilities in the lower margin 8. Definition of independence of two random variables: X takes values x1 ,..., xn with probabilities p1  P( X  x1 ),..., pn  P( X  xn ) , Y takes values y1 ,..., ym with probabilities q1  P(Y  y1 ),..., qn  P(Y  ym ) , the pair ( X , Y ) takes values ( x1 , y1 ),..., ( xn , ym ) (joint events) with probabilities pij  P( X  xi , Y  y j ) . X , Y are called independent of probabilities of all joint events are equal to products of the variables’ own events: pij  pi q j for all i, j . Demonstrate in a cross-table taking n  2, m  3 8.1. Multiplicativity of expected values: if X , Y are independent, then EXY  EX  EY . Proof: n m n m EXY   xi y j pij  xi y j pi q j  i 1 j 1 n m i 1 j 1 i 1 j 1   xi pi  y j q j  EX  EY 7 8.2. If X , Y are independent, then their covariance and correlation are zero. Proof: cov( X , Y ) cov( X , Y )  EXY  EX  EY  EX  EY  EX  EY  0 ,   0 ( X )(Y ) 8.3. Additivity: if X , Y are independent, then variance of their sum equals the sum of their variance, V ( X  Y )  V ( X )  V (Y ) 9. Sums and sample means of i.i.d. variables 9.1. Definition of identically distributed random variables (discrete case): X 1 ,..., X n are called identically distributed if they take the same values with the same probabilities: X1 … X n p X n … X n p1 … … … … xm … xm pm This happens when the sample is drawn from the same population. Therefore their means and variances are equal to the mean  and variance  2 , respectively, of the population 9.2. If S  X1  ...  X n , then by linearity ES  EX1  ...  EX n  n, and by independence V ( X )  V ( X 1 )  ...  V ( X n )  n2 where  and  2 are the mean and variance of the population X  ...  X n S ES  we have EX    and 9.3. For the sample mean X  1 n n n V ( S ) 2 V (X )  2  n n 10. Binomial distribution 10.1. Definition. A discrete random variable is completely defined by the table of its values and corresponding probabilities. For the Bernoulli variable X p X2 1 p 1 0 q=1p 0 EX  1 p  0  q  p, V ( X )  EX 2  ( EX )2  1 p  0  q  p 2  p(1  p) 10.2. Definition. n independent trials from the same Bernoulli population give X 1 ,..., X n which are all distributed as X in the above table. The sum S  X1  ...  X n is called a binomial distribution. ES  np, V ( S )  np(1  p) X  ...  X n S  in this case is a proportion of 10.3. The sample mean X  1 n n successes because S may take values from 0 to n and the sample mean np np(1  p) p(1  p) ,V (X )   may take values from 0 to 1. EX  n n2 n 10.4. Since the trials are independent, the probability of one particular combination of x ones and n  x zeros is p x q n  x by the multiplication rule. 8 Since the total number of combinations is C xn and they cannot occur at the same time, by the addition rule P( S  x)  C xn p x q n  x 10.5. Theorem. For large n the binomial distribution is approximately normal. ES  np, V ( S )  np(1  p)  By the CLT P(S  x | n, p)  P( X  x |   np, 2  np(1  p) 11. Mean (or average, or expected value) Variance (or sample variance) Standard deviation (or sample standard deviation) Population EX  x1 p1  ...  xn pn Sample X  ...  X n X 1 obtains n with p1  ...  pn  1/ n V ( X )  E ( X  EX )2 ( x1  X )2  ...  ( xn  X ) 2 n 1 obtains with X instead of EX , 1/( n  1) instead of p1 ,..., pn  p1 ( x1  EX )2  ...  pn ( xn  EX )2 ( X )  V ( X ) s2  s  s2 12. Definition: a standard normal variable z is a continuous random variable with 2 1  x2 e values in R and density p( x)  2 12.1. Symmetry: p( x)  p( x) (show on the graph), limits at infinity: lim p( x)  0 x  12.2. Mean value: Ez  0 because each positive value is balanced with a negative value with the same probability 12.3. Theorem. Ez 2  1 (second moment).  V ( z )  Ez 2  ( Ez )2  1 12.4. Theorem. Ez 4  3 (fourth moment) 13. Definition: a linear function X  az  b of a standard normal variable is called a normal variable 13.1. EX  aEz  b  b , V ( X )  V (az )  a 2V ( z )  a 2 13.2. Positive b moves the mean to the right; negative b moves the mean to the left. The smaller a 2 , the higher concentration of X around its mean (show on the graph) 14. Definition: if u1 ,..., un are standard normal independent, then the sum of squares  2n  u12  ...  un2 is called a  2 variable with n degrees of freedom. 14.1. E12  Eu 2  1, V (12 )  E[(12 )2 ]  ( E12 )2  Eu 4  1  3  1  2 14.2. E2n  E (12  ...  12 )  n, V ( 2n )  V (12 )  ...  V (12 )  2n 9 14.3. Theorem. E (n  1) s 2   2n 1 is distributed as chi-squared with n  1 d.o.f.  2  (n  1) s 2  E2n 1  n  1  Es 2   2 2 15. Confidence intervals X  is standard normal and therefore  b   a  P ( a  X  b)  P  z  . If  is given, a , b can be found from     a   b    P z    P z     / 2 using statistical tables. Then       b   a  P z   1   . Show all areas on the graph     15.2. The population distribution may not be normal but the Central Limit Theorem states that for large n the sample mean will be almost normal. 2 .  By the CLT EX   and V ( X )  n b   a  P ( a  X  b)  P  z  / n   / n X  15.3. Idea: replace  by its unbiased estimator s . Theorem. tn 1  is s/ n distributed as Student’s t distribution with n  1 d.o.f. The resulting interval b   a  is P(a  X  b)  P   tn 1   s/ n   s/ n (n  1)b   (n  1)a   2n 1  15.4. P a  s 2  b  P   2 2    16. Definition: for random variables with nonzero standard deviations cov( X , Y ) . Geometrically,  is cosine of the angle between  ( X )(Y ) X  EX and Y  EY . Statistically,  is a measure of co-movement of variables X , Y (the book says it is a measure of linear relationship between them which is actually true only in the extreme cases   1 ) 16.1. The range of correlation is [-1,1] because it is cosine 16.2. If   1 , then the angle between X  EX and Y  EY is 0  Y  EY  a ( X  EX ) with some a  0 . Denoting b  EY  aEX and rearranging we see that Y is a linear function of X with a positive slope: Y  aX  b . Positive increases in X are exactly matched by positive increases in Y 16.3. If 0    1 , then, on average, positive increases in X are associated with positive increases in Y . The angle between X  EX and Y  EY belongs to (0, /2) 15.1. If X ~ N (, 2 ) , then z    10 16.4. If   0 , then the angle between X  EX and Y  EY is /2. On average, there is no association in movement of X and Y 16.5. If   1 , then the angle between X  EX and Y  EY is   Y  EY  a ( X  EX ) with some a  0 . Denoting b  EY  aEX and rearranging we see that Y is a linear function of X with a negative slope: Y  aX  b . Positive increases in X are exactly matched by negative changes in Y 16.6. If 1    0 , then, on average, positive increases in X are associated with negative increases in Y . The angle between X  EX and Y  EY belongs to (/2, ) 17. Definition: regression is an equation of form Y  aX  b  e where Y , X , e are random variables, Y , X are observed (can be measured), e is unobservable and coefficients a , b are unknown. e is called an error. Regression is an approximate linear relationship between Y , X 17.1. Main condition on the error term: Ee  0 . Because of this condition, the regression model is correct on average: EY  aEX  b 17.2. Technical condition on the error term: cov( X , e)  0 . It simplifies the derivation of estimators for coefficients. cov(Y , X )  cov(aX  b  e, X )  a cov( X , X )  cov(b, X )  cov(e, X )  aV ( X , X ) cov(Y , X ) (Y ) ˆ  bˆ  EY  aEX  V (X ) ( X ) 17.3. Since the mean of error 0, for prediction purposes it is assumed to be 0. ˆ  bˆ Estimated coefficients are plugged in the equation.  Ypredicted  aX  aˆ  18. If P0 denotes the stock price in the beginning of the period, P1 denotes its price in the end of the period, then the rate of return for the period is defined by P P r  1 0 .  P1  (1  r ) P0 . P0 is assumed non-stochastic (usually it is the P0 current price). The future price P1 is unknown and random. EP1  (1  Er ) P0 . V ( P1 )  V (rP0 )  PV 0 (r ) . 19. Let PA , PB denote prices of stocks A, B . The portfolio consists of n A shares of A and nB shares of B . The value of the portfolio is M  nA PA  nB PB . The numbers of shares are assumed fixed and the prices random. EM  nA EPA  nB EPB , V ( M )  nA2V ( PA )  nB2V ( PB )  2nA nB cov( PA , PB ) 20. Definition: a uniform distribution U is a continuous random variable such that a) the probability of U taking values outside some interval (a, b) is 0 and b) any value inside that interval is equally likely. 1 20.1. The density of U is identically equal to inside (a, b) and is 0 outside ba that interval. 11 20.2. The cumulative distribution function F ( x ) of U is zero for x  a , is a xa linear function F ( x)  for a  x  b and is identically 1 after that ba 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Linear transformations of random variables