Download Linear transformations of random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Answer the following questions with proofs
Probabilities
1. What are the axioms of probability (discrete case)? How do you justify them using
frequencies?
2. In an urn there are 3 balls numbered 1 and one ball numbered 2.
2.1. How do you construct the probability space?
2.2. You are paid $1 if you pull out a ball numbered 2 and nothing if you pull out
a ball numbered 1. What do you call the resulting variable? Find its mean
and variance
3. In the situation of the previous exercise you are given an opportunity to pull a ball
from the urn twice. Construct the probability space and give examples of events
for which you would use:
3.1. the addition rule,
3.2. the complement rule and
3.3. the multiplication rule.
3.4. Give the necessary definitions (disjoint, complementary and independent
events)
4. Draw a cross-table for two discrete random variables (each having two values)
and fill it with numbers that make sense
4.1. What do the probabilities in the main body of the table mean?
4.2. What do the probabilities in the margins mean?
4.3. What is the relation between the probabilities in the main body and margins?
Why?
4.4. How would you use the table to explain the notion of conditional probability?
4.5. Explain intuitively the notion of independence of random variables. How is
independence of variables defined in terms of the cross-table?
5. CDF and density
5.1. Give a simple example motivating interest in cumulative probabilities
5.2. Define the CDF and write down its properties (monotonicity and the limits at
 ).
5.3. Define a density of a continuous random variable and write down its
properties (total area under the graph and nonnegativity).
5.4. What is the relationship between CDF and density?
Mean value and variance
6. Give the definition of the expected (mean) value in the discrete case
6.1. How would you justify it? How do you relate it to usual averages?
6.2. Prove its linearity
6.3. What is the relationship between linearity, additivity and homogeneity of
means?
6.4. What is the expected value of a constant?
7. Give the definition of covariance, state and prove its properties:
7.1. Alternative expression
7.2. Linearity (with respect to the 1st argument when the 2nd is fixed and vice
versa)
7.3. Symmetry
7.4. Covariance of a variable with a constant
7.5. Do you know other objects in Math with similar properties?
8. Give the definition of variance and interpret it geometrically. Give an example of
two data sets with the same ranges but different variances. State and prove
properties:
1
8.1. Variance of a linear combination
8.2. Homogeneity
8.3. Additivity
8.4. Alternative expression
8.5. Variance of a variable does not change when a constant is added to that
variable
8.6. Nonnegativity (discrete case)
9. Give the definition of standard deviation, state and prove its properties:
9.1. Homogeneity
9.2. Nonnegativity
9.3. Triangle inequality*
10. Give the definition of independence of two random variables, state and prove its
consequences (discrete case):
10.1. Multiplicativity of expected values
10.2. If X , Y are independent, what can be said about their covariance and
correlation?
10.3. If X , Y are independent, what can be said about the variance of their sum?
Linear transformations of random variables
11. Scaling and centering
11.1. If Y  X  EX , what is the mean of Y?
11.2. If X is a random variable such that ( X )  0 , what is the variance of
X
?
Y
( X )
11.3. If X is a random variable such that ( X )  0 , what are the mean and
X  Ex
standard deviation of Y 
?
( X )
12. In a table give formulas for the three main statistical characteristics of a
population (population mean, variance and correlation) and sample (sample mean,
variance and correlation) showing the relationship between them
13. Give the definition and interpret statistically and geometrically the properties of
correlation:
13.1. What is the range of correlation?
13.2. What happens if   1 ?
13.3. What happens if 0    1 ?
13.4. What happens if   0 ?
13.5. What happens if   1 ?
13.6. What happens if 1    0 ?
14. Prove that the sample variance is an unbiased estimator of the population variance
Special cases
15. Binomial distribution
15.1. A fair coin is flipped three times. Construct the sample space and indicate
the values of “number of successes” and “proportion of successes”
15.2. Give the definition of the Bernoulli random variable and find its mean and
variance
15.3. Give the definition of a binomial distribution, find its mean and variance
15.4. Give the definition of proportion of successes for the binomial distribution,
find its mean and variance
2
15.5. What is the probability of x successes in n independent trials from the
same Bernoulli population? Prove the formula
15.6. How is the CLT used for counting the number of successes for the binomial
distribution?
16. Sums and sample means of i.i.d. variables
16.1. Give the definition of identically distributed random variables. What can be
said about their means and variances?
16.2. What are the mean and variance of a sum S of n independent identically
distributed variables?
16.3. What are the mean and variance of a sample mean X of n independent
identically distributed variables?
Normal variable and its derivatives
17. Give the definition and properties of a standard normal variable
17.1. Formula and graph of the density
17.2. Mean value
17.3. Variance and second moment*
17.4. Fourth moment*
18. Give the definition and properties of a normal variable
18.1. Mean value and variance
18.2. How are the mean and variance reflected in geometrical properties of the
density of a normal variable?
19. Give the definition and properties of  2 variables
19.1. Mean and variance of a chi-square variable with one d.o.f.
19.2. Mean and variance of a chi-square variable with n d.o.f.
20. Confidence intervals
20.1. How do you find a confidence interval for X ~ N (, 2 ) ?
20.2. How do you find a confidence interval for a sample mean if  2 is known?
20.3. How do you find a confidence interval for a sample mean if  2 is unknown?
20.4. How do you find a confidence interval for a sample standard deviation?
21. Portfolio theory
21.1. What is the relationship between stock prices and stock returns?
21.2. Prove that a return on a portfolio consisting of two stocks is a linear
combination of returns on those stocks, the coefficients being the shares of
the investment in the two stocks.
21.3. What are the consequences for their means and standard deviations?
22. Give the motivation, definition and prove the properties of uniform distribution:
22.1. Density
22.2. Cumulative distribution function
22.3. Mean
22.4. Variance
23. Give the definition and describe the properties of the t distribution
24. Give the definition and describe the properties of the F distribution
Characteristics of random vectors
25. How is expected value generalized for random vectors? State and prove
generalizations of linearity, additivity and homogeneity.
26. How is covariance generalized for random vectors? State and prove
generalizations of the alternative expression and linearity. Do you think there is
symmetry?
3
27. Define variance in the vector case. State and prove its properties, including nonnegativity
28. Write out V ( X ) in terms of its elements. Does it simplify when the mean of X is
zero?
29. Define a standard normal vector and derive its properties
30. Define a normal vector and derive its properties
Regression
31. Describe simple regression:
31.1. Algebraic and probabilistic forms of dependence
31.2. What is the minimal condition on the error term? Why is it imposed?
31.3. Using an example show how one equation leads to many equations. How are
they written in a matrix form?
31.4. Graphically illustrate Gauss’ idea
32. Derive the OLS estimator formula using FOC
33. What are the classical assumptions? Describe and comment each
34. Under classical assumptions:
34.1. What is the distribution of the dependent vector?
34.2. What is the distribution of the OLS estimator?
35. How do you test for significance the estimator of one regression coefficient?
Matrix algebra
36. List properties of real numbers related to addition and indicate how they
generalize to matrices, with all the necessary details
36.1. Commutativity
36.2. Associativity
36.3. Existence of 0
36.4. Existence of an opposite element
37. Does commutativity apply to multiplication of matrices? Explain
38. Explain definition of a product of two matrices. What is the reason for such a
weird definition? Does it possess associativity? Give an example of application of
associativity
39. Show that a nonzero square matrix may not have an inverse. What matrices have
inverses?
40. Give the rules for calculating determinants of matrices of the 2nd and 3rd orders.
Give general properties of determinants that you know
41. What properties of inverse matrices do you know? Which of them can you prove?
When you don’t know the proof, indicate an application. What can you say about
the inverses of the null and identity matrices? Generalize the product inversion
formula to the case of n factors.
42. What properties of transposed matrices do you know? Which of them can you
prove? When you don’t know the proof, indicate an application
43. In a table list matrix operations you know, indicate for each of them the
restrictions on the arguments and the dimension of the result
44. What is a symmetric matrix? What properties does it have?
45. Explain the definition of order in the set of matrices. How would you check
positive definiteness of a matrix? Can you indicate two matrices that are not
comparable?
46. Check that for any matrix X the matrix A = XX exists and is symmetric. Besides,
if detA  0, then the inverse of A is also symmetric.
47. For a column-vector x find the products xx and xx.
4
48. Prove that if A  B, then all diagonal elements of A are greater than or equal to
those of B.
49. If det X ' X  0 , then the matrix P  X ( X ' X )1 X ' is symmetric and satisfies
P2  P . Prove
50. Calculate the expression for x ' Ax where x is a 2-D vector and A is of size 22
 (r ' )
51. Derive the rule for vector differentiation

 ' A
52. Derive the rule for vector differentiation

53. Prove that if the matrix X is of size n  m , Y is of size n1 , then Y ' XX ' Y is a
nonnegative scalar
Answers
1. Probabilities
1.1. Axioms:
1.1.1. 0  pi  1 for all i (intuitively, they represent proportions or
percentages of occurrences of events)
1.1.2. Completeness: p1  ...  pn  1 (intuitively, if this sum is less than 1,
then the sample space does not include all possible outcomes)
1.2. In general, P( A  B)  P( A)  P( B)  P( A  B) because the intersection is
counted twice. For disjoint events P( A  B)  0 and
P( A  B)  P( A)  P( B) . Demonstrate on Venn diagrams
1.3. If  denotes the sample space, then the complement of A contains all
points in the sample space that don’t belong to A : Ac   \ A .
Complement rule: P( Ac )  1  P( A) . Proof: A  Ac  , A  Ac    By
completeness and additivity P( A)  P( Ac )  P()  1  P( Ac )  1  P( A) .
Demonstrate on Venn diagrams
1.4. Definition: conditional probability of A given B (such that P( B)  0 ) is
P( A  B)
defined by P( A | B) 
. It implies the multiplication rule:
P( B)
P( A  B)  P( A | B) P( B) . For independent events the multiplication rule
reduces to P( A  B)  P( A) P( B) , so P( A | B)  P( A) (intuitively,
knowledge of B does not affect probability of A ).

2. Definition: EX  x1 p1  ...  xn pn in the discrete case and EX   xp( x)dx in the

continuous case (intuitively, each value’s importance in the sum is determined by
its probability)
2.1. Linearity: E (aX  bY )  aEX  bEY . Proof:
E (aX  bY )  (ax1  by1 ) p1  ...  (axn  byn ) pn
 ax1 p1  ...  axn pn  by1 p1  ...  byn pn  aEX  bEY
2.2. Additivity: E ( X  Y )  EX  EY . Proof: put a  b  1 in the linearity
property
2.3. Homogeneity: E (aX )  aEX . Proof: put b  0 in the linearity property
2.4. Ec  cp1  ...  cpn  c( p1  ...  pn )  c by the completeness axiom
5
3. Definition: cov( X , Y )  E ( X  EX )(Y  EY )
3.1. Alternative expression: cov( X , Y )  EXY  ( EX )( EY ) . Proof:
cov( X , Y )  E ( XY  XEY  ( EX )Y  ( EX )( EY )) 
 EXY  EX  EY  EX  EY  EX  EY  EXY  EX  EY
3.2. Linearity with respect to the 1st argument (when the 2nd is fixed):
cov(aX  bY , Z )  a cov( X , Z )  b cov(Y , Z ).
Proof: by the alternative expression
cov(aX  bY , Z )  E (aX  bY ) Z  E (aX  bY )  EZ 
 E (aXZ  bYZ )  (aEX  bEY )  EZ 
 aEXZ  bEYZ  aEX  EZ  bEY  EZ 
 a cov( X , Z )  b cov(Y , Z )
Linearity with respect to the 2nd argument (when the 1st is fixed):
cov( X , aY  bZ )  a cov( X , Y )  b cov( X , Z ). The proof is similar
3.3. Symmetry: covariance doesn’t change when the arguments change
places: cov( X , Y )  cov(Y , X ) . Proof:
cov( X , Y )  E ( X  EX )( E  EY ) 
 E (Y  EY )( X  EX )  cov(Y , X )
3.4. Covariance of any random variable with a constant is zero:
cov( X , c)  E ( X  EX )(c  Ec)  0
4. Definition: variance is the expected value of squares of deviations from the mean
or, alternatively, covariance of the variable with itself:
V ( X )  E ( X  EX )2  cov( X , X )
4.1. Variance of a linear combination:
V (aX  bY )  a 2V ( X )  2ab cov( X , Y )  b2V (Y ) . Proof:
V (aX  bY )  E (aX  bY ) 2  E (aX  bY )  E (aX  bY ) 
 E (a 2 X 2  2abXY  b 2Y 2 )  [(aEX  bEY )(aEX  bEY )] 
 a 2 EX 2  2abEXY  b 2 EY 2  [a 2 ( EX )2  2abEX  EY  b 2 ( EY ) 2 ] 
 a 2 [ EX 2  ( EX ) 2 ]  2ab[ EXY  EX  EY ]  b 2 [ EY 2  ( EY ) 2 ] 
 a 2V ( X )  2ab cov( X , Y )  b 2V (Y )
4.2. Homogeneity: V (aX )  a 2V ( X ) . Proof: put b  0 in the previous property
4.3. Additivity: if cov( X , Y )  0 and a  b  1 , then V ( X  Y )  V ( X )  V (Y ) . In
particular, this is true if X , Y are independent
4.4. Alternative expression: V ( X )  cov( X , X )  EX 2  ( EX )2
4.5. Variance of a variable does not change when a constant is added to that
variable: V ( X  c)  E ( X  c  E ( X  c))2  E ( X  c  EX  c)2  V ( X )
4.6. Nonnegativity: V ( X )  0 because squares of deviations from the mean
( xi  EX ) 2 are nonnegative and probabilities are nonnegative, so
V ( X )  E ( X  EX ) 2  p1 ( x1  EX ) 2  ...  pn ( xn  EX ) 2  0
5. Definition: ( X )  V ( X ) is called standard deviation
6
5.1. Homogeneity: (aX )  V (aX )  a 2V ( X ) | a | V ( X ) | a | ( X )
5.2. Nonnegativity: ( X )  0 because an arithmetic square root is applied
5.3. Triangle inequality*: ( X  Y )  ( X )  (Y ) . Proof:
| cov( X , Y ) |
|  |
 1  | cov( X , Y ) | ( X )(Y ) 
( X )(Y )
V ( X  Y )  V ( X )  2 cov( X , Y )  V (Y ) 
 2 ( X )  2( X )(Y )  2 (Y )   ( X )  (Y ) 
2

( X  Y )  ( X )  (Y )
6. Scaling and centering
6.1. EY  EX  EX  0
1
6.2. V Y   2
V (X ) 1
 (X )
6.3. EY 
 X
 X 
Ex 
1

( EX  EX )  0 . V Y   V 
 V 
 1
( X )
 ( X ) ( X ) 
 ( X ) 
7. Fill the table it with numbers that make sense
X
1
Y
2
3
0
1
7.1. The probabilities in the main body of the table mean probabilities of joint
events ( X  1, Y  0),..., ( X  3, Y  1)
7.2. The probabilities in the lower margin are the probabilities of the values of X ;
those in the right margin are the probabilities of the values of Y
7.3. Sums of probabilities across rows give probabilities in the right margin, sums
across columns give probabilities in the lower margin
8. Definition of independence of two random variables:
X takes values x1 ,..., xn with probabilities p1  P( X  x1 ),..., pn  P( X  xn ) ,
Y takes values y1 ,..., ym with probabilities q1  P(Y  y1 ),..., qn  P(Y  ym ) ,
the pair ( X , Y ) takes values ( x1 , y1 ),..., ( xn , ym ) (joint events) with probabilities
pij  P( X  xi , Y  y j ) .
X , Y are called independent of probabilities of all joint events are equal to
products of the variables’ own events: pij  pi q j for all i, j . Demonstrate in a
cross-table taking n  2, m  3
8.1. Multiplicativity of expected values: if X , Y are independent, then
EXY  EX  EY . Proof:
n
m
n
m
EXY   xi y j pij  xi y j pi q j 
i 1 j 1
n
m
i 1
j 1
i 1 j 1
  xi pi  y j q j  EX  EY
7
8.2. If X , Y are independent, then their covariance and correlation are zero. Proof:
cov( X , Y )
cov( X , Y )  EXY  EX  EY  EX  EY  EX  EY  0 ,  
0
( X )(Y )
8.3. Additivity: if X , Y are independent, then variance of their sum equals the
sum of their variance, V ( X  Y )  V ( X )  V (Y )
9. Sums and sample means of i.i.d. variables
9.1. Definition of identically distributed random variables (discrete case):
X 1 ,..., X n are called identically distributed if they take the same values
with the same probabilities:
X1 … X n p
X n … X n p1
… … … …
xm … xm
pm
This happens when the sample is drawn from the same population.
Therefore their means and variances are equal to the mean  and variance
 2 , respectively, of the population
9.2. If S  X1  ...  X n , then by linearity ES  EX1  ...  EX n  n, and by
independence V ( X )  V ( X 1 )  ...  V ( X n )  n2 where  and  2 are the
mean and variance of the population
X  ...  X n S
ES
 we have EX 
  and
9.3. For the sample mean X  1
n
n
n
V ( S ) 2
V (X )  2 
n
n
10. Binomial distribution
10.1. Definition. A discrete random variable is completely defined by the table of
its values and corresponding probabilities. For the Bernoulli variable
X p
X2
1 p
1
0 q=1p 0
EX  1 p  0  q  p, V ( X )  EX 2  ( EX )2  1 p  0  q  p 2  p(1  p)
10.2. Definition. n independent trials from the same Bernoulli population give
X 1 ,..., X n which are all distributed as X in the above table. The sum
S  X1  ...  X n is called a binomial distribution.
ES  np, V ( S )  np(1  p)
X  ...  X n S
 in this case is a proportion of
10.3. The sample mean X  1
n
n
successes because S may take values from 0 to n and the sample mean
np
np(1  p) p(1  p)
,V (X ) 

may take values from 0 to 1. EX 
n
n2
n
10.4. Since the trials are independent, the probability of one particular
combination of x ones and n  x zeros is p x q n  x by the multiplication rule.
8
Since the total number of combinations is C xn and they cannot occur at the
same time, by the addition rule P( S  x)  C xn p x q n  x
10.5. Theorem. For large n the binomial distribution is approximately normal.
ES  np, V ( S )  np(1  p)  By the CLT
P(S  x | n, p)  P( X  x |   np, 2  np(1  p)
11.
Mean (or
average, or
expected
value)
Variance (or
sample
variance)
Standard
deviation (or
sample
standard
deviation)
Population
EX  x1 p1  ...  xn pn
Sample
X  ...  X n
X 1
obtains
n
with p1  ...  pn  1/ n
V ( X )  E ( X  EX )2
( x1  X )2  ...  ( xn  X ) 2
n 1
obtains with X instead of
EX , 1/( n  1) instead of
p1 ,..., pn
 p1 ( x1  EX )2  ...  pn ( xn  EX )2
( X )  V ( X )
s2 
s  s2
12. Definition: a standard normal variable z is a continuous random variable with
2
1  x2
e
values in R and density p( x) 
2
12.1. Symmetry: p( x)  p( x) (show on the graph), limits at infinity:
lim p( x)  0
x 
12.2. Mean value: Ez  0 because each positive value is balanced with a negative
value with the same probability
12.3. Theorem. Ez 2  1 (second moment).  V ( z )  Ez 2  ( Ez )2  1
12.4. Theorem. Ez 4  3 (fourth moment)
13. Definition: a linear function X  az  b of a standard normal variable is called a
normal variable
13.1. EX  aEz  b  b , V ( X )  V (az )  a 2V ( z )  a 2
13.2. Positive b moves the mean to the right; negative b moves the mean to the
left. The smaller a 2 , the higher concentration of X around its mean (show
on the graph)
14. Definition: if u1 ,..., un are standard normal independent, then the sum of squares
 2n  u12  ...  un2 is called a  2 variable with n degrees of freedom.
14.1. E12  Eu 2  1, V (12 )  E[(12 )2 ]  ( E12 )2  Eu 4  1  3  1  2
14.2. E2n  E (12  ...  12 )  n, V ( 2n )  V (12 )  ...  V (12 )  2n
9
14.3. Theorem.
E
(n  1) s 2
  2n 1 is distributed as chi-squared with n  1 d.o.f. 
2

(n  1) s 2
 E2n 1  n  1  Es 2   2
2
15. Confidence intervals
X 
is standard normal and therefore

b 
 a 
P ( a  X  b)  P 
z
 . If  is given, a , b can be found from
 
 
a  
b 


P z 
  P z 
   / 2 using statistical tables. Then
 
 


b 
 a 
P
z
  1   . Show all areas on the graph
 
 
15.2. The population distribution may not be normal but the Central Limit
Theorem states that for large n the sample mean will be almost normal.
2
.  By the CLT
EX   and V ( X ) 
n
b 
 a 
P ( a  X  b)  P 
z

/ n 
 / n
X 
15.3. Idea: replace  by its unbiased estimator s . Theorem. tn 1 
is
s/ n
distributed as Student’s t distribution with n  1 d.o.f. The resulting interval
b 
 a 
is P(a  X  b)  P 
 tn 1 

s/ n 
 s/ n
(n  1)b 
 (n  1)a
  2n 1 
15.4. P a  s 2  b  P 

2
2 
 
16. Definition: for random variables with nonzero standard deviations
cov( X , Y )
. Geometrically,  is cosine of the angle between

( X )(Y )
X  EX and Y  EY . Statistically,  is a measure of co-movement of variables
X , Y (the book says it is a measure of linear relationship between them which is
actually true only in the extreme cases   1 )
16.1. The range of correlation is [-1,1] because it is cosine
16.2. If   1 , then the angle between X  EX and Y  EY is 0 
Y  EY  a ( X  EX ) with some a  0 . Denoting b  EY  aEX and
rearranging we see that Y is a linear function of X with a positive slope:
Y  aX  b . Positive increases in X are exactly matched by positive
increases in Y
16.3. If 0    1 , then, on average, positive increases in X are associated with
positive increases in Y . The angle between X  EX and Y  EY belongs to
(0, /2)
15.1. If X ~ N (, 2 ) , then z 


10
16.4. If   0 , then the angle between X  EX and Y  EY is /2. On average,
there is no association in movement of X and Y
16.5. If   1 , then the angle between X  EX and Y  EY is  
Y  EY  a ( X  EX ) with some a  0 . Denoting b  EY  aEX and
rearranging we see that Y is a linear function of X with a negative slope:
Y  aX  b . Positive increases in X are exactly matched by negative
changes in Y
16.6. If 1    0 , then, on average, positive increases in X are associated with
negative increases in Y . The angle between X  EX and Y  EY belongs to
(/2, )
17. Definition: regression is an equation of form Y  aX  b  e where Y , X , e are
random variables, Y , X are observed (can be measured), e is unobservable and
coefficients a , b are unknown. e is called an error. Regression is an approximate
linear relationship between Y , X
17.1. Main condition on the error term: Ee  0 . Because of this condition, the
regression model is correct on average: EY  aEX  b
17.2. Technical condition on the error term: cov( X , e)  0 . It simplifies the
derivation of estimators for coefficients.
cov(Y , X )  cov(aX  b  e, X )  a cov( X , X )  cov(b, X )  cov(e, X )  aV ( X , X )
cov(Y , X )
(Y )
ˆ
 bˆ  EY  aEX

V (X )
( X )
17.3. Since the mean of error 0, for prediction purposes it is assumed to be 0.
ˆ  bˆ
Estimated coefficients are plugged in the equation.  Ypredicted  aX
 aˆ 
18. If P0 denotes the stock price in the beginning of the period, P1 denotes its price in
the end of the period, then the rate of return for the period is defined by
P P
r  1 0 .  P1  (1  r ) P0 . P0 is assumed non-stochastic (usually it is the
P0
current price). The future price P1 is unknown and random. EP1  (1  Er ) P0 .
V ( P1 )  V (rP0 )  PV
0 (r ) .
19. Let PA , PB denote prices of stocks A, B . The portfolio consists of n A shares of A
and nB shares of B . The value of the portfolio is M  nA PA  nB PB . The numbers
of shares are assumed fixed and the prices random. EM  nA EPA  nB EPB ,
V ( M )  nA2V ( PA )  nB2V ( PB )  2nA nB cov( PA , PB )
20. Definition: a uniform distribution U is a continuous random variable such that
a) the probability of U taking values outside some interval (a, b) is 0 and b) any
value inside that interval is equally likely.
1
20.1. The density of U is identically equal to
inside (a, b) and is 0 outside
ba
that interval.
11
20.2. The cumulative distribution function F ( x ) of U is zero for x  a , is a
xa
linear function F ( x) 
for a  x  b and is identically 1 after that
ba
12