Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Financial Noise
And
Overconfidence
Ubiquitous News and Advice
 CNBC
 WSJ
 Jim Cramer
 Assorted talking heads
 How good can it be?
Thought Experiment
 Listen to CNBC every day
 Choose ten stocks from recommendations
 Buy stocks
 How good can this strategy be?
Thought Experiment
 10 stocks x 260 business days = 2,600 bets
 Assume the bets are independent
 How good does the information have to be such that the probability
of losing money is 1 in 10,000?
Detour: Gaussian Distribution
 Random Variable X depends upon two parameters:
‣ mean value: 
‣ uncertainty (standard deviation): 
1
 x    
Proba  X  b 
 exp 
 dx
2
2 
2
b
2
a
2
Detour: Gaussian Distribution
 Universal “Bell Curve” Distribution
 Prob(X < 0) depends only on the ratio of  / 
ProbX  0 
1
2
 /
 x2 
 exp   2  dx
The Central Limit Theorem
 {Xi}i = 1..N a set of r.v.:
‣ Identically distributed
‣ Independent
‣ All moments are finite
 If X = x1 + x2 + … + xN then:
‣
X has a Gaussian distribution with:
•  = N * average(x1)
•  = N * stddev(x1)
1
 x    
Proba  X  b 
 exp 
 dx
2
2 
2
b
2
a
2
The Central Limit Theorem
 X = x1 + x2 + … + xN is a Gaussian r.v. with:
‣ mean(X) =  = N * average(x1)
‣ stddev(X) =  = N * stddev(x1)
 For a Gaussian r.v. the probability that it is greater than zero is just
a function of the ratio of the mean to the stddev:
Prob(X 0)   
 
 In our experiment, our annual profit/loss will have a ratio of mean to
stddev sqrt(2,600)  50 greater than just one of our stock bets.
This is the advantage of aggregation.
How Good is the Information?
 Assume Prob(Xannual < 0)  1 / 10,000
 mean(Xannual)  3.7 * stddev(Xannual)
 mean(Xind)  (3.7 / 50) * stddev(Xind)  0.07 * stdev(Xind)
 Prob(Xind < 0)  47%
Probability That Advice Is Right?
 Any given piece of advice can’t be much more than 53% reliable,
compared to 50% for a random flip of a coin
 Otherwise, we could make a strategy that almost never loses
money by just aggregating all the different pieces of advice
 Contrast “my price target for this stock is 102 by mid-November”
with “I think it’s 53–47% that this goes up vs. down”
 Precision here greatly exceeds possible accuracy
 Overconfidence
 No error bars
Probability that Advice is Right?
 If we want Prob(Xannual < 0)  1 / 1,000,000 then:
‣ Prob(Xind < 0)  46%, instead of 47%
 A computer algorithm (or team of analysts) predicting something
everyday for all  2,000 U.S. stocks, instead of just 10 stocks, would
need only Prob(Xind < 0)  49.75% to get Prob(Xannual < 0) to 1 in
1,000,000!
 The existence of such a strategy seems unlikely
 Statements expressed with certainty about the market need to be
viewed with skepticism
 Low signal-to-noise ratio
Where Are Our Assumptions Wrong?
 Transaction costs matter
 Individual stock bets almost certainly not independent
 By placing bets on linear combinations of individual stocks one can
partially overcome correlations between bets, but you reduce the
effective number of bets.
 These aren’t just technical details. They can make the difference
between making or losing money.
Honesty Is The Best Policy
 Despite caveats, the big picture is correct
‣ Only small effects exist
‣ The data is very noisy
‣ One can find statistical significance only by aggregating lots of data
 As a result there is great value in:
‣ Clean data
‣ Quantitative sophistication
‣ Discipline
‣ Honesty
Overconfidence and Overfitting
 Humans are good at learning patterns when:
‣ Patterns are persistent
‣ Signal-to-noise ratio isn’t too bad
 Examples include: language acquisition, human motor control,
physics, chemistry, biology, etc.
 Humans are bad at finding weak patterns that come and go as
markets evolve
 They tend to overestimate their abilities to find patterns
Further Consequences of Aggregation
 If profit/loss distribution is Gaussian then only the mean and the
standard deviation matter
 Consider 3 strategies:
‣ Invest all your money in the S&P 500
‣ Invest all your money in the Nasdaq
‣ 50% in the Nasdaq and 50% in short-term U.S. Government
Bonds
 Over the past 40 years, these strategies returned approximately:
‣ 7% +/- 20%
‣ 8% +/- 35%
‣ 5.5% +/- 17.5%
Diversifying Risk
 Imagine we can lend or borrow at some risk-free rate r
 Suppose we have an amount of capital c, which we distribute:
‣ X in S&P 500
‣ Y in Nasdaq
‣ Z in “cash” (where c = X + Y + Z)
 Our investment returns are:
‣ rP = X*rSP + Y*rND + Z*r
‣ rP = X*(rSP – r) + Y*(rND – r) + c*r
‣ E(rP) = X*E(rSP – r) + Y*E(rND – r) + c*r
Consequences of Diversification
 As long as E(rSP) and/or E(rND) exceeds the risk-free rate, r, we
can target any desired return
 The price for high return is high volatility
 One measure of a strategy is the ratio of the return above the
risk-free rate divided by the volatility of the strategy
 Investing everything in Nasdaq gave the best return of the three
strategies in the original list
 Assuming the correlation between ND and SP is 0.85, the optimal
mixture of investments gives:
Y
 0.2
X
Consequences of Diversification
 For our example, we want Y  - 0.2*X, but we get to choose the
value of X. We can choose the level of risk!
 If we choose X  2.4 and Y  -0.5, we get the same risk as
investing everything in the Nasdaq but our return is  10.1%
rather than 8%
 Returns are meaningless if not quoted with:
‣ Volatility
‣ Correlations
 Why does everyone just quote absolute returns?
Past Results Do Not Guarantee Future Results
 If we only look back over the past 20 years, the numbers change:
‣ 3.5% +/- 20% for S&P
‣ 7% +/- 35% for Nasdaq
‣ 5.0% +/- 17.5% for 50% Nasdaq and 50 % Risk-free
 The same optimal weighting calculation gave X  -2.6 and Y  1.9
which gave the same risk as Nasdaq but with a return of 9.3%!
 40 years of data suggests that an investor should go long S&P and
short Nasdaq, but 20 years of data suggests to opposite. If we
used the 40 year weight on the last 20 years we’d end up making
2%.
Application to the Credit Crisis
 Securitization
‣ Bundles of loans meant to behave independently
‣ CDOs are slices of securitized bundles
‣ Rating agencies suggested 1 in 10,000 chance of investments
losing money
‣ All the loans defaulted together, rather than independently!
Credit Models Were Not Robust
 Even if mortgages were independent, the process of securitization
can be very unstable.
 Thought Experiment:
‣ Imagine a mortgage pays out $1 unless it defaults and then
pays $0.
‣ All mortgages are independent and have a default probability
of 5%.
‣ What happens to default probabilities when one bundles
mortgages?
Mortgage Bundling Primer: Tranches
 Combine the payout of 100 mortgages to make 100 new
instruments called “Tranches.”
 Tranche 1 pays $1 if no mortgage defaults.
 Tranche 2 pays $1 if only one mortgage defaults.
 Tranche i pays $1 if the number of defaulted mortgages < i.
 So far all we have done is transformed the cashflow.
 What are the default rates for the new Tranches?
Mortgage Bundling Primer: Tranches
 If each mortgage has default rate p=0.05 then the ith Tranche has
default rate:
100  j
 p (1  p)100 j
p T (i)   
j i  j 
 Where pT(i) is the default probability for the ith Tranche.
 For the first Tranche pT(1) = 99.4%, i.e., it’s very likely to default.
But for the 10th Tranche pT(10) = 2.8%.
 By the 10th Tranche we have created an instrument that is safer
than the original mortgages.
Securitizing Tranches: CDOs
 That was fun, let’s do it again!
 Take 100 type-10 Tranches and bundle them up together using the
same method.
 The default rate for the kth type-10 Tranche is then:
100 
 pT (10) j (1  pT (10))100 j
p CDO (k )   
j k  j 
 The default probability on the 10th type-10 Tranche is then
pCDO(k) = 0.05%. Only 1/100 the default probability of the original
mortgages!
Why Did We Do This: Risk Averse Investors
 After making these manipulations we still only have the same cash
flow from 10,000 mortgages, so why did we do it?
 Some investors will pay a premium for very low risk investments.
 They may even be required by their charter to only invest in
instruments rated very low risk (AAA) by rating agencies.
 They do not have direct access to the mortgage market, so they
can’t mimic securitization themselves.
Are They Really So Safe?
 These results are very sensitive to the assumptions.
 If the underlying mortgages actually have a default probability of
6% (a 20% increase) than the 10th Tranches have a default
probability of 7.8% (a 275% increase).
 Worse, the 10th type-10 Tranches will have a default probability of
25%, a 50,000% increase!
 These models are not robust to errors in the assumptions!
Application to the Credit Crisis
 Connections with thought experiment:
‣ Overconfidence (bad estimate of error bars)
‣ Insufficient independence between loans (bad use of Central
Limit Theorem)
‣ Illusory diversification (just one big bet houses wouldn’t lose
value)
About the D. E. Shaw Group
 Founded in 1988
 Quantitative and qualitative investment strategies
 Offices in North America, Europe, and Asia
 1,500 employees worldwide
 Managing approximately $22 billion (as of April 1, 2010)
About What I Do
 Quantitative Analyst (“Quant”)
 Most Quants have a background in math, physics, EE, CS, or
statistics
‣ Many hold a Ph.D. in one of these subjects
 We work on:
‣ Forecasting future prices
‣ Reducing transaction costs
‣ Modeling/managing risk