Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Financial Noise And Overconfidence Ubiquitous News and Advice CNBC WSJ Jim Cramer Assorted talking heads How good can it be? Thought Experiment Listen to CNBC every day Choose ten stocks from recommendations Buy stocks How good can this strategy be? Thought Experiment 10 stocks x 260 business days = 2,600 bets Assume the bets are independent How good does the information have to be such that the probability of losing money is 1 in 10,000? Detour: Gaussian Distribution Random Variable X depends upon two parameters: ‣ mean value: ‣ uncertainty (standard deviation): 1 x Proba X b exp dx 2 2 2 b 2 a 2 Detour: Gaussian Distribution Universal “Bell Curve” Distribution Prob(X < 0) depends only on the ratio of / ProbX 0 1 2 / x2 exp 2 dx The Central Limit Theorem {Xi}i = 1..N a set of r.v.: ‣ Identically distributed ‣ Independent ‣ All moments are finite If X = x1 + x2 + … + xN then: ‣ X has a Gaussian distribution with: • = N * average(x1) • = N * stddev(x1) 1 x Proba X b exp dx 2 2 2 b 2 a 2 The Central Limit Theorem X = x1 + x2 + … + xN is a Gaussian r.v. with: ‣ mean(X) = = N * average(x1) ‣ stddev(X) = = N * stddev(x1) For a Gaussian r.v. the probability that it is greater than zero is just a function of the ratio of the mean to the stddev: Prob(X 0) In our experiment, our annual profit/loss will have a ratio of mean to stddev sqrt(2,600) 50 greater than just one of our stock bets. This is the advantage of aggregation. How Good is the Information? Assume Prob(Xannual < 0) 1 / 10,000 mean(Xannual) 3.7 * stddev(Xannual) mean(Xind) (3.7 / 50) * stddev(Xind) 0.07 * stdev(Xind) Prob(Xind < 0) 47% Probability That Advice Is Right? Any given piece of advice can’t be much more than 53% reliable, compared to 50% for a random flip of a coin Otherwise, we could make a strategy that almost never loses money by just aggregating all the different pieces of advice Contrast “my price target for this stock is 102 by mid-November” with “I think it’s 53–47% that this goes up vs. down” Precision here greatly exceeds possible accuracy Overconfidence No error bars Probability that Advice is Right? If we want Prob(Xannual < 0) 1 / 1,000,000 then: ‣ Prob(Xind < 0) 46%, instead of 47% A computer algorithm (or team of analysts) predicting something everyday for all 2,000 U.S. stocks, instead of just 10 stocks, would need only Prob(Xind < 0) 49.75% to get Prob(Xannual < 0) to 1 in 1,000,000! The existence of such a strategy seems unlikely Statements expressed with certainty about the market need to be viewed with skepticism Low signal-to-noise ratio Where Are Our Assumptions Wrong? Transaction costs matter Individual stock bets almost certainly not independent By placing bets on linear combinations of individual stocks one can partially overcome correlations between bets, but you reduce the effective number of bets. These aren’t just technical details. They can make the difference between making or losing money. Honesty Is The Best Policy Despite caveats, the big picture is correct ‣ Only small effects exist ‣ The data is very noisy ‣ One can find statistical significance only by aggregating lots of data As a result there is great value in: ‣ Clean data ‣ Quantitative sophistication ‣ Discipline ‣ Honesty Overconfidence and Overfitting Humans are good at learning patterns when: ‣ Patterns are persistent ‣ Signal-to-noise ratio isn’t too bad Examples include: language acquisition, human motor control, physics, chemistry, biology, etc. Humans are bad at finding weak patterns that come and go as markets evolve They tend to overestimate their abilities to find patterns Further Consequences of Aggregation If profit/loss distribution is Gaussian then only the mean and the standard deviation matter Consider 3 strategies: ‣ Invest all your money in the S&P 500 ‣ Invest all your money in the Nasdaq ‣ 50% in the Nasdaq and 50% in short-term U.S. Government Bonds Over the past 40 years, these strategies returned approximately: ‣ 7% +/- 20% ‣ 8% +/- 35% ‣ 5.5% +/- 17.5% Diversifying Risk Imagine we can lend or borrow at some risk-free rate r Suppose we have an amount of capital c, which we distribute: ‣ X in S&P 500 ‣ Y in Nasdaq ‣ Z in “cash” (where c = X + Y + Z) Our investment returns are: ‣ rP = X*rSP + Y*rND + Z*r ‣ rP = X*(rSP – r) + Y*(rND – r) + c*r ‣ E(rP) = X*E(rSP – r) + Y*E(rND – r) + c*r Consequences of Diversification As long as E(rSP) and/or E(rND) exceeds the risk-free rate, r, we can target any desired return The price for high return is high volatility One measure of a strategy is the ratio of the return above the risk-free rate divided by the volatility of the strategy Investing everything in Nasdaq gave the best return of the three strategies in the original list Assuming the correlation between ND and SP is 0.85, the optimal mixture of investments gives: Y 0.2 X Consequences of Diversification For our example, we want Y - 0.2*X, but we get to choose the value of X. We can choose the level of risk! If we choose X 2.4 and Y -0.5, we get the same risk as investing everything in the Nasdaq but our return is 10.1% rather than 8% Returns are meaningless if not quoted with: ‣ Volatility ‣ Correlations Why does everyone just quote absolute returns? Past Results Do Not Guarantee Future Results If we only look back over the past 20 years, the numbers change: ‣ 3.5% +/- 20% for S&P ‣ 7% +/- 35% for Nasdaq ‣ 5.0% +/- 17.5% for 50% Nasdaq and 50 % Risk-free The same optimal weighting calculation gave X -2.6 and Y 1.9 which gave the same risk as Nasdaq but with a return of 9.3%! 40 years of data suggests that an investor should go long S&P and short Nasdaq, but 20 years of data suggests to opposite. If we used the 40 year weight on the last 20 years we’d end up making 2%. Application to the Credit Crisis Securitization ‣ Bundles of loans meant to behave independently ‣ CDOs are slices of securitized bundles ‣ Rating agencies suggested 1 in 10,000 chance of investments losing money ‣ All the loans defaulted together, rather than independently! Credit Models Were Not Robust Even if mortgages were independent, the process of securitization can be very unstable. Thought Experiment: ‣ Imagine a mortgage pays out $1 unless it defaults and then pays $0. ‣ All mortgages are independent and have a default probability of 5%. ‣ What happens to default probabilities when one bundles mortgages? Mortgage Bundling Primer: Tranches Combine the payout of 100 mortgages to make 100 new instruments called “Tranches.” Tranche 1 pays $1 if no mortgage defaults. Tranche 2 pays $1 if only one mortgage defaults. Tranche i pays $1 if the number of defaulted mortgages < i. So far all we have done is transformed the cashflow. What are the default rates for the new Tranches? Mortgage Bundling Primer: Tranches If each mortgage has default rate p=0.05 then the ith Tranche has default rate: 100 j p (1 p)100 j p T (i) j i j Where pT(i) is the default probability for the ith Tranche. For the first Tranche pT(1) = 99.4%, i.e., it’s very likely to default. But for the 10th Tranche pT(10) = 2.8%. By the 10th Tranche we have created an instrument that is safer than the original mortgages. Securitizing Tranches: CDOs That was fun, let’s do it again! Take 100 type-10 Tranches and bundle them up together using the same method. The default rate for the kth type-10 Tranche is then: 100 pT (10) j (1 pT (10))100 j p CDO (k ) j k j The default probability on the 10th type-10 Tranche is then pCDO(k) = 0.05%. Only 1/100 the default probability of the original mortgages! Why Did We Do This: Risk Averse Investors After making these manipulations we still only have the same cash flow from 10,000 mortgages, so why did we do it? Some investors will pay a premium for very low risk investments. They may even be required by their charter to only invest in instruments rated very low risk (AAA) by rating agencies. They do not have direct access to the mortgage market, so they can’t mimic securitization themselves. Are They Really So Safe? These results are very sensitive to the assumptions. If the underlying mortgages actually have a default probability of 6% (a 20% increase) than the 10th Tranches have a default probability of 7.8% (a 275% increase). Worse, the 10th type-10 Tranches will have a default probability of 25%, a 50,000% increase! These models are not robust to errors in the assumptions! Application to the Credit Crisis Connections with thought experiment: ‣ Overconfidence (bad estimate of error bars) ‣ Insufficient independence between loans (bad use of Central Limit Theorem) ‣ Illusory diversification (just one big bet houses wouldn’t lose value) About the D. E. Shaw Group Founded in 1988 Quantitative and qualitative investment strategies Offices in North America, Europe, and Asia 1,500 employees worldwide Managing approximately $22 billion (as of April 1, 2010) About What I Do Quantitative Analyst (“Quant”) Most Quants have a background in math, physics, EE, CS, or statistics ‣ Many hold a Ph.D. in one of these subjects We work on: ‣ Forecasting future prices ‣ Reducing transaction costs ‣ Modeling/managing risk