Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Session #1 Statistical Concepts and Market Returns Dr. Himanshu Joshi Summarizing Data using Frequency Distribution • A Frequency distribution is a tabular display of data summarized into a relatively small number of intervals. • Frequency distribution help in the analysis of large amount of statistical data, and they work with all types of measurement scales. Frequency Distribution • Rates of returns are the fundamental units that analyst and portfolio managers use for making investment decisions, and we can use frequency distribution to summarize rate of return. Holding Period Return Rt = (Pt – Pt-1 + Dt)/Pt-1 Where: Pt = price per share at the end of period t. Pt-1 = price per share at the end of time t-1, the time immediately preceding time period t Dt = cash distribution received during time period t. Holding Period Return Equation for holding period return can be used to define the holding period return on any asset for a day, week, month, or year simply by changing the interpretation of the time interval between successive values of the time index, t. Common Stock: the distribution is a dividend. Bonds: the distribution is a coupon payment. Holding Period Return: Two Important Characteristics Element of Time: Holding period return has an element of time attached to it. For example: if a monthly time interval is used between successive observations for price, then the rate of return is monthly figure. No Currency Unit Attached: It has no currency unit attached to it. The return will hold regardless of the currency in which prices are denominated. Continuously Compounded Returns • Rt = 100% x ln (Pt/Pt-1) Construction of Frequency Distribution Sort the Data in ascending Order. Calculate the range of the data, defined as Range = Maximum Value – Minimum Value. Decide on the number of intervals in the frequency distribution, k. Determine interval width as Range/k Determine the interval by successively adding the interval width to the minimum value, to determine the ending point of intervals, stopping after reaching an interval that include maximum value. Count the number of observations falling in each interval. Construct a table of intervals listed from smallest to largest that show the number of observations falling in each interval. Construction of a Frequency Distribution • The actual number of observations in a given interval is called the absolute frequency, or simply frequency. The frequency distribution is the list of intervals together with the corresponding measure of frequency. Construction of a Frequency Distribution: An Example • Suppose we have 12 observations sorted in ascending order: -4.57, -4.04,-1.64, 0.28, 1.34, 2.35, 2.38, 4.28, 4.42, 4.468, 7.16, and 11.43. • Range = Maximum – Minimum = 11.43 – (4.57) = 16 • If we decide number of intervals to be 4. • Then interval width = range/k = 16/4 =4 Construction of a Frequency Distribution • Table 1. Endpoints of Intervals -4.57+4 = -0.57 -0.57+4 = 3.43 3.43 + 4= 7.43 7.43 +4= 11.43 • Table 2. Frequency Distribution Interval Absolute Frequency A -4.57<=Observation < -0.57 3 B -0.57<=Observation<3.43 4 C 3.43<=Observation<7.43 4 D 7.43<=Observation<11.43 1 S&P 500, 1926-2002, Total Annual Return Frequncy 8 7 6 5 4 3 2 1 0 Frequncy (-30%) to (-28%) (-28%) to (-26%) (-26%) to (-24%) (-24%) to (-22%) (-22%) to (-20%) (-20%) to (-18%) (-18%) to (-16%) (-16%) to (-14%) (-14%) to (-12%) (-12%) to (-10%) (-10%) to (-8%) (-8%) to (-6%) (-6%) to (-4%) (-4%) to (-2%) (-2%) to (0%) 0% to 2% 2% t0 4% 4% to 6% 6% to 8% 8% to 10% 10% to 12% 12% to 14% 14% to 16% 16% to 18% 18% to 20% 20% to 22% 22% to 24% 24% to 26% 26% to 28% 28% to 30% 30% to 32% 32% to 34% 34% to 36% 36% to 38% 38 % to 40% 40% to 42% 42% to 44% S&P 500, 1926-2002, Total Monthly Returns Absolute Frequency 200 180 160 140 120 100 80 60 Absolute Frequency 40 20 0 Interpretation of Frequency Distribution Exercise on Construction of Frequency Distribution Relative Frequency • The relative frequency is the absolute frequency of each interval divided by the total number of observations. Using Pivot Table/Grouping to Create Frequency Distribution Step#1. Select/activate any cell in the data base, and insert a pivot table. Step#2. Drag and Drop Return field in Row Labels and Value Boxes. Step#3. Select any return in the row area of the pivot table and open the Grouping Dialog box by right clicking on the cell. Step#4. Excel has already entered the lowest and highest values in the Return field in the starting and ending at boxes. It has also suggested a step increment in the box next to By. Select the Interval Gap and Click OK. Frequency Distribution Using Excel Pivot Tables Count of Return 2007 (%) -8.33--6.33 1 -2.33--0.33 4 -0.33-1.67 7 1.67-3.67 11 3.67-5.67 6 5.67-7.67 7 7.67-9.67 4 9.67-11.67 6 11.67-13.67 7 13.67-15.67 3 15.67-17.67 8 17.67-19.67 2 19.67-21.67 8 21.67-23.67 5 25.67-27.67 1 27.67-29.67 1 31.67-33.67 2 37.67-39.67 1 39.67-41.67 1 43.67-45.67 1 Grand Total 86 Total 12 10 8 6 Total 4 2 0 -8.33--6.33 -2.33--0.33 -0.33-1.67 1.67-3.67 3.67-5.67 5.67-7.67 7.67-9.67 9.67-11.67 11.67-13.67 13.67-15.67 15.67-17.67 17.67-19.67 19.67-21.67 21.67-23.67 25.67-27.67 27.67-29.67 31.67-33.67 37.67-39.67 39.67-41.67 43.67-45.67 Row Labels Measures of Central Tendency 1. Arithmetic Mean: Analysts and portfolio managers often want one number that describes a representative possible outcome of an investment decision. The arithmetic mean is by far the most frequently used measure of the middle or centre of data. Definition: Sum of Observations/Number of Observations. Arithmetic Mean • Population Mean μ = ∑i=1N Xi / N N= Number of observations in population. • Sample Mean X- = ∑i=1n Xi /n n= Number of observations in sample. Cross Sectional vs. Time Series Mean • Cross Sectional Mean: Sample might be the 2013 return on equity (ROE) for 30 companies in the BSE Sensex. In this case we calculate mean ROE in 2013 as an average across 30 largest market cap firms in BSE. • When we examine the characteristics of some units at a specific point in time, we are examining Cross-Sectional-Data. Cross Sectional vs. Time Series Mean • If our sample consists of historical monthly returns on BSE Sensex for the past five years, then we have a time series data. • Panel Data: Mix of Cross-Sectional Data and Time Series Data. Properties of Arithmetic Mean • Arithmetic mean can be linked to the centre of gravity of an object. • Deviations from the arithmetic mean forms the foundation for more complex concepts of risks like: variance, skewness, kurtosis, and heteroscedasticity. • Advantage of Arithmetic Mean: it uses all information about size and magnitude of the observation. • Limitation: Sensitivity to extreme values, e.g., Returns = 2%, 3%, 6%, and -80%. The Geometric Mean (G.M).. • The geometric mean is most frequently used to average rates of change over time to compute the growth rate of a variable. • In Investment and valuation, we frequently use the G.M to average a time series of rates of return on an asset or a portfolio or to compute growth rate of financial variables such as earning or sales. The Geometric Mean.. • G = n√X1. X2. X3……..Xn with Xi >= 0 for i= 1,2,3,….n The above equation has a solution, and geometric mean exists, only if the product under the radical sign is non-negative. Q. Is that O.K for Stock returns? Q. What can be most negative value of Stock Return? Geometric Mean Return Formula • RG = [∏t=1T (1+Rt)1/T] – 1 Example.. GM and AM.. Q. As a mutual fund analyst, you are examining, as of early 2016, the most recent five years to total returns for two U.S large-cap value equity mutual funds: Year SLAX PRFDX 2002 16.2% 9.2% 2013 20.3% 3.8% 2014 9.3% 13.1% 2015 -11.1% 1.6% 2016 -17.0% -13.0% GM and AM Fund Arithmetic Mean Geometric Mean Difference (BPS) SLASX 3.54% 2.43% 111 PRFDX 2.94% 2.53% 41 •Geometric Mean Return is less than the Arithmetic Mean Return. In fact GM is always less than the AM. •The only time AM and GM are equal when there is no variability in the observations- that is when all the observations in the series are same. •In general, the difference between AM and GM increases with the variability in the period by period observations. •Returns of SLASX are more variable than the PRFDX, and consequently the spread between AM and GM is larger for SLASX. •Although SLASX has higher AM return, PRFDX has the higher GM return. How the Analyst should interpret this result? • AM vs. GM Example.. • A hypothetical investment in a single stock costs $100. one year later, the stock is trading at $200. at the end of the second year, the stock price falls back to the original purchase price of $100. No dividends are paid during the period of past two years. • The arithmetic mean return reflects the average of the one-year returns, while geometric return is compound rate of return. Solve for AM and GM Quartile (1/4), Quintile (1/5), Deciles (1/10) and Percentiles (1/100) • Let y be the value at or below which y percent of the distribution lies, or the yth percentile. • For example, P18 is the point below which 18 percent of the observations lie; 100-18 =82% are greater than P18.) • The formula for the position of a percentile in an array with n entries sorted in ascending order is: Ly = (n+1) * (y/100) Quartile and Percentile • Calculate the 10th and 90th percentiles. • Calculate 1st, 2nd, and 3rd quartiles. • What is the median? Quintiles in Investment Practice • Investment analysts use quintiles every day to rank performance-performance of portfolios and stocks. • We can illustrate the use of quintiles, in particular quartiles, in investment research using the example of Bauman, Conover, and Miller (1998). • That study compared the performance of international growth stocks to value stocks. • Typically growth stocks are defined as those for which the market price (MPS) is relatively low in relation to EPS, BPS, and DPS. • Growth stocks on the other hand, have comparatively high prices in relation to those same measures. Quintiles in Investment Practice • The Bauman et al. classification criteria were following valuation measures: price to earnings, price to cash flow, price of book value, dividend yield (D/P). • They assigned one fourth of the of the total sample with the lowest P/E to quartile 1, and the 1/4th with the highest P/E to Quartile to Q4. The stocks with the second highest P/E formed Q3 and stocks with second lowest P/E formed Q2. Interpretation.. • Small company portfolio had a median market cap of $46.6 million AND the large company portfolio had a median value of $2472.3 million. • Large companies were 50 times larger than small companies, yet their mean returns were less than the half those of small companies. • Overall, Bauman et al. found two effects. First, international value stocks outperformed international growth stocks. Second, international small stocks outperformed international large stocks. • Next step for analysis: how value and growth stocks performed while controlling for size? • This involved constructing 16 different value/growth and size portfolios (4x4), they found that value stocks outperformed growth stocks except when market capitalization was very small. Measures of Risk 1. Mean Absolute Deviation (MAD) = ∑i=1n |Xi – X-|/n 2. Population Variance and Standard Deviation 3. Sample Variance and Standard Deviation. Note: Standard Deviation is dispersion from the arithmetic mean, and not the geometric mean. Semi-variance & Semi-deviation • Variance or standard deviation of returns take account of returns above and below mean, but investors are concerned only with the downside risk, for example, returns below the mean. • Semi-variance Computation: 1. Calculate sample mean 2. Identify observations that are smaller than mean (discarding observations equal to or greater than mean); suppose n* are smaller than the mean. 3. Compute the sum of squared negative deviations from mean (using n* observations that are smaller than mean) 4. Divide the sum of squared negative deviations by n* -1. ∑for all Xi<X- (Xi – X-)2/(n2 – 1) Coefficient of Variation • We may sometimes find it difficult to interpret what Std. Deviation means in terms of the relative degree of variability of different set of data, however, either because the data sets have markedly different means or because the datasets have different units of measurement. • The coefficient of Variation is useful in such situations: • CV = s/X- or CV = SD/Mean CV The Sharpe Ratio • Sh= (RP- - RF)/Sp • Where RP = Return on Portfolio, RF = Risk Free Return, Sp = S.D of the portfolio • Although CV was designed as a measure of relative dispersion, its inverse reveals something about return per unit of risk because the standard deviation of returns is commonly used as measure of investment risk. Sharpe Ratio • Finance theory tells us that in the long run, investors should be compensated with additional mean return above risk free rate for bearing additional risk, at least if the risky portfolio is well diversified. Sharpe Ratio Moments and Central Moments • The shape of a probability distribution can be described by the “moments” of the distribution. • Raw moments are measured relative to an expected value raised to the appropriate power. • The first moment is the mean of the distribution, which is expected value of returns: E(Rk) = ∑ni=1 pi x Rik Moments and Central Moments • Central moments are measured relative to the mean (i.e., central around the mean). The kth central moment is defined as: • E(R- μ)k = 𝒏𝒊=𝟏 𝑷𝒊 𝑹𝒊 − 𝝁 𝒌 • Since central moments are measured relative to the mean, the first central moment equals zero and is, therefore, not typically used. The Second Central Moment (Variance) • The second central moment is the variance of the distribution, which measures the dispersion of data. • Variance = σ2 = E[(R – μ)2] • Since the moments higher than the second central moment can be difficult to interpret, they are typically standardized by dividing the central moment by σk. The Third Central Moment (Skew) • The third moment measures the departure from symmetry in the distribution. This moment will be equal to zero for a symmetric distribution (such as normal distribution). • Third central moment = E [ (R – μ)3] • The skewness statistics is the standardized third central moment. Skewness refers to the extent to which the distribution of data is not symmetric around its mean. • Skewness = E [(R – μ)3]/σ3 The Fourth Central Moment • The fourth central moment measures the degree of clustering in the distribution. • Fourth central moment = E[(R - μ)4] • The kurtosis statistics is the standardized fourth central moment of the distribution. Kurtosis refers to the degree of peakedness or clustering in the data distribution and is calculated as: • Kurtosis = E[(R - μ)4]/σ4 The Fourth Central Moment (Kurtosis) • Kurtosis for the normal distribution equals roughly 3. therefore, the excess kurtosis for any distribution equals: • Excess Kurtosis = Kurtosis - 3 Symmetry and Skewness in Returns Distribution • Mean and Variance may not adequately describe an investment’s distribution of returns. • In calculation of variance, for example, the deviations around the mean are squared, so do not know whether large deviations are likely to be positive and negative. • One important characteristic of interest to analyst is the degree of symmetry in return distribution. Normal Distribution • Its mean and median are equal. • It is completely described by two parametersits mean and variance. • Roughly 68.3% of its observations lie between +/- 1 SD from mean, 95.5% of its observations lie between +/-2 SD from mean and 99.7% of observations lie between +/-3 SD from mean. • A distribution not normal is called Skewed. Normal Distribution Skewness • Nonsymmetrical distributions may be either positively or negatively skewed and result from occurrence of outliers in the data set. • Outliers are observations with extraordinary large values, either positive or negative. • A Positive Skewness • A Positively skewed distribution is characterized by many outliers in the upper region, or right tail. • The mean is affected by the outliers, is a positively skewed distribution, there are large positive outliers which will “pull” the mean upward, or more positive. Exp. 100 products, 99 sold for Rs.1,00,000, and 1 sold for Rs. 10,00,000 • Mode = Rs. 1,00,000, Median = Rs.1,00,000. • Mean = 99,00,000+10,00,000/100= 1,09,00,000/100 = Rs. 1,09,000 Positively Skewed Distribution A Negatively Skewed Distribution • A negatively skewed distribution has a disproportionately large number of outliers on the lower side of the tail (left tail). • For a negatively skewed, unimodal distribution, the mean is less than the median, which is less than the mode. In this case large, negative outlier tend to “Pull” the mean downwards (to the left). • Exp. 100 products, 99 sold for 1,00,000 and 1 sold for 10,000. Mean = {99,00,000 + 10,000}/ 100 = 99,100, while mode and median = 1,00,000. Negatively Skewed Distribution Skewness • Positive Skewness: A Return distribution with positive skew has frequent small losses and a few extreme gains. It has a long tail on its right side. Mode< Median<Mean • Negative Skewness: A Return distribution with negative skew has frequent small gains and few extreme losses. Mean<Median<Mode Skewness of a Sample • SK = [n/(n-1)(n-2)] x ∑i=1n (Xi – X-)3 S3 Q. Calculating Skewness for a Mutual Fund. Skewness.. Skewness Interpretation of Skewness.. • Based on this small sample of mutual fund, the distribution of annual returns for the fund appears to be approximately symmetric (very slightly negatively skewed). • The negative and position deviations are equally frequent and large positive deviations approximately offset negative deviations. Kurtosis • Another way in which a return distribution might differ from a normal distribution is by having more returns clustered closely around the mean (being more Peaked) and more returns with large deviations from the mean (having fatter Tails). • Relative to normal distribution, such a distribution has a greater percentage of small deviations from mean return (more small surprises) and a greater percentage of extremely large deviations from the mean return (more big surprises). • Most investors would perceive a greater chance of extremely large deviations from the mean as increasing risk. Kurtosis • Kurtosis is the measure that tells us when a distribution is more or less peaked than a normal distribution. • A distribution that is more peaked than normal is called leptokurtic. (more frequent extremely large surprises) • A distribution that is less peaked than the normal is called platykurtic. • Identical to normal distribution – mesokurtic. Kurtosis Kurtosis Excess Kurtosis Interpretation of Kurtosis • The Distribution seems to be mesokurtic, based on sample excess kurtosis close to zero. With skewness and excess kurtosis both close to zero, annual returns seems to have been normally distributed. Kurtosis- Interpretation • Most research about the distribution of the securities returns has shown that returns are not normally distributed. • Actual securities returns tend to exhibit both skewness and kurtosis. • Skewness and kurtosis are critical concept of risk management because when securities returns are modeled using an assumed normal distribution, the predictions from the models will not take into account the potential for extremely large, negative outcomes. Skewness-Kurtosis- Interpretation • In fact, most risk managers put very little emphasis on the mean and standard deviation of a distribution and focus more on the distribution of the returns in the tails of the distribution – that is where the risk is. • In general, greater positive kurtosis and more negative skew in returns distributions indicates increased risk. Covariance and Correlation • Covariance (Ri, Rj) = E{[Ri – E(Ri)][Rj – E(Rj)]} • Correlation (Ri, Rj) = Covar (Ri, Rj) /σ(Ri) σ(Rj) Coskewness and Cokurtosis • Previously, we identified moments and central moments for mean and variance. In a similar fashion, we can identify cross central movements for the concept of covariance. • The third central movement is called Coskewness and fourth one is called Cokurtosis. Coskewness and Cokurtosis Stock Returns Stocks Time 1 2 3 1 0.00% -2.40% -12.60% -12.60% -1.20% -12.60% 2 -2.40% -12.60% -5.30% -5.30% -7.50% -5.30% 3 -12.60% 2.40% 0.00% -2.40% -5.10% -1.20% 4 -5.30% -5.30% -2.40% 12.60% -5.30% 5.10% 5 2.40% 0.00% 2.40% 0.00% 1.20% 1.20% 6 5.30% 5.30% 5.30% 5.30% 5.30% 5.30% 7 12.60% 12.60% 12.60% 2.40% 12.60% 7.50% Mean 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% Median 0.00% 0.00% 0.00% 0.00% -1.20% 1.20% Mode #N/A #N/A #N/A 4 Portfolio A #N/A #N/A Portfolio B #N/A Variance 0.0064 0.0064 0.0064 0.0064 0.0050 0.0050 Skewness 0.0000 0.0000 0.0000 0.0000 0.9538 -0.9538 Kurtosis 0.4931 0.4931 0.4931 0.4931 0.3141 0.3141 Covriance 1,2 0.0031 Covariance 3,4 0.0031 Coskewness and Cokurtosis • Although returns vary over time, the mean, variance, skewness and kurtosis of all stocks returns are same under this scenario. • In addition, the covariance between returns of stock 1,2 and stock 3,4 are same. • By combining stock 1,2 in portfolio A, and stock 3,4 in portfolio B, we find that return of portfolio A and B have the same mean and variance. • However, these combined returns do not have the same skewness (i.e., the Coskewness between stocks in the portfolio is different). Coskewness and Cokurtosis • The reason for this difference is that the ranking of returns over time (e.g., from best to worst) is different for each stock, and when combined in a portfolio, these differences skew the portfolio returns distributions. • For example, the worst return for Stock 1 occurred during time period 3, but in portfolio A, the worst return occurred during time period 2. • Similarly, the best return for stock 4 occurred during the time period 4, but in portfolio B, the best return occurred during the time period 7. Coskewness and Cokurtosis • From a risk management point of view, it is helpful to know that the worst outcome in Portfolio B is 1.7 times greater than the worst outcome in Portfolio A. • So, although the mean and variance of these portfolios are equal, shortfall risk expectations can differ depending upon time period. • This is important information to know, however most risk models choose to ignore the effects of Coskewness and Cokurtosis. The reason being that as the number of variables increase, the number of Coskewness and Cokurtosis terms will increase rapidly, making data more difficult to analyze. • Practitioners instead opt to use more tractable risk models, such as GARCH, which captures the essence of Coskewness and Cokurtosis by incorporating time-varying volatility and/or time varying correlation. Using GM and AM Why GM is appropriate for making investment statements about past performance? Why arithmetic mean is appropriate for making investment statements in a forward looking context?