Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Interaction (statistics) wikipedia , lookup
Expectationโmaximization algorithm wikipedia , lookup
German tank problem wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Regression toward the mean wikipedia , lookup
Linear regression wikipedia , lookup
Measuring Surprises It is common in academic research to estimate an unexpected value (or surprise) of a variable; e.g., is this yearโs income higher or lower than expected? We can use a regression model to give us this expectation. With the Ball and Brown article, the authors estimate unexpected net income as the residual of a simple regression of current net income on last yearโs net income: ๐๐ผ๐ก = ๐ผ + ๐ฝ๐๐ผ๐กโ1 + ๐ This regression calculates ๐ผฬ and ๐ฝฬ , which are estimates of the true unobservable parameters ๐ผ and ๐ฝ. This regression may be estimated for a single firm (e.g., years 2000 through 2014 for Microsoft โ this is a time series regression, giving the estimates for a single firm), or for a larger sample of firms (for all publicly traded companies, this is called a cross-sectional regression) over one or more years. There are advantages to both methods. With the estimated ๐ผฬ and ๐ฝฬ , we can calculate the expected value of NI for year t given the year t-1 value. For example, suppose the year t-1 NI value is 0.08, that ๐ผฬ is 0 and ๐ฝฬ is 0.7, the expected value is 0.7 * 0.08 = 0.056. If the actual value of NI observed in year t is 0.09, then the difference between the two: 0.09 โ 0.056 = 0.034. Income in year t was higher than expected. This is called a residual. The residual is commonly used as a proxy for an unexpected amount. Ball and Brown use a similar method to calculate unexpected returns. Ball and Brown use a naïve model in which the net income surprise is simply measured as the difference in net income in years t and t-1. In the simple example above, the change in net income is 0.09 โ 0.08 = 0.01. Note that this assumes that ๐ผ is 0 and ๐ฝ is 1. Earnings Management (Abnormal or Discretionary Accruals) We will see several papers that explore aggressive accounting choices made by managers โ earnings management. Earnings management is often assumed to be most observable in accruals (accounts receivable, inventory, etc). A regression model can be used to estimate the expected and unexpected values of a variable of interest. To study earnings management many papers estimate โdiscretionary accrualsโ using a regression framework and interpret them as the portion of earnings that is attributable to evil managers exercising their diabolical will. Discretionary accruals are estimated in a regression using accruals as the dependent variable. A regression equation expresses a dependent variable (accruals) as a function of several explanatory or independent variables. To estimate discretionary accruals, we could estimate the following regression: ๐ด๐๐๐๐ข๐๐๐ ๐ก = ๐ผ + ๐ฝ1 ๐ ๐ธ๐๐ก + ๐ฝ2 ๐๐๐ธ๐ก + ๐ฝ3 ๐ ๐๐ด๐ก + ๐ where accruals is estimated as the difference between net income and operating cash flow; REV is estimated as the change in revenue less the change in accounts receivable; ROA is net income divided by average assets. According to this equation, accruals are a function of 1) assets 2) Change in Revenues/AR 3) PPE and 4) performance as measured by ROA and 5) an unknown error term ๐. This regression equation can be estimated with Ordinary Least Squares (OLS) in Excel or a number of other software packages. OLS estimates the coefficients by minimizing the distance between the squared residual and the regression line. The coefficient estimates are often of interest -- the value of the intercept (๐ผฬ ) and the ๐ฝฬ variables. There may be a hypothesized value and the closeness of the estimated coefficient to this value tells us something about the relationship between the variables examined, and whether a particular hypothesis is correct. With the estimated coefficients, you can compute the predicted values. That is, given the values of the explanatory variables, we can calculate what the accruals are expected to be. If you have ever used the Z Score, which predicts bankruptcy, this is exactly what is being done. The original Z Score coefficients were estimated using a regression, and they can be used on other data to give a score -- this is the predicted value. Of course, the predicted value will not generally be equal to the observed value (e.g. some firms with high Z scores may not go bankrupt, while some firms with low Z scores may still go bankrupt). The difference between the observed (or actual) value and the predicted value is called the residual. Residuals are analyzed to see whether an estimated model has statistical power to explain the dependent variable. The residual is what is unexplained by the model. In many papers, the residual plays an important role. It is assumed to represent managerial discretion. Extreme values of the residual are interpreted as managers manipulating earnings. A large positive residual indicates earnings are managed up, while a negative residual indicates earnings are managed down. This is admittedly a crude measure of manipulation. There may be a large residual simply because the regression model excludes a key variable. In some instances, we may not care if earnings are being managed up or down, just whether there is any funny business going on. In this case, we may take the absolute value of the residual. When a regression model is estimated, the estimated coefficient has a measure of variation called the standard error โ this is like the standard deviation of the estimate, a lower standard error means we have higher confidence that the coefficient is significant. The coefficient estimate can be divided by the standard error to give us a t-value. A t-value that is large enough (e.g. > 2) indicates that the coefficient estimate is significantly positive. The t-value corresponds to a p-value. The p-value tells us the probability that the observed coefficient actually not different from zero. A t-value of 2.00 roughly corresponds to a p-value of 0.05 (a 5% chance that the true, unobserved coefficient is NOT different from zero). A t-value of around 2.5 would correspond to a p-value of 0.01. The actual p-values depend on the sample size (large sample sizes give smaller standard errors and thus more statistical power -- lower pvalues). Real Earnings Management In addition to manipulating accruals, managers may take real actions (directly affecting cash flow) to manipulate earnings. In the literature, researchers have explored: reductions in R&D and Advertising expenditures (discretionary expenditures); increases in inventory production to allocate fixed costs over more units, effectively reducing Cost of Goods Sold and increasing income; channel stuffing (providing discounts to increase sales). Using methods similar to those above, we can estimate regression equations to give estimates of โsurprisesโ of each of these. So in sum: Dependent variables are what we are trying to explain (in this paper the key dependent variable is managerial discretion, which is estimated as the absolute value of the residual of another regression) Independent variables are variables hypothesized to be related to a dependent variable. A regression is the method of estimating the coefficients of independent variables in predicting dependent variables. Predicted values are the result of using the estimated coefficients and the independent variables -- tells us the best guess of the dependent variable given the values of the independent variables Residuals are what is not explained by the regression model. It is the distance a dependent variable is from the regression line. Standard errors tell us how precise the coefficient estimates are. A large standard error (large relative to the coefficient estimate) tells us the estimate is not very precise, and the significance is low> A T-value is the coefficient estimate divided by the standard error. Values that are large indicate the coefficient is significant. A p-value corresponds to a t-value, and tells us the probability the coefficient is not significant. I. Discussion of Statistics: a. Tests of Significance i. Normal Distribution (โMomentsโ of the distribution: N(µ,ฯ2)) 1. Probability distribution function: pdf 2. Cumulative Distribution function: cdf ii. Mean: (average), the first moment, computed as the sum of values divided by number of observations, the sample average xฬ (read as x-bar). It is an estimate an unknown sample mean (µ, the greek letter mu) iii. Variance: the second moment, computed as the average squared difference between each value and the mean. (ฯ2, the greek letter sigma). When estimating the variance from a sample (not the entire population), the sample estimate s2 is divided by n-1 rather than n. (You would prove this in a statistics class, that dividing by n-1 gives you an unbiased estimate of ฯ2) iv. Standard Deviation, the square root of the variance. Interpretation: under normality, 68% of all observations fall within 1 STD of the mean, 95% fall within 2 STD of the mean, 99.7% fall within 3 STD of the mean). v. Standard Error (Standard Deviation divided by sqrt(NOBS) โ gives us a sense of the precision of the estimate of the mean). Much statistics and econometrics (statistics applied to economics) deals with dealing with a variety of issues known to affect the Standard Error). vi. Significance of an estimate: the probability that an estimate is significantly different from, or equal to (depending on how you frame it) from a hypothesized value. This corresponds to the area under the curve of a pdf or cdf function. vii. T-Distribution, arises from estimating the normal distribution: has fatter tails, but approaches the normal distribution as n approaches infinity.