Download Measuring Surprises It is common in academic research to estimate

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Expectationโ€“maximization algorithm wikipedia , lookup

German tank problem wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Measuring Surprises
It is common in academic research to estimate an unexpected value (or surprise) of a variable; e.g., is
this yearโ€™s income higher or lower than expected? We can use a regression model to give us this
expectation. With the Ball and Brown article, the authors estimate unexpected net income as the residual
of a simple regression of current net income on last yearโ€™s net income:
๐‘๐ผ๐‘ก = ๐›ผ + ๐›ฝ๐‘๐ผ๐‘กโˆ’1 + ๐œ–
This regression calculates ๐›ผฬ‚ and ๐›ฝฬ‚ , which are estimates of the true unobservable parameters ๐›ผ and ๐›ฝ.
This regression may be estimated for a single firm (e.g., years 2000 through 2014 for Microsoft โ€“ this is a
time series regression, giving the estimates for a single firm), or for a larger sample of firms (for all
publicly traded companies, this is called a cross-sectional regression) over one or more years. There are
advantages to both methods.
With the estimated ๐›ผฬ‚ and ๐›ฝฬ‚ , we can calculate the expected value of NI for year t given the year t-1 value.
For example, suppose the year t-1 NI value is 0.08, that ๐›ผฬ‚ is 0 and ๐›ฝฬ‚ is 0.7, the expected value is 0.7 *
0.08 = 0.056. If the actual value of NI observed in year t is 0.09, then the difference between the two:
0.09 โ€“ 0.056 = 0.034. Income in year t was higher than expected. This is called a residual. The residual
is commonly used as a proxy for an unexpected amount. Ball and Brown use a similar method to
calculate unexpected returns.
Ball and Brown use a naïve model in which the net income surprise is simply measured as the difference
in net income in years t and t-1. In the simple example above, the change in net income is 0.09 โ€“ 0.08 =
0.01. Note that this assumes that ๐›ผ is 0 and ๐›ฝ is 1.
Earnings Management (Abnormal or Discretionary Accruals)
We will see several papers that explore aggressive accounting choices made by managers โ€“ earnings
management. Earnings management is often assumed to be most observable in accruals (accounts
receivable, inventory, etc). A regression model can be used to estimate the expected and unexpected
values of a variable of interest.
To study earnings management many papers estimate โ€œdiscretionary accrualsโ€ using a regression
framework and interpret them as the portion of earnings that is attributable to evil managers exercising
their diabolical will. Discretionary accruals are estimated in a regression using accruals as the
dependent variable. A regression equation expresses a dependent variable (accruals) as a function of
several explanatory or independent variables.
To estimate discretionary accruals, we could estimate the following regression:
๐ด๐‘๐‘๐‘Ÿ๐‘ข๐‘Ž๐‘™๐‘ ๐‘ก = ๐›ผ + ๐›ฝ1 ๐‘…๐ธ๐‘‰๐‘ก + ๐›ฝ2 ๐‘ƒ๐‘ƒ๐ธ๐‘ก + ๐›ฝ3 ๐‘…๐‘‚๐ด๐‘ก + ๐œ–
where accruals is estimated as the difference between net income and operating cash flow; REV
is estimated as the change in revenue less the change in accounts receivable; ROA is net income
divided by average assets.
According to this equation, accruals are a function of 1) assets 2) Change in Revenues/AR 3) PPE and 4)
performance as measured by ROA and 5) an unknown error term ๐œ–.
This regression equation can be estimated with Ordinary Least Squares (OLS) in Excel or a number of
other software packages. OLS estimates the coefficients by minimizing the distance between the
squared residual and the regression line.
The coefficient estimates are often of interest -- the value of the intercept (๐›ผฬ‚ ) and the ๐›ฝฬ‚ variables. There
may be a hypothesized value and the closeness of the estimated coefficient to this value tells us
something about the relationship between the variables examined, and whether a particular hypothesis is
correct.
With the estimated coefficients, you can compute the predicted values. That is, given the values of the
explanatory variables, we can calculate what the accruals are expected to be. If you have ever used the
Z Score, which predicts bankruptcy, this is exactly what is being done. The original Z Score coefficients
were estimated using a regression, and they can be used on other data to give a score -- this is the
predicted value.
Of course, the predicted value will not generally be equal to the observed value (e.g. some firms with high
Z scores may not go bankrupt, while some firms with low Z scores may still go bankrupt). The difference
between the observed (or actual) value and the predicted value is called the residual. Residuals are
analyzed to see whether an estimated model has statistical power to explain the dependent variable. The
residual is what is unexplained by the model.
In many papers, the residual plays an important role. It is assumed to represent managerial
discretion. Extreme values of the residual are interpreted as managers manipulating earnings. A large
positive residual indicates earnings are managed up, while a negative residual indicates earnings are
managed down. This is admittedly a crude measure of manipulation. There may be a large residual
simply because the regression model excludes a key variable. In some instances, we may not care if
earnings are being managed up or down, just whether there is any funny business going on. In this case,
we may take the absolute value of the residual.
When a regression model is estimated, the estimated coefficient has a measure of variation called the
standard error โ€“ this is like the standard deviation of the estimate, a lower standard error means we
have higher confidence that the coefficient is significant. The coefficient estimate can be divided by the
standard error to give us a t-value. A t-value that is large enough (e.g. > 2) indicates that the coefficient
estimate is significantly positive. The t-value corresponds to a p-value. The p-value tells us the
probability that the observed coefficient actually not different from zero. A t-value of 2.00 roughly
corresponds to a p-value of 0.05 (a 5% chance that the true, unobserved coefficient is NOT different from
zero). A t-value of around 2.5 would correspond to a p-value of 0.01. The actual p-values depend on the
sample size (large sample sizes give smaller standard errors and thus more statistical power -- lower pvalues).
Real Earnings Management
In addition to manipulating accruals, managers may take real actions (directly affecting cash flow) to
manipulate earnings. In the literature, researchers have explored: reductions in R&D and Advertising
expenditures (discretionary expenditures); increases in inventory production to allocate fixed costs over
more units, effectively reducing Cost of Goods Sold and increasing income; channel stuffing (providing
discounts to increase sales). Using methods similar to those above, we can estimate regression
equations to give estimates of โ€œsurprisesโ€ of each of these.
So in sum:
Dependent variables are what we are trying to explain (in this paper the key dependent variable is
managerial discretion, which is estimated as the absolute value of the residual of another regression)
Independent variables are variables hypothesized to be related to a dependent variable.
A regression is the method of estimating the coefficients of independent variables in predicting
dependent variables.
Predicted values are the result of using the estimated coefficients and the independent variables -- tells
us the best guess of the dependent variable given the values of the independent variables
Residuals are what is not explained by the regression model. It is the distance a dependent variable is
from the regression line.
Standard errors tell us how precise the coefficient estimates are. A large standard error (large relative to
the coefficient estimate) tells us the estimate is not very precise, and the significance is low>
A T-value is the coefficient estimate divided by the standard error. Values that are large indicate the
coefficient is significant.
A p-value corresponds to a t-value, and tells us the probability the coefficient is not significant.
I.
Discussion of Statistics:
a. Tests of Significance
i. Normal Distribution (โ€œMomentsโ€ of the distribution: N(µ,ฯƒ2))
1. Probability distribution function: pdf
2. Cumulative Distribution function: cdf
ii. Mean: (average), the first moment, computed as the sum of values divided by
number of observations, the sample average xฬ… (read as x-bar). It is an estimate
an unknown sample mean (µ, the greek letter mu)
iii. Variance: the second moment, computed as the average squared difference
between each value and the mean. (ฯƒ2, the greek letter sigma). When
estimating the variance from a sample (not the entire population), the sample
estimate s2 is divided by n-1 rather than n. (You would prove this in a statistics
class, that dividing by n-1 gives you an unbiased estimate of ฯƒ2)
iv. Standard Deviation, the square root of the variance. Interpretation: under
normality, 68% of all observations fall within 1 STD of the mean, 95% fall within
2 STD of the mean, 99.7% fall within 3 STD of the mean).
v. Standard Error (Standard Deviation divided by sqrt(NOBS) โ€“ gives us a sense of
the precision of the estimate of the mean). Much statistics and econometrics
(statistics applied to economics) deals with dealing with a variety of issues
known to affect the Standard Error).
vi. Significance of an estimate: the probability that an estimate is significantly
different from, or equal to (depending on how you frame it) from a
hypothesized value. This corresponds to the area under the curve of a pdf or
cdf function.
vii. T-Distribution, arises from estimating the normal distribution: has fatter tails,
but approaches the normal distribution as n approaches infinity.