Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Managerial Economics & Decision Sciences Department Developed for business analytics II week 1 ▌statistical models: hypotheses, tests & confidence intervals week 2 statistics appendix week 3 random variables density functions cumulative functions © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one – statistics appendix Managerial Economics & Decision Sciences Department statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ random variables ► Economic and business environments are “plagued” with uncertainty – a fancy name to capture the idea that there are several possible outcomes each occurring with a certain probability (likelihood). We represent this type of uncertainty through random variables. ► Roll the dice. Let X represent the side that shows up; for a fair dice each side shows up with equal probability 1/6, thus the representation: 1 2 3 4 5 6 1/6 1/6 1/6 1/6 1/6 1/6 ← possible outcomes ← probability of occurrence ► Stock returns. Let R represent the daily stock returns for IBM shares. The problem here is that the possible outcomes are “too many” to allow a representation as above. This is a continuous random variable and in such cases the representation consists of: - the range of the possible outcomes R + - the likelihood of each possible outcome f(r) The likelihood is not a probability per se but has the interpretation of the how likely is a certain outcome to appear: if f(r1) > f(r2) then the returns are more likely to be around r1 than around r2. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 1 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ density functions ► The function f is called the density function, and you are probably already familiar with one such function – the bell shape “distribution” for a normal random variable. STATA provides a very simple way to plot the standard normal (zero mean and unit standard deviation) density function: twoway function normalden(x), range(-4 4) STATA provides a very simple way to plot the standard normal (zero mean and unit standard deviation) density function: twoway function normalden(x), range(-4 4) Figure 1. Normal density function The normalden(x) is the built-in standard normal density function in STATA, all you need to specify is the range over which you need to plot the function, i.e. range(a b) will indicate that the function will be evaluate for x ranging from a to b. In order to obtain the value of normalden at a certain point x, say x = 1, use display normalden(1). Remark: The density is symmetric around the mean (which is 0 for the standard normal); a fact that is a bit more difficult to see (or check) is that the area under the curve is equal to 1 (this is the continuous counterpart of the property that the sum of probabilities of all possible outcomes is 1). © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 2 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ density functions ► Normal distribution. A more general normal distribution is one with mean and standard deviation . The connection between the density functions (for general and standard) is given by 1 x μ f ( x | μ, σ ) f | 0,1 σ σ normalden((x – )/) STATA provides a very simple way to plot the normal density function for various means and standard deviations: twoway function miunormal = normalden(x-1), range(-4 4) Here we plot the normal density for mean 1 and standard deviation 1. The “miunormal” in front of normalden function is simply for labeling purposes. twoway function sigmanormal = ½*normalden(x/2), range(-4 4) Here we plot the normal density for mean 0 and standard deviation 2. The “sigmanormal” in front of normalden function is simply for labeling purposes. Don’t forget to add the range you want! twoway function msnormal = 1/sigma*normalden((x-miu)/sigma), range(-4 4) Here we plot the normal density for mean 0 and standard deviation 2. The “msnormal” in front of normalden function is simply for labeling purposes. Don’t forget to add the range you want! © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 3 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals ► Normal(1,1) – The shape is maintained as symmetric around the new mean – basically you perform a shift of the curve towards the new mean. business analytics II random variables ◄ density functions ◄ cumulative functions ◄ density functions Figure 2. Normal density function (different means) Developed for Figure 3. Normal density function (different variances) ► Normal(0,2) – The shape is changed: lower “peak” at the mean and fatter tails. The intuition is fairly straightforward: a higher standard deviation means that the outcomes further away from the mean becomes more likely. Remark: The command syntax for plotting several cures on the same graph simply requires to list the functions you want to plot between parentheses. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 4 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ density functions ► The t-distribution. This is a very useful distribution, also known as the Student distribution, for testing hypotheses. The t-distribution has one parameter df – degrees of freedom. For the moment let’s visualize the distribution using STATA. twoway function tden(df,x), range(-4 4) Figure 4. t-distribution density function (different means) Figure 5. t-distribution density function (different df) Remark: On the left df = 2 while on the right we added a t-distribution with df = 20. In both cases the resemblance to the (standard) normal distribution is quite striking! © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 5 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ cumulative functions ► Given a random variable X the function F is called the cumulative distribution function associated to X and is defined as FX(x0) = Pr[X x0] The function F basically provides the probability that the random variable will be less than a given cutoff level. Graphically, FX(x0) is the area under the f (density function) curve to the left of x0. STATA allows you to obtain directly the cumulative distribution function. Figure 6. The Normal distribution density function Using STATA’s command normal(-1) we obtain 0.15865525 which is really the shaded area in the diagram. The curve represents the density for the standard normal distribution (mean is zero and standard deviation is 1). normal(-1) quiz Using the above result suppose you are asked what is FX(0) when X is a normal with mean 1 and standard deviation 1. Will this number be greater or smaller than the result above? © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 6 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ cumulative functions quiz Using the above result suppose you are asked what is FX(0) when X is a normal with mean 1 and standard deviation 1. Will this number be greater or smaller than the result above? Answer A change in mean does not change the shape of the normal density just the “location”, i.e. it shifts the curve horizontally. Figure 7. A shift in the Normal distribution density function In both cases we are calculating the area at the left of the point (mean - 1), in the first case at the left of 0 - 1 = -1 for a normal with mean 0, and in the second case at the left of 1 – 1 = 0 for a normal with mean 1. twoway function normalden(x), range(-4 4) twoway function normalden(x-1), range(-3 5) The result is the same in both cases! © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 7 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ cumulative functions ► The connection between a normal distribution with mean and standard deviation and a standard normal distribution: x F (x | , ) F |0,1 normal((x – )/) normal((x-miu)/sigma) ► It is very easy to calculate the probability that the random variable X is greater than a given cutoff x0: Figure 8. Area to the right of a given cutoff 1 – normal(1) Pr[X x0] 1 FX(x0) Remark: The relation above follows from the fact that the total area under the density curve f is 1. Since FX(x0) is the area under this curve to the left of x0, the area to the right of x0, which is really Pr[X x0], must be 1 FX(x0). © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 8 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ cumulative functions ► For the t distribution STATA provides a command that allows you to calculate directly the probability that t is greater than the cutoff x (thus the “right tail” area): ttail(df,x) Figure 9. Area to the right of a given cutoff Remark: Using STATA’s command ttail(2,1), which gives the probability that a t-variable with 2 degrees of freedom is greater than 1, we get the answer 0.21132487 which is the area under the t-distribution density curve and to the right of 1. To calculate the area to the left of some given cutoff x, we use of course 1 – ttail(df,x) ttail(2,1) © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 9 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ cumulative functions ► The cumulative function provides the answer to the question: “what is the area, under the density function, to the left of a given cutoff point x0?” ► The inverse function will answer the question: “what is the cutoff x0 for which the area, under the density function, to the left of x0 is equal to a given area ?” ► Obviously these two questions are really the “two-sides of the same coin”: the connection between the cutoff x0 and area is simply FX(x0) The cumulative function solves for when x0 is given while the inverse function solves for x0 when is given. In STATA, given a number between 0 and 1 we find x0 using: invnormal() invttail(df,) Remark: If you use invttail(df,) you will obtain the x0 such that the area, under the distribution curve, to the right of x0 is exactly . Since the density curve for the t distribution is symmetric around 0, the area to the left of –x0 exactly equals the area to the right of x0. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session one- statistics appendix | page 10 Managerial Economics & Decision Sciences Department session one – statistics appendix statistical models: hypotheses, tests & confidence intervals Developed for business analytics II random variables ◄ density functions ◄ cumulative functions ◄ cumulative functions ► Below is an example for = 0.09175171 (you may check that ttail(2,2) = 0.09175171) Figure 10. Inverse cumulative function for t distribution –invttail(2,0.09175171) © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II invttail(2,0.09175171) session one- statistics appendix | page 11