Download Business Analytics II - Winter 2016 - Session 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Managerial Economics &
Decision Sciences Department
Developed for
business analytics II
week 1
▌statistical models: hypotheses, tests & confidence intervals
week 2
statistics appendix
week 3
random variables 
density functions 
cumulative functions 
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one – statistics appendix
Managerial Economics &
Decision Sciences Department
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
random variables
► Economic and business environments are “plagued” with uncertainty – a fancy name to capture the idea that
there are several possible outcomes each occurring with a certain probability (likelihood). We represent this type
of uncertainty through random variables.
► Roll the dice. Let X represent the side that shows up; for a fair dice each side shows up with equal probability 1/6,
thus the representation:
1
2
3
4
5
6
1/6
1/6
1/6
1/6
1/6
1/6
← possible outcomes
← probability of occurrence
► Stock returns. Let R represent the daily stock returns for IBM shares. The problem here is that the possible
outcomes are “too many” to allow a representation as above. This is a continuous random variable and in such cases
the representation consists of:
- the range of the possible outcomes   R  +
- the likelihood of each possible outcome
f(r)
The likelihood is not a probability per se but has the interpretation of the how likely is a certain outcome to appear: if
f(r1) > f(r2) then the returns are more likely to be around r1 than around r2.
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 1
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
density functions
► The function f is called the density function, and you are probably already familiar with one such function – the
bell shape “distribution” for a normal random variable.
STATA provides a very simple way to plot the standard normal (zero mean and unit standard deviation) density
function: twoway function normalden(x), range(-4 4)
STATA provides a very simple way to plot the standard normal (zero mean and unit standard deviation) density function:
twoway function normalden(x), range(-4 4)
Figure 1. Normal density function
The normalden(x) is the built-in standard normal density function
in STATA, all you need to specify is the range over which you
need to plot the function, i.e. range(a b) will indicate that the
function will be evaluate for x ranging from a to b. In order to
obtain the value of normalden at a certain point x, say x = 1, use
display normalden(1).
Remark: The density is symmetric around the mean (which is 0
for the standard normal); a fact that is a bit more difficult to see
(or check) is that the area under the curve is equal to 1 (this is
the continuous counterpart of the property that the sum of
probabilities of all possible outcomes is 1).
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 2
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
density functions
► Normal distribution. A more general normal distribution is one with mean  and standard deviation . The
connection between the density functions (for general and standard) is given by
1 x μ

f ( x | μ, σ )  f 
| 0,1
σ  σ

normalden((x – )/)
STATA provides a very simple way to plot the normal density function for various means and standard deviations:
twoway function miunormal = normalden(x-1), range(-4 4)
Here we plot the normal density for mean 1 and standard deviation 1. The “miunormal” in front of normalden function is simply for
labeling purposes.
twoway function sigmanormal = ½*normalden(x/2), range(-4 4)
Here we plot the normal density for mean 0 and standard deviation 2. The “sigmanormal” in front of normalden function is simply for
labeling purposes. Don’t forget to add the range you want!
twoway function msnormal = 1/sigma*normalden((x-miu)/sigma), range(-4 4)
Here we plot the normal density for mean 0 and standard deviation 2. The “msnormal” in front of normalden function is simply for
labeling purposes. Don’t forget to add the range you want!
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 3
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
► Normal(1,1) – The shape is maintained as
symmetric around the new mean – basically you
perform a shift of the curve towards the new mean.
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
density functions
Figure 2. Normal density function (different means)
Developed for
Figure 3. Normal density function (different variances)
► Normal(0,2) – The shape is changed:
lower
“peak” at the mean and fatter tails. The intuition is
fairly straightforward: a higher standard deviation
means that the outcomes further away from the
mean becomes more likely.
Remark: The command syntax for plotting several cures on the same graph simply requires to list the functions you want to plot
between parentheses.
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 4
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
density functions
► The t-distribution. This is a very useful distribution, also known as the Student distribution, for testing
hypotheses. The t-distribution has one parameter df – degrees of freedom. For the moment let’s visualize the
distribution using STATA.
twoway function tden(df,x), range(-4 4)
Figure 4. t-distribution density function (different means)
Figure 5. t-distribution density function (different df)
Remark: On the left df = 2 while on the right we added a t-distribution with df = 20. In both cases the resemblance to the
(standard) normal distribution is quite striking!
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 5
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
cumulative functions
► Given a random variable X the function F is called the cumulative distribution function associated to X and is
defined as
FX(x0) = Pr[X  x0]
The function F basically provides the probability that the random variable will be less than a given cutoff level.
Graphically, FX(x0) is the area under the f (density function) curve to the left of x0. STATA allows you to obtain directly
the cumulative distribution function.
Figure 6. The Normal distribution density function
Using STATA’s command normal(-1) we obtain
0.15865525 which is really the shaded area in the
diagram. The curve represents the density for the
standard normal distribution (mean is zero and standard
deviation is 1).
normal(-1)
quiz
Using the above result suppose you are
asked what is FX(0) when X is a normal
with mean 1 and standard deviation 1. Will
this number be greater or smaller than the
result above?
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 6
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
cumulative functions
quiz Using the above result suppose you are asked what is FX(0) when X is a normal with mean 1 and standard
deviation 1. Will this number be greater or smaller than the result above?
Answer A change in mean does not
change the shape of the normal
density just the “location”, i.e. it shifts
the curve horizontally.
Figure 7. A shift in the Normal distribution density function
In both cases we are calculating the
area at the left of the point (mean - 1),
in the first case at the left of 0 - 1 = -1
for a normal with mean 0, and in the
second case at the left of 1 – 1 = 0 for
a normal with mean 1.
twoway function normalden(x), range(-4 4)
twoway function normalden(x-1), range(-3 5)
The result is the same in both cases!
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 7
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
cumulative functions
► The connection between a normal distribution with mean  and standard deviation  and a standard normal
distribution:
x
F (x |  , )  F 
|0,1 
 

normal((x – )/)
normal((x-miu)/sigma)
► It is very easy to calculate the probability that the
random variable X is greater than a given cutoff x0:
Figure 8. Area to the right of a given cutoff
1 – normal(1)
Pr[X  x0]  1  FX(x0)
Remark: The relation above follows from the fact that the total
area under the density curve f is 1. Since FX(x0) is the area
under this curve to the left of x0, the area to the right of x0,
which is really Pr[X  x0], must be 1  FX(x0).
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 8
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
cumulative functions
► For the t distribution STATA provides a command that allows you to calculate directly the probability that t is
greater than the cutoff x (thus the “right tail” area):
ttail(df,x)
Figure 9. Area to the right of a given cutoff
Remark: Using STATA’s command ttail(2,1), which gives the
probability that a t-variable with 2 degrees of freedom is
greater than 1, we get the answer 0.21132487 which is the
area under the t-distribution density curve and to the right of 1.
To calculate the area to the left of some given cutoff x, we use
of course 1 – ttail(df,x)
ttail(2,1)
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 9
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
cumulative functions
► The cumulative function provides the answer to the question: “what is the area, under the density function, to the
left of a given cutoff point x0?”
► The inverse function will answer the question: “what is the cutoff x0 for which the area, under the density function,
to the left of x0 is equal to a given area ?”
► Obviously these two questions are really the “two-sides of the same coin”: the connection between the cutoff x0
and area  is simply
FX(x0)  
The cumulative function solves for  when x0 is given while the inverse function solves for x0 when  is given. In
STATA, given a number  between 0 and 1 we find x0 using:
invnormal()
invttail(df,)
Remark: If you use invttail(df,) you will obtain the x0 such that the area, under the distribution curve, to the
right of x0 is exactly . Since the density curve for the t distribution is symmetric around 0, the area to the
left of –x0 exactly equals the area to the right of x0.
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session one- statistics appendix | page 10
Managerial Economics &
Decision Sciences Department
session one – statistics appendix
statistical models: hypotheses, tests & confidence intervals
Developed for
business analytics II
random variables ◄
density functions ◄
cumulative functions ◄
cumulative functions
► Below is an example for  = 0.09175171 (you may check that ttail(2,2) = 0.09175171)
Figure 10. Inverse cumulative function for t distribution
–invttail(2,0.09175171)
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
invttail(2,0.09175171)
session one- statistics appendix | page 11