Download Formula sheet - The University of Chicago Booth School of Business

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Business Statistics 41000 Fact Sheet
List of topics
• basic probability
• mutually exclusive versus independent
• odds versus probability
• law of total probability
• independence
• Bayes’ rule
• expected value
• variance/standard deviation
• correlation
• linear combinations
• normal distribution
• 68-95-99.7
• binomial distribution
• normal approximation to the binomial
• line of best-fit
• histograms
• scatterplots
• cumulative distribution functions
• Simpson’s paradox
• mean reversion
Most, but not all, of these topics, are described formally via the formulas below.
1
Probability rules
• Probabilities of mutually exclusive events add together. When you roll a die it cannot come up both 2 and 5,
so the probability of it being 2 or 5 is P(2) + P(5).
• The overlap formula: P(A or B) = P(A) + P(B) − P(A and B).
• Definition of conditional probability: P(A | B) =
P(A and B)
.
P(B)
• Compositional form of joint probability: P(A and B) = P(A | B)P(B)
• Law of total probability:
P(A) = P(A and B) + P(A and not-B) = P(A | B)P(B) + P(A | not-B)P(not-B).
This generalizes to any number of mutually disjoint events Bj , which we can express with summation notation:
P(A) =
X
P(A and Bj )
j
=
X
P(A | Bj )P(Bj ).
j
• Putting the compositional form of joint probability together with the definition of conditional probability leads to
Bayes’ rule:
P(A | B) =
P(B | A)P(A)
P(A and B)
=
.
P(B)
P(B)
• If events A and B are independent: P(A and B) = P(A)P(B) and P(A | B) = P(A).
Expectations (mean, standard deviation, correlation)
• The expected value of a random variable, denoted E(X) and often shortened as µX , is a weighted sum:
µX = E(X) =
X
x P(X = x)
x
where the sum is taken over all of the values that X can attain.
• More generally, a function of g(X) has expected value
E(g(X)) =
X
x
2
g(x)P(X = x).
For example, if g(X) = X 2 then the formula becomes E(X 2 ) =
the formula is E(XY ) =
P
x,y
P
xx
2
P(X = x). If it is g(X, Y ) = XY then
x · y P(X = x, Y = y), where the sum is over all possible combinations of x
and y .
• The variance of a random variable X , denoted V(X) is the expectation of the function g(X) = (X − E(X))2 ,
so
V(X) =
X
(x − µX )2 P(X = x).
x
• A computational shortcut for the variance is the expression V(X) = E(X 2 ) − E(X)2 .
• The standard deviation of a random variable X , commonly denoted σX , is the square root of its variance:
p
V(X).
• The correlation between two random variables X and Y , denoted corr(X, Y ) is defined as:
P P
x
corr(X, Y ) =
y (x
− µX ) (y − µY ) P(X = x, Y = y)
σX σY
.
• A shortcut formula for the correlation is:
corr(X, Y ) =
E(XY ) − E(X)E(Y )
σX σY
=
cov(X, Y )
σX σY
.
• The expectation of a weighted sum is the weighted sum of the expectations, meaning that E(aX + bY ) =
aE(X) + bE(Y ), for any numbers a and b.
• This formula generalizes to sums of more than two terms, which we can express using summation notation:
E
n
X
!
ci Xi
=
i=1
n
X
ci E(Xi ).
i=1
• The weighted sum aX + bY of two random variables, for any numbers a and b, has variance given by the
expression
V(aX + bY ) = a2 V(X) + b2 V(Y ) + 2abcov(X, Y ).
• When the variables are independent this equation simplifies to V(aX + bY ) = a2 V(X) + b2 V(Y ).
• Again, this formula generalizes to sums of more than two terms, meaning for independent random variables
Xi , i = 1, . . . , n we can write:
V
n
X
!
ci Xi
i=1
=
n
X
i=1
3
c2i V(Xi ).
Common distributions
• Normal. If X ∼ N(µ, σ 2 ) then E(X) = µ. V(X) = σ 2 . The mean, median and mode are all µ. X is a
continuous variable taking any value over the whole real line.
• Binomial If Y ∼ Bin(n, p), then Y reflects the number of successes in n independent tosses with success
probability p. E(Y ) = np and V(Y ) = np(1 − p). For large values of n, say over 50, the binomial probability
mass function is very close to the density of a N(np, np(1 − p)) distribution. Binomial probabilities are given
by:
n k
Pr(k) =
p (1 − p)n−k ,
k
where
n
n!
=
k
k!(n − k)!
and n! = n · (n − 1) · (n − 2) . . . 2 · 1.
• 68-95-99.7 rule If a distribution is approximately normal, then (approximately): 68% of the probability mass is
within 1 standard deviation of the mean, 95% is within about 2 standard deviations of the mean and 99.7% is
within about 3 standard deviations of the mean.
Simple linear regression
• For any two random variables X and Y , the values of a and b minimizing the expected squared loss E((Y −
(a + bX)2 )) are:
β1 = ρ
σY
σX
β0 = µY − bµX ,
where ρ = corr(X, Y ), µx = E(X), µY = E(Y ), σX =
p
V(X) and σY =
p
V(Y ).
• To get the least-squares line-of-best fit between an outcome variable y and a predictor variable x, apply
the previous result to the empirical distribution, substituting in sample versions of the necessary quantities:
replace µX with x̄, the sample mean of x, and µY with ȳ , the sample mean of y . Likewise, compute the
sample standard deviations and the sample correlation. In other words, the least-squares estimate is:
β̂1 = ρ̂
σ̂y
σ̂x
β̂0 = ȳ − bx̄.
4
1
Hypothesis tests and confidence intervals based on the normal distribution
Conducting hypothesis tests and constructing confidence intervals concerning mean parameters of normal distributions requires knowing the variance of the statistic. In practice we have to estimate this variance and the procedure
we use to estimate it will depend on the sort of analysis being performed. We have several specific cases to consider.
In the following let σ̂ 2 denote the estimated variance of the data and n denote the sample size.
• simple mean: use the observed data variance divided by the sample size: σ̂ 2 /n.
• hypothesis test for simple proportion: σ02 = p0 (1 − p0 )/n where p0 is the population proportion under the
null hypothesis.
• confidence interval for simple proportion: σ̂ 2 = p̂(1 − p̂)/n where p̂ is the observed proportion.
• simple difference between two means: σ̂ 2 = σ̂x2 /nx + σ̂y2 /ny where σ̂x2 and σ̂y2 denote the observed data
variance of the two respective groups and nx and ny are the respective sample sizes.
• difference between two proportions: σ̂ 2 = p̂x (1 − p̂x )/nx + p̂y (1 − p̂y )/ny .
• difference between two means, paired samples: the observed variance of the differences, divided by n.
2
Simple linear regression
The main outputs of any linear regression software are point estimates and standard errors for the regression coefficient and the intercept parameters. The can be used to construct confidence intervals and perform hypothesis
tests because the sampling distribution of the least squares estimators are b0 ∼ N(β0 , s20 ) and b1 ∼ N (β1 , s21 ) for
the intercept and slope respectively. The software provides b0 and b1 in terms of an estimated coefficient and the
standard errors, s0 and s1 . From there, apply what you know about normal hypothesis testing.
5