Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Business Statistics 41000 Fact Sheet List of topics • basic probability • mutually exclusive versus independent • odds versus probability • law of total probability • independence • Bayes’ rule • expected value • variance/standard deviation • correlation • linear combinations • normal distribution • 68-95-99.7 • binomial distribution • normal approximation to the binomial • line of best-fit • histograms • scatterplots • cumulative distribution functions • Simpson’s paradox • mean reversion Most, but not all, of these topics, are described formally via the formulas below. 1 Probability rules • Probabilities of mutually exclusive events add together. When you roll a die it cannot come up both 2 and 5, so the probability of it being 2 or 5 is P(2) + P(5). • The overlap formula: P(A or B) = P(A) + P(B) − P(A and B). • Definition of conditional probability: P(A | B) = P(A and B) . P(B) • Compositional form of joint probability: P(A and B) = P(A | B)P(B) • Law of total probability: P(A) = P(A and B) + P(A and not-B) = P(A | B)P(B) + P(A | not-B)P(not-B). This generalizes to any number of mutually disjoint events Bj , which we can express with summation notation: P(A) = X P(A and Bj ) j = X P(A | Bj )P(Bj ). j • Putting the compositional form of joint probability together with the definition of conditional probability leads to Bayes’ rule: P(A | B) = P(B | A)P(A) P(A and B) = . P(B) P(B) • If events A and B are independent: P(A and B) = P(A)P(B) and P(A | B) = P(A). Expectations (mean, standard deviation, correlation) • The expected value of a random variable, denoted E(X) and often shortened as µX , is a weighted sum: µX = E(X) = X x P(X = x) x where the sum is taken over all of the values that X can attain. • More generally, a function of g(X) has expected value E(g(X)) = X x 2 g(x)P(X = x). For example, if g(X) = X 2 then the formula becomes E(X 2 ) = the formula is E(XY ) = P x,y P xx 2 P(X = x). If it is g(X, Y ) = XY then x · y P(X = x, Y = y), where the sum is over all possible combinations of x and y . • The variance of a random variable X , denoted V(X) is the expectation of the function g(X) = (X − E(X))2 , so V(X) = X (x − µX )2 P(X = x). x • A computational shortcut for the variance is the expression V(X) = E(X 2 ) − E(X)2 . • The standard deviation of a random variable X , commonly denoted σX , is the square root of its variance: p V(X). • The correlation between two random variables X and Y , denoted corr(X, Y ) is defined as: P P x corr(X, Y ) = y (x − µX ) (y − µY ) P(X = x, Y = y) σX σY . • A shortcut formula for the correlation is: corr(X, Y ) = E(XY ) − E(X)E(Y ) σX σY = cov(X, Y ) σX σY . • The expectation of a weighted sum is the weighted sum of the expectations, meaning that E(aX + bY ) = aE(X) + bE(Y ), for any numbers a and b. • This formula generalizes to sums of more than two terms, which we can express using summation notation: E n X ! ci Xi = i=1 n X ci E(Xi ). i=1 • The weighted sum aX + bY of two random variables, for any numbers a and b, has variance given by the expression V(aX + bY ) = a2 V(X) + b2 V(Y ) + 2abcov(X, Y ). • When the variables are independent this equation simplifies to V(aX + bY ) = a2 V(X) + b2 V(Y ). • Again, this formula generalizes to sums of more than two terms, meaning for independent random variables Xi , i = 1, . . . , n we can write: V n X ! ci Xi i=1 = n X i=1 3 c2i V(Xi ). Common distributions • Normal. If X ∼ N(µ, σ 2 ) then E(X) = µ. V(X) = σ 2 . The mean, median and mode are all µ. X is a continuous variable taking any value over the whole real line. • Binomial If Y ∼ Bin(n, p), then Y reflects the number of successes in n independent tosses with success probability p. E(Y ) = np and V(Y ) = np(1 − p). For large values of n, say over 50, the binomial probability mass function is very close to the density of a N(np, np(1 − p)) distribution. Binomial probabilities are given by: n k Pr(k) = p (1 − p)n−k , k where n n! = k k!(n − k)! and n! = n · (n − 1) · (n − 2) . . . 2 · 1. • 68-95-99.7 rule If a distribution is approximately normal, then (approximately): 68% of the probability mass is within 1 standard deviation of the mean, 95% is within about 2 standard deviations of the mean and 99.7% is within about 3 standard deviations of the mean. Simple linear regression • For any two random variables X and Y , the values of a and b minimizing the expected squared loss E((Y − (a + bX)2 )) are: β1 = ρ σY σX β0 = µY − bµX , where ρ = corr(X, Y ), µx = E(X), µY = E(Y ), σX = p V(X) and σY = p V(Y ). • To get the least-squares line-of-best fit between an outcome variable y and a predictor variable x, apply the previous result to the empirical distribution, substituting in sample versions of the necessary quantities: replace µX with x̄, the sample mean of x, and µY with ȳ , the sample mean of y . Likewise, compute the sample standard deviations and the sample correlation. In other words, the least-squares estimate is: β̂1 = ρ̂ σ̂y σ̂x β̂0 = ȳ − bx̄. 4 1 Hypothesis tests and confidence intervals based on the normal distribution Conducting hypothesis tests and constructing confidence intervals concerning mean parameters of normal distributions requires knowing the variance of the statistic. In practice we have to estimate this variance and the procedure we use to estimate it will depend on the sort of analysis being performed. We have several specific cases to consider. In the following let σ̂ 2 denote the estimated variance of the data and n denote the sample size. • simple mean: use the observed data variance divided by the sample size: σ̂ 2 /n. • hypothesis test for simple proportion: σ02 = p0 (1 − p0 )/n where p0 is the population proportion under the null hypothesis. • confidence interval for simple proportion: σ̂ 2 = p̂(1 − p̂)/n where p̂ is the observed proportion. • simple difference between two means: σ̂ 2 = σ̂x2 /nx + σ̂y2 /ny where σ̂x2 and σ̂y2 denote the observed data variance of the two respective groups and nx and ny are the respective sample sizes. • difference between two proportions: σ̂ 2 = p̂x (1 − p̂x )/nx + p̂y (1 − p̂y )/ny . • difference between two means, paired samples: the observed variance of the differences, divided by n. 2 Simple linear regression The main outputs of any linear regression software are point estimates and standard errors for the regression coefficient and the intercept parameters. The can be used to construct confidence intervals and perform hypothesis tests because the sampling distribution of the least squares estimators are b0 ∼ N(β0 , s20 ) and b1 ∼ N (β1 , s21 ) for the intercept and slope respectively. The software provides b0 and b1 in terms of an estimated coefficient and the standard errors, s0 and s1 . From there, apply what you know about normal hypothesis testing. 5