* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Formula Sheets
Sufficient statistic wikipedia , lookup
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Inductive probability wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Law of large numbers wikipedia , lookup
German tank problem wikipedia , lookup
Misuse of statistics wikipedia , lookup
ENGI 3423 Suggestions for Formula Sheets Page 13-01 Some formulæ for ENGI 3423 Probability and Statistics On these pages are many more formulae than will fit comfortably on your allocation of two sheets in the final examination. Examine these suggestions together with your own notes, in order to construct a more concise set of formula sheets that will work best for you. Descriptive Statistics median = middle value in ordered set mode = most frequently occurring value 1 mean = x x n 1 2 s x x n 1 2 or s 2 n x 2 x 2 n n 1 From a frequency table (fi occurrences of distinct value xi observed): fi xi , n f mean = x i fi 1 2 s fi xi x n 1 2 or s 2 n fi xi 2 fi xi 2 n n 1 Quartiles: in ordered set { x1, x2, ... , xn } are xn/4 and x3n/4 (interpolate as necessary) Histogram: (area of a bar) = (rel. freq. of that interval) Boxplot: ENGI 3423 Suggestions for Formula Sheets Page 13-02 Laws of Probability Odds [on] = r and probability = p: p r , p 1 p 1 r deMorgan’s laws: ~(A B) = ~A ~B and ~(A B) = ~A ~B General addition law of probability: P[A B] = P[A] + P[B] P[A B] General multiplication law of probability: P[A B] = P[A] P[B | A] = P[B] P[A|B] Events A, B are independent iff P[A B] = P[A] P[B] Events A, B are incompatible (mutually exclusive) iff P[A B] = 0 r If {E1, E2, ... , En } is a partition, then Total Probability Law: P A P A Ei P A | Ei P Ei i and Bayes’ Theorem: P Ek | A i P A | Ek P Ek P A | Ei P Ei i Counting Techniques: Number of distinct ways of selecting r objects from n objects: with replacement, order of selection matters = nr without replacement, order of selection matters = nPr = n! / (nr)! without replacement, order of selection doesn’t matter = nCr = n! / (r! (nr)!) Number of distinct ways to partition n objects into piles of r1, r2, ... , rk objects: n n! r r r1 ! r2 ! rk ! 1 2 rk Conditions for p(x) to be a probability mass function (p.m.f.): p x 0 x and p x 1 coherence condition all x Conditions for f (x) to be a probability density function (p.d.f.): f x 0 x and f x dx 1 coherence condition Cumulative Distribution Function (c.d.f.): Discrete: Continuous: F x p x y x staircase Median: F 1 2 F x x f x dx P a X b F b F a ogive Lower quartile: F xL 1 Upper quartile: F xU 3 4 4 ENGI 3423 Suggestions for Formula Sheets Page 13-03 Expected value: E X x p x E h X Variance: x f x dx or x h x p x h x f x dx or x 2 = V[X] = E[(X)2] = E[X 2] (E[X])2 Discrete Uniform distribution: 1 x x1 , x2 , , xn p x n , 0 otherwise 1 x n Binomial distribution bin(n, p): P[X = x] = b(x; n, p) = nCx px (1p)nx , = np , 2 = np(1p) P[X x] = B(x; n, p) = b(0; n, p) + b(1; n, p) + ... + b(x; n, p) Four conditions for X to be binomial: 1) Each trial has a complementary pair of outcomes; 2) p = P[success] is the same for all trials; 3) Trials are all independent of each other; 4) The number of trials, n, is fixed. Poisson distribution Poisson(): e x x 0, 1, 2, p x; x ! ; 2 0 otherwise [Hypergeometric distribution – not directly examinable in this semester]: M successes in N items in the population; sample of size n drawn. The probability of observing x successes in the sample is h x; n , M , N M N M x n x N n 0 np , 2 np 1 p x integer in max 0, n N M , min n, M N n , N 1 otherwise where p M N ENGI 3423 Suggestions for Formula Sheets Joint Probability Mass Functions: p(x, y) = P[(X = x) (Y = y)] Marginal p.m.f.’s: p X x p x, y , pY y y Page 13-04 p x, y x Conditional p.m.f.’s: p x, y p x, y pY |X y | x , p X |Y x | y pX x pY y Covariance: Cov[X, Y] = E[(XX)(YY)] = E[XY] E[X]E[Y] V[X] = Cov[X, X] 0 Cov X , Y ; 1 1 Correlation: X ,Y V X V Y Independence = 0 ; (but = 0 independence) Continuous Uniform distribution U(a, b): 0 1 xa f x a x b ; P X x F x ba b a 1 2 b a ab 2 , 2 12 Exponential distribution: 0 0 x 0 f x x ; P X x F x x x 0 e 1 e P[X x] = ex (x 0) ; = = 1/ x a a x b x b x 0 x 0 Normal distribution N(, 2): To convert X ~ N(, 2) to Z ~ N(0, 1) : X Z ; then P a X b P za Z zb zb za Combinations of Random Quantities: E[aX + bY] = a E[X] + b E[Y] and V[aX + bY] = a2 V[X] + 2ab Cov[X, Y] + b2 V[Y] for all constants a, b and all random quantities X, Y. Special case, when X and Y are independent (or at least uncorrelated): E[X Y] = E[X] E[Y] and V[X Y] = V[X] + V[Y] ENGI 3423 Suggestions for Formula Sheets Page 13-05 Central Limit Theorem: For any random quantity X with E[X] = , V[X] = 2 and sufficiently large sample size n 2 X X ~ N , and Z ~ N 0, 1 n n Bayesian Confidence Intervals: If prior information is X ~ N(o, o2) and data from random sample: size n, mean = x , standard deviation = s, then Calculate weights 1 n w 2 , wd 2 w f w wd s and updated mean and variance w x w 1 2 f d , f2 Posterior dist'n N f , f wf wf Then the (1)100% confidence interval estimate for is f t , n1 f f t , n1 f 2 2 [Note: if is known, then it replaces s , and t becomes z. Otherwise, the true number of degrees of freedom on t is actually a number between n 1 and . The interval shown above is a conservative approximation.] Classical Confidence Intervals: s2 n Then the (1)100% confidence interval estimate for is s s x t , n 1 x t , n 1 2 2 n n 2 w 0 f x and f 2 Classical One Sample Hypothesis Tests: Ho : = o . Select the appropriate Ha and level of significance . Two tailed test, Ha : o : s Method 1: Evaluate cL , cU t , n 1 2 n Reject Ho iff x cL or x cU Method 2: Reject Ho iff x tobs t , n 1 , where tobs 2 s n ENGI 3423 Suggestions for Formula Sheets Page 13-06 x P T tobs , where tobs s n Upper tailed test, Ha : > o : s Method 1: Evaluate c t , n 1 n Reject Ho iff x c x Method 2: Reject Ho iff tobs t , n1 , where tobs s n x Method 3: Reject Ho iff P T tobs , where tobs s n Lower tailed test, Ha : < o : s Method 1: Evaluate c t , n 1 n Reject Ho iff x c x Method 2: Reject Ho iff tobs t , n1 , where tobs s n x Method 3: Reject Ho iff P T tobs , where tobs s n Modify the above if the sample size is large and/or 2 is known. Method 3: Reject Ho iff Classical Two Sample Hypothesis Tests: Paired if equal-size samples are pairs of observations on the same set of individuals. Unpaired otherwise – but must be uncorrelated. Paired Test: Find all differences di = xi yi sample mean d and sample standard deviation sD. Proceed as in one sample case, with D replacing . Upper tailed method 1 example: Test Ho : D = o vs. Ha : D > o . Choose . s Evaluate Reject Ho iff d c . c t , n 1 D n ENGI 3423 Suggestions for Formula Sheets Page 13-07 Unpaired Test, Large Samples, Variances Possibly Unequal: Upper tailed method 1 example: Test Ho : 1 2 = o vs. Ha : 1 2 > o . Choose . Degrees of freedom: 1 = n1 1 , 2 = n2 1 and = 1 + 2 . c t , Evaluate s12 s2 2 n1 n2 Reject Ho iff x1 x2 c With large samples, t, may be replaced by z . Unpaired Test, Small Samples, 1 = 2: Random quantities must follow normal distributions (approximately). Upper tailed method 1 example: Test Ho : 1 2 = o vs. Ha : 1 2 > o . Choose . Degrees of freedom: 1 = n1 1 , 2 = n2 1 and = 1 + 2 . s 2 2 s2 2 Evaluate pooled sample variance sP 2 1 1 c t , sP Evaluate 1 1 n1 n2 Reject Ho iff x1 x2 c Unpaired Test, Small Samples, 1 2: [beyond the scope of this course] Simple Linear Regression: Model: Y = 1x + 0 + , ~ N(0, 2) Check for linear trend with constant error variance. Find summary statistics: n x y x2 xy Evaluate y 2 n S xx n x 2 x , n S xy n xy x y , n S yy n y 2 y then S 1 ˆ1 xy , ˆ0 y ˆ1 x S xx n The line of best fit in the least squares sense is y ˆ1 x ˆ 0 Entries in the ANOVA table: 2 SSR yˆ y 2 SSE y yˆ 2 MSR SSR R n S xy 2 n n S xx , SST SST SSR SSR 2 S yy n S xx n S yy n S xy n n S xx s 2 MSE , y y SSE E SSE n2 , n S yy n 2 f t2 MSR MSE 2 ENGI 3423 Suggestions for Formula Sheets Page 13-08 Source d.f. Sum Sq. Mean Sq. R 1 SSR MSR E SSE MSE n 2 T SST n 1 f f n S xx n S yy n S xy MSE Variance of slope ˆ1 : sb 2 S xx n 2 n S xx Coefficient of Determination: 2 2 n S xy SSR R2 SST n S xx n S yy 2 Correlation coefficient r sign ˆ1 R 2 Hypothesis tests on the linear association of X and Y: To test Ho : = 0 vs. Ha : 0 (or, equivalently, Ho : 1 = 0 vs. Ha : 1 0): Use any of t r n2 1 r2 t , ˆ1 0 sb n S n 2 n S n S n S 2 t , MSR MSE , xx To test Ho : 1 = 10 vs. Ha : 1 > 1o use ˆ t 1 1o or t n S xy 1o n S xx sb t xy yy n 2 n S xx n S yy n S xy The (1)100% confidence interval estimate for = E[Y | x=xo] is ˆ0 ˆ1 xo t / 2, ( n 2) s n xo x 1 n n S xx 2 The (1)100% prediction interval estimate for Y | x=xo is ˆ0 ˆ1 xo t / 2, ( n 2) s n xo x 1 1 n n S xx 2 Decision Tree [See examples in problem sets 3, 4, 5, past exams and additional exercises.] 2 xy 2