Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume a population consists of N values. Sample: An observed subset of population values. We assume a sample consists of n values. Graphic Summaries Dotplot - A graphic to display the original data. Draw a number line, and put a dot on it for each observation. For identical or really close observations, stack the dots. . . : .: .: : . . : .:..:: :: :. . .. . : . ::.: :.:.:::::::::::..::::::. ... : . +---------+---------+---------+---------+---------+------40 50 60 70 80 90 Histogram - A graphic that displays the shape of numeric data by grouping it in intervals. 1. Choose evenly spaced categories 2. Count the number of observations in each group 3. Draw a bar graph with the height of each bar equal to the number of observations in the corresponding interval. Stem and leaf plot Similar to a dotplot. Data are grouped according to their leading digits, and the last digit is used as a plotting symbol (like a dot in the dotplot). The left digits are a cumulative count on each side of the middle. The bracketed number is how many observations are in the middle. The middle column of digits are the first digits of the number, and the “bars” are the last digit. 2 4 57 5 5 133 13 5 56777899 23 6 0222334444 39 6 5667777788888899 (18) 7 000000111222222334 38 7 5555555666668899999999 16 8 0022333444 2 Numeric Summaries of Data Suppose that we have n observations, labeled x1 , x2 , . . . , xn . P Then ni=1 xi means n X xi = x 1 + x 2 + x 3 + . . . + x n i=1 Some other relations are: n X f (xi ) = f (x1 ) + f (x2 ) + f (x3 ) + . . . + f (xn ), for any function f , i=1 n X i=1 n X cxi = c n X xi for any constant c, i=1 (axi + byi ) = a i=1 n X xi + b i=1 n X yi , for any constants a and b. i=1 Measures of Location - numeric summaries of the center of a distribution. Mean (or average) N 1 X µ= xi N i=1 (population) n 1X x= xi n i=1 (sample) Median - the middle observation The middle observation of the sorted data if n is odd, otherwise the average of the two middle values. Measures of Dispersion - numeric summaries of the spread or variation of a distribution. Variance and Standard Deviation σ2 = N 1 X (xi − µ)2 N i=1 (population) s2 = n 1 X (xi − x)2 n − 1 i=1 Population and sample standard deviations (σ and s) are just the square roots of these quantities. Interpreting σ Chebyshev’s rule: for any population at least 75% of the observations lie within 2σ of µ, at least 89% of the observations lie within 3σ of µ, at least 100(1 − 1/m2 )% of the observations lie within m × σ of the mean µ. 3 IQR (Interquartile Range): The distance between the (n + 1)/4th and 3 × (n + 1)/4th observations in an ordered dataset. These two values are called the first and third quartiles. Measure of symmetry : Skewness skewness = n X i=1 (xi − x)3 /n s3 Negative means left skewed, 0 means symmetric, positive means right skewed. Measure of heavy tails : Kurtosis kurtosis = n X i=1 (xi − x)4 /n −3 s4 A normal distribution has a kurtosis of 3, so if we subtract 3, interpretations are relative to that. Positive values mean a sharper peak, and negative values mean a flatter top than a normal distribution. Box-and-whisker plot: A graphic that summarizes the data using the median and quartiles, and displays outliers. Good for comparing several groups of data o first quartile median third quartile 4 largest value below Q3 + 1.5 IQR outlier Probability (Ch. 5) Definitions and Set Theory: Random experiment: A process leading to at least two possible outcomes with uncertainty as to which will occur. Basic outcome: A possible outcome of the random experiment. Sample space: The set of all basic outcomes. Event: A set of basic outcomes from the sample space. An event is said to occur if any one of its constituent basic outcomes occurs. Combining events: let A and B be two events. Technical Symbol Pronounced Meaning Union A∪B A or B A occurs or B occurs or both occur Intersection A∩B A and B A occurs and B occurs Complement A not A A does not occur Venn Diagrams: A A B A B A A B B C shaded =A∩B shaded =A∪B A shaded A, B mutually exclusive A,B,C collectively exhaustive Probability Postulates: 1. If A is any event in the sample space S, 0 ≤ P(A) ≤ 1 2. Let A be an event in S and let Oi denote the basic outcomes. Then P(A) = P A P(Oi ), where the notation implies that the summation extends over all the basic outcomes in A. 3. P(S) = 1 Probability rules for combining events: P(A ∪ B) = P(A) + P(B) − P(A ∩ B) P(A ∪ B) = P(A) + P(B) if A and B mutually exclusive. P(A) = 1 − P(A) Conditional Probability: P(A ∩ B) P(A|B) = provided P(B) > 0. P(B) P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A) 5 Independence: Two events are Statistically Independent if and only if P(A ∩ B) = P(A)P(B) or equivalently P(A|B) = P(A) and P(B|A) = P(B). General case: events E1 , E2 , . . . , Ek are independent if and only if P(E1 ∩ E2 ∩ . . . ∩ Ek ) = P(E1 )P(E2 ) . . . P(Ek ) Bivariate Probability Probabilities of outcomes for bivariate events: B1 B2 B3 A1 P(A1 ∩ B1 ) P(A1 ∩ B2 ) P(A1 ∩ B3 ) P(A1 ) A2 P(A2 ∩ B1 ) P(A2 ∩ B2 ) P(A2 ∩ B3 ) P(A2 ) P(B1 ) P(B2 ) P(B3 ) P(A1 ∩ B2 ) is a probability of A1 and B2 occurring. P(A1 ) = P(A1 ∩ B1 ) + P(A1 ∩ B2 ) + P(A1 ∩ B3 ) is the marginal probability that A1 occurs. If we think of A1 , A2 as a group of attributes A, and B1 , B2 , B3 as a group of attributes B, then A and B are independent only if every one of {A1 , A2 } are independent of every one of {B1 , B2 , B3 }. Discrete Random Variables (Ch. 6-7) Definitions Random Variable: (r.v.) A variable that takes on numerical values determined by the outcome of a random experiment. Discrete Random Variable: A r.v. that can take on no more than a countable number of values. Continuous Random Variable: A r.v. that can take any value in an interval. Notation: An upper case letter (e.g. X) will represent a r.v.; a lower case letter (e.g. x) will represent one of its possible values. 6 Discrete Probability Distributions The probability function, PX (x), of a discrete r.v. X gives the probability that X takes the value x: PX (x) = P(X = x) where the function is evaluated at all possible values of x. Properties: 1. PX (x) ≥ 0 for any value x 2. P x PX (x) = 1 Cumulative probability function, FX (x0 ) of a r.v. X: FX (x0 ) = P (X ≤ x0 ) = X PX (x). x≤x0 Properties: 1. 0 ≤ FX (x) ≤ 1 for any x 2. If a < b, then FX (a) ≤ FX (b). Expectation of Discrete Random Variables Expected value of a discrete r.v.: E(X) = µX = X xPX (x). x For any function g(X), E(g(X)) = X g(x)PX (x). x Variance of a discrete r.v.: 2 Var(X) = σX = E((X − µX )2 ) = X x (x − µX )2 PX (x) = E(X 2 ) − µ2X . The standard deviation of X is σX . Plug-In Rules: let X be a r.v., and a and b constants. Then E(a + bX) = a + bE(X) Var(a + bX) = b2 Var(X). This only works for linear functions. 7 Jointly Distributed Discrete Random Variables Joint Probability Function: Suppose X and Y are r.v.’s. Their joint probability function gives the probability that simultaneously X = x and Y = y: PX,Y (x, y) = P({X = x} ∩ {Y = y}) Properties: 1. PX,Y (x, y) ≥ 0 for any pair (x, y) 2. P P x y PX,Y (x, y) = 1. Marginal probability function: PX (x) = X PX,Y (x, y). y Conditional probability function: PX,Y (x, y) PX (x) PY |X (y|x) = Independence: X and Y are independent if and only if PX,Y (x, y) = PX (x)PY (y) for all possible (x, y) pairs Expectation: Let X and Y be r.v.’s, and g(X, Y ) any function. Then E(g(X, Y )) = XX x g(x, y)PX,Y (x, y). y Conditional Expectation: Let X and Y be r.v.’s, and suppose we know the conditional distribution of X for Y = y, labeled PX|Y (x|y). Then E(X|Y = y) = X xPX|Y (x|y). x Covariance: If E(X) = µX and E(Y ) = µY , Cov(X, Y ) = E[(X − µX )(Y − µY )] = = E(XY ) − µX µY = " XX XX x y x y (x − µX )(y − µY )PX,Y (x, y) # xyPX,Y (x, y) − µX µY If two r.v.’s are independent, their covariance is zero. The converse is not necessarily true. 8 Correlation: ρXY = Cor(X, Y ) = q −1 ≤ ρXY ≤ 1 always. ρXY = ±1 if and only if Y = a + bX Cov(X, Y ) Var(X)Var(Y ) (a,b constants). Plug-in rules: Let X and Y be r.v.’s, and a, b constants. Then E(aX + bY ) = aE(X) + bE(Y ) Var(aX + bY ) = a2 Var(X) + b2 Var(Y ) + 2abCov(X, Y ) Binomial distribution Bernoulli Trials: A sequence of repeated experiments are Bernoulli trials if: 1. The result of each trial is either a success or failure. 2. The probability p of a success is the same for all trials. 3. The trials are independent. If X is the number of successes in n Bernoulli trials, X is a Binomial Random Variable. It has probability function: ! n x p (1 − p)n−x PX (x) = x Where for n x n x counts the number of ways of getting x successes in n trials. The formula is ! n! n = x x!(n − x)! where n! = n × (n − 1) × (n − 2) × ... × 2 × 1. Mean and Variance: E(X) = np, Var(X) = np(1 − p). Continuous Random Variables (Ch. 7) Probability Distributions Probability density function: A function fX (x) of the continuous r.v. X with the following properties: 9 1. fX (x) ≥ 0 for all values of x. 2. P(a ≤ X ≤ b) = the area under fX (x) between a and b, if a < b. 3. The total area under the curve is 1 4. The area under the curve to the left of any value x is FX (x), the probability that X does not exceed x. Cumulative distribution function: Same as before. P(a ≤ X ≤ b) = FX (b) − FX (a) (provided a < b). Expectations, Variances, Covariances, etc. R P Same rules as for discrete r.v.’s. The summation ( ) is replaced by the integral ( ), which is not necessary for this course. Normal Distribution Probability Density function: fX (x) = √ 2 2 1 e−(x − µ) /2σ 2πσ 2 for constants µ and σ such that −∞ < µ < ∞ and 0 < σ < ∞. Mean and Variance: E(X) = µ Var(X) = σ 2 Notation: X ∼ N (µ, σ 2 ) means X is normal with mean µ and variance σ 2 . If Z ∼ N (0, 1) we say it has a standard normal distribution. If X ∼ N (µ, σ 2 ) then Z = (X − µ)/σ ∼ N (0, 1). Thus a−µ b−µ P(a < X < b) = P <Z< σ σ ! = FZ ! b−µ a−µ − FZ σ σ Central Limit Theorem Let X1 , X2 , . . . , Xn be n independent r.v.’s, each with identical distributions, mean µ and variance σ 2 . As n becomes large, X ∼ N (µ, σ 2 /n) n X i=1 Xi = nX ∼ N (nµ, nσ 2 ) 10 Sampling & Sampling distributions Simple random sample: (or random sample) A method of randomly drawing n objects wihich are Independent and Identically Distributed (I.I.D.). Statistic: A function of the sample information. Sampling distribution of a statistic: The probability distribution of the values a statistic can take, over all possible samples of a fixed size n. Sampling distribution of the mean : Suppose X1 , . . . Xn are a random sample 2 from some population with mean µX and variance σX . The sample mean is X= n X Xi . i=1 It has the following properties: 1. E(X) = µX √ 2. It has standard deviation σX = σX / n. 3. If the population distribution is normal, 2 2 ) = N (µX , σX /n). X ∼ N (µX , σX 4. If the population distribution is not normal, but n is large, then (3) is roughly true. Sampling distribution of a proportion : Suppose the r.v. X is the number of successes in a binomial sample of n trials, whose probability of success is p. The sample proportion is p̂ = X/n It has the following properties: 1. E(p̂) = p q 2. It has standard deviation σp̂ = p(1−p) . n 3. If n is large (np(1 − p) > 9 or roughly n ≥ 40), p̂ ∼ N (p, σp̂2 ) = N (p, p(1 − p)/n). 11 Point Estimation Estimator: A random variable that depends on the sample information and whose realizations provide approximations to an unknown population parameter. Estimate: A specific realization of an estimator. Point estimator: An estimator that is a single number. Point estimate: A specific realization of a point estimator. Bias: Let θ̂ be an estimate of the parameter θ. The bias in θ̂ is Bias(θ̂) = E(θ̂) − θ. If the bias is 0, θ̂ is an unbiased estimator. Efficiency: Let θ̂1 and θ̂2 be two estimators of θ, based on the same sample. Then θ1 is more efficient than θ̂2 if Var(θ̂1 ) < Var(θ̂2 ). Interval Estimation Confidence Interval: Let θ be an unknown parameter. Suppose that from sample information, we can find random variables A and B such that P(A < θ < B) = 1 − α. If the observed values are a and b, then (a, b) is a 100(1 − α)% confidence interval for θ. The quantity (1 − α) is called the probability content of the interval. Student’s t distribution: Given a random sample of n observations with mean X and standard deviation s, from a normal population with mean µ, the random variable X −µ T = √ s/ n follows the Student’s t distribution with (n − 1) degrees of freedom. For n > 30, the t distribution is quite close to a N (0, 1) distribution. 12 Data Parameter N (µ, σ 2 ), σ 2 known µ mean µ, σ 2 unknown, n > 30 µ N (µ, σ 2 ), σ 2 unknown µ 100(1 − α)% C.I. σ x ± zα/2 √ n s x ± zα/2 √ n s x ± tn−1,α/2 √ n s p̂(1 − p̂) n Binomial(n, p), np(1 − p) > 9, or roughly n ≥ 40 p p̂ ± zα/2 n matched pairs, difference ∼ N (µX − µY , σ 2 ) µX − µ Y sd d ± tn−1,α/2 √ n 2 independent samples, means µX , µY variances unknown, n > 30 µX − µ Y 2 independent samples, means µX , µY variances unknown 2 independent samples, Binomial(nX , pX ),Binomial(nY , pY ) µX − µ Y pX − p Y x − y ± zα/2 v u 2 u sx t nx x − y ± tn∗ ,α/2 p̂x − p̂y ± zα/2 + v u 2 u sx t nx s2y ny + s2y ny v u u p̂x (1 − p̂x ) t nx + p̂y (1 − p̂y ) ny Notes for the table: 1. All quantities in the C.I. column are either known constants or observed sample quantities. 2. P(Z > zα/2 ) = α/2 for Z ∼ N (0, 1). 3. P(T > tn−1,α/2 ) = α/2 for T ∼ Student’s t with (n − 1) d.f. 4. s, sx , sy , sd are observed sample standard deviations corresponding to xi , xi , yi , di = xi − yi respectively. 5. p̂, p̂x , p̂y are the observed sample proportions corresponding to xi , xi , yi respectively. 6. d is the sample mean corresponding to di = xi − yi . 7. n, nx , ny are the total, x and y sample sizes. 13 ∗ 8. n = s2y s2x + nx ny !2 ," (s2x /nx )2 (s2y /ny )2 + nx − 1 ny − 1 # Also note: The text gives a different formula for comparing two means with small sample sizes. It requires that the two variances be the same, which may not be the case. Unless you’re sure the variances are equal, it’s safer to use the approximation given here (the formula with a n∗ ). If you are sure that the variances are equal, using the book’s formula is ok. Estimating the sample size: If you want an 100(1−α)% interval of ±L (i.e. length 2L), choose n so situation n normal, σ known Bernoulli, worst case 2 zα/2 σ2 n= L2 2 0.25zα/2 n= L2 Hypothesis Testing Null Hypothesis (H0 ): The hypothesis we assume to be true unless there is sufficient evidence to the contrary. Alternative Hypothesis (H1 ): The hypothesis we test the null against. If there is evidence that H0 is false, we accept H1 . Type I Error: Rejecting a true H0 . Type II Error: Not rejecting a false H0 . Significance Level: P(rejectH0 |H0 true) = P(type I error). Power: The probability of rejecting a null hypothesis that is false. Note that this depends on the true value of the parameter. P-value: The smallest significance level at which a null hypothesis can be rejected. This is a measure of how likely the data is, if H0 is true. Notes for the following table (In addition to the comments for CI’s): 1. The first three tests are examples of one (> and < alternatives) and two sided (6= alternative) tests. The remaining tests all have a > alternative, but are easily adaptable to either of the other two alternatives. 14 2. In the last test, p̂0 = nx p̂x + ny p̂y nx + n y 3. In all the tests, we are comparing the unknown parameter (such as µ, p or µX − µY ) to constants (µ0 , p0 , and D0 ). Hypothesis tests with significance level α Data H0 H1 reject H0 if N (µ, σ 2 ), σ 2 known µ = µ0 or µ ≤ µ0 µ > µ0 x − µ0 √ > zα σ/ n same µ = µ0 or µ ≥ µ0 µ < µ0 x − µ0 √ < −zα σ/ n same µ = µ0 µ 6= µ0 mean µ, σ 2 unknown n > 30 µ = µ0 or µ ≤ µ0 µ > µ0 N (µ, σ 2 ), σ 2 unknown n < 30 µ = µ0 or µ ≤ µ0 µ > µ0 Binomial(n, p) n(1 − p) > 9 or roughly n > 40 p = p0 or p ≤ p0 p > p0 n matched pairs, µX − µ Y = D 0 µX − µ Y > D 0 difference ∼ N (µX −µY , σ 2 ) or µX − µ Y ≤ D 0 (Continued on next page) 15 x − µ0 not in √ σ/ n (−zα/2 , zα/2 ) x − µ0 √ > zα s/ n x − µ0 √ > tn−1,α s/ n q p̂ − p0 p0 (1 − p0 )/n > zα d − D0 √ > tn−1,α sd / n Hypothesis tests with significance level α H0 H1 Data 2 independent samples, means µX , µY variances unknown, nx > 30, ny > 30 2 independent normal samples, means µX , µY variances unknown 2 independent samples, Binomial(nx , px ) and Binomial(ny , py ) µX − µ Y = D 0 or µX − µ Y ≤ D 0 µX − µ Y > D 0 µX − µ Y = D 0 or µX − µ Y ≤ D 0 µX − µ Y > D 0 px − p y = 0 or px − p y ≤ 0 px − p y > 0 16 reject H0 if x − y − D0 > zα x−y > tn∗ ,α v u 2 u sx t s2y + nx ny v u 2 u sx t s2y + nx ny v u u tp̂ p̂x − p̂y nx + n y 0 (1 − p̂0 ) nx ny ! > zα