Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ismor Fischer, 5/29/2012 2.4 2.4-1 Summary (Compare with first page of §2.3.) Distribution of X POPULATION X discrete Random Variable X, numerical X continuous Parameters ♦ Mean µ (“mu”) σ ♦ Variance σ ♦ Standard Deviation σ (“sigma”) 2 µ Statistical Inference SAMPLE, size n Relative frequency of xi Estimators µ̂ and σˆ can be calculated via the following statistics: ♦ Mean x s ♦ Variance s2 ♦ Standard Deviation s xi x Density Histogram Comments: The population mean µ and variance σ 2 are defined in terms of expected value: µ = E[X] = Σ x f(x), all x σ 2 = E[(X − µ)2] = Σ (x − µ) 2 f(x) all x if X is discrete (with corresponding “integration formulas” if X is continuous), where f(x) is the probability of value x occurring in the population, i.e., P(X = x). Later… If n is used instead of n − 1 in the denominator of s2, the expected value is always less than σ 2. Consistent under- (or over-) estimation of a parameter by a statistic is called bias. The formulas given for the sample mean and variance are unbiased estimators. Ismor Fischer, 5/29/2012 2.4-2 Chebyshev’s Inequality Whatever the shape of the distribution, at least 75% of the values lie within ±2 standard deviations of the mean, at least 89% lie within ±3 standard deviations, etc. 1 More generally, at least 1 − 2 × 100% of the values lie within ± k k standard deviations of the mean. (Note that k > 1, but it need not be an integer!) Pafnuty Chebyshev (1821-1894) σ µ − 3σ µ − 2σ µ −1σ µ µ + 1σ µ + 2σ µ + 3σ ≥ 75% ≥ 89% Exercise: Suppose that a population of individuals has a mean age of µ = 40 years, and standard deviation of σ = 10 years. At least how much of the population is between 20 and 60 years old? Between 15 and 65 years old? What symmetric age interval about the mean is guaranteed to contain at least half the population? Note: If the distribution is bell-shaped, then approximately 68% lie within ±1σ, approximately 95% lie within ±2σ, approximately 99.7% lie within ±3σ. For other multiples of σ, percentages can be obtained via software or tables. Much sharper than Chebyshev’s general result, which can be overly conservative, this can be used to check if a distribution is reasonably bell-shaped for use in subsequent testing procedures. (Later...)