Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
USC3002 Picturing the World Through Mathematics Wayne Lawton Department of Mathematics S14-04-04, 65162749 [email protected] Theme for Semester I, 2008/09 : The Logic of Evolution, Mathematical Models of Adaptation from Darwin to Dawkins PLAN FOR LECTURE 1. Populations and Samples 2. Sample Population Statistics 3. Statistical Hypothesis 4. Test Statistics for Gaussian Hypotheses Sample Mean for Parameter Estimation z-Test and t-Test Statistics Rejection/Critical Region for z-Test Statistic Hypothesis Test for Mean Height 5. General Hypotheses Tests Type I and Type II Errrors Null and Alternative Hypotheses 6. Assign Tutorial Problems POPULATIONS AND SAMPLES Population - a specified collection of quantities: e.g. heights of males in a country, glucose levels of a collection of blood samples, batch yields of an industrial compound for a chemical plant over a specified time with and without the use of a catalyst Sample Population – a population from which samples are taken to be used for statistical inference Sample - the subset of the sample population consisting of the samples that are taken. SAMPLE POPULATION PARAMETERS Sample X 1 ,..., X n Sample Parameters Sample Size Sample Mean n n [ X 1 X 2 X n ] / n Sample Variance [( X 1 n ) ( X n n ) ] / n 2 n 2 Sample Standard Deviation n 2 2 n SAMPLE POPULATION PARAMETERS Theorem 1 The variance of a population is related to its mean and average squared values by [X X X ]/ n 2 n 2 1 2 2 2 n Proof Since ( X k n ) X 2 n X k 2 2 k 2 n n ( X 1 n ) ( X n n ) 2 2 2 X 1 X n 2 n ( X 1 X n ) n n 2 2 2 2 X 1 X n 2n n n n Why ? 2 2 2 X 1 X n n n 2 n 2 Question How can the proof be completed ? 2 2 n STATISTICAL HYPOTHESES are assertions about a population that describe some statistical properties of the population. Typically, statistical hypotheses assert that a population consists of independent samples of a random variable that has a certain type of distribution and some of the parameters that describe this distribution may be specified. For Gaussian distributions there are four possibilities: Neither the mean nor the variance is specified. Only the variance is specified. Only the mean is specified. Both the mean and the variance are specified. TEST STATISTICS for Hypothesis with Gaussian Distributions The sample mean for unknown, known n [ X 1 X 2 X n ] / n is Gaussian with mean 0 and variance 1/n. Proof (Outline) We let < Y > denote the mean of a random variable Y. Then clearly n [ X 1 X 2 X n ] / n Independence and Theorem 1 gives variance ( n ) 2 2 2 ( n ) [( X 1 ) ( X n )] / n n 2 n ( X )( X ) / n . i j j 1 n i 1 where variance ( X i ), i 1,..., N 2 2 PARAMETER ESTIMATION for Hypothesis with Gaussian Distributions The sample mean for unknown, known n [ X 1 X 2 X n ] / n since the estimate error n n is unbiased 0 n can be used to estimate the mean and converges in the statistical sense that standard deviation n / n 0 as n MORE TEST STATISTICS for Hypothesis with Gaussian Distributions , known zn (n ) /( / n ) The One Sample z-Test for is a Gaussian random variable with mean 0,variance 1. known, unknown tn (n ) /( n / n ) The One Sample t-Test for is a t-distributed random variable with n-1 degrees of freedom. z-TEST STATISTIC ALPHAS z 0 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000 1.4000 1.6000 1.8000 2.0000 2.2000 2.4000 2.6000 2.8000 3.0000 (z ) 0.5000 0.4207 0.3446 0.2743 0.2119 0.1587 0.1151 0.0808 0.0548 0.0359 0.0228 0.0139 0.0082 0.0047 0.0026 0.0013 z ( ) 0.0500 0.0400 0.0300 0.0200 0.0100 0.0050 0.0040 0.0030 0.0020 0.0010 0.0005 0.0004 0.0003 0.0002 0.0001 0.0001 1.6449 1.7507 1.8808 2.0537 2.3263 2.5758 2.6521 2.7478 2.8782 3.0902 3.2905 3.3528 3.4316 3.5401 3.7190 3.8906 p ( x) d x z p( x) e x2 / 2 2 CRITICAL REGION FOR alpha=0.05 HEIGHT HISTOGRAMS HYPOTHESIS TEST FOR MEAN HEIGHT You suspect that the height of males in a country has increased due to diet or a Martian conspiracy, you aim to support your Alternative Hypothesis by testing the 6.509 cm 20 177.115 cm Null Hypothesis 174.204 cm You compute a sample mean using 20 samples then compute z20 (20 174.204 cm) / (1.4555 cm) 2.000 If the Null Hypothesis is true the probability that z20 2.000 is (2) .0228 Question Should the Null Hypothesis be rejected ? GENERAL HYPOTHESES TESTS involve Type I Error: prob rejecting null hypothesis if its true, also called the significance level Type II Error: prob failing to reject null hypothesis if its false, 1 also called the power of a test, requires an Alternative Hypothesis that determines the distribution of the test statistic. and more complicated test statistics, such as the One Sample t-Test statistic, whose distribution is determined even though the distributions of the Gaussian random samples, used to compute it, is not. Homework 5. Due Monday 20.10.08 1. Compute the power of a hypothesis test whose null hypothesis is that in vufoil #13, the alternative hypothesis asserts that heights are normally distributed with mean 3.386 cm standard deviation where and are the same as for the null hypothesis and 20 samples are used and the significance .05 Suggestion: if the alternative hypothesis is true, what is the distribution of test statistic z20 (20 ) /( / 20 ) What is the probability that z20 z ( ) ? 2. Use a t-statistic table to describe how to test the null hypothesis that heights are normal with mean and unknown variance based on 20 samples. EXTRA TOPIC: CONFIDENCE INTERVALS Given a sample mean n for large n we can assume, by the central limit theorem that it is Gaussian with mean mean of the original population and 2 1 1 variance n n variance of the original population. 2 2 Furthermore, n sample variance and if the population is {0,1}-valued 2 (1 ) b We say that [ a, b] with confidence c p( x)dx a where p(x) is the probability density of a Gaussian with mean n and standard deviation / n n / n Theorem If is a random variable unif. on [-L,L] then Bayes Theorem c lim p( | n ) L EXTRA TOPIC: TWO SAMPLE TESTS A null hypothesis may assert a that two populations have the same means, a special case for {0,1}-valued populations asserts equalily of population proportions. Under these assumptions and if the variances of both populations are known, hypothesis testing uses the Two-Sample z-Test Statistic z ~ ( n n~ ) 2 n ~ 2 n~ where n , , n is the sample mean, variance, and sample size for one population, tilde’s for the other. For unkown variances and other cases consult: 2 http://en.wikipedia.org/wiki/Statistical_hypothesis_testing EXTRA TOPIC: CHI-SQUARED TESTS are used to determine goodness-or-fit for various distributions. They employ test statistics of the form d 2 i 1 (obsi exp i )2 / exp i where obs i are independent observations & null hyp. expected value obs i exp i and chi-squared distrib. with d-1 degrees of freedom. Example [1,p.216] A geneticist claims that four species of fruit flies should appear in the ratio 1:3:3:9. Suppose that the sample of 4000 flies contained 226, 764, 733, and 2277 flies of each species, respectively. For alpha = .1, is there sufficient evidence to reject the geneticist’s claim ? Answer: The expected values are 250, 750, 750, 2250 2 2 hence (226 250) / 250 3.27 2 NO since 3 deg. freed. & alpha = .1 6.251 EXTRA TOPIC: POISSON APPROXIMATION n! k nk The Binomial Distribution B(k ) a (1 a) k!(n k )! is the probability that k-events happen in n-trials if a prob. that an event happens in one trial It has mean na and variance 2 na(1 a) If a 1 and k n then B(0) (1 n ) n e The right side is the n! B(1) a (1 n ) n e Poisson Distribution (n 1)! k n! nk k B( k ) a (1 n ) e k!(n k )! k! REFERENCES 1. Martin Sternstein, Statistics, Barrows College Review Series, New York, 1996. Survey textbook covers probability distributions, hypotheses tests, populations,samples, chi-squared analysis, regression. 2. E. L. Lehmann, Testing Statistical Hypotheses, New York, 1959. Detailed development of the Neyman-Pearson theory of hypotheses testing. 3. J.Neyman and E.S. Pearson, Joint Statistical Papers, Cambridge University Press, 1967. Source materials. 4. Jan von Plato, Creating Modern Probability, Cambridge University Press, 1994. Charts the history and development of modern probability theory.