Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Inductive probability wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Central limit theorem wikipedia , lookup
Law of large numbers wikipedia , lookup
Resampling (statistics) wikipedia , lookup
USC3002 Picturing the World Through Mathematics Wayne Lawton Department of Mathematics S14-04-04, 65162749 [email protected] Theme for Semester I, 2007/08 : The Logic of Evolution, Mathematical Models of Adaptation from Darwin to Dawkins MOTIVATION Probability and Statistics play an increasingly crucial role in evolution research http://www.springer.com/east/home/life+sci/bioinformatics? SGWID=5-10031-22-34952257-0 http://www-stat.stanford.edu/~susan/courses/s366/ http://findarticles.com/p/articles/mi_qa3746/is_199 904/ai_n8829021/pg_16 SOURCE OF LECTURE VUFOILS SP2170 Doing Science Lecture 3: Random Variables, Distributions, Inductive & Abductive Reasoning, Experiments REFERENCES [1] Rudolph Carnap, An Introduction to the Philosophy of Science, Dover, N.Y., 1995. [2] Leong Yu Kang, Living With Mathematics, McGraw Hill, Singapore, 2004. (GEM Textbook) (1 Reasoning, 2 Counting, 3 Graphing, 4 Clocking, 5 Coding, 6 Enciphering, 7 Chancing, 8 Visualizing) MATLAB Demo Random Variables & Distributions Discuss Topics in Chap. 2-4 in [1], Chap. 1, 7 in [2]. Baye’s Theorem & The Envelope Problem, Deductive, Inductive, and Abductive Reasoning. Assign computational tutorial problems. RANDOM VARIABLES The number that faces up on an ‘unloaded’ dice rolled on a flat surface is in the set { 1, 2, 3, 4, 5, 6 } and the probability of each number is equal and hence = 1/6 After rolling a dice, the number is fixed to those who know it but remains an unknown, or random variable to those who do not know it. Even while it is still rolling, a person with a laser sensor connected with a sufficiently powerful computer may be able to predict with some accuracy the number that will come up. This happened and the Casino was not amused ! MATLAB PSEUDORANDOM VARIABLES The MATLAB (software) function rand generates decimal numbers d / 10000 that behaves as if d is a random variable with values in the set {0,1,2,…,9999} with equal probability. It is a pseudorandom variable. It provides an approximation of a random variable x with values in the interval [0,1] of real numbers such that for all 0 < a < b < 1 the probability that x is in the interval [a,b] equals b-a = length of [a,b]. These are called uniformly distributed random variables. PROBABILITY DISTRIBUTIONS Random variables with values in a set of integers are described by discrete distributions Uniform (Dice), Prob(x = k) = 1/6 for k = 1,…,6 Binomial Prob(x = k) = a^k (1-a)^(n-k) n!/(n-k)!k! for k = 0,1,…,n where an event that has probability a occurs k times out of a maximum of n times and k! = 1*2…*(k-1)*k is called k factorial. Poisson Prob(x = k) = a^k exp(-a) / k! for k > -1 where k is the event that k-atoms of radium decay if a is the average number of atoms expected to decay. PROBABILITY DISTRIBUTIONS Random variables with values in a set of real numbers are described by continuous distributions Uniform over the interval [0,1] b Prob( x [a, b]) 1dx b a for 0 a b 1 a Gaussian or Normal b Prob( x [a, b]) exp 2 2 dx here mean 2 and standard deviation, variance 1 a 2 ( x ) 2 MATLAB HELP COMMAND >> help rand RAND Uniformly distributed random numbers. RAND(N) is an N-by-N matrix with random entries, chosen from a uniform distribution on the interval (0.0,1.0). RAND(M,N) is a M-by-N matrix with random entries. >> help hist HIST Histogram. N = HIST(Y) bins the elements of Y into 10 equally spaced containers and returns the number of elements in each container. If Y is a matrix, HIST works down the columns. N = HIST(Y,M), where M is a scalar, uses M bins. MATLAB DEMONSTRATION 1 14 16 14 12 12 10 10 8 8 6 6 4 4 2 0 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 16 15 14 12 10 10 8 6 5 4 2 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 Why do these histograms look different ? 0.8 0.9 1 MATLAB DEMONSTRATION 2 >> x = rand(10000,1); >> hist(x,41) 300 250 200 150 100 50 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MORE MATLAB HELP COMMANDS >> help randn RANDN Normally distributed random numbers. RANDN(N) is an N-by-N matrix with random entries, chosen from a normal distribution with mean zero, variance one and standard deviation one. RANDN(M,N) is a M-by-N matrix with random entries. >> help sum SUM Sum of elements. For vectors, SUM(X) is the sum of the elements of X. For matrices, SUM(X) is a row vector with the sum over each column. 3 1 sum 7 6 4 5 MATLAB DEMONSTRATION 3 >> s = -4:.001:4; >> plot(s,exp(s.^2/2)/(sqrt(2*pi))) >> grid 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 MATLAB DEMONSTRATION 3 >> x = randn(10000,1); >> hist(x,41) 800 700 600 500 400 300 200 100 0 -5 -4 -3 -2 -1 0 1 2 3 4 MATLAB DEMONSTRATION 3 >> x = rand(5000,10000); >> y = sum(x); >> hist(y,41) 800 700 600 500 400 300 200 100 0 2420 2440 2460 2480 2500 2520 2540 2560 2580 CENTRAL LIMIT THEOREM The sum of N real-valued random variables y = x(1) + x(2) + … + x(N) will be a random variable. If the x(j) are independent and have the same distribution then as N increases the distributions of y will approach (means gets closer and closer to) a Gaussian distribution. The mean of this Gaussian distribution = N times the (common) mean of the x(j) The variance of this Gaussian distribution = N times the (common) variance of the x(j) CONDITIONAL PROBABILITY Recall that on my dice the ‘numbers’ 1 and 4 are red and the numbers 2, 3, 5, 6 are blue. I roll one dice without letting you see how it rolls. What is the probability that I rolled a 4 ? I repeat the procedure BUT tell you that the number is red. What is the probability that I rolled a 4 ? This probability is called the conditional probability that x = 4 given that x is red (i.e. x in {1,4}) Prob( A | B) Prob of event A given event B CONDITIONAL PROBABILITY If A and B are two events then A B denotes the event that BOTH event A and event B happen. Common sense implies the following LAW: Prob( A B) Prob( B) Prob( A | B) Example Consider the roll of a dice. Let A be the event x = 4 and let B be the event x is red (= 1 or 4) Prob( A B) Pr ob( A) 1 / 6 Pr ob( B) 1 / 3, Prob( A | B) 1 / 2 Question What does the LAW say here ? BAYE’s THEOREM http://en.wikipedia.org/wiki/Bayes'_theorem for an event A, A c denotes the event not A Question Why does Prob(A) Prob(A ) 1 ? c Prob(A) and Prob(B) are called marginal distributions. Question Why does Prob( B) Prob(B | A) Prob(A) Prob(B | Ac ) Prob(A c ) Question Why does Prob(B | A) Prob(A) Prob( B) Prob(B) INDUCTIVE & ABDUCTIVE REASONING http://en.wikipedia.org/wiki/Inductive_reasoning Inductive reasoning is the process of reasoning in which the premises of an argument support the conclusion but do not ensure it. This is in contrast to Deductive reasoning in which the conclusion is necessitated by, or reached from, previously known facts. http://en.wikipedia.org/wiki/Abductive_reasoning Abductive reasoning, is the process of reasoning to the best explanations. In other words, it is the reasoning process that starts from a set of facts and derives their most likely explanations. The philosopher Charles Peirce introduced abduction into modern logic. In his works before 1900, he mostly uses the term to mean the use of a known rule to explain an observation, e.g., “if it rains the grass is wet” is a known rule used to explain that the grass is wet. He later used the term to mean creating new rules to explain new observations, emphasizing that abduction is the only logical process that actually creates anything new. Namely, he described the process of science as a combination of abduction, deduction and implication, stressing that new knowledge is only created by abduction. EXPERIMENTS http://www.holah.karoo.net/experimental_method.htm Carnap p. 41 [1] “One of the great distinguishing features of modern science, as compared to the science of earlier periods, is its emphasis on what is called the “experimental method”. “ Question How does the experimental method differ from the method of observation ? Question What fields favor the experimental methods and what fields do not and why ? Ideal Gas Law - one of the greatest experiments ! TUTORIAL QUESTIONS Question 1. The uniform distribution on [0,1] has mean ½ and variance 1/12. Use the Central Limit Theorem to compute the mean and variance of the random variable y whose histogram is shown in vufoil # 13. Question 2. I roll a dice to get a random variable x in {1,2,3,4,5,6}, then put x dollars in one envelope and put 2x in another envelope then flip a coin to decide which envelope to give you (so that you receive the smaller or larger amount with equal probability). Use Baye’s Theorem to compute the probability that you received the smaller amount CONDITIONED on YOUR FINDING THAT YOU HAVE 1,2,3,4,5,6,8,10,12 dollars. Then use these conditional probabilities to explain the Envelope Paradox. SOURCE OF LECTURE VUFOILS USC2170 Lecture 4: Hypothesis Testing PLAN FOR LECTURE 1. Populations and Samples 2. Sample Population Statistics 3. Statistical Hypothesis 4. Test Statistics for Gaussian Hypotheses Sample Mean for Parameter Estimation z-Test and t-Test Statistics Rejection/Critical Region for z-Test Statistic Hypothesis Test for Mean Height 5. General Hypotheses Tests Type I and Type II Errrors Null and Alternative Hypotheses 6. Assign Tutorial Problems POPULATIONS AND SAMPLES Population - a specified collection of quantities: e.g. heights of males in a country, glucose levels of a collection of blood samples, batch yields of an industrial compound for a chemical plant over a specified time with and without the use of a catalyst Sample Population – a population from which samples are taken to be used for statistical inference Sample - the subset of the sample population consisting of the samples that are taken. SAMPLE POPULATION PARAMETERS Sample X 1 ,..., X n Sample Parameters Sample Size Sample Mean n n [ X 1 X 2 X n ] / n Sample Variance [( X 1 n ) ( X n n ) ] / n 2 n 2 2 Sample Standard Deviation n 2 SAMPLE POPULATION PARAMETERS Theorem 1 The variance of a population is related to its mean and average squared values by [X X X ]/ n 2 n 2 1 2 2 2 n Proof Since ( X k n ) X 2 n X k 2 2 k 2 n n ( X 1 n ) ( X n n ) 2 2 2 X 1 X n 2 n ( X 1 X n ) n n 2 2 2 2 X 1 X n 2n n n n Why ? 2 2 2 X 1 X n n n 2 n 2 Question How can the proof be completed ? 2 2 n STATISTICAL HYPOTHESES are assertions about a population that describe some statistical properties of the population. Typically, statistical hypotheses assert that a population consists of independent samples of a random variable that has a certain type of distribution and some of the parameters that describe this distribution may be specified. For Gaussian distributions there are four possibilities: Neither the mean nor the variance is specified. Only the variance is specified. Only the mean is specified. Both the mean and the variance are specified. TEST STATISTICS for Hypothesis with Gaussian Distributions The sample mean for unknown, known n [ X 1 X 2 X n ] / n is Gaussian with mean 0 and variance 1/n. Proof (Outline) We let < Y > denote the mean of a random variable Y. Then clearly n [ X 1 X 2 X n ] / n Independence and Theorem 1 gives variance ( n ) ( n ) [( X 1 ) ( X n )] / n 1 n n 2 ( X )( X ) / n. i j 2 i 1 j 1 n 2 2 2 PARAMETER ESTIMATION for Hypothesis with Gaussian Distributions The sample mean for unknown, known n [ X 1 X 2 X n ] / n since the estimate error n n is unbiased 0 n can be used to estimate the mean and converges in the statistical sense that standard deviation n / n 0 MORE TEST STATISTICS for Hypothesis with Gaussian Distributions , known zn (n ) /( / n ) The One Sample z-Test for is a Gaussian random variable with mean 0,variance 1. known, unknown tn (n ) /( n / n ) The One Sample t-Test for is a t-distributed random variable with n-1 degrees of freedom. z-TEST STATISTIC ALPHAS z 0 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000 1.4000 1.6000 1.8000 2.0000 2.2000 2.4000 2.6000 2.8000 3.0000 (z ) 0.5000 0.4207 0.3446 0.2743 0.2119 0.1587 0.1151 0.0808 0.0548 0.0359 0.0228 0.0139 0.0082 0.0047 0.0026 0.0013 z ( ) 0.0500 0.0400 0.0300 0.0200 0.0100 0.0050 0.0040 0.0030 0.0020 0.0010 0.0005 0.0004 0.0003 0.0002 0.0001 0.0001 1.6449 1.7507 1.8808 2.0537 2.3263 2.5758 2.6521 2.7478 2.8782 3.0902 3.2905 3.3528 3.4316 3.5401 3.7190 3.8906 p ( x) d x z p( x) e x2 / 2 2 CRITICAL REGION FOR alpha=0.05 HEIGHT HISTOGRAMS HYPOTHESIS TEST FOR MEAN HEIGHT You suspect that the height of males in a country has increased due to diet or a Martian conspiracy, you aim to support your Alternative Hypothesis by testing the 6.509 cm 20 177.115 cm Null Hypothesis 174.204 cm You compute a sample mean using 20 samples then compute z20 (20 174.204 cm) / (1.4555 cm) 2.000 If the Null Hypothesis is true the probability that z20 2.000 is (2) .0228 Question Should the Null Hypothesis be rejected ? GENERAL HYPOTHESES TESTS involve Type I Error: prob rejecting null hypothesis if its true, also called the significance level Type II Error: prob failing to reject null hypothesis if its false, 1 also called the power of a test, requires an Alternative Hypothesis that determines the distribution of the test statistic. and more complicated test statistics, such as the One Sample t-Test statistic, whose distribution is determined even though the distributions of the Gaussian random samples, used to compute it, is not. TUTORIAL QUESTIONS 1. Compute the power of a hypothesis test whose null hypothesis is that in vufoil #13, the alternative hypothesis asserts that heights are normally distributed with mean 3.386 cm standard deviation where and are the same as for the null hypothesis and 20 samples are used and the significance .05 Suggestion: if the alternative hypothesis is true, what is the distribution of test statistic z20 (20 ) /( / 20 ) What is the probability that z20 z ( ) ? 2. Use a t-statistic table to describe how to test the null hypothesis that heights are normal with mean and unknown variance based on 20 samples. EXTRA TOPIC: CONFIDENCE INTERVALS Given a sample mean n for large n we can assume, by the central limit theorem that it is Gaussian with mean mean of the original population and 2 1 1 variance n n variance of the original population. 2 2 Furthermore, n sample variance and if the population is {0,1}-valued 2 (1 ) b We say that [ a, b] with confidence c p( x)dx a where p(x) is the probability density of a Gaussian with mean n and standard deviation / n n / n Theorem If is a random variable unif. on [-L,L] then Bayes Theorem c lim p( | n ) L EXTRA TOPIC: TWO SAMPLE TESTS A null hypothesis may assert a that two populations have the same means, a special case for {0,1}-valued populations asserts equalily of population proportions. Under these assumptions and if the variances of both populations are known, hypothesis testing uses the Two-Sample z-Test Statistic z ~ ( n n~ ) 2 n ~ 2 n~ where n , , n is the sample mean, variance, and sample size for one population, tilde’s for the other. For unkown variances and other cases consult: 2 http://en.wikipedia.org/wiki/Statistical_hypothesis_testing EXTRA TOPIC: CHI-SQUARED TESTS are used to determine goodness-or-fit for various distributions. They employ test statistics of the form d 2 i 1 (obsi exp i )2 / exp i where obs i are independent observations & null hyp. expected value obs i exp i and chi-squared distrib. with d-1 degrees of freedom. Example [1,p.216] A geneticist claims that four species of fruit flies should appear in the ratio 1:3:3:9. Suppose that the sample of 4000 flies contained 226, 764, 733, and 2277 flies of each species, respectively. For alpha = .1, is there sufficient evidence to reject the geneticist’s claim ? Answer: The expected values are 250, 750, 750, 2250 2 2 hence (226 250) / 250 3.27 2 NO since 3 deg. freed. & alpha = .1 6.251 EXTRA TOPIC: POISSON APPROXIMATION n! k nk The Binomial Distribution B(k ) a (1 a) k!(n k )! is the probability that k-events happen in n-trials if a prob. that an event happens in one trial It has mean na and variance 2 na(1 a) If a 1 and k n then B(0) (1 n ) n e The right side is the n! B(1) a (1 n ) n e Poisson Distribution (n 1)! k n! nk k B( k ) a (1 n ) e k!(n k )! k! REFERENCES 1. Martin Sternstein, Statistics, Barrows College Review Series, New York, 1996. Survey textbook covers probability distributions, hypotheses tests, populations,samples, chi-squared analysis, regression. 2. E. L. Lehmann, Testing Statistical Hypotheses, New York, 1959. Detailed development of the Neyman-Pearson theory of hypotheses testing. 3. J.Neyman and E.S. Pearson, Joint Statistical Papers, Cambridge University Press, 1967. Source materials. 4. Jan von Plato, Creating Modern Probability, Cambridge University Press, 1994. Charts the history and development of modern probability theory.