Download Lecture slides - Columbia Statistics

Binomial model (cont’d) Introduction to Statistics Lecture 16 For n random Bernoulli trials, X is the number of successes. If the probability of success is p, q=1-p is the probability of failure, ⎛n⎞ P ( X = k ) = ⎜ ⎟ p k q n −k ⎝k ⎠ "n choose k" Reminder: Quiz 3 next week. ⎛n⎞ n! ⎜ ⎟= !( k k n − k )! ⎝ ⎠ 5! = 5 × 4 × 3 × 2 × 1 Mean: np Standard deviation: np(1 − p ) 0! = 1 Example Normal approximation of binomial distribution A large pool of candies. 30% red. We randomly sampled 10, what is the probability 5 of them are red? Binom(100, 0.3) Binom(10, 0.3) Binom(100, 0.3) Binom(10, 0.3) 25 8 20 What is the probability that fewer than 5 of them are red? Percent of Total Percent of Total 6 15 10 4 2 5 0 0 0 2 4 6 rbinom(50000, 10, 0.3) 8 10 0 20 40 60 80 100 rbinom(50000, 100, 0.3) When n is large, so that (np>10) and (nq>10), the binomial distribution Binom(n, p) can be approximated by N (np, np(1 − p )) Calculation steps Normal approximation of Binomial model Check Bernoulli trials? Success/failure Probability of success Independence Find n—number of trials Find p—probability of success for each trial Write down the distribution Binom(n, p) Check Normal approximation rules: np>10 n(1-p)>10 Find mean and standard deviation of Binom(n, p) Use normal distribution with the same mean and standard deviation to carry out the calculation. What if p is too small that np<10 even when n is large Poisson distribution can be used to approximate the binomial distribution when n is large np is small where λ=np. P( X = k ) = e−λ λk k! 1 Poisson approximation of the binomial distribution Binom(10000, 0.0001) Poisson with rate 1 Why study binomial distribution and its approximations Consider the following scenario: 0.3 0.3 In a large population, 60% of the people support the current mayor. You random sample one individual 0.2 0.2 0.1 A survey of 100 people, X is the number of people that supports the mayor in this survey. 0.0 0.0 0.1 Success: he/she supports the mayor Failure: he/she doesn’t support the mayor Probability of success = 60% ? 0 1 2 3 4 5 0 1 2 3 4 5 Different samples out of a population Why probability models are important? Or why binomial distribution is important? We are starting chapters on statistical inference NOW. Given a simple random sample with n observations of a Bernoulli trial X: the number of success p: probability of success Sample proportion: X pˆ = Sample 2 (a,b,c,d) 1 Green Red 2 Red Green 3 Green Green 4 Red Green c d 2 3 b 4 a Sampling distribution models of the sample proportion Sample proportion Sample 1 (1,2,3,4) 1 n Sample proportion is a random variable since X is a random variable that has Binomial model. Sampling variation: the p-hat calculated based on different random samples from the sample population differs due to chance. Parameters: The sample size, n The probability of success: long-term relative frequency (proportion) of success in the population—population proportion, p We can study the variation of sample proportion using some probability model under some assumptions—sampling distribution model. 2 By the normal approximation … X is approximately normally distributed with General situation … mean np and standard deviation np (1 − p ) Then, pˆ = X is approximately normally distributed with n mean p and standard deviation p (1- p ) . n N: population size n: sample size To use normal approximation of binomial model to study the random behavior of the sample proportion, we need n < N/10 np > 10 AND nq=n(1-p) > 10 Sampling distribution model of sample proportion! Example We just discussed about proportion, what about mean, the average? As historically studied,15% of faculty members leave campus during spring break. For a (simple) random sample of 100 faculty members, what is the chance that more than 20% of them left campus during the past spring break? 100 -1 0 1 0 50 0 -2 2 -0.5 0.0 0.5 -0.2 100 6 8 10 1 2 3 4 5 1.4 4 2.0 2.2 2.4 2.6 -2 0 2 sample mean of 4 obs 4 1.9 2.0 2.1 2.2 150 100 300 50 200 0 100 0 -4 1.8 sample mean of 1000 obs 400 300 200 100 2 1.8 sample mean of 100 obs 0 0 population 1.6 sample mean of 10 obs 0.12 -2 0.2 0 0 0 population -4 0.1 50 50 50 100 0 4 0.0 100 150 200 250 200 150 0.4 0.3 2 -0.1 sample mean of 1000 obs 200 sample mean of 100 obs 300 0.5 sample mean of 10 obs 0.2 density curve 50 100 150 200 250 300 150 300 200 4 0.1 0 density curve 2 2 0.0 1 n 1 X = ∑ X i so µ X = ( µ + ... + µ ) = µ n i =1 n σ2 ⎛1⎞ σ X2 = ⎜ ⎟ (σ 2 + ... + σ 2 ) = n ⎝n⎠ 0 population 0.11 1 -2 0.10 1 50 100 -4 independent, and have the same probability distribution µ X = µ σ X2 = σ 2 0.15 0 X 1 ,… , X n n observations: 0.10 0.05 density curve 0.20 Central limit theorem 0.09 Mean and Variance of Sample mean Let’s see an online demo first. 0.08 -4 -2 0 2 sample mean of 2 obs 4 -3 -2 -1 0 1 2 3 sample mean of 10 obs 3 Sampling distribution model of Sample mean If the population distribution is X ∼ N (µ , n N (µ , σ ) σ n ) If the population distribution is not normal and with mean µ and standard deviation σ X ∼ N (µ , σ In practice Standard error: estimated standard deviation of a sampling distribution. Distinguish the sampling distribution and the distribution of a sample. ) approximately when n is large. How large is large? Reading Chapter 18 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture slides - Columbia Statistics