* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download chapter5
Survey
Document related concepts
Transcript
Chapter 5 Contents I The Concept of a Sampling Distribution I Properties of Sampling Distributions: Unbiasedness and Minimum Variance I The Sample Distribution of the Sample Mean and the Central Limit Theorem I The Sampling Distribution of the Sample Proportion Models for the population of a variable I We have two equivalent ways of describing a population I Through relative frequency I Through the probability distribution of a sample drawn at random. This is called the Probability Model of the population Table: Population proportions values 0 2 3 4 proportion .1 .4 .3 .2 I Suppose we draw many, say 1000 samples from the population I At the end look at the proportion of 4,3,2,0 in the 1000 samples I What would these proportions be? I The proportions would be close to the population proportions simulate1sample1 simulate1sample2 simulate1samplenormal I A variable of interest in a population has a distribution, described by a histogram. The population mean is µ and s.d σ I The probability distribution of a sample X from the population is another description of the population E(X ) = µ and σX = σ I If a population is constructed by taking one sample, many times, then, this population will have a distribution close to the original population, and mean and s.d close to µ and σ Table: Population proportions values 3 4 5 6 7 proportion 3 16 1 4 1 8 1 4 3 16 √ I µ = 5, σ = 2 I If we draw a sample 62 % of these will be within µ ± 1 I Suppose that we draw two independent samples and take their average X̄ I What percentage of these will be within one µ ± 1 ? I To answer this, we need to calculate the distribution of the sample I The possible values of the sum are: 6,7,8,9,10,11,12,13,14 I Prob of getting 6 = P(3,3,) = I Prob of getting 7 = P(3,4)+ P(4,3) = I Prob of getting 8 = P(3,5)+ P(4,4)+P(5,3) = I and so on 3 3 16 16 3 1 16 4 + 3 1 16 4 3 1 16 8 + 11 44 + 3 1 16 8 I Tedious computation shows µX̄ = 5, σX̄ = 1 I X̄ still has the original µ = 5 as center I The s.d has decreased I What if we take 25 samples and compute the mean X̄ ? I Too tedious to calculate so we will simulate to get an approximation I Simulation: take 25 samples compute mean. Repeat this process many times and create a population of X̄ s. Find the mean,s.d of this population. simulate 25 samples I Suppose we have a population which is N(5, 2). Then roughly 68% of the population will be within 5 ± 2. I So if we take many samples, roughly 68% of these samples will lie within one s.d. of the mean I Suppose we take four samples, calculate their average and repeat this process many times. I we now have a population of averages of four samples from the population. I Question: What percentage of this will be within 5 ± 2? simulate 4 samples I For X̄ the mean is (close to) 5 and s.d is approximately 1. I So 95% of the sample means would be within 5 ± 2 simulate 25samples I For X̄ the mean is (close to) 5 and s.d is approximately .4. I So 99.5% of the sample means would be within 5 ± 2 Sampling Distribution of mean I From a normal population draw n independent samples. Let X̄ be the average of the n samples I Since the samples are random, so is X̄ . Hence X̄ is a random variable I The distribution of X̄ is called the Sampling Distribution of X̄ Statistic and parameters I Quantities calculated for the whole population are called parameters. I Compute the average,median , standard deviation, percentiles of a population I we generally denote the population mean byµ, s.d byσ etc I Typically these population quantities are not known and we use the analogous quantities computed from the sample to estimate these Statistic and parameters I quantities calculated from the sample are called ‘ statistic‘ I Examples, mean of a sample, median of a sample, s.d of a sample I Not surprisingly, the sample mean is used as an estimate of the population mean, and the sample s.d is used as an estimate of population s.d I These estimates satisfy some nice properties, unbiasedness and so on.We will not get into it Properties of the Sampling Distribution of x 1. Mean of the sampling distribution equals mean of sampled population , that is, x E x . 2. Standard deviation of the sampling distribution equals Standard deviation of sampled population Square root of sample size That is, x n . Standard Error of the Mean The standard deviation isx often referred to as the standard error of the mean. I If a random sample of n observations is selected from a population with a normal distribution, the sampling distribution of X̄ will be a normal distribution I If the population is N(µ, σ), then X̄ will be N(µ, √σn ) Central Limit theorem Consider a random sample of n observations selected from a population (any probability distribution) with meanµand standard deviation σ. Then, when n is sufficiently large, the sampling distribution of X̄ will be approximately a normal distribution with mean µ and standard deviation √σ . n The larger the sample size, the better will be the normal approximation to the sampling distribution of X̄ Problems 21,24,29,68,74 problem 21 I n = 100, µ = 30, σ = 16 I µX̄ = 30, σX̄ = I P(X̄ > 28) = P(z > I P(22.1 < X̄ < 26.8) = P( 22.1−30 <Z < 1.6 I = P(−4.9 < Z < −2) = 0.228 I Part c and d are similar √16 100 = 1.6 28−30 1.6 ) = P(Z > −1.25) = 0.8944 26.8−30 1.6 ) Problem 24 I µ = 96850 σ = 30, 000 n = 50 I µX̄ = 96850 σX̄ = I Approximately Normal with mean = 96850, s.d =4242.64 I z score of x̄ = 89500 is I P(X̄ > 89500) = P(Z > −1.73) = .9582 30000 √ 50 = 4242.64 89500−96850 4242.64 = −1.73 Problem 29 I µ = .53 σ = .193 n = 50 I µX̄ = .53 σX̄ = I Approximately Normal with mean 53 and s.d .0273 I P(X̄ > .58) = P(Z > I The z value of .59, before tensioning is .193 √ 50 after tensioning is I = .0273 .58−.53 .0273 ) .59−.58 .0273 = P(Z > 1.832) = .0335 .59−.53 .0273 = 2.2 and = 0.37 the z-value before tensioning is much larger so the sample came after tensioning Problem 68 I n = 344 x̄ = 19.1 σ=6 I Approximately normal with s.d = I µ = 18.5.P(X̄ > 19.1) = P(Z > I If µ = 19.5P(X̄ > 19.1) = P(Z > I µ = 19.1 I Less than 19.1. If not, P(X̄ > 19.1) would be more than .5 √6 344 = .3235 19.1−18.5 .3235 ) = .0322 19.1−19.5 .3235 ) = .8925 Problem 74 I (a) µ = 157, σ = 3 n = 40 µX̄ = 157 σX̄ = √3 40 = .74 X̄ = 157 − 1.3 = 155.7 −1.3 .74 ) I P(X̄ < 155.7) = P(Z < I (b) µ = 156, σ = 3, more likely. If mean is 158, less likely I (c) µ = 157, σ = 2, less likely. If σ = 6, more likely = .0031 I Let X be the number of successes in n independent trials with p - the probability of success in each trial I We know that X is Bin(n,p) I X n is the proportion of success in n trials I X n is usually denoted by p̂. This is because p̂ serves as an estimate of p Sampling distribution of sample proportion I µp̂ = E(p̂) = p This follows from E(X ) = np p(1−p) n I V (p̂) = I Again follows from V (X ) = np(1 − p) for a binomial(n,p) q s.d(p̂) = p(1−p) . We write this as n r σp̂ = p(1 − p) n Sampling distribution of sample proportion I The exact sampling distribution of p̂ is quite messy, especially if n is large I The normal approximation to binomial helps us to approximate the sampling distribution by a normal I If n is large, the sampling distribution of p̂ is approximately Normal with r µ = µp̂ = p and σ = σp̂ = p(1 − p) n problems 39,43,46,69 Problem 39 I X is Binomial with n =250, p =.85 I X̂ = I σX̂ = I So p̂ is approximately normal with mean =.85 and s.d = X n q E(p̂) = p = .85 p(1−p) n = q .85∗.15 250 = .0226 .0226 I P(p̂ < .9) = P(Z < .9−.85 .0226 ) = .9864 Problem 43 I X is Binomial with n = 1000, p =.67 I X̂ = I σX̂ = I So p̂ is approximately normal with mean =.67 and s.d = X n q E(p̂) = p = .67 p(1−p) n = q .67∗.33 1000 = .0149 .0149 I P(p̂ < .75) = P(Z < I P(p̂ > .5) = P(Z > .75−.67 .0149 ) .5−.67 .0149 ) =1 =1 Problem 46 I For High IQ, p = .44; Average IQ, p = .26; Low IQ , p = .14 I X is Bin( 500,.44), Find P(X > 150) I Two ways of doing this I Normal approximation to Binomial: P(Z > 150.5−500∗.44 √ ) 500∗.44∗.56 =1 150 500 ) = P(Z > √(150/500)−.44 ) I P(X > 150) = P(p̂ > I This is same as the normal approximation but without the continuity correction (.44∗.53)/500) Problem 69 I X is Binomial with n = 250, p =.2 I X̂ = X n q E(p̂) = p = .2 q p(1−p) .2∗.8 = n 250 = .0253 I σX̂ = I E(p̂ ± 2σp̂ ) = .2 ± (2 ∗ .0253) = (.1494, .2596) I So p̂ is approx. normal(.2 ,.0253) I P(.1494 < p̂ < .2596) = P( .1494−.2 .0253 < Z < I So roughly 95% of the samples would have p̂ that fall in the interval (.1494,.2596) .75−.67 .0149 ) = .954