Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MBACATÓLICA Quantitative Methods Miguel Gouveia Manuel Leite Monteiro Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACatólica 2006/07 Métodos Quantitativos 7-2 Problem ! a) b) A soft-drink vending machine is set so the amount of drink dispensed is a random variable with a mean of 200 milliliters and a standard deviation of 15 milliliters. What is the probability that the average amount dispensed in a random sample of 36 is at least 204 milliliters: if the the random variable is normally distributed? if the distribution is unknown? MBACatólica 2006/07 Métodos Quantitativos 7-3 Distribution of the sample mean ! The sample mean (computed from n observations drawn from a population) is a random variable. ! Our objective is to study the distribution of that variable and to see how it is related to the distribution of the population from which the sample was drawn. MBACatólica 2006/07 Métodos Quantitativos 7-4 Distribution of the sample mean Example: samples (with replacement) of size n=2 from a population with four values: 1, 2, 3, 4. (µ=2.5 e σ 2 =1.25) ! Possible samples : 16 Sample means ! 1,1 2,1 3,1 4,1 1,2 2,2 3,2 4,2 1,3 2,3 3,3 4,3 MBACatólica 2006/07 1,4 2,4 3,4 4,4 1.0 1.5 2.0 2.5 1.5 2.0 2.5 3.0 2.0 2.5 3.0 3.5 2.5 3.0 3.5 4.0 Métodos Quantitativos 7-5 Distribution of the sample mean Sample Mean 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Total MBACatólica 2006/07 Nº of samples 1 2 3 4 3 2 1 16 Métodos Quantitativos Probability 1/16 2/16 3/16 4/16 3/16 2/16 1/16 1 7-6 Distribution of the sample mean Distribution of the sample mean Distribution of the population f (x) f ( x) 0.3 0.3 0.2 0.2 0.1 0.1 0 1 2 MBACatólica 2006/07 3 4 x 0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Métodos Quantitativos 7-7 Distribution of the sample mean E X = ∑ x. f (x ) = 2.5 = µ ! The mean of the sample mean’s distribution is the mean of the population. ! Concepts of mean being used: Expected value (parameter of the mean's distribution) Random variable Parameter (parameter of the universe) MBACatólica 2006/07 Métodos Quantitativos 7-8 x Distribution of the sample mean V X = ∑ ( x − µ ) . f ( x ) = 0.625 2 V X = σ 2 n = 1.25 / 2 σ =σx ! The standard deviation of the sample mean is: ! As the sample size (n) increases, the standard deviation of the mean decreases. As the standard deviation (σ) decreases, the standard deviation of the mean also decreases. ! MBACatólica 2006/07 n Métodos Quantitativos 7-9 Distribution of the sample mean Population: N = 4 µ = 2.5 Sample mean (n = 2) E[ X ] = 2.5 σ 2 = 1.25 f ( x) f (x) .3 .3 .2 .2 .1 .1 0 1 2 MBACatólica 2006/07 3 4 x 0 V [ X ] = 0.625 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Métodos Quantitativos 7-10 x Distribution of the sample mean X + X 2 + ... + X n E X = E 1 n µ + µ + ... + µ n µ = = =µ n n X + X 2 + ... + X n V X = V 1 n σ 2 + σ 2 + ... + σ 2 nσ 2 σ 2 = = 2 = n2 n n MBACatólica 2006/07 Métodos Quantitativos 7-11 Distribution of the sample mean for Normal Populations ! The linear combination of independent normal random variables is itself a normal random variable. ! Application: n σ If X ~ N (µ, σ) then X =∑ X i f i ~ N µ , n i =1 ! X×Y e X/Y do not have a normal distribution MBACatólica 2006/07 Métodos Quantitativos 7-12 Problem ! a) b) A soft-drink vending machine is set so the amount of drink dispensed is a random variable with a mean of 200 milliliters and a standard deviation of 15 milliliters. What is the probability that the average amount dispensed in a random sample of 36 is at least 204 milliliters: if the the random variable is normally distributed? if the distribution is unknown? MBACatólica 2006/07 Métodos Quantitativos 7-13 Solution ! a) ! X: quantity of the soft-drink dispensed, with µ=200 and σ=15. Sample size: n=36 if X ~ N ( 200,15 2 ) ⇒ 15 2 X ~ N 200, 36 probability that the average amount is at least 204: X − µ 204 − 200 P X ≥ 204 =P ≥ n 15 36 σ =P [ Z ≥ 1.6 ] =1-0.9452=5.48% and if the distribution was unknown? MBACatólica 2006/07 Métodos Quantitativos 7-14 Central Limit Theorem ! ! ! The distribution of a random variable obtained from the sum (mean) of “n” independent and identically distributed (i.i.d) random variables approaches a normal distribution as “n” increases. This result is independent from the distribution of the population. If X1, X2, ..., Xn are n random variables i.i.d. with mean µ and variance σ 2, then: X − µ ~ N σ n (0 ,1 ) MBACatólica 2006/07 Métodos Quantitativos 7-15 MBACatólica 2006/07 Métodos Quantitativos 7-16 MBACatólica 2006/07 Métodos Quantitativos 7-17 Central Limit Theorem …the distribution of the sample mean becomes almost Normal, independently of the population’s distribution. As the sample size increases… x MBACatólica 2006/07 Métodos Quantitativos 7-18 Central Limit Theorem ! What sample size (n) is “large enough”? – For most population distributions, n>30 – For distributions that are fairly symmetric, n>15 may suffice – For distributions that are normally distributed, the sampling distribution of the mean will always be normally distributed, regardless of the sample size. MBACatólica 2006/07 Métodos Quantitativos 7-19 Solution ! b) ! X: quantity of the soft-drink dispensed, with µ=200 and σ=15. Sample size: n=36 since n is "large" ⇒ 15 2 X ~! = N 200, 36 probability that the average amount is at least 204: X − µ 204 − 200 P X ≥ 204 =P ≥ n 15 36 σ " P [ Z ≥ 1.6 ] =1-0.9452=5.48% MBACatólica 2006/07 Métodos Quantitativos 7-20 10. INTRODUCTION TO STATISTICAL INFERENCE MBACatólica 2006/07 Métodos Quantitativos 7-21 Statistical Inference 11. Point Estimation 12. Confidence Intervals 13. Hypothesis Tests MBACatólica 2006/07 Métodos Quantitativos 7-22 Problem ! a) b) BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors provided the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600. Compute a point estimate for the average amount invested. Compute a 90% confidence interval for the average amount invested. MBACatólica 2006/07 Métodos Quantitativos 7-23 Parameters and Statistics ! Parameter: is a numerical value that characterizes the distribution or the universe studied. ! Estimator: is a random variable that can take different values depending on the particular sample drawn. ! Estimate: is a number that is obtained from a specific sample. MBACatólica 2006/07 Métodos Quantitativos 7-24 11. Point Estimation MBACatólica 2006/07 Métodos Quantitativos 7-25 Estimators for the mean, variance and proportion Population’s Estimator parameter Mean Variance Standard deviation Proportion MBACatólica 2006/07 Estimate µ X x σ2 S2 s2 σ S s p fn (fn) Métodos Quantitativos 7-26 Estimator’s properties ! Unbiasedness An estimator is unbiased it the mean of its distribution equals the parameter. ! Efficiency An unbiased estimator is the most efficient if its variance (around the parameter) is minimal. ! Consistency An estimator is consistent if, as the sample size increases, its mean approaches the parameter and its variance decreases. MBACatólica 2006/07 Métodos Quantitativos 7-27 Unbiasedness f (⋅) Unbiased Biased µ MBACatólica 2006/07 Métodos Quantitativos 7-28 Efficiency f (⋅) Sampling distribution of the mean Sampling distribution of the median µ MBACatólica 2006/07 Métodos Quantitativos 7-29 Consistency f (⋅) Large sample Small sample µ MBACatólica 2006/07 Métodos Quantitativos 7-30 Problem ! a) b) BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors provided the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600. Compute a point estimate for the average amount invested. Compute a 90% confidence interval for the average amount invested. MBACatólica 2006/07 Métodos Quantitativos 7-31 Solution ! a) BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors provided the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600. Compute a point estimate for the average amount invested. n=25; x= 1000 25 =40; s 2 = 9600 24 =400. point estimate: µˆ = x= 1000 25 =40 b) Compute a 90% confidence interval for the average amount invested. MBACatólica 2006/07 Métodos Quantitativos 7-32 12. CONFIDENCE INTERVALS MBACatólica 2006/07 Métodos Quantitativos 7-33 Point Estimation vs. Confidence Intervals Population The mean, µ, is unknown Random sample Mean x = 50 I’ve got 95% confidence that µ is located between 40 and 60. Sample MBACatólica 2006/07 Métodos Quantitativos 7-34 Confidence Intervals for the mean ! Example for a Normal population (or for “large” samples) As: X ~ N µ , X −µ ~ N (0,1) we have σ n n σ Thus: P − 1.96 < X − µ < 1.96 = 0.95 σ / n MBACatólica 2006/07 Métodos Quantitativos 7-35 Confidence Intervals for the mean which can also be written as: σ σ < µ < X +1.96 = 0.95 P X −1.96 n n ! So, we have a 95% confidence interval for the mean: x − 1.96 MBACatólica 2006/07 σ n < µ < x + 1.96 Métodos Quantitativos σ n 7-36 Interpretation of a (1-α)% confidence interval ! (1-α)% is the percentage of confidence intervals, – from successive samples, – all with size n, – drawn from the same population that include the true value of the parameter being estimated. MBACatólica 2006/07 Métodos Quantitativos 7-37 Interpretation of a (1-α)% confidence interval µ − zα / 2 σ n α /2 µ + zα / 2 1−α E[ X ] = µ Confidence intervals for 10 different samples MBACatólica 2006/07 α /2 σ n x (1 − α ) % of the intervals contain µ and α % don’t. Métodos Quantitativos 7-38 (1- α)% CI for the mean: Normal Pop., n large and σ known ! For a Normal population (or large n) with σ known: 1. 2. 3. 4. Define the level of confidence (1- α)% Collect a sample with size n. Compute x Obtain zα/2 from the statistic tables The confidence interval is given by: x − zα 2 MBACatólica 2006/07 σ n < µ < x + zα 2 Métodos Quantitativos σ n 7-39 Problem ! a) b) BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors, collected the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600. Compute a point estimate for the average amount invested. Compute a 90% confidence interval for the average amount invested. MBACatólica 2006/07 Métodos Quantitativos 7-40 Solution BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors, collected the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600. ! b) n=25; x= 1000 25 =40; s 2 = 9600 24 =400. Compute a 90% CI for the average amount invested. α = 10% ⇒ z0.05 = 1.645 σ σ x − zα 2 < µ < x + zα 2 n n IC for µ : (33.156,46.844) MBACatólica 2006/07 ⇒ 40 ± 1.645 Métodos Quantitativos 20 25 7-41 Conflict between credibility and precision ! Credibility – Confidence level of an interval Precision – Width of the confidence interval ! For a given sample size n: ! – More precision means decrease the width of the interval. Therefore implying a lower level of confidence. – A higher level of confidence implies a larger interval (less precision). ! The only way to increase simultaneously the precision and the credibility of the inference is to increase n. MBACatólica 2006/07 Métodos Quantitativos 7-42 Problem ! A vending machine is calibrated to pour a quantity of liquid that follows a normal distribution with variance equal to 16 ml2. In a sample of 25 drinks, the average was: x = 2 5 0 m l We want: a) To construct a 95% Confidence Interval for the true average quantity of liquid on the served drinks; b) To determine how many drinks should be included on a new sample, if the interval precision is to be increased to 2 ml. MBACatólica 2006/07 Métodos Quantitativos 7-43 Solution a) x − 1.96 σ n < µ < x + 1.96 σ n 4 4 < µ < 250 + 1.96 25 25 248.432 < µ < 251.568 250 − 1.96 x − 1.96 σ n < µ < x + 1,96 σ n 4 4 < µ < 250 + 1.96 25 25 248.432 < µ < 251.568 250 − 1.96 The width of the interval is 3.136 ml. MBACatólica 2006/07 Métodos Quantitativos 7-44 Solution b) Width = 2 × zα 2 σ n 2 = 2 ×1.96 zα 2 σ n 4 n n = 7.84 n = 62 MBACatólica 2006/07 Métodos Quantitativos 7-45 Problem ! Ten analysts have given the following year earnings forecasts for a stock, which are normally distributed: Forecast (X i ) Number of analysts (ni ) 1.40 1 1.43 1 1.44 3 1.45 2 1.47 1 1.48 1 1.50 1 Compute a 95% confidence interval for the population mean of the forecasts. MBACatólica 2006/07 Métodos Quantitativos 7-46 Population’s Variance unknown Until now we have assumed that the variance of the population was known. However, it usually is unknown and has to be estimated. n ! We know that 2 ! S 2 = ∑ (X i =1 i − X ) n −1 is an unbiased estimator for the population variance. E S 2 = σ MBACatólica 2006/07 2 Métodos Quantitativos 7-47 Distribution of the sample mean from a Normal population with unknown σ ! If the population is Normal, is the sample mean distribution still given by X −µ ~ N ( 0,1 ) ? S n For small samples the answer is NO! MBACatólica 2006/07 Métodos Quantitativos 7-48 Distribution of the sample mean from a Normal population with unknown σ ! With σ unknown, we have a “t” distribution: X −µ ~ t ( n − 1) S n n where: S2 = MBACatólica 2006/07 ∑ ( xi − x ) i =1 2 n −1 Métodos Quantitativos 7-49 t distribution (Student’s distribution) Normal (0,1) t (df = 13) Also bell shaped Also symmetric But with wider tails t (df = 5) z, t 0 MBACatólica 2006/07 Métodos Quantitativos 7-50 Student’s t distribution 1 2 3 4 5 0.90 3.078 1.886 1.638 1.533 1.476 0.95 6.314 2.920 2.353 2.132 2.015 F(x) 0.975 12.706 4.303 3.182 2.776 2.571 0.99 31.821 6.965 4.541 3.747 3.365 0.995 63.656 9.925 5.841 4.604 4.032 6 7 8 9 10 1.440 1.415 1.397 1.383 1.372 1.943 1.895 1.860 1.833 1.812 2.447 2.365 2.306 2.262 2.228 3.143 2.998 2.896 2.821 2.764 3.707 3.499 3.355 3.250 3.169 11 12 13 14 15 1.363 1.356 1.350 1.345 1.341 1.796 1.782 1.771 1.761 1.753 2.201 2.179 2.160 2.145 2.131 2.718 2.681 2.650 2.624 2.602 3.106 3.055 3.012 2.977 2.947 26 27 28 29 inf 1.315 1.314 1.313 1.311 1.282 1.706 1.703 1.701 1.699 1.645 2.056 2.052 2.048 2.045 1.960 2.479 2.473 2.467 2.462 2.326 2.779 2.771 2.763 2.756 2.576 n MBACatólica 2006/07 t (df = 3) 1-0.975 0 3.182 Métodos Quantitativos 7-51 (1- α)% CI for the mean: Normal Pop. and σ unknown ! For a Normal population with σ unknown: 1. 2. 3. 4. Define the level of confidence (1- α)% Collect a sample with size n. Compute x n −1 Obtain tα( / 2 ) from the statistical tables The confidence interval is given by: ( n −1) x − tα / 2 MBACatólica 2006/07 s ( n −1) s < µ < x + tα / 2 n n Métodos Quantitativos 7-52 Problem ! Ten analysts have given the following year earnings forecasts for a stock, which are normally distributed: Forecast (X i ) Number of analysts (ni ) 1.40 1 1.43 1 1.44 3 1.45 2 1.47 1 1.48 1 1.50 1 Compute a 95% confidence interval for the population mean of the forecasts. MBACatólica 2006/07 Métodos Quantitativos 7-53 Solution x = 1.45; s = 0.02789; n = 10; df = 9 9 t0.025 = 2.262 0.02789 0.02789 ≤ µ ≤ 1.45 + 2.262 10 10 1.43 ≤ µ ≤ 1.47 1.45 − 2.262 ! For a 99% confidence level, the interval would be: 0.02789 0.02789 ≤ µ ≤ 1.45 + 3.250 10 10 1.421 ≤ µ ≤ 1.479 1.45 − 3.250 MBACatólica 2006/07 Métodos Quantitativos 7-54 Distribution of the sample mean σ Known n<30 σ Unkown n≥30 n≥30 n<30 Normal X − µ X −µ X −µ X −µ ~ N (0,1) ~ N (0,1) ~ N (0,1) ~ t (n − 1) S σ S Population σ n n n n CLT We don’t X − µ ~ N (0,1) know the σ distribution n Not Normal Population We don’t X − µ ~ N (0,1) know the S distribution n CLT MBACatólica 2006/07 CLT Métodos Quantitativos 7-55 Confidence interval for a proportion ! The true proportion of a population is p. The estimator of p is the proportion on the sample, X i.e., f n = , where X is a binomial variable: n [] [ ] 1 np E [ fn ] = E [ X ] = = p n n EP = V [ fn ] = 1 np EX = =p n p np (1 − p ) p (1 − p ) 1 = = V X [ ] 2 2 n n n MBACatólica 2006/07 Métodos Quantitativos 7-56 Confidence interval for a proportion ! fn − p For a large n: p (1 − p ) n ! ~ N ( 0 ,1 ) The confidence interval is given by: f n − zα 2 f n (1 − f n ) n MBACatólica 2006/07 < p < f n + zα 2 f n (1 − f n ) n Métodos Quantitativos 7-57 (1- α)% CI for a proportion : with large samples 1. 2. 3. 4. Define the level of confidence (1- α)% Collect a sample of size n. Compute f n Obtain zα/2 from the statistic tables The confidence interval is given by: f n − zα 2 f n (1 − f n ) MBACatólica 2006/07 n < p < f n + zα 2 Métodos Quantitativos f n (1 − f n ) n 7-58 Problem ! We want to estimate the proportion of voters in a political party. 400 citizens were interviewed and 140 of them revealed the intention to vote on that party. Compute a 99% confidence interval for the proportion of votes on that party. MBACatólica 2006/07 Métodos Quantitativos 7-59 Solution n = 400 f n = 140 / 400 = 0.35, 1 − f n = 0.65 1 − α = 0.99, α / 2 = 0.005, zα / 2 = 2.57 0.35*0.65 0.35*0.65 ≤ p ≤ 0.35 + 2.57 400 400 0.28871 ≤ p ≤ 0.41129 0.35 − 2.57 MBACatólica 2006/07 Métodos Quantitativos 7-60 Selection of the sample size ! The sample size is a decision variable reflecting a conflict between precision and the cost of sampling. Very large: • Too expensive MBACatólica 2006/07 Very small: • Imprecise results Métodos Quantitativos 7-61 Selection of the sample size ! Question: for a desirable minimum precision, what should be the minimum sample size to be drawn? The choice of n is affected by 3 factors: 1. The level of precision or the level of margin of error (interval width) 2. Level of confidence 3. The dispersion of the population MBACatólica 2006/07 Métodos Quantitativos 7-62 Sample size: Estimation of a proportion ! Since the confidence interval is given by: f n − zα 2 f n (1 − f n ) < p < f n + zα n 2 f n (1 − f n ) n it can also be written as fn − e < p < fn + e with e being the margin of error. MBACatólica 2006/07 Métodos Quantitativos 7-63 Sample size: Estimation of a proportion ! Fixing e, it is possible to obtain n as: n = ( zα 2 ) ! 2 f n (1 − f n ) e2 BUT: the value of f n is unknown before the sample is drawn. The value used for f n should be the one that maximizes p(1-p), i.e., f n = 0.5 . MBACatólica 2006/07 Métodos Quantitativos 7-64 Problem ! Determine the minimum size of a sample in order to compute a 95% confidence interval for the proportion of consumers who are willing to buy a new product, with a margin of error of one percentage point. ! Recompute that confidence interval if you were sure that, given the high price of the product, no more than 25% of consumers would buy it. MBACatólica 2006/07 Métodos Quantitativos 7-65 Solution e = 0.01 α = 5% Zα / 2 = 1.96 0.5 × 0.5 = 9604 2 0.01 ! If we knew “a priori” that p<0.25, then 0.25 × 0.75 n = 1.962 = 7203 0.012 n = 1.962 MBACatólica 2006/07 Métodos Quantitativos 7-66 Sample size: Estimation of the mean ! The confidence interval is given by: x − zα 2 σ < µ < x + zα 2 n Thus: n = ( z α MBACatólica 2006/07 2 ) σ n x −e < µ < x +e and it can be written as: 2 σ 2 e2 Métodos Quantitativos 7-67 Sample size: Estimation of the mean ! If σ is unknown: 1. Collect a pilot sample, with a smaller size, to estimate σ. 2. If the population is approximately normal: Prob[µ ± 2σ]=0.95 and Prob[µ ± 3σ]=0.997 Therefore (and using past data or subjective evaluations of the population), we can “estimate”: ι. σ = (Percentile 97.5- Percentile 2.5)/4 ιι. σ = (MAX- MIN)/6 MBACatólica 2006/07 Métodos Quantitativos 7-68 Problem ! Suppose you want to estimate the population mean of the analysts forecasts for next year stock earnings to within ± 0.01 with 95% confidence. On the basis of past studies, you believe the standard deviation of those forecasts to be 0.03. Find the minimum sample size needed. MBACatólica 2006/07 Métodos Quantitativos 7-69 Solution e = 0.01 σ = 0.03 α = 5% zα / 2 = 1.96 0.032 = 34.6 n = 1.96 0.012 We need at least 35 forecasts in our sample. 2 MBACatólica 2006/07 Métodos Quantitativos 7-70