Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Regression toward the mean wikipedia , lookup
Resampling (statistics) wikipedia , lookup
BA 275 Quantitative Business Methods Agenda Hypothesis Testing Elements of a Test Concept behind a Test Examples 1 Midterm Examination #1 2 Question 1 – A Histogram B Histogram A 12 8 10 6 8 6 4 4 2 2 0 0 0 0 20 40 60 80 20 40 60 80 100 100 Histogram C 10 8 6 4 2 0 0 20 40 60 80 100 3 Question 1 – B Box-and-Whisker Plot A B C 0 100 4 Question 1 – C C – 1. (1 point) The mean score will (circle one) A). Be unchanged B). Increase by 5 points C). Increase by 5 points C – 2. (1 point) The median score will (circle one) C). Increase by B). Increase by 5 A). Be unchanged points 5 points C – 3. (1 point) The standard deviation of scores will (circle one) C). Increase by B). Increase by 5 A). Be unchanged points 5 points C – 4. (1 point) The interquartile range (IQR) of scores will (circle one) C). Increase by B). Increase by 5 A). Be unchanged points 5 points C – 5. (1 point) The range of scores will (circle one) C). Increase by B). Increase by 5 A). Be unchanged points 5 points D). None of the above D). None of the above D). None of the above D). None of the above D). None of the above 5 Question 2 Center City Outlying Area $940 $955 $965 $975 $980 $985 $999 $1,000 $1,119 $1,247 $575 $690 $694 $705 $725 $725 $745 $750 $775 $800 Summary Statistics Center City Outlying Area Count Average Median Mode Variance Standard deviation Minimum Maximum Range Lower quartile A 1,016.5 982.5=(980+985)/2 D F 2 8949.16=(94.6) H 94.6 J L 1247 N 307=1247-940 965.0 B C E G 725 3,755.6 61.28295 I K 575 M O 694.0 Upper quartile Interquartile range 1000=965+35 P 750.0 56=750-694 Q 35.0 6 Question 3 – A, B, and C 99.7% 95% 68% 0.15% 2.35% 13.5% 34% 34% 13.5% 2.35% 0.15% x 3s 3 x 2s xs x xs x 2s 2 2 3 2 1 0 x 3s 3 1 2 3 5k 6k 7k 8k 10k 11k 9k A.Estimate the percentage of games that have between 7,000 to 10,000 people in attendance. 81.5% = 68% + 13.5% B. Estimate the percentage of games that have less than 6,000 people in attendance. 2.5% = 2.35% + 0.15% C. Estimate the percentage of games that have more than 9,000 people in attendance. 16% = 13.5% + 2.35% + 0.15% 7 Question 3 – D and E D. If we assume that the attendance follows a normal distribution with the same mean and standard deviation, estimate the percentage of games that have between 7,500 to 10,250 people in attendance. P(7500 < X < 10250) = P( -0.5 < Z < 2.25) = 0.9878 – 0.3085 = 0.6793 E. Again, with the same normality assumption, estimate the percentage of games that have more than 5,500 people in attendance. P( X < 5500 ) = P( Z < -2.5) = 0.0062 P( X > 5500 ) = 1 – 0.0062 = 0.9938 8 Question 4 A bottling company uses a filling machine to fill plastic bottles with cola. The bottles are supposed to contain 300 milliliters (ml). In fact, the contents vary according to a normal distribution with mean = 298 ml and standard deviation = 3 ml. A. What is the probability that an individual bottle contains less than 296 ml? P( X < 296 ) = P( Z < -0.67 ) = 0.2514 B. What is the probability that the mean contents of the bottles in a six-pack is less than 296 ml? 296 298 P( X 296) P( Z ) P( Z 1.64) 0.0505 3/ 6 9 Question 5 A questionnaire about study habits was given to a random sample of students taking a large introductory statistics class. The sample of 25 students reported that they spent an average of 110 minutes per week studying statistics. Assume that the standard deviation is 40 minutes. Give a 90% confidence interval for the mean time spent studying statistics by students in this class. 40 110 1.645 110 13.16 25 If we wish to reduce the margin of error to only 8 minutes while keeping the confidence level at 90%, how large a sample do we need? 1.645 40 n 67.65 68 8 2 10 Central Limit Theorem (CLT) The CLT applied to Means If X ~ N ( , 2 ) , then X ~ N ( , 2 ). n If X ~ any distribution with a mean , and variance 2, then X ~ N ( , 2 n ) given that n is large. 11 Example 1 The number of cars sold annually by used car salespeople is normally distributed with a standard deviation of 15. A random sample of 400 salespeople was taken and the mean number of cars sold annually was found to be 75. Find the 95% confidence interval estimate of the population mean. Interpret the interval estimate. 15 X 1.96 75 1.95 n 400 12 Statistical Inference: Estimation Population Example: = 10,000 n = 100 What is the value of ? Research Question: What is the parameter value? Example: ? Sample of size n Tools (i.e., formulas): Point Estimator Interval Estimator 13 Example 2: Concept behind a H.T. A bank has set up a customer service goal that the mean waiting time for its customers will be less than 2 minutes. The bank randomly samples 30 customers and finds that the sample mean is 100 seconds. Assuming that the sample is from a normal distribution and the standard deviation is 28 seconds, can the bank safely conclude that the population mean waiting time is less than 2 minutes? 14 Statistical Inference: Hypothesis Testing Population Research Question: Is a claim about the parameter value supported? Example: = 10,000 n = 100 Is “ > 22,000”? Example: “ > 22,000”? Sample of size n Tool (i.e., formula): Z or T score 15 Elements of a Test Hypotheses Null Hypothesis H0 Alternative Hypothesis Ha Test Statistic Decision Rule (Rejection Region) Before collecting data After collecting data Evidence (actual observed test statistic) Conclusion Reject H0 if the evidence falls in the R.R. Do not reject H0 if the evidence falls outside the R.R. 16 Example 2 (cont’d) A bank has set up a customer service goal that the mean waiting time for its customers will be less than 2 minutes. The bank randomly samples 30 customers and finds that the sample mean is 112 seconds. Assuming that the sample is from a normal distribution and the standard deviation is 28 seconds, can the bank safely conclude that the population mean waiting time is less than 2 minutes? 17 Type I and II Errors Conclusion State of Nature H0 is true ( ≥ 120) Ha is true ( < 120) Ha is not supported (cannot say < 120) Ha is supported (have evidence to say < 120) Correct Type II error Type I error Correct Chance of making Type I error = P( Type I error ) = a Chance of making Type II error = P( Type II error ) = b 18 Example 3 The manager of a department store is thinking about establishing a new billing system for the store’s credit customers. After a thorough financial analysis, she determines that the new system will be cost-effective only if the mean monthly account is greater than $70. A random sample of 200 monthly accounts Is drawn, for which the sample mean account is $76 with a standard deviation of $30. Is there enough evidence at the 5% significance level to conclude that the new system will be cost-effective? What if the sample mean is $68? $74? 19 Answer Key to the Examples Example 1: X 1.96 75 1.95 15 n 400 Example 2: the z score of the sample mean 100 is -3.9123 meaning the sample mean is almost 4 standard deviations below the null hypothesis = 120. Strong evidence to reject H0 and to conclude that the true mean waiting time is less than 120 seconds. Example 2 (cont’d): the z score of the sample mean 112 is only -1.56, less than 2 standard deviations below = 120. No evidence to reject H0. Example 3: at a = 5%, the rejection region is: Reject H0 if z > 1.645. The z score of 76 is 2.828. Reject H0. 68 is below the null = 70. No evidence at all to reject H0. The z score of 74 is 1.8856. Reject H0. 20