* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides 3-7 Proportion Inference
Survey
Document related concepts
Transcript
BA 275 Quantitative Business Methods Agenda Multiple Linear Regression: Dummy Variables Statistical Inference: Population Proportion Confidence Interval Estimation Hypothesis Testing 1 Using Dummy Variables Median Value Location Amount of Homes of ATM Withdrawn ($000) ($000) X1 X2 Y 225 yes 120 170 no 99 153 yes 91 132 no 82 237 yes 124 187 yes 104 245 yes 127 125 yes 80 215 yes 115 170 no 97 223 no 117 147 no 86 197 yes 109 167 no 94 210 no 112 Median Value Location Amount of Homes of ATM Withdrawn ($000) ($000) X1 X2 Y 225 120 170 99 153 91 132 82 237 124 187 104 245 127 125 80 215 115 170 97 223 117 147 86 197 109 167 94 210 112 2 Dummy Variable for LOCATION 1, yes, if located in a shopping center X2 0, no, if located outside a shopping center Y 0 1 (Value) 2 ( Location) 3 Fitted Model Multiple Regression Analysis Dependent variable: Withdrawal Parameter CONSTANT Location Value Source Model Residual Total (Corr.) Estimate 29.6823 1.22822 0.393129 Standard Error 1.33374 0.545909 0.0073456 Analysis of Variance Sum of Squares Df Mean Square 3278.42 2 1639.21 11.9753 12 0.997943 3290.4 14 T Statistic 22.2549 2.24986 53.5189 F-Ratio 1642.59 P-Value 0.0000 0.0440 0.0000 P-Value 0.0000 R-squared = 99.6361 percent R-squared (adjusted for d.f.) = 99.5754 percent Standard Error of Est. = 0.998971 Mean absolute error = 0.623333 Durbin-Watson statistic = 1.93022 4 Questions Write down the fitted model. Is the assumed model reliable? Why? What is the value of R2? the adjusted R2? To select a model, why do we prefer adj-R2 to R2? Predict the amount of money withdrawn from a neighborhood in which the median value of homes is $200,000 for an ATM that is located in a shopping center. If the median value of homes increases by $2,000, then the amount of money withdrawn from an ATM located in a shopping center is expected to increase by . If the median value of homes is $200,000, then the amount of money withdrawn from an ATM located in a shopping center is ???; and the amount of money withdrawn from an ATM located outside a shopping center is ???. What is the difference? 5 Two Lines with the Same Slopes but Different Intercepts Y 0 1 (Value) 2 ( Location) Plot of Fitted Model Withdrawal 130 Location 0 1 120 110 100 90 80 120 150 180 210 240 270 Value 6 Two Lines with Different Intercepts and Slopes Y 0 1 (Value) 2 ( Location) 3 (Value Location) Plot of Fitted Model Withdrawal 130 Location 0 1 120 110 100 90 80 120 150 180 210 240 270 Value 7 Analyzing Categorical Data Do you own an iPod? ___Yes ___No Do you own a XBoX? ___Yes ___No Which of the following 4 soft drinks gives you the highest satisfaction? Type A ___ Type B ___ Type C ___ Type D ___ Your gender: ____Male ____ Female Your nationality: _____ 8 Central Limit Theorem In the case of sample mean X ~ N ( , 2 n ) X n In the case of sample proportion p(1 p) pˆ ~ N ( p, ) n pˆ p(1 p) n pˆ (1 pˆ ) n 9 Formulas for Proportion 100(1-)% confidence interval estimator pˆ (1 pˆ ) p z / 2 n Hypothesis testing H0: p = p0 z pˆ p0 p0 (1 p0 ) n 10 Example 1 A sample of 35 student information sheets shows that 9 intend to concentrate in Finance. Give a 99% confidence interval for the proportion of students in the population that intend to concentrate in Finance. 11 Margin of Error (m) X z / 2 n X t / 2 s n pˆ z / 2 pˆ (1 pˆ ) n Margin of Error (how good is your point estimate?) 12 Example 2 A marketing manager for a start-up firm in Michigan wishes to discover the proportion of teenagers in Japan who own an iPod. If the manager wants a confidence interval of width 0.1, how many teenagers must be sampled? Use a conservative estimate of p. Assume that the confidence level is to be 95%. 13 Accuracy Gained by Increasing the Sample Size 1.96 n (0.5)( 0.5) m 2 Margin of Error (B) 7% Sample Size (n) 196 6% 5% 4% 266 384 600 3% 2% 1% 1037 2401 9604 14 Hypothesis Testing 4 out of 5 dentists recommend Oral-B. Scenario 1: “Hmm, I thought it was higher.” Scenario 2: “No, it cannot be. It should be lower.” Scenario 3: “Really? I don’t think so.” 15 Example 3 Three politicians are attempting to win the Democratic nomination for senator. The result of a survey of 1000 Democrats is summarized below. Do we have enough evidence to indicate that Candidate A will receive more than 50% of the vote? Assume = 5%. (use the rejection region and the pvalue approaches.) Candidate A 550 Candidate B 300 Candidate C 150 16 Example 4 In recent years over 70% of first-year college students responding to a national survey have identified “being well-off financially” as an important personal goal. A state university finds that 153 of a random sample of 200 of its first-year students say that this goal is important. Do we have evidence to support that more than 70% of first-year students would identify being well-off as an important personal goal? (use the rejection region and the p-value approaches.) Assume = 5%. 17 Example 5 A financial analyst wanted to determine the mean annual return on mutual funds. In a random sample of 15 returns she found a sample mean of 12.9% with a (sample) standard deviation 3%. Is there evidence to claim that the mean annual return on mutual funds is greater than 12%? Assume = 5%. 18 Answer Key to Examples 1 – 3 9 9 (1 ) 9 35 2.576 35 Example 1. 35 35 Example 2. Margin of error = m = 0.10 / 2 = 0.05. 2 1.96 0.5 0.5 384.16 385 . n 0.05 Example 3. H0: p = 0.50 vs. Ha: p > 0.50. Given = 5%, the rejection region is: 550 0.55 and the z statistic Reject H0 if z > 1.645. The sample proportion pˆ 1000 0.55 0.5 z 3.16 is in the rejection region. Hence, reject H0. The p-value = 0.5 0.5 1000 P( z > 3.16 ) = 1 – 0.9992 = 0.0008 < . Again, reject H0. 19 Answer Key to Examples 4 – 5 Example 4. H0: p = 0.70 vs. Ha: p > 0.70. Given = 5%, the rejection region is: 153 0.765 and the z statistic Reject H0 if z > 1.645. The sample proportion pˆ 200 0.765 0.7 z 2.00 is in the rejection region. Hence, reject H0. The p-value = 0.7 0.3 200 P( z > 2.00 ) = 1 – 0.9772 = 0.0228 < . Again, reject H0. Example 5. H0: = 0.12 vs. Ha: > 0.12. Given a small sample with unknown population standard deviation, we use the T table (Table D) to set up the rejection region. With = 5% and degrees of freedom n – 1 = 14, the rejection region is: 0.129 0.12 Reject H0 if t > 1.761. The t statistic t 1.16 is outside the 0.03 / 15 rejection region and thus, we fail to reject H0. By using Table D, we found that we would have rejected H0 if = 15% but would not if = 10% or less. This implies that the p-value of the test is between 10% and 15%. 20