Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Exam2 A learning experience…. Scores • • • • Raw Scores went from 68 to 147 As percentage of total….40% to 86% Scaled scores went from 60.5 to 100 Some still left to be graded… 90s 80s 70s 60s TOTAL 8 23 23 5 59 Question by Question Question count min max avg s avail 1 2 3 4 5 6 7 59 59 59 59 59 59 59 4 2 2 4 4 10 1 20 20 20 34 18 20 15 13.7 16.4 12.9 23.2 10.5 16.0 8.9 4.5 4.1 4.4 7.5 4.1 1.7 3.0 20 20 20 40 20 20 15 8 59 2 10 4.3 2.7 10 9 59 0 2 0.8 0.6 5 raw 59 68 147 106.7 17.3 170 scaled 59 60.5 100 79.8 8.7 100 Data for Q1 to Q3 Categorical n=60 Numerical Numerical Categorical Car Class Displacement Fuel Type Hwy MPG 1 Midsize 3.5 R 28 2 Midsize 3 R 26 3 Large 3 P 26 4 Large 3.5 P 25 . . . . . . . . . . 58 Compact 6 P 20 59 Midsize 2.5 R 30 60 Midsize 2 R 32 Q1 • expect that the size of the car engine (measured by displacement) would change based on car class (compact, midsize, large) • H0: MU(compact)=MU(mid)=MU(large) • Ha: not all equal • ANOVA single factor (3 samples) • Unstack the data, excel data analysis Q2 • expect to see a relationship between car class and recommended fuel type • Relationship between two categorical variables (car class and fuel type) • Chi-sq independence test – 3x2 contingency table of counts…summing to 60 Compact Large Midsize P 16 11 9 36 R 3 5 16 24 19 16 25 60 Q3. Fuel type and mpg • expect that because premium gasoline is higher quality, cars for which it is recommended will get higher gas mileage (on average) than cars for which regular fuel is recommended R got higher • Ho: MU(prem) = MU(reg) sample mean • Ha: MU(prem) > MU(reg) t-Test: Two-Sample Assuming Equal Variances • Unstack, T-test two sample Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat The wrong P(T<=t) one-tail t Critical one-tail value P(T<=t) two-tail t Critical two-tail • NOTE: We guessed the wrong tail. • Do not reject HO in favor of THIS Ha. p P R 24.33333 27.70833 12.4 9.519928 36 24 11.2579 0 58 -3.81704 0.000165 0.999835 1.671553 0.000331 2.001717 The correct p value Q4a Aspirin and Heart Attack • Relationship between two 0/1 variable. • 2x2 contingency table from the facts in the question (like lights and myopia). • Chi-sq independence test for 2T alternative. • Half the pvalue if you want a 1T alternative (Paspirin < Pplacebo) Aspirin Placebo Heart Attack 104 189 293 No Heart Attack 10896 10811 21707 11000 11000 22000 Q4b. How many heart attacks using new design (given Ps) • It is easy to calculate the mean (most likely) of 250.5. • Tell me that the actual number is a random variable • Provide a probability distribution for that random variable 𝑛 ∗ 𝑝 ∗ (1 − 𝑝) Group aspirin placebo TOTAL Number 16500 5500 22000 Probability 0.009454545 0.017181818 Number of Heart Attacks mean variance std dev 156 154.53 12.4 94.5 92.88 9.6 250.5 247.40 15.7 Normal approx to binomial Q4c. Will new design affect p-value? • Yes. We will be more certain about Aspirin’s effect and LESS certain about Placebo’s effect. • The test is focused on the difference. • The gain in accuracy for aspirin is not as great as the loss in accuracy for placebo (diminishing returns) • Our test will be less powerful. • P-value will go up. • 50/50 v 75/25 v 100/0 Best design Worst design Q5. Is Di significantly better than El? • Not about whether P=0.5 • About whether P(di)=P(el) • 2x2 chi-squared independence test 2 tailed pvalue 1 tailed pvalue Di El Expected In 10 5 15 Out 10 17 27 7.1 7.9 12.9 14.1 Distances 1.142857 0.634921 1.038961 0.577201 calculated chisuared Pvalue 3.393939 0.065436 Pvlaue/2 0.032718 20 22 42 Q6. Rportfolio • Rportfolio = (R1+R2+R3)/3 R1, R2, R3 Will not be Independent. Stock Mean Return Variance Standard Deviation 1 0.1 0.01 0.1 2 0.05 0.0016 0.04 3 0.2 0.16 0.4 TOTAL 0.35 0.1716 0.414 Rportfolio 0.117 0.019066667 0.138 Sum of variances (independent) .414/3 Q7. Total (Avg) weight of n=20 • Mean = 20*μ • Variance = 20*σ2 • Normal (sum of normals) One guest Total of 20 guests Mean 150 3000 Family hotel means….. Weights in elevator not independent. More likely to be under 3500. Variance Std Dev 1600 40 32000 178.8854 Pr(total<3500) = NORMDIST(3500,3000,178.9,true) = 0.9974 Q8. Al and Bo • Neither knows σ • Both get the same 𝑋−𝜇 𝑠 • Al uses t.dist, Bo uses normdist • The t correctly reflects extra uncertainty…giving Al a higher p-value • Bo’s cheating is rewarded with a lower pvalue. Q9 • If students don’t cheat, then their IQs are independent identically distributed N(100,15) • The null hypothesis (mean men = mean women) IS TRUE!!! • When H0 is true, and we do any test correctly, we reject with probability 0.05. • We will reject H0 with probability 0.05 and fail to reject with probability 0.95 • What will happen under H0 is “easy” • What will happen under Ha is very difficult…