Download Answer key

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
STATISTICS FOR THE SOCIAL AND BEHAVIORAL SCIENCES Recitation #10 Answer Key PROBABILITY, HYPOTHESIS TESTING, CONFIDENCE INTERVALS Hypothesis tests 2 When a recent GSS asked, “would you be willing to pay much higher taxes in order to protect environment?”, 369 people answered yes and 483 answered no. Software shows the following results to analyze whether a majority or minority of Americans would answer yes: Test of proportion = 0.5 N Sample prop 95% CI z-­‐value p-­‐value 852 0.4331 (0.400, 0.466) -­‐3.91 0.000 a) Specify the hypotheses that are being tested. b) Report and interpret the test statistic value. c) Explain an advantage of the confidence interval shown over the significance test. a) Ho = 0.5, Ha ≠ 0.5 b) Note that since dF>100, we can use z values because we treat this case as if dF = ∞. The absolute value of the t statistic is greater than 1.96, which means that we can reject the null hypothesis. c) A test merely indicates whether the particular value in Ho is plausible. It does not tell us which other potential values are plausible. The confidence interval shows the entire set of believable values. It shows the extent to which Ho may be false by showing whether the values in the interval are far from the Ho value. 3 A multiple-­‐choice test question has four possible responses. The question is difficult, with none of the four responses being obviously wrong, yet with only one correct answer. It first occurs on an exam taken by 400 students. Only 125 answer it correctly. Test whether more people answer the question correctly than would be expected just due to chance using the p-­‐value method. H0 : μ = 0.25 Ha: μ≠ 0.25 se0= !! ∗(!!!! )
!
=
!
!"
!.!"#$
t statistic = !" = 14.4 !
1 t0.025 = 1.96 at 95% confidence level because df>100. Since |14.1| > 1.96, fail to reject the null hypothesis. 4 In the figure below, one interval does not contain 94.52 minutes. Does this imply that the mean cannot be 94.52? Just as some observations occur more than 2 standard deviations from the mean, some point estimates will be more than 2 standard errors from the parameter. A confidence interval only provides a plausible range of values for a parameter. While we might say other values are implausible based on the data, this does not mean they are impossible. 5 The nutrition label on a bag of potato chips says that a 1 ounce (28 gram) serving of potato chips has 130 calories and contains ten grams of fat, with three grams of saturated fat. A random sample of 35 bags yielded a sample mean of 134 calories with a standard deviation of 17 calories. Is there evidence that the nutrition label does not provide an accurate measure of calories in the bags of potato chips? We have verified the independence, sample size, and skew conditions are satisfied. H0 :μ=130. HA :μ≠130. t statistic=
!!!
!"
=
!"#!!"#
!"
!"
=1.39, Here n=35<100, so we look for the t0.025 value for dF=34 on our t table. Looking at the t table we take the dF value that is closest to 34, which is dF=30. The t score associated with that is 2.042 (remember, if you have dF=99 and on your table you only have df=80 and dF=100, ALWAYS take the t value for dF=80 which is the most conservative estimate). Since |1.39|< 2.042, we fail to reject the null hypothesis. The data do not provide convincing evidence that the true average calorie content in bags of potato chips is different than 130 calories. 2 Type I and Type II errors 6 Suppose that only 10% of the time a true effect exists in medical studies. Suppose also that when an effect truly exists, there’s a 50% chance of making a Type II error and failing to detect it. Assuming these rates, could a substantial percentage of medical ‘discoveries’ actually be Type I errors? See example 6.8 on page 165 from your book. 7 In making a decision in a test, a researcher worries about the possibility of rejecting the null hypothesis when it is actually true. Explain how to control the probability of this type of error. By using a higher level of confidence. For instance, if he is using a 90% confidence level, he should use a 95% or 99% confidence level so that the confidence interval becomes bigger, and the possibility of rejecting the null hypothesis when it is actually true becomes smaller. This comes at the cost of increasing Type II errors, which occurs when we fail to reject the null hypothesis when it is actually not true. Probability 8 Below are four versions of the same game. Someone else gets to pick the version of the game, and then you get to choose how many times to flip a coin: 10 times or 100 times. Identify how many coin flips you should choose for each version of the game. It costs $1 to play each game. Explain your reasoning. a) If the proportion of heads is larger than 0.60, you win $1. b) If the proportion of heads is larger than 0.40, you win $1. c) If the proportion of heads is between 0.40 and 0.60, you win $1. (d) If the proportion of heads is smaller than 0.30, you win $1. a) 10 tosses. Fewer tosses mean more variability in the sample fraction of heads, meaning there’s a better chance of getting at least 60% heads. b) 100 tosses. More flips means the observed proportion of heads would often be closer to the average, 0.50, and therefore also above 0.40. c) 100 tosses. With more flips, the observed proportion of heads would often be closer to the average, 0.50. d) 10 tosses. Fewer flips would increase variability in the fraction of tosses that are heads. 9 A 2012 Pew Research survey asked 2,373 randomly sampled registered voters their political affiliation (Republican, Democrat, or Independent) and whether or not they identify as swing voters. 35% of respondents identified as Independent, 23% identified as swing voters, and 11% identified as both. 3 a) Are being Independent and being a swing voter disjoint, i.e. mutually exclusive? b) Draw a Venn diagram summarizing the variables and their associated probabilities. c) What percent of voters are Independent but not swing voters? d) What percent of voters are Independent or swing voters? e) What percent of voters are neither Independent nor swing voters? f) Is the event that someone is a swing voter independent of the event that someone is a political Independent? a) No, there are voters who are both politically Independent and also swing voters. b) Venn diagram below: c) 24%. d) Add up the corresponding disjoint sections in the Venn diagram: 0.24 + 0.11 + 0.12 = 0.47. Alternatively, use the General Addition Rule: 0.35 + 0.23 -­‐ 0.11 = 0.47. e) 1-­‐ 0.47 = 0.53. f) P(Independent)* P (swing) = 0.35* 0.23 = 0.08, which does not equal P(Independent and swing) = 0.11, so the events are dependent. 12 The smallpox data set provides a sample of 6,224 individuals from the year 1721 who were exposed to smallpox in Boston. Doctors at the time believed that inoculation, which involves exposing a person to the disease in a controlled form, could reduce the likelihood of death. Each case represents one person with two variables: inoculated and result. The variable inoculated takes two levels: yes or no, indicating whether the person was inoculated or not. The variable result has outcomes lived or died. 4 Write out, in formal notation, the probability a randomly selected person who was not inoculated died from smallpox, and find this probability. P(result = died | inoculated = no) =
!(!"#$%&!!"#! !"# !"#$%&'()*!!")
!(!"#$%&'()*!!")
!.!"#$
= !.!"#$ = 0.1411 Determine the probability that an inoculated person died from smallpox. P(result = died | inoculated = yes) = 0.0255 !(!"#$%&!!"#! !"# !"#$%&'()*!!"#)
!(!"#$%&'()*!!"#)
!.!!"!
= !.!"#$ =
5