Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inference for Proportions One Sample Confidence Intervals One Sample Proportions Rate your confidence 0 - 100 • Name my age within 10 years? • within 5 years? • within 1 year? • Shooting a basketball at a wading pool, will make basket? • Shooting the ball at a large trash can, will make basket? • Shooting the ball at a carnival, will make basket? What happens to your confidence as the interval gets smaller? The larger your confidence, the wider the interval. Point Estimate • Use a single statistic based on sample data to estimate a population parameter • Simplest approach • But not always very precise due to variation in the sampling distribution Confidence intervals • Are used to estimate the unknown population parameter • Formula: estimate + margin of error Margin of error • Shows how accurate we believe our estimate is • The smaller the margin of error, the more precise our estimate of the true parameter • Formula: æ critical ö E=ç ÷ value è ø æ standard deviation ö ×ç ÷ of the statistic è ø Assumptions: • SRS • Normal distribution n p̂ > 10 & n(1- p̂) > 10 • Population is at least 10n Formula for Confidence interval: CI statistic critical value SD of statistic Normal curve p̂ z * æ p̂ (1- p̂ ) ö ç ÷ n è ø Note: For confidence intervals, we DO NOT know p – so we MUST substitute p-hat for p in both the SD & when checking assumptions. Critical value (z*) • Found from the confidence level • The upper z-score with probability p lying to its right under the standard normal curve Confidence level 90% 95% 99% z*=1.645 z*=1.96 z*=2.576 Tail Area Z* .05 .025 .005 1.645 .05.025 .0051.96 2.576 Confidence level • Is the success rate of the method used to construct the interval • Using this method, ____% of the time the intervals constructed will contain the true population parameter What does it mean to be 95% confident? • 95% chance that p is contained in the confidence interval • The probability that the interval contains p is 95% • The method used to construct the interval will produce intervals that contain p 95% of the time. A May 2000 Gallup Poll found that 38% of a random sample of 1012 adults said that they believe in ghosts. Find a 95% confidence interval for the true proportion of adults who believe in ghost. Assumptions: Step 1: check assumptions! •Have an SRS of adults •n p̂ =1012(.38) = 384.56 & n(1- p̂ ) = 1012(.62) = 627.44 Since both are greater than 10, the distribution can be approximated by a normal curve Step10,1012. 2: make •Population of adults is at least calculations æ p̂ (1- p̂ ) ö æ .38(.62) ö P̂ ± z * ç = (.35,.41) ÷ = .38 ±1.96 ç ÷ n è 1012 ø è ø Step 3: conclusion in context We are 95% confident that the true proportion of adults who believe in ghosts is between 35% and 41%. To find sample size: Another Gallop Poll is taken in order to æ ö p̂ 1p̂ ( ) measure of adults who E the = zproportion *ç ÷humans. What approve of attempts to clone n ø è sample size is necessary to be within + 0.04 However, since we have not yet taken a of the true proportion adults approve sample, we do not know a of p-hat (or p)who to use! of attempts to clone humans with a 95% Confidence Interval? What p-hat (p) do you use when trying to find the sample size for a given margin of error? .1(.9) = .09 .2(.8) = .16 .3(.7) = .21 .4(.6) = .24 .5(.5) = .25 By using .5 for p-hat, we are using the worst-case scenario and using the largest SD in our calculations. Another Gallop Poll is taken in order to measure the proportion of adults who approve of attempts to clone humans. What sample size is necessary to be within + 0.04 of the true proportion of adults who approve of attempts to clone humans with a 95% Confidence Interval? æ E = z *ç è p (1- p ) ö ÷ n ø æ .5 (.5 ) ö .04 = 1.96 ç ÷ n è ø .5 ( .5 ) .04 = 1.96 n 2 .25 æ .04 ö çè ÷ø = 1.96 n n = 600.25 601 Use p-hat = .5 Divide by 1.96 Square both sides Round up on sample size Hypothesis Tests One Sample Proportions Example 1: Julie and Megan How can I tell if pennies really wonder ifland head and tails are heads 50% of the time? equally likely if a penny is spun. Hypothesis test They spin pennies 40 times and will help me get 17 decide! heads. Should they reject the standard that But how do I know if this P̂ is one WhatIis theirheads sample proportion? pennies land 50%or of the that expect to happen is it one time? that is unlikely to happen? What are hypothesis tests? Calculations that tell us if a value occurs by random chance or not – if it is statistically significant Is it . . . – a random occurrence due to variation? – a biased occurrence due to some other reason? Nature of hypothesis tests How does a murder trial work? • First begin by supposing the First - is assume the “effect” NOTthat present person is innocent • Next, see if data provides Then – must have sufficient evidence against the evidence to prove guilty supposition Example: murder trial Steps: Notice the steps are the same except we add hypothesis statements – which you will learn today 1) Assumptions 2) Hypothesis statements & define parameters 3) Calculations 4) Conclusion, in context Assumptions for z-test: • Have an SRS from a binomial distribution • Distribution is (approximately) normal np ³ 10 YES – n(1- p) ³ 10 N >10n These are the same assumptions as confidence Use the hypothesized parameter in the null hypothesisintervals!! to check assumptions! Example 1: Julie and Megan wonder if head and tails are equally likely if a penny is spun. They spin pennies 40 times and get 17 heads. Should they reject the standard that pennies land 50% of the time? Are the assumptions met? • Binomial Random Sample • 40(.5) >10 and 40(1-.5) >10 • Infinate amount of spins > 10(40) Writing Hypothesis statements: • Null hypothesis – is the statement being tested; this is a statement of “no effect” or “no difference” H0: • Alternative hypothesis – is the statement that we suspect is true Ha: The form: Null hypothesis H0: parameter = hypothesized value Alternative hypothesis Ha: parameter = hypothesized value Ha: parameter > hypothesized value Ha: parameter < hypothesized value Example 1 Contd.: Julie and Megan wonder if head and tails are equally likely if a penny is spun. They spin pennies 40 times and get 17 heads. Should they reject the standard that pennies land 50% of the time? State the hypotheses : H0: p = .5 Ha: p ≠ .5 Where p is the true proportion of heads Example 2: A company is willing to renew its advertising contract with a local radio station only if the station can prove that more than 20% of the residents of the city have heard the ad and recognize the company’s product. The radio station conducts a random sample of 400 people and finds that 90 have heard the ad and recognize the product. Is this sufficient evidence for the company to renew its contract? State the hypotheses : H0: p = .2 Ha: p > .2 Where p is the true proportion that heard the ad. Formula for hypothesis test: statistic - parameter Test statistic SD of statistic z pˆ p p 1 p n Example 1 Contd. Test Statistics for Julie and Megan’s Data statistic - parameter Test statistic SD of statistic -0.95 = .425 - .5 .5 (1- .5 ) 40 P-values • The probability that the test statistic would have a value as extreme or more than what is actually observed Level of significance • Is the amount of evidence necessary before we begin to doubt that the null hypothesis is true • Is the probability that we will reject the null hypothesis, assuming that it is true • Denoted by α – Can be any value – Usual values: 0.1, 0.05, 0.01 – Most common is 0.05 Statistically significant – • The p-value is as small or smaller than the level of significance (α) • If p > α, “fail to reject” the null hypothesis at the a level. • If p < α, “reject” the null hypothesis at the a level. Facts about p-values: • ALWAYS make decision about the null hypothesis! • Large p-values show support for the null hypothesis, but never that it is true! • Small p-values show support that the null is not true. • Double the p-value for two-tail (=) tests • Never accept the null hypothesis! Never “accept” the null hypothesis! Never “accept” the null hypothesis! Never “accept” the null hypothesis! At an α level of .05, would you reject or fail to reject H0 for the given p-values? a) .03 b) .15 c) .45 d) .023 Reject Fail to reject Fail to reject Reject Writing Conclusions: 1) A statement of the decision being made (reject or fail to reject H0) & why (linkage) AND 2) A statement of the results in context. (state in terms of Ha) “Since the p-value < (>) α, I reject (fail to reject) the H0. I do (do not) have statistically significant evidence to suggest that Ha.” Be sure to write Ha in context (words)! Example 1 Contd. The Decision .425 - .5 .5 (1- .5 ) 40 = -0.95 P-Value = .342 Compare the P-Value to the Alpha Level .342 > .05 Since the P-Value is greater than the alpha level I fail to reject that spinning a penny lands heads 50% of the time. I do not have statistically significant evidence to suggest that spinning a penny is anything other than fair. What? You and Jeff Spun your pennies and got 10 heads out of 40 spins? Well that not what Meg and I got. So what now? You Decide Joe and Jeff decide to test the same hypothesis but gather their own evidence. They spin pennies 40 times and get 10 heads. Should they reject the standard that pennies land heads 50% of the time? We DID NOT reject! But we DID reject! BOTH OF THEM!!! Who is Correct? Conclusion are based off of your data. It is important however to discuss possible ERRORS that could have been made. Errors in Hypothesis Tests Every time you make a decision there is a possibility that an error occurred. ERRORS Murder Trial Revisited Reject Decision Guilty Fail to Reject Decision Not Guilty Ho is True Actually Innocent Ho is False Actually Guilty Type I Error Correct Type I Error Correct Correct Type II Error Type II Error Correct Type I Error When you reject a null hypothesis when it is actually true. Denoted by alpha (α) -the level of significance of a test Type II Error When you fail to reject the null hypothesis when it is false Denoted by beta (β) Example 2 Revisited: A company is willing to renew its advertising contract with a local radio station only if the station can prove that more than 20% of the residents of the city have heard the ad and recognize the company’s product. The radio station conducts a random sample of 400 people and finds that 90 have heard the ad and recognize the product. Is this sufficient evidence for the company to renew its contract? Assumptions: •Have an SRS of people •np = 400(.2) = 80 & n(1-p) = 400(.8) = 320 - Since both are greater than 10, this distribution is approximately normal. •Population of people is at least 4000. Use the parameter in the null hypothesis to check assumptions! H0: p = .2 where p is the true proportion of people who Ha: p > .2 z .225 .2 .2(.8) 400 heard the ad 1.25 p value .1056 α .05 Use the parameter in the null hypothesis to calculate standard deviation! Since the p-value >α, I fail to reject the null hypothesis. There is not sufficient evidence to suggest that the true proportion of people who heard the ad is greater than .2. What type of error could the radio station have made? Type I OR Type II Two-Sample Proportions Inference Sampling Distributions for the difference in proportions When tossing pennies, the probability of the coin landing on heads is 0.5. However, when spinning the coin, the probability of the coin landing on heads is 0.4. Let’s investigate. Looking at the sampling distribution of the difference in sample proportions: •What is the mean of the difference in sample proportions (flip spin)? 0.1 pˆf pˆs •What is the standard deviation of the difference in sample proportions (flip - spin)? 0.14 pˆf pˆs •Can the sampling distribution of difference in sample proportions (flip - spin) be approximated by a normal distribution? Yes, since n1p1=12.5, n1(1-p1)=12.5, n2p2=10, n2(1p2)=15 –so all are at least 5) Assumptions: • Two, independent SRS’s from populations • Populations at least 10n • Normal approximation for both n1 p1 5 n1 1 p1 5 n2 p2 5 n2 1 p2 5 Formula for confidence interval: CI statistic critical value SD of statistic pˆ pˆ Margin of 1 error! 2 z* Standard error! pˆ1 1 pˆ1 pˆ2 1 pˆ2 n1 n2 Note: use p-hat when p is not known Example 1: At Community Hospital, the burn center is Since n1with p1=259, n1(1-p n2p2=94, ntreatment. 1)=57, compress 2(1experimenting a new plasma p2)=325 and all > 5, then the distribution of A random sample of 316 patients with minor burns difference in proportions is approximately received the plasma compress normal.treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. .82(.18) .22(.78) For this group, S .E . it was found that 94 had no visible scars after treatment. What is316 the shape &419 standard error of the sampling distribution 0.0296 of the difference in the proportions of people with visible scars between the two groups? Example 1: At Community Hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is a 95% confidence interval of the difference in proportion of people who had no visible scars between the plasma compress treatment & control group? Assumptions: Since these are all burn patients, we can add 316 + 419 = •Have 2 independent SRS of burn patients 735. If not the same – you MUST list separately. •Both distributions are approximately normal since n1p1=259, n1(1-p1)=57, n2p2=94, n2(1-p2)=325 and all > 5 •Population of burn patients is at least 7350. p1 1 p1 p2 1 p2 pˆ1 pˆ2 z * n1 n2 .82.18 .22.78 .82 .22 1.96 .537, .654 316 419 We are 95% confident that the true difference in the proportion of people who had no visible scars between the plasma compress treatment & control group is between 53.7% and 65.4% Example 2: Suppose that researchers want to estimate the difference in proportions of people who are against the death penalty in Texas & in California. If the two sample Since both n’s are the same size, you sizes are same, what size– sample havethe common denominators so add! is needed to be within 2% of the true difference at 90% confidence? .5(.5) .5(.5) .02 1.645 n n .25 .25 .02 1.645 n n = 3383 ExampleSO 3: –Researchers comparing the effectiveness of two pain which is correct? medications randomly selected a group of patients who had been complaining of a certain kind of joint pain. They randomly = (.67, .83) divided these people into CI twoA groups, and then administered the CIB =(.52, .70) who received painkillers. Of the 112 people in the group Since overlap, it was appears that there medication A,the 84 intervals said this pain reliever effective. Of the 108 is no proportion of people people in the difference other group,in66the reported that pain relieverwho B was reported relieve between the two medicines. effective. (BVD, pain p. 435) a) Construct separate 95% confidence intervals for the proportion of people who reported that the pain reliever was effective. Based CIdo= the (0.017, 0.261) of people who reported on these intervals how proportions Since zero is not inAthe is a pain relieve with medication or interval, medicationthere B compare? difference the proportion who in the b) Construct a 95% in confidence intervalof forpeople the difference reported between the two effective. proportions of peoplepain whorelieve may find these medications medicines. Hypothesis statements: • H0: p1 = p2 • Ha: p1 > p2 • Ha: p1 < p2 • Ha: p1 ≠ p2 Be sure to define both p1 & p2! Since we assume that the population proportions are equal in the null hypothesis, the variances are equal. Therefore, we pool variances! x1 x 2 the pˆ n1 n2 Formula for Hypothesis test: p1 = p2 statistic - parameter Test statistic So . . . SD of statistic p1 – p2 =0 z pˆ1 pˆ2 p1 p2 1 1 pˆ1 pˆ n1 n2 Example 4: A forest in Oregon has an infestation of spruce moths. In an effort to control the moth, one area has been regularly sprayed from airplanes. In this area, a random sample of 495 spruce trees showed that 81 had been killed by moths. A second nearby area receives no treatment. In this area, a random sample of 518 spruce trees showed that 92 had been killed by the moth. Do these data indicate that the proportion of spruce trees killed by the moth is different for these areas? Assumptions: •Have 2 independent SRS of spruce trees •Both distributions are approximately normal since n1p1=81, n1(1-p1)=414, n2p2=92, n2(1-p2)=426 and all > 5 •Population of spruce trees is at least 10,130. H0: p1=p2 where p1 is the true proportion of trees killed by moths Ha: p1≠p2 in the treated area p2 is the true proportion of trees killed by moths in the untreated area z pˆ1 pˆ2 .16 .18 0.59 1 1 1 1 p 1 p .17 .83 n1 n2 495 518 P-value = 0.5547 a = 0.05 Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that the proportion of spruce trees killed by the moth is different for these areas