Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hypothesis Testing Applied to population parameters by specifying H0 that contains a null value for the population parameter—a value that would indicate a baseline, or that nothing of interest is happening: ―old news‖, ―no difference‖, etc. Based on a point estimate (sample statistic), and assessing how unlikely to obtain this sample statistic if the null parameter value were correct. Example According to MA 115 B1 Intro Survey, 52 out of 108 students reported feeling stressed at the beginning of the semester. What is the approximate probability to obtain the sample proportion of 48% or lower, if the true proportion of all students who feel stressed during the semester is 85%? Hypothesis Testing Achieving statistical significance is equivalent to rejecting the idea that the observed results are plausible if the null value is correct, i.e., rejecting the null hypothesis (H0) in a favor of alternative hypothesis (Ha). Ha does not specify any specific value for the true population parameter. Ha gives an open interval that may contain possible values of the true parameter, but never contains the null value. Hypothesis Testing H0: μ=μ0(p=p0) Ha: μ>μ0(p>p0) upper-sided Ha: μ≠μ0(p≠p0) two-sided Ha: μ<μ0(p<p0) lower-sided 5 Basic Steps in Any Hypothesis Test Step 1: Determine hypotheses (H0 and Ha). Step 2: Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic. Step 3: Assuming the null (H0) hypothesis is true, find either Rejection Rule (region or the p-value). Step 4: Decide whether or not the result is statistically significant based on Rejection Rule. Step 5: Report the conclusion in the context of the problem (question of interest). One Sample Hypothesis Test for Population Mean Test Scenario Data Populati 1 on Mean Sample Population Sample Parameter Statistics Response Explan atory Variable Numerical (Age, Price) ____ Can we claim that the average GPA of all BU graduates is higher than 3.0? Example: Testing Mean Systolic Blood Pressure A large, national study conducted in 2003 reported that the mean systolic blood pressure for males aged 50 was 130 with std =15. 2004, an investigator hypothesized that due to increased stress in the work-place, faster-paced lifestyles, and poorer nutritional habits, average systolic blood pressure have increased. Example: Testing Mean Systolic Blood Pressure In 2003, systolic blood pressure for males aged 50 had mean=130 with std =15. Have average systolic blood pressure increased in 2004, on average? Step 1: H0: μ = 130 (―no change‖) Ha:μ>130 (i.e., mean blood pressure increased) Example: Testing Mean Systolic Blood Pressure Have average systolic blood pressure increased in 2004, on average? (2003, mean=130 with std =15) Step 2: Select a random sample from population of interest (n = 108 males aged 50 in 2004) Record the systolic blood pressure on each male Generate a point estimate for the population mean μ Compute an appropriate test statistic Example: Testing Mean Systolic Blood Pressure Have average systolic blood pressure increased in 2004, on average? (n=108, mean=130 , std =15) Step 2: Consider the following cases: (H0:μ =130) 1. X 2. X 150 130 3. X 135 130 130 P( X 135 | H 0 _ true) P( X 135 | 130) P( X 150 | H 0 _ true) P( X 150 | 150) Critical Value If the sample mean is less than the critical value, we will conclude that H0 is true (e.g., μ = 130) If sample mean is greater than the critical value, we will conclude that Ha is true (e.g., μ >130) Test Statistic Instead of determining critical values for sample mean (specific to each application), we use the CLT to standardize and produce a z-score: Assuming H0 is true Z ~ N(0,1): Z is close to zero then H0 is most likely true. Z is large, then H1 is most likely true Test Statistic Have average systolic blood pressure increased in 2004, on average? (n=108, mean=130 , std =15) Step 2: Compute an appropriate test statistic Appropriate Test Statistics Decision/Rejection Rule Step 3: Assuming H0 is true, how likely is Using CLT: Similarly: Rejection Rule Step 3: Is set up a rule reject H0 if Z> 1, α = P(Type I error) =0.1587 Is set up a rule reject H0 if Z> 2, α = P(Type I error) =0.0228 Better to fix α in advance! Rejection Rule (Table 5.4) Rejection Rule Say α=0.05, then we have to find such point A, such that P(Z≥Z1-α) =0.05, therefore Z1-α =1.645. Rejection Rule is given by: Reject Ho if Z≥ 1.645, and fail to reject H0 if Z< 1.645. Rejection Rule H0: μ=μ0(p=p0) Ha: μ>μ0(p>p0) upper-sided Reject H0 if Z≥ Z1-α (t≥ t1-α, df) Ha: μ≠μ0(p≠p0) two-sided Reject H0 if Z≥ Z1-α/2 (t≥ t1-α/2,df) or Z≤- Z1-α/2 (t≤- t1-α/2,df ) Ha: μ<μ0(p<p0) lower-sided Reject H0 if Z≤- Z1-α (t≤- t1-α,df) Decision Step 4: The final step in the test of hypothesis is to compare the test statistic to the decision rule to draw a conclusion. The test statistic falls in the rejection region and therefore we reject H0 because Test statistic (3.46) > critical value (1.645) Conclusion Step 5: Conclusion: Based on the sample of n=108 male, there is significant evidence, at level α = 0.05, to conclude that the mean systolic blood pressure for males aged 50 in 2004 has increased from 130. P-value Step 4: We rejected H0 because Test statistic (3.46) > critical value (1.645) Option 2: Compute p-value p-value is the probability of getting a test statistic as extreme or more extreme (in the direction of Ha) than the observed value of the test statistic, assuming the null hypothesis(H0) is true. P-value p-value = P(test stat is more extreme|H0 is true) From Step 1: H0: μ = 130 (―no change‖) Ha:μ>130 (i.e., mean blood pressure increased) From Step 3: Test Statistic p-value = P(Z≥ 3.46| H0 is true=> Z~N(0,1))= = 0.0001 P-value If the p-value ≤ α , then the result IS statistically significant, the decision is to reject H0. If the p-value >α , then the result IS NOT statistically significant, the decision is to fail to reject H0. p-value = P(Z≥ 3.46| H0 is true=> Z~N(0,1))= = 0.0001<0.05 , so reject H0 Example (Student Sleep Deprivation) A National Sleep Foundation survey found that college/university-aged students get an average of 6.8 hours of sleep each night. Sleep deprivation is common in college freshmen. At the beginning of the fall semester, based on a sample of n=108 students in MA 115, we estimated mean =7.21 hours with the std=1.12 hours. Based on the data, is there significant evidence (at α =5%) to conclude that students tend to sleep more at the beginning of school year? Example (Student Sleep Deprivation) A National Sleep Foundation survey found that college/university-aged students get an average of 6.8 hours of sleep each night. Based on the data (108 MA 115 students), is there significant evidence (at α =5%) to conclude that students tend to sleep more at the beginning of school year? Step 1: Parameter: μ = _____________________________ H0: Ha: Significance level α = ______ Example (Student Sleep Deprivation) A National Sleep Foundation survey found that college/university-aged students get an average of 6.8 hours of sleep each night. Based on the data (108 MA 115 students), is there significant evidence (at α =5%) to conclude that students tend to sleep more at the beginning of school year? Step 1: Parameter: μ = number of hours of sleep for students H0: μ = 6.8 Ha: μ > 6.8 Significance level α = 0.05 Example (Student Sleep Deprivation) H0: μ = 6.8 Ha: μ > 6.8 Significance level α = 0.05 At the beginning of the fall semester, based on a sample of n=108 students in MA 115, we estimated sample mean =7.21 hours with the s=1.12 hours. Step 2: Compute appropriate test statistic Appropriate Test Statistics Example (Student Sleep Deprivation) H0: μ = 6.8 Ha: μ > 6.8 Significance level α = 0.05 At the beginning of the fall semester, based on a sample of n=108 students in MA 115, we estimated sample mean =7.21 hours with the s=1.12 hours. Step 2: Compute appropriate test statistic Example (Student Sleep Deprivation) Step 3: Assuming the null (H0) hypothesis is true, define decision rule (or rejection region) The decision rule for any hypothesis testing application depends on three factors: (1) Whether the test is an upper-, lower-, or two-tailed test (2) The level of significance: (3) The form of the test statistic: Rejection Rule H0: μ = 6.8 Ha: μ > 6.8 Significance level α = 0.05 H0: μ=μ0(p=p0) Ha: μ>μ0(p>p0) upper-sided Reject H0 if Z≥ Z1-α (t≥ t1-α, df) Ha: μ≠μ0(p≠p0) two-sided Reject H0 if Z≥ Z1-α/2 (t≥ t1-α/2,df) or Z≤- Z1-α/2 (t≤- t1-α/2,df ) Ha: μ<μ0(p<p0) lower-sided Reject H0 if Z≤- Z1-α (t≤- t1-α,df) Example (Student Sleep Deprivation) Step 3: Assuming the null (H0) hypothesis is true, define decision rule (or rejection region) The decision rule for any hypothesis testing application depends on three factors: (1) Whether the test is an upper-, lower-, or two-tailed test (2) The level of significance: α =0.05 (3) The form of the test statistic: Z statistic. The decision rule: Reject H0 if Z ≥ 1.645 Do not reject Ho if Z < 1.645 Example (Student Sleep Deprivation) A National Sleep Foundation survey found that college/university-aged students get an average of 6.8 hours of sleep each night. Based on the data, is there significant evidence (at α =5%) to conclude that students tend to sleep more at the beginning of school year? Step 4: We reject H0 (μ = 6.8 ) because 3.79 > 1.645. Example (Student Sleep Deprivation) A National Sleep Foundation survey found that college/university-aged students get an average of 6.8 hours of sleep each night. Based on the data, is there significant evidence (at α =5%) to conclude that students tend to sleep more at the beginning of school year? Step 5: Based on the sample of n=108 students, there is significant evidence, at level α = 0.05, to conclude that students tend to sleep more at the beginning of school year, on average. Facts: In 1997 the University of Minnesota found that students who went to school at 7:15 a.m. got higher grades than those who went to school at 8:40 a.m. Randy Gardner holds the scientifically documented record for not sleeping for 264 hours ~eleven days without using stimulants of any kind Never scientifically verified: Thai Ngoc, born 1942, claimed in 2006 to have been awake for 33 years or 11,700 nights. Facts: Caffeine is often used over short periods to increase alertness and counteract the effects of sleep deprivation; however, caffeine is less effective if taken routinely. Other strategies recommended by the American Academy of Sleep Medicine include prophylactic sleep, daytime naps, increase in night sleep time. Types of Errors Truth H0 True Ha True Decision Type of Error Probability Reject H0 TYPE I Error P(TYPE I)=α Fail to Reject H0 NO Error 1-α Reject H0 NO Error power=1-β Fail to Reject H0 TYPE II Error P(TYPE II)=β