Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hypothesis Test Notes Two Population Tests We sometimes would like to know if one population is larger or smaller than another population. This is a two population hypothesis test. Label which group is population 1 and which is population 2!!! (It does not matter which group you pick to be population 1 or 2, but however you label it, make sure you put the data into StatCrunch in that order!) Key Question????? We are comparing two populations by looking at sample data. Remember, like any hypothesis test, we have to rule out sampling variability (random chance) to be able to reject the null hypothesis. Key Question: Why are my two samples different? Option 1: (Random Chance) The populations are the same, and the samples are different because all random samples are different. Option 2: (Populations are different) The samples are different because the populations are different. In a two population hypothesis test, to determine if populations are different, we first must rule out option 1 (random chance). How can we rule out random chance??? Important Note: You cannot just look at the two sample values. Remember sometimes a 10 pound difference is a lot and sometimes it is not a lot. Sometimes a 3% difference is a lot and sometimes it is not a lot. Test Statistic, P-value, or Simulation to the rescue!! We are able to rule out random chance when the samples are significantly different and the probability of that significant difference happening is very low. Large Test Statistic (T-stat or Z-stat close to +2 or higher or close to -2 or lower.) Low P-value (P-value is close to zero or less than the significance level) Simulate what samples would look like when the populations are the same. (If our sample difference is in the tail, then our sample difference is significant and the probability of that sample difference (P-value) or more extreme is very low.) Setting up your two population hypothesis test Step 1: Label which group is population 1 and which is population 2 and stick to it. For example: Population 1: women Population 2: men Step 2: Null and Alternative Hypothesis (There are various ways of writing the null and alternative hypothesis, they are all equally correct and you can use any of them) Example Claim: Mean average salary for women 1 is lower than the mean average salary of men 2 H 0 : 1 2 H A : 1 2 (claim) By subtracting 2 from both sides we get. Remember saying group 1 is lower than group 2 is the same as saying the difference (group 1 – group 2) is negative. H 0 : 1 2 0 H A : 1 2 < 0 (claim) If the data is matched pair (husband and wife or same person measured twice) then you will sometime see 1 2 written as d H 0 : d 0 H A : d < 0 (claim) Example Claim: The percentage of women p1 is higher than the percentage of men p2 H 0 : p1 p2 H A : p1 p2 (claim) By subtracting p2 from both sides we get. Remember saying group 1 is lower than group 2 is the same as saying the difference (group 1 – group 2) is negative. H 0 : p1 p2 0 H A : p1 p2 0 (claim) What does this mean? H 0 : d 0 H A : d 0 (claim) Think: Think: So H 0 : 1 2 0 H A : 1 2 0 (claim) What does that mean? H 0 : 1 2 H A : 1 2 (claim) H 0 : d 0 H A : d 0 (claim) means that the two populations are the same or different. Assumptions 2 population mean average (Check these twice) Random At least 30 or bell shaped (normal) Matched Pair or Independent? Remember matched pair is a one-to-one pairing (not just something in common) 2 population proportion (percentage) (Check these twice) Random At least 10 success At least 10 failures Two groups should be independent Test Statistics 1 population test statistic sentence: the number of standard errors that the sample value is above or below the population value. 2 population test statistic sentence: the number of standard errors that the sample value from group 1 is above or below the sample value from group 2. Formula for two population test statistic (Z or T) sample value 1 sample value 2 standard error Example: group 1: women , group 2: men Comparing the percentage of women to the percentage of men. Test Statistic Z = +2.48 Sample percentage from group 1 (women) is 2.48 standard errors above the sample percentage from group 2 (men). Example: group 1: Valencia High School , group 2: Saugus High School Compare the mean average SAT scores Test Statistic T = -1.06 Sample mean average for group 1 (Valencia) was 1.06 standard errors below the sample mean average for group 2 (Saugus). StatCrunch Directions (Alternate null and alternative with “zero” Two Population proportion (percentage) Stat => Proportion-Stats => Two Sample => with data or with summary Two Population mean average (Independent groups) Stat => T-Stats => Two Sample => with data or with summary Two Population mean average (matched pair with raw data) Stat => T-Stats => Paired => columns? Two Population mean average (matched pair with summary data d , sd , n ) Stat => T-Stats => 1 sample =>with summary => put in mean, standard deviation, sample size Pool or Not to Pool? (That is the question) 1. Pooling in 2 population proportion problems (categorical data) P-pooled is combining the # of successes and the sample sizes of your two groups into one large sample. p ( x1 x2 ) (n1 n2 ) Note: You are allowed to pool the two sample percentages if the population percentages are equal. In confidence intervals we do not know if the populations are the same or not. So for 2 population proportion confidence intervals: Do not pool. In two population proportion hypothesis tests, it is OK to Pool, because you are assuming the population percentages are the same in null hypothesis. (Some programs ask if you want to pool for two population proportion, but StatCrunch does this automatically. It automatically pools for the 2 population proportion hypothesis test standard error and automatically does not pool for confidence interval standard error. You will see a slight difference in the standard error for hypothesis test verses confidence interval.) 2. Pooling the variances in 2 population mean average problems. (Quantitative data) You should not pool the sample variances unless you are sure the population variances are equal. Since we rarely know the population variances, do not pool the variances in StatCrunch. Act 11 #1 (Matched Pair with summary data) Group 1: After ACT scores Group 2: Before ACT scores H A : 1 2 (claim) H 0 : 1 2 Note: Alternate way of writing null and alternative H A : 1 2 0 (claim) H 0 : 1 2 0 H A : d 0 (claim) H 0 : d 0 Two Population mean average (matched pair with summary data d , sd , n ) Stat => T-Stats => 1 sample =>with summary => put in mean, standard deviation, sample size T test statistic = +2.9166 Sample mean of after scores were 2.92 standard errors above the sample mean of the before scores. After scores are significantly higher than before scores (class is effective) P-value = 0.0044 If Ho is true, then there is a 0.0044 probability of getting the sample data (sample difference) or more extreme by random chance. (unlikely to happen by random chance, Ho must be wrong.) P-value (0.0044) < sig level (0.05) Reject Ho Conclusion: There is significant sample evidence to support the claim that the ACT prep class is effective. (After > Before) Act 12/#2 Population 1: Marijuana Population 2: Non-marijuana H A : p1 p 2 (claim) H 0 : p1 p 2 Note: in StatCrunch null and alternative H A : p1 p2 0 (claim) H 0 : p1 p 2 0 Z test statistic = 6.85 Percentage of group 1 (marijuana users) was 6.85 standard errors above the percentage of group 2 (non-marijuana users) Percent of marijuana users that use other drugs is significantly greater. P-value = 0 (< 0.0001) If Ho is true, there was 0 probability of getting the sample data (sample difference) or more extreme by random chance. (Did not happen by random chance. Population 1 significantly different than population 2) Ho is wrong. Reject Ho There is significant sample evidence to support the claim the percent of marijuana users that use illegal drugs is higher than the percent of non-marijuana users that use illegal drugs.