Download Hypothesis Test Notes Two Population Tests We sometimes would

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Hypothesis Test Notes
Two Population Tests
We sometimes would like to know if one population is larger or smaller than
another population. This is a two population hypothesis test.
Label which group is population 1 and which is population 2!!! (It does not matter
which group you pick to be population 1 or 2, but however you label it, make sure
you put the data into StatCrunch in that order!)
Key Question?????
We are comparing two populations by looking at sample data. Remember, like
any hypothesis test, we have to rule out sampling variability (random chance) to
be able to reject the null hypothesis.
Key Question: Why are my two samples different?
Option 1: (Random Chance) The populations are the same, and the samples are
different because all random samples are different.
Option 2: (Populations are different) The samples are different because the
populations are different.
In a two population hypothesis test, to determine if populations are different, we
first must rule out option 1 (random chance).
How can we rule out random chance???
Important Note: You cannot just look at the two sample values. Remember
sometimes a 10 pound difference is a lot and sometimes it is not a lot. Sometimes
a 3% difference is a lot and sometimes it is not a lot.
Test Statistic, P-value, or Simulation to the rescue!!
We are able to rule out random chance when the samples are significantly
different and the probability of that significant difference happening is very low.
 Large Test Statistic (T-stat or Z-stat close to +2 or higher or close to -2 or
lower.)
 Low P-value (P-value is close to zero or less than the significance level)
 Simulate what samples would look like when the populations are the same.
(If our sample difference is in the tail, then our sample difference is
significant and the probability of that sample difference (P-value) or more
extreme is very low.)
Setting up your two population hypothesis test
Step 1: Label which group is population 1 and which is population 2
and stick to it.
For example:
Population 1: women
Population 2: men
Step 2: Null and Alternative Hypothesis
(There are various ways of writing the null and alternative hypothesis, they are all
equally correct and you can use any of them)
Example Claim: Mean average salary for women  1  is lower than the mean
average salary of men  2 
H 0 : 1  2
H A : 1  2 (claim)
By subtracting 2 from both sides we get. Remember saying group 1 is lower
than group 2 is the same as saying the difference (group 1 – group 2) is negative.
H 0 : 1  2  0
H A : 1  2 < 0 (claim)
If the data is matched pair (husband and wife or same person measured twice)
then you will sometime see 1  2 written as d
H 0 : d  0
H A : d < 0 (claim)
Example Claim: The percentage of women  p1  is higher than the percentage of
men  p2 
H 0 : p1  p2
H A : p1  p2 (claim)
By subtracting p2 from both sides we get. Remember saying group 1 is lower
than group 2 is the same as saying the difference (group 1 – group 2) is negative.
H 0 : p1  p2  0
H A : p1  p2  0 (claim)
What does this mean?
H 0 : d  0
H A : d  0 (claim)
Think:
Think:
So
H 0 : 1  2  0
H A : 1  2  0 (claim)
What does that mean?
H 0 : 1  2
H A : 1  2 (claim)
H 0 : d  0
H A : d  0 (claim)
means that the two populations are the same or different.
Assumptions
2 population mean average (Check these twice)
 Random
 At least 30 or bell shaped (normal)
 Matched Pair or Independent? Remember matched pair is a one-to-one
pairing (not just something in common)
2 population proportion (percentage) (Check these twice)




Random
At least 10 success
At least 10 failures
Two groups should be independent
Test Statistics
1 population test statistic sentence: the number of standard errors that the
sample value is above or below the population value.
2 population test statistic sentence: the number of standard errors that the
sample value from group 1 is above or below the sample value from group 2.
Formula for two population test statistic (Z or T)
sample value 1  sample value 2
standard error
Example: group 1: women , group 2: men
Comparing the percentage of women to the percentage of men.
Test Statistic Z = +2.48
Sample percentage from group 1 (women) is 2.48 standard errors above the
sample percentage from group 2 (men).
Example: group 1: Valencia High School , group 2: Saugus High School
Compare the mean average SAT scores
Test Statistic T = -1.06
Sample mean average for group 1 (Valencia) was 1.06 standard errors below the
sample mean average for group 2 (Saugus).
StatCrunch Directions (Alternate null and alternative with “zero”
Two Population proportion (percentage)
Stat => Proportion-Stats => Two Sample => with data or with summary
Two Population mean average (Independent groups)
Stat => T-Stats => Two Sample => with data or with summary
Two Population mean average (matched pair with raw data)
Stat => T-Stats => Paired => columns?
Two Population mean average (matched pair with summary data d , sd , n )
Stat => T-Stats => 1 sample =>with summary =>
put in mean, standard deviation, sample size
Pool or Not to Pool? (That is the question)
1. Pooling in 2 population proportion problems (categorical data)
P-pooled is combining the # of successes and the sample sizes of your two groups
into one large sample. p 
( x1  x2 )
(n1  n2 )
Note: You are allowed to pool the two sample percentages if the population
percentages are equal.
 In confidence intervals we do not know if the populations are the same or
not. So for 2 population proportion confidence intervals: Do not pool.
 In two population proportion hypothesis tests, it is OK to Pool, because you
are assuming the population percentages are the same in null hypothesis.
(Some programs ask if you want to pool for two population proportion, but
StatCrunch does this automatically. It automatically pools for the 2 population
proportion hypothesis test standard error and automatically does not pool for
confidence interval standard error. You will see a slight difference in the standard
error for hypothesis test verses confidence interval.)
2. Pooling the variances in 2 population mean average problems.
(Quantitative data)
You should not pool the sample variances unless you are sure the population
variances are equal. Since we rarely know the population variances, do not pool
the variances in StatCrunch.
Act 11 #1 (Matched Pair with summary data)
Group 1: After ACT scores
Group 2: Before ACT scores
H A : 1  2 (claim)
H 0 : 1  2
Note: Alternate way of writing null and alternative
H A : 1  2  0 (claim)
H 0 : 1  2  0
H A : d  0 (claim)
H 0 : d  0
Two Population mean average (matched pair with summary data d , sd , n )
Stat => T-Stats => 1 sample =>with summary =>
put in mean, standard deviation, sample size
T test statistic = +2.9166
Sample mean of after scores were 2.92 standard errors above the sample mean of
the before scores.
After scores are significantly higher than before scores (class is effective)
P-value = 0.0044
If Ho is true, then there is a 0.0044 probability of getting the sample data (sample
difference) or more extreme by random chance.
(unlikely to happen by random chance, Ho must be wrong.)
P-value (0.0044) < sig level (0.05)
Reject Ho
Conclusion: There is significant sample evidence to support the claim that the
ACT prep class is effective. (After > Before)
Act 12/#2
Population 1: Marijuana
Population 2: Non-marijuana
H A : p1  p 2 (claim)
H 0 : p1  p 2
Note: in StatCrunch null and alternative
H A : p1  p2  0 (claim)
H 0 : p1  p 2  0
Z test statistic = 6.85
Percentage of group 1 (marijuana users) was 6.85 standard errors above the
percentage of group 2 (non-marijuana users)
Percent of marijuana users that use other drugs is significantly greater.
P-value = 0 (< 0.0001)
If Ho is true, there was 0 probability of getting the sample data (sample
difference) or more extreme by random chance.
(Did not happen by random chance. Population 1 significantly different than
population 2) Ho is wrong.
Reject Ho
There is significant sample evidence to support the claim the percent of marijuana
users that use illegal drugs is higher than the percent of non-marijuana users that
use illegal drugs.