Download Hypothesis testing 101 What is a hypothesis? A hypothesis an idea

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Eigenstate thermalization hypothesis wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
What is a hypothesis?
A hypothesis an idea or conjecture that we want to test to see if it is true or not. In the case of statistics
it is usually related to a population parameter. For example we may have several ideas about a
population parameter such as the:
The average commute time is 40 minutes
The average life of the light bulb is at least 2000 hours
The average salary for men is greater than that of women
The average price for a gallon of gas in NY is equal to that in NJ
These are all ideas about the population parameters (the entire group) that we are looking to study
based on data from samples (a few observations). We cannot prove or disprove a hypothesis with
100% certainty only make conclusions based on the data we have.
What is the null hypothesis?
The null hypothesis which is written as 𝐻0 states that a parameter is equal to a value or equal to another
parameter. Examples include
The mean commute time is equal to 40 minutes
The mean life of a set of tires is 60,000 miles
The mean salary for men is the same as the mean salary for women.
Notice they all deal with equality.
What is the alternative hypothesis?
The alternative hypothesis which is written as 𝐻1 states that a parameter is not equal to the value
provided or to another population parameter. This can be given in the general sense (not equal or β‰ ) or
the more specific sense (less than < or greater than >). Some examples include
The mean commute time is not equal to 40 minutes
The mean life of a set of tires is less than 60,000 miles
The mean salary for men is greater than the mean salary for women.
Notice they all deal with inequality in some way.
What is the difference between a one tailed and a two tailed test?
A two tailed test is used when the alternative hypothesis deals with the not equal to case. The one
tailed test deals specifically with the greater than or less than alternative hypothesis.
What is the difference between a left tailed and a right tailed test?
Both left and right tailed tests are forms of the one tailed hypothesis test. The left tailed deals with the
alternative hypothesis of less than, the right tailed deals with the alternative hypothesis of greater than.
What are some common words that are used when forming the alternative hypothesis?
Two tailed Hypothesis (β‰ )
Is not equal to
Is different from
Is not the same as
Is not the same as
Left Tailed (<)
Is less than
Is below
Is lower than
Is smaller than
Is decreased/reduced from
Right Tailed
Is greater than
Is above
Is higher than
Is bigger than
Is increased
What does a hypothesis test do?
A hypothesis test cannot prove or disprove the null hypothesis. We can only determine that based on
the data we have if there is enough evidence to reject the null hypothesis or not. Based on our decision
to reject or not we interpret what that means in terms of the specific hypotheses we have.
How do we decide to reject the null hypothesis or not?
This is based on a comparison between the tests statistic and a critical value. The test statistic is
calculate based on what type of test we are conducting (z, t, chi square, etc) and the critical value is
obtained from a reference table or from excel. The rules for rejection of the null hypothesis are based
on the type of test we are doing. The following table provides a guideline. CV stands for critical value, TS
stands for test statistic
Type of test
Two tailed
Left Tailed
Right Tailed
Reject 𝐻0 if:
TS>+CV or TS<-CV
TS<-CV
TS> +CV
What is the alpha level?
The alpha level or Ξ± indicates the probability of making a type 1 error. A type I error occurs when you
Reject the null hypothesis but it was in fact true. Obviously we don’t want to make errors so we want to
keep this probability low. Common levels include Ξ±=0.05, 0.10 or 0.01 or a 5%, 10%, or 1% chance of
making this error respectively.
How do I type out the hypotheses?
Typing out a hypothesis statement in equation form involves using symbols in Microsoft word. In order
to type out a hypothesis we want to insert an equation
Steps:
1. Go to the insert tab and choose equation
2. Once you click on equation you want to choose the script option
3. Then we want to choose the second option in the first row, this creates a template to put our
subscript in. You want to type H in the large box and either a 0 or 1 in the little box depending
on the hypothesis you are working on
4. You can find other symbols such as µ,β‰  etc in the symbol region of the equation box. If you want
a symbol you can’t find simply click on the down arrow to see more symbols
5. Your finished hypothesis should look something like this:
𝐻0 : πœ‡ = 5.0
HINT: To do the alternative you can copy and paste the null and click on the equation to change
the = to the appropriate symbol
If you want to change the location of the equation click on it, click the arrow that appears on the
right then choose β€œchange to inline”
What should be included in my hypothesis statement?
You should include the Null and alternative hypothesis as well as the alpha level. An initial hypothesis
statement might look something like this:
π»π‘œ : πœ‡ = 125
𝐻1 : πœ‡ β‰  125
Ξ±=0.05
How do I know if I should use a z- test or a t-test?
If you know the population standard deviation you should use the z-test. If you do not know the
population standard deviation but instead know the standard deviation of the sample you should use
the t-test.
How do I find the critical value?
-For a z-test
These are the critical values used for common alpha values:
Type of test
Left Tailed
Right Tailed
Two Tailed
Alpha
0.10
0.05
0.01
0.10
0.05
0.01
0.01
0.05
0.01
Critical value
-1.28
-1.65
-2.33
1.28
1.65
2.33
±1.65
±1.96
±2.58
For a t-test
First we must determine the degrees of freedom, this is done by taking the number of data points we
have and subtracting one
For example if my survey consisted of 22 data points my degrees of freedom would be 22-1=21
Excel will calculate critical values for a two tailed test
-Insert a function
-Select the function TINV
In the probability box place your alpha value
***If you are doing a one tailed test you need to compensate by doubling alpha first***
You can see the critical value appear in the box or it will appear in the cell you selected after you push
okay
*** Some notes about critical value calculation********
If you are doing a left tailed test you must first double alpha, this value goes in for your probability.
The critical value will then be the negative equivalent of the value provided.
If you are doing a right tailed test be sure to double alpha f before entering it into the probability
field.
If you are doing a two tailed test there are two critical values, one is the value provided the other is
the negative equivalent.
How do I calculate my test statistic?
The general formula for the test statistic is
Test value=Observed value-expected value
standard error
The exact formulas depend on if you are using a t test or a z test
Zο€½
X 
/ n
tο€½
X 
s/ n
X is the sample mean
µ is the hypothesized population mean (value used in the hypothesis)
Οƒ is the population standard deviation
s is the sample standard deviation
n is the number of data points/sample size
Where do I get all the information in the formulas from?
If you have raw data all of the information except for the hypothesized value of the population mean
can be calculated in excel. To do this use the data analysis tool pack to calculate descriptive statistics.
Click on the data analysis tool located in the data tab (it is assumed you have already installed the data
analysis tool pack). Then choose the descriptive statistics option
You then want to highlight your data in the input range. Be sure to click the box marked summary
statistics
This will place your summary statistics in a new workbook. Your output should look something like this.
It includes the sample mean, standard deviation and sample size
Column1
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Or just the
important stuff:
Cost
Mean
Standard
Deviation
Count
131.3333
21.39096
90
85
98.02568
9609.033
5.596243
2.229339
419
40
459
2758
21
131.3333333
98.02567691
21
These values go into the formula along with the hypothesized population mean (value from your null
hypothesis) .
How do I know if I reject my null hypothesis or not?
Now that you have the critical value and the test statistic it is time to make a decision. Use this table to
determine if you should reject the null hypothesis or not
Type of test
Two tailed
Left Tailed
Right Tailed
What does that mean?
Reject 𝐻0 if:
TS>+CV or TS<-CV
TS<-CV
TS> +CV
If you reject your null hypothesis there is enough evidence to conclude the alternative hypothesis is
true. If you do not reject the null hypothesis there is not enough evidence to support that it is false.
How do I word my explanation for my decision?
If you are rejecting the null hypothesis:
Since the test statistic __fill in the value_________is __fill in larger or smaller than based on the type of
test_________ than the critical value of ______fill in the value_____ we reject the null hypothesis.
If we fail to reject the null hypothesis
Since the test statistic __fill in the value_________is __fill in larger or smaller than based on the type of
test_________ than the critical value of ______fill in the value_____ we fail to reject the null
hypothesis.
How do I word the explanation as to what this all means?
If we are rejecting the null hypothesis:
There is enough evidence to reject the null hypothesis of fill in what the null hypothesis is and conclude
fill in what the alternative hypothesis is.
If we are failing to reject the null hypothesis
There is not enough evidence to reject the null hypothesis that fill in what the null hypothesis is.
Can you give me an example to show how it all works?
Sure I can! The data below gives the total cost for 21 animals who visited the vet in one day.
Cost
$125
$85
$109
$99
$250
$150
$212
$73
$90
$459
$153
$90
$85
$74
$48
$67
$109
$88
$40
$72
$280
The vet would like to claim that the average vet visit cost $125.00. We will do a two tailed test to test if
it is not equal to $125.
π»π‘œ : πœ‡ = 125
𝐻1 : µ β‰  125
Ξ±=0.05
Critical Values: 2.086 and -2.086
(20 degrees of freedom for a two tailed test)
Test statistic (see excel calculations above for descriptive statistics)
𝑑=
131.33 βˆ’ 125
98.03/√21
=
6.33
= 0.295
21.39
Since the test statistic of 0.295 is less than the critical value of 2.086 we fail to reject the null
hypothesis.
Based on our data there is not enough evidence to reject the null hypothesis that the mean cost
of a vet visit is $125.00