Download Presentation slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Tests of significance
Confidence intervals are used when the goal of our analysis is to estimate an
unknown parameter in the population.
A second goal of a statistical analysis is to verify some claim about the process
on the basis of the data.
A test of significance is a procedure to assess the truth about a hypothesis using
the observed data. The results of the test are expressed in terms of a probability
that measures how well the data support the hypothesis.
Example: How much does a dial-up connection cost?
A consumer organization claimed that people paid $20 per month for Internet
access. A study was conducted to check the truthfulness of this claim. Data
from 50 users of commercial Internet service providers were collected in
August 2000.
These are the summary statistics computed in Excel – Data analysis toolpak
Internet Fees
Mean
Standard Error
Median
Standard Deviation
Minimum
Maximum
Sum
Count
Confidence Level(95.0%)
20.900
1.081
20.000
7.646
8.000
50.000
1045.000
50.000
2.173
Margin of error
Thus the 95% C.I. Is
20.9 ± 2.173
Statistical test
The researcher would like to show that the unknown average (internet
fees) is not equal to a specific value  0 ($20)
Hypothesis testing for one sample problem consists of three parts:
I) Null hypothesis (Ho): The unknown average    0
II) Alternative hypothesis (Ha): The unknown average is not equal to  0
This is the hypothesis that the researcher would like to validate and it is
chosen before collecting the data!
There are three possible alternative hypotheses – choose only one!
1.
Ha:
  0
 one-sided test
2.
Ha:
  0
 two-sided test
3.
Ha:
  0
 one-sided test
III) The sample statistic computed from the data
sample mean  null value x  0
t* 

st.error
s/ n
If the null hypothesis Ho is true (the sample mean is equal to the null
value  0 ) then the t statistic is approximately t-distributed with n-1
degrees of freedom, whenever the distribution of the data is symmetric.
If n is large (n>30) then the t statistic is approximately standard normal
under the null hypothesis Ho.
We can use the distribution of the t statistic to measure how well the null
hypothesis explains the observed value t*!
Example
The researcher wants to test if people paid on average a fee different from
$20.
Ho:  = 20 null hypothesis
Ha:   20  alternative hypothesis
The test statistic is computed as
t* 
x  0 20.90  20

 0.832
s / n 7.646 / 50
How likely is the observed value of t* if the null hypothesis were true?
probability = TDIST(0.832, 49, 2) = 0.409131
The answer to the question is given by computing the test p-value.
The p-value is the probability that the test statistic can be more extreme
than its observed value.
Two-sided test:
If the alternative hypothesis is Ha:
  0
then the p-value is equal to the sum of the areas to the left of – t* and to the right
of t*.
-t*
t*
In Excel: p-value “=TDIST(ABS(t),df,2)” where df=n-1
One-sided test
If the alternative hypothesis is
1.
P-value
Ha:    0 then the p-value is equal to
the area on the right of t*
In Excel: if t*>0 p-value “=TDIST(t*, df, 1)” for df=n-1
if t*<0 p-value “=1-TDIST(ABS(t*),df,1)”
2.
Ha:    0 then the p-value is equal to
the area on the left of t*
t*
P-value
In Excel: if t*<0 p-value “=TDIST(ABS(t*), df, 1)” for df=n-1
if t*>0 p-value “=1-TDIST(t*,df,1)”
t*
These are called one-sided hypotheses, because they state that the true value is
larger (1) or smaller (2) than the hypothesized value in H0.
Significance levels
In common statistical terminology:
• If p-value < 0.05, then the null hypothesis is rejected at 5% significance
level and the test result is called “statistically significant”.
• If p-value <0.01, then the null hypothesis is rejected at 1% significance
level and the test result is called “highly significant”.
• If p-value>= 0.05 then we can’t reject the null hypothesis, and the test
result is “not significant”.
Notice that the significance levels are very popular for reporting the test
results. However, it is better practice to summarize the test results
reporting what test was used, the P-value and whether the test was
“statistically significant” or “highly significant”.
Making a test of significance
Follow these steps:
1. Set up the null hypothesis H0– the hypothesis you want to test.
2. Set up the alternative hypothesis Ha– what we accept if H0 is rejected
3. Compute the value t* of the test statistic.
4. Compute the observed significance level P. This is the probability, calculated
assuming that H0 is true, of getting a test statistic as extreme or more extreme
than the observed one in the direction of the alternative hypothesis.
5. State a conclusion. You could choose a significance level . If the P-value is
less than or equal to , you conclude that the null hypothesis can be rejected at
level , otherwise you conclude that the data do not provide enough evidence to
reject H0.
Warning! You can never prove a hypothesis. You can only show that its converse
is highly unlikely.
TDIST(x,df,tails)
x is the positive numeric value at which to evaluate the distribution.
df is an integer indicating the number of degrees of freedom.
Tails specifies the number of distribution tails to return:
If tails = 1, TDIST returns the one-tailed distribution p(T>x)
x
If tails = 2, TDIST returns the two-tailed distribution
P(T<-x)+P(T>x)=2P(T>x)
-x
x
Suppose that the researcher wanted to test if people paid a fee different
than 20 dollars for Internet dial-up service.
The test hypotheses are:
H o :   20
H a :   20
The test statistic is the same:
t* 
x  0 20.90  20

 0.832
s / n 7.646 / 50
The p-value is the probability P(T<t*)
Ha:mu <20
p-value
0.590555
Decision Do not reject Ho
=1-TDIST(0.832, 49, 2)
=IF(p<0.05,”Reject Ho”,”Do not reject Ho”)
What would a 1-tailed test look like? How can we achieve a lower p-value?
The IF function
IF(logical_test,value_if_true,value_if_false)
Logical_test is any value or expression that can be evaluated to TRUE or
FALSE.
Value_if_true is the value that is returned if logical_test is TRUE.
Value_if_false is the value that is returned if logical_test is FALSE.
For example:
IF(A10<=100,"Within budget","Over budget")
Returns “within budget” if cell A10 is less or equal to 100, otherwise the
function displays "Over budget".
In the example,
IF(A10=100,SUM(B5:B15),““)
if the value in cell A10 =100, then logical_test is TRUE
 the total value for the range B5:B15 is calculated.
Otherwise, logical_test is FALSE
 empty text (“”) is returned (equivalent to a blank cell).