Download Sampling:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
MODULEV
TESTING OF HYPOTHESIS
HYPOTHESIS
Whenever we have a decision to make about a population characteristic, we make a hypothesis.
Some examples are:
 > 3
or 
5.
Suppose that we want to test the hypothesis that  5. Then we can think of our opponent
suggesting that  = 5. We call the opponent's hypothesis the null hypothesis and write:
H0:  = 5
And our hypothesis the alternative hypothesis and write
H1: 
5
For the null hypothesis we always use equality, since we are comparing  with a previously
determined mean.
For the alternative hypothesis, we have the choices: < , > , or
.
Procedures in Hypothesis Testing
When we test a hypothesis we proceed as follows:
1. Formulate the null and alternative hypothesis.
2. Choose a level of significance.
3. Determine the sample size. (Same as confidence intervals)
4. Collect data.
5. Calculate z (or t) score.
6. Utilize the table to determine if the z score falls within the acceptance region.
7. Decide to
a. Reject the null hypothesis and therefore accept the alternative
hypothesis or
b. Fail to reject the null hypothesis and therefore state that there is not
enough evidence to suggest the truth of the alternative hypothesis.
Errors in Hypothesis Tests
We define a type I error as the event of rejecting the null hypothesis when the null hypothesis
was true. The probability of a type I error () is called the significance level.
We define a type II error (with probability ) as the event of failing to reject the null
hypothesis when the null hypothesis was false.
Hypothesis Testing For a Population Mean
The Idea of Hypothesis Testing
Suppose we want to show that only children have an average higher cholesterol level than the
national average. It is known that the mean cholesterol level for all Americans is 190.
Construct the relevant hypothesis test:
H0:  = 190
H1:  > 190
We test 100 only children and find that
x = 198
and suppose we know the population standard deviation
 = 15.
Do we have evidence to suggest that only children have an average higher cholesterol level
than the national average? We have
z is called the test statistic.
Since z is so high, the probability that Ho is true is so small that we decide to reject H0 and
accept H1. Therefore, we can conclude that only children have a higher cholesterol level on the
average then the national average.
Rejection Regions
Suppose that  = .05. We can draw the appropriate picture and find the z score for -.025 and
.025. We call the outside regions the rejection regions.
We call the blue areas the rejection region since if the value of z falls in these regions, we can
say that the null hypothesis is very unlikely so we can reject the null hypothesis
Example
50 smokers were questioned about the number of hours they sleep each day. We want to test
the hypothesis that the smokers need less sleep than the general public which needs an average
of 7.7 hours of sleep. We follow the steps below.
A. Compute a rejection region for a significance level of .05.
B. If the sample mean is 7.5 and the population standard deviation is 0.5,
what can you conclude?
Solution
First, we write down the null and alternative hypotheses
H0:  = 7.7
H1:  < 7.7
This is a left tailed test. The z-score that corresponds to .05 is -1.645. The critical region is the
area that lies to the left of -1.645. If the z-value is less than -1.645 there we will reject the null
hypothesis and accept the alternative hypothesis. If it is greater than -1.645, we will fail to
reject the null hypothesis and say that the test was not statistically significant.
We have
Since -2.83 is to the left of -1.645, it is in the critical region. Hence we reject the null
hypothesis and accept the alternative hypothesis. We can conclude that smokers need less
sleep.
p-values
There is another way to interpret the test statistic. In hypothesis testing, we make a yes or no
decision without discussing borderline cases. For example with  = .06, a two tailed test will
indicate rejection of H0 for a test statistic of z = 2 or for z = 6, but z = 6 is much stronger
evidence than z = 2. To show this difference we write the p-value which is the lowest
significance level such that we will still reject Ho. For a two tailed test, we use twice the table
value to find p, and for a one tailed test, we use the table value.
Example:
Suppose that we want to test the hypothesis with a significance level of .05 that the climate has
changed since industrialization. Suppose that the mean temperature throughout history is 50
degrees. During the last 40 years, the mean temperature has been 51 degrees and suppose the
population standard deviation is 2 degrees. What can we conclude?
We have
H0:  = 50
H1: 
50
We compute the z score:
The table gives us .9992
so that
p = (1 - .9992)(2) = .002
since
.002 < .05
we can conclude that there has been a change in temperature.
Small p-values will result in a rejection of H0 and large p-values will result in failing to reject
H0.
I. Testing of significance for single proportion
This test is used to find the significant difference between proportion of the sample and the
population.
If X is the number of successes in n independent trials with constant probability of success for
each trial,
E(x) = nP; V(x) = nPQ where Q = 1-P = probability of failure.
E(p) =E(x/n) = 1/n E(x) = p; E(p) =p
V(p) = PQ/n
S.E(p) = √PQ/n.
(p-P)
Test statistic Z =
√(PQ/n)
Problems
1. A coin is tossed 256 times and 132 heads are obtained. Would you conclude that the
coin is biased one?
H0 : The coin is unbiased
H1 : The coin is biased.
n=256
Number of success X = 132
p = proportion of successes in the sample = X/ n =132/256 = .5156
P = 0.5
Q= 1-P = 0.5
(p-P)
Test statistic Z =
√(PQ/n)
.5156-.5
=
√.5x.5x1/256
= .4992
Since | Z| < 1.96 we accept the hypothesis at 5% level of significance.
There is no reason to reject H0. Hence the coin is unbiased and H0 is accepted.
2. A sample of size of 600 persons selected at random from a large city shows that the
percentage of males in the samples is 53. It is believed that the ratio males to the total
population in the city is ½. Test whether the belief is confirmed by the observation.
H0 : The number of males to total population is ½.
H1 : P≠ ½.
p= .53
P= 0.5 ; n= 600
.53-.5
Z=
√(1/2x 1/2)/600
= 1.47
Since | Z| < 1.96 we accept the hypothesis at 5% level of significance.
That is the belief is accepted or confirmed.
II. Testing of significance for difference of proportions of success in
two samples:
Suppose n1 and n2 are sizes of two samples taken from two different populations. To test
the significance of the difference of the difference between the sample proportions p1 and p2 ,
we set up the test statistic
p1-p2
Z=
n1p1 + n2p2
where P =
√ PQ(1/n1+ 1/n2)
n1 +n2
And Q = 1-P
Problems
1.
In a rural area where no development was undertaken, 160 out of a sample of 250
farmers were indebted. In another area, where development work was in progress, 84
out of a sample of 150 farmers were indebted. Would you consider that the latter area is
enjoying greater prosperity as indebted by a lower percentage of indebted.
p1 = 160/250 = 0.64
p2 = 84/150 = 0.56
H0 : p1=p2
H1 : p1>p2
160+84
P=
250 +150
= 244/ 400 = 0.61 ; Q = 0.39
p1-p2
Z=
√ PQ(1/n1+ 1/n2)
0.64- 0.56
=
√ (.61)(.39)( 1/250 + 1/150)
= 1.589
| Z| < 1.645 (5% level)
H0 is accepted.
There is no difference between the levels of indebtedness of farmers in the two
areas.
2. Out of a sample of 1000 persons, 800 persons were found to be coffee drinkers.
Subsequently, the excise duty on coffee was increased. After the increase in excise duty of
coffee seeds, 800 people were found to take coffee out of a sample 1200. Test whether
there is any significant decrease in the consumption of coffee after the increase in excise
duty.
H0 : p1 = p2
H1 : p1 > p2
n1 = 100, n2 = 1200
p1 = 800/1000 = 0.8
p2 = 800/1200 = 0. 67
800 + 800
P=
1000+ 1200
= 8/11
Q = 3/11
.30 - .25
Z=
√ ( .3x.7/12 )+(.25 x .75)/9
= 2.55
Since | Z| > 1.96 the null hypothesis is rejected at 5% level of significance.
Test III: Test of significance for Single Mean
It is used to test whether the given sample of size n has been shown from a population with
mean µ.
X - µ
Z=
(σ/ √n)
Where σ is the standard deviation of the population.
X -µ
If σ is not known, we use the test statistic, Z =
(s/√n)
1. A test was given to a large group of boys who scored on the average 64.5 marks. The
same test was given to a group of 400 boys who scored an average of 62.5 marks with a
S.D 12.5 marks. Examine if the difference is significant.
H0: µ = 64.5
H1: µ ≠ 64.5
Here n = 400, s = 12.5,
x
= 62.5, µ = 64.5
62.5 -64.5
Z =
12.5/20
= -3.2
| Z| = 3.2 >1.96
Hence, we reject the null hypothesis. The difference is significant.
Test IV: Test of significance for Difference of Means
x1 - x2
The test statistic is Z =
√ σ 12 /n1 + σ22 /n2
Under H0: µ1= µ2, if the samples are drawn from the same population where
σ 1= σ 2 = σ
x1 - x2
Z=
σ √ 1 /n1 + 1 /n2
If σ 1, σ 2 are not known and σ 1≠ σ 2, test statistic is
x1 - x2
Z=
√ s 12 /n1 + s22 /n2
Eg: 1. The means of two large samples of sizes 2000 and 1000 are 68 and 67.5 gm
respectively. Can the sample be regarded as drawn from the same population of standard
deviation 2.25 gm.
n1 = 2000,
x1 = 68 gm
n2 = 1000, x2 = 67.5 gm
σ = 2.25 gm
H0 : µ1= µ2
H1 : µ1 ≠ µ2
x1 - x2
Z=
σ √ 1 /n1 + 1 /n2
68 – 67.5
=
= 5.74
2.25 x √1.5
|Z| >1.96 and |Z| >2.58
Reject H0 both at 1% and 5% level of significance.
Test of significance for small samples
When the sample is small(n<30), the sampling distribution in most cases may not be normal.
We will not be justified in estimating the population parameters as equal to the corresponding
sample values. Hence in the study of small sample, the test statistics will change.
Let x1,x2,x3,…….xn be a random sample of size n from a normal population with mean µ and
variance σ2. The student’s t test is defined in the statistics as
X
-µ
t =
S/ √n
Where X is sample mean, µ population mean, S population variance and n- sample size.
Assumptions for t-test
1. The parent population from which the samples are drawn is normal.
2. The sample observations are independent.
3. The population standard deviation σ is unknown.
Test I: t-test of significance for single mean
X
-µ
t =
S/ √n
or
X
-µ
t =
with degrees of freedom n-1
s/ √n-1
95% confidence limits are x + t 0.05 S/√n or x - t 0.05 S/√n.
99% confidence limits are x + t 0.01 S/√n or x - t 0.01 S/√n.
Problems
1. Sandal powder is packed into packets by a machine. A random sample of 12
packets is drawn and their weights(in kg) are found to be 0.49, 0.48, 0.47,0.48,0.49,
0.50, 0.51, 0.49, 0.48, 0.50, 0.51, 0.48. Test if the average packing can be taken as
0.5 Kg.
Solution
H0: µ =0.5
H1 : µ ≠ 0.5
∑x = 0.49+ 0.48+ 0.47+ 0.48+0.49+ 0.50+ 0.51+ 0.49+ 0.48+0.50+0.51+0.48
= 5.88
X = 5.88/12 = 0.49
s2 =( ∑x 2/n)- (X )2 = 0.24025 -0.2401 = 0.00015
s= 0.012
-µ
X
0.49- 0.5
| t| =
=
s/ √n-1
= 2.76
0.012/√11
d.f = n-1 = 11
t-value from table = 2.20
|t| > 2.20. Hence H0 is rejected. That is averaging packing cannot be taken to be 0.5 kg
2. A sample of 20 items has mean 42 units and S.D 5 units. Test the hypothesis that it
is a random sample from a normal population with mean 45 units.
H0 : µ = 45
H1: µ ≠ 45; n= 20, x
= 42;s=5
42-45
|t | =
= 2.61; d.f = 19; table value of t = 2.09
5/√19
Since |t| > 2.09, sample could not have come from this population.
Test II: t-test for difference of means
The following assumptions are made in using this test.
i)
ii)
iii)
Parent population from which the samples have been drawn are normally
distributed.
Population variances are equal and unknown.
The two samples are random and independent.
x1 - x2
Test statistic t =
√(n1s12 + n2s22)/( n1+ n2-2)[1/ n1 + 1/ n2]
Problem:
1.A group of 10 rats fed on diet A and another group of 8 rats fed on diet B, recorded the
following increase in weight(gms)
Diet A: 5,6,8,1,12,4,3,9,6,10
Diet B: 2,3,6,8, 10,1,2,8.
H0: Assume the difference is not significant
H1: µ1>µ2
x1 - x2
Test statistic t =
√(n1s12 + n2s22)/( n1+ n2-2)[1/ n1 + 1/ n2]
=
6.4- 5
√2.593125
= 0.869
d.f = 16
Table value at 5% level is 1.75
|t| < table value. The difference is not significant. Therefore we cannot conclude that diet A is
superior to diet B.
F-Test of significance
F-test is a test for the equality of population variances by using F-test of significance. It is
used to test whether two independent samples have been drawn from the normal populations
with the same variance or whether the two independent estimates of the population variance
are homogeneous or not.
If s12 and s22 are the variances of two samples of sizes n1 and n2 respectively, the
estimates of the population variance based on these samples are respectively
S12 = n1s12/n1-1 and S22 = n2s22/n2-1 . The quantities v1=n1-1 and v2=n2-1 are called the
degrees of freedom of these estimates.
S12
F=
and S12 > S22
S22
Problems
1. In one sample of 10 observations, the sum of the squares of the deviations of the sample
values from the sample mean was 120 and in another sample of 12 observations it was 314.
Test whether this difference is significant at 5% level of significance.
S12 = 120/9 = 13.33
S22 = 314/11 = 28.55
S22
F=
S12
=28.55/13.33 = 2.14 since
S22 > S12
The value of F at 5% level for v1= 11, v2 = 9 d.f is 3.11
Since F <F0.05 we accept H0.
3. Two random samples gave the following results.
Sample
size
Sample mean
Sum of squares of
Deviations from the mean
1
12
14
108
2
10
15
90
Test whether the samples came from the same population.
To test whether the two independent samples have been drawn from the same
norχmal population, we have to test
i)
the equality of population means and
ii)
the equality of population variances.
H0 : The two samples have been drawn from the same normal population
H1 : The two samples are drawn from different populations.
i) To test σ12 = σ22
S12 = 9.818; S22 = 10
F = 10/9.818 = 1.0185.
Tabulated F at 5% level for (9, 11) = 2.90
Calculated F = 1.0185
Hence, samples came from the populations of equal variance.
ii) To test µ1 = µ2
14-15
t=
= 0.7422696
√ 9.9 x 22/(10x12)
Calculated t value < table t value. Hence we accept the null hypothesis
Therefore we conclude that the two samples have been drawn from the same normal
population.
Chi- Square Test of goodness of fit
This is a powerful test for testing the significance of the discrepancy between the theory and
experiment. It helps us to find if the deviation of the experiment from theory is just by chance
or it is due to inadequacy of the theory to fit the observed data.
If Oi( i=1,2,3,….n.) is a set of observed frequencies and Ei(i = 1,2,3….n)is the corresponding
n
set of expected frequencies, then Chi-square = ∑ (Oi-Ei)2/Ei with the condition ∑Oi = ∑Ei
i=1
follows chi-square distribution with (n-1) degrees of freedom.
Eg:
1. The following figures show the distribution of digits in numbers chosen at random from a
telephone directory.
Digits:
0
1
2
3
4
Frequency: 1026 1107 997 966 1075
5
6
7
8
9
933 1107 972 964 853
Total
10000
Test whether the digits may be taken occur equally frequently in the directory.
H0 : Digits occur equally frequently
H1 : Digits are not equally frequent.
Expected frequency of each digit = 10000/10 = 1000
Chi-square = 1/1000[ 262 +107 2 + …………….+ (-147)2 ]
= 58.54
d.f = 10-1 = 9
Table value of chi-square = 16.919
Reject the null hypothesis.
Additional Questions
I. Explain the following terms:
1)
2)
3)
4)
Null Hypothesis
Alternative Hypothesis
Type I & Type II errors.
Level of Significance.
5) Acceptance Region & Rejection Region
II. Given that on the average 4% of insured men of age 65 die within a year and that of 60 of
a particular group of 1000 such men died within a year. Can this group be regarded as a
representative sample?
III. In a year there are 956 births in a town A of which 52.5% were males while in town A &
B combined this proportion in total of 1406 births was .496. Is there any significant difference
in the proportion of male births in the two towns?
IV. A sample of 100 items selected from a lot of 2000 items gives the average diameter of the
items as 0.354 with a s.d. of 0.048. Find the 95% confidence interval for the average of the lot.
V. Intelligent tests were given to two groups of boys & girls.
Girls:
Mean=75
S.D.=8
n 1=60
Boys:
Mean =73
S.D=10
n 2=100
Examine if the difference between mean scores is significant.