Download File - Maths Web World

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
1
TEST OF HYPOTHESIS
large samples
In many circumstances, to arrive at decisions about the population on the basis of sample
information we make assumptions about the population parameters involved. Such an
assumption is called a statistical hypothesis which many or many not be true. The procedure
which enables us to decide on the basis of sample results whether a hypothesis is true or not, is
called Test of Hypothesis or Test of Significance.
Procedure for testing a hypothesis.
Test of Hypothesis involves he following steps:
1. Statement of hypothesis: There are two types of hypothesis (a)Null hypothesis
(b)Alternative hypothesis.
Null Hypothesis: for applying he tests of significance, we first set up a hypothesis – a
definite statement about the population parameter. Such a hypothesis is usually a
hypothesis of no-difference is called a Null Hypothesis. It is in the form H 0 :   0 .
A null hypothesis is the hypothesis which asserts that there is no significance difference
between the statistic and the population parameter and whatever observed differences is
there, is merely due to fluctuations in sampling from the same population.
Alternative Hypothesis: Any hypothesis which contradicts the Null Hypothesis is called
an Alternative Hypothesis. Usually it is denoted by H1 . The alternative hypothesis
would be (a) H1 :   0 (b) H1 :   0 (c) H1 :   0 . The alternative hypothesis
(a) is called a two-tailed alternative and the (b) is called the right tailed alternative (c) is
called the left tailed alternative.
2. Specification of the level of significance: The level of significance is denoted by 
and it is the confidence with which we rejects or accepts the Null hypothesis. It is usually
considered at 5% level.
t  E t 
3. Test statistic: computer the test statistic Z 
under the null hypothesis.
S .E  t 
4. Conclusion: we compare the computed value of test statistic Z with the critical value Z
at the given level o significance.
if Z  Z , i.e. if the absolute value of the calculated value of Z is less than the
critical value Z , we conclude that it is not significant. We accept the null
hypothesis. Otherwise we reject the null hypothesis.
2
CRITICAL VALUES OF Z
Level of
significance
Critical values for
two-tailed test
Critical values for
right-tailed test
Critical values for
left-tailed test
1%
5%
10%
Z =2.58
Z =1.96
Z =1.645
Z =2.33
Z =1.645
Z =1.28
Z =-2.33
Z =-1.645
Z =-1.28
Test of significance for large samples: suppose we wish to test the hypothesis that the
probability of success in such a trial is P. assuming it to be true, the mean  and the standard
deviation  o the sampling distribution of number of successes are np and
npq respectively.
If x be the observed number of successes in the sample and “Z” is the standard normal variate
x
then Z 

Test of significance of a single mean-Large samples:
i.
The null hypothesis: H0 : x   . I.e. there is no significance difference between the
sample mean and population mean or the sample has been drawn from the parent
population.
ii.
The Alternative Hypothesis: H1 : x      0  or H1 : x      0  or
H1 : x      0  . Since n is large, the sampling distribution of x is approximately
iii.
iv.
normal.
Level of Significance: set the level of significance  .
Case1 : when the standard deviation  of population is known. In this case, standard
Error of Mean , S .E.( x) 

n
, where n  sample size,   standard deviation o the
population. The test statistic is given by z 
x
where  is the population mean.
 n
Case2: when the standard deviation  of population is not known. In this case, we take
‘ s ’, the standard deviation o sample to compute the standard error of means it is given
by S .E ( x) 
s
x
. Hence the test statistic is given by z 
.
s n
n
v.
Find the critical value z of z at the level of significance  from the normal table
vi.
(a) if z  z , we accept the null hypothesis otherwise we reject the null hypothesis.
3
PROBLEMS:
1. According to the norms established for a mechanical aptitude test, persons who are 18
years old have an average height of 73.2 with a standard deviation of 8.6. if 4 randomly
selected persons of that age averaged 76.7, test the hypothesis that   73.2 against the
2.
3.
4.
5.
alternative hypothesis   73.2 at the 0.01 level o significance.
A sample of 64 students have a mean weight of 70kgs. Can this be regarded as a sample
from a population with mean weight 56kgs and standard deviation 25kgs.
A sample of 900 members has a mean of 3.4cms and S.D. 2.61 cms. Is this sample has
been taken from a large population of mean 3.25cm with S.D. 2.61 cms. If the
population is normal and its mean is unknown find the 95% confidence limits of true
mean.
A sample of 400 items is taken from a population whose standard deviation is 10. The
mean of the sample is 40. Test whether the sample has come from a population with
mean 38. Also calculate 95% confidence interval for the population.
An ambulance service claims that it takes on the average of less than 10 minutes to reach
its destination in emergency calls. A sample of 36 calls has a mean of 11 minutes and the
variance of 16 minutes. Test the claim at 0.05 level of significance.
TEST FOR EQUALITY OF TWO MENAS –LARGE SAMPLES:
(Test of significance for difference of means of two large samples)
Let x1 , x2 be the sample means of two independent large random samples sizes n1 , n2 drawn
from two populations having means 1 , 2 and the standard deviations 1 ,  2 . To test
whether the two population means are equal, let the null hypothesis is H 0 : 1  2 and the
alternative hypothesis is H1 : 1  2 .
S.E. of


x1  x2 
 12
n1

 22
n2
where 1 ,  2 are the standard deviations of two populations.
To test whether there is any significant difference between x1 , x2 we have to use the
x1  x2
following test statistic. z   2  2 .
1
 2
n1
n2
If the samples have been drawn from a population with common S.D. then
x1  x2
z

then
2 2

n1 n2
2 
If  is not known then
n1s12  n2 s22
n1  n2
1 2   22   ,
4
Problems:
1. A research investigator is interested in studying whether there is a significant
difference in the salaries of MBA grades in two metropolitan cities. A random
sample of size 100 from Mumbai yields on average income of 20150/- another
random sample of 60 from Chennai results in an average income 20250/-. If the
variances of the both populations are given as  12  40000,  22  32400 respectively.
2. The mean life time of a sample of 10 electric bulbs was found to be 1456 hours with
S.D. of 423 hours. A second sample of 17 bulbs chosen from a different batch
showed mean life of 1280 hours with S.D. of 398 hours. Is there a significant
difference between the means of two batches.
3. A company claims that its bulbs are superior to those of its main competitor. If a
study showed to a sample o 40 of its bulbs have a mean life time of 647 hours of
continuous use with S.D. 27 hours. While a sample of 40 bulbs made by its main
competitor had a mean life time of 638 hours of continuous use with S.D. of 31 hours.
Test the significance between the difference of two means at 5% level.
4. The nicotine in milligrams of two samples of tobacco were found to be as follows .
find the standard error and confidential limits for the difference between the means at
5% level.
Sample-A 24
27
26
23
25
Sample-B 29
30
30
31
24
36
TEST OF SIGNIFICANCE FOR SINGLE PROPORTION-LARGE SAMPLES:
Suppose a large sample of size n has a sample proportion p of members possessing a certain
attribute. To test the hypothesis that the proportion P has a specified value P0 .
pP
The test statistic z 
is approximately normally distributed . where p is the sample
PQ
n
proportion, P is the population proportion. Q  1  P .
pq
; q  1 p
n
(b) Confidence interval for proportion P for large sample at  level of significance is
PQ
PQ
p  z 2 .
 P  p  z 2 .
; where Q  1  P .
n
n
Problems:
1. A manufacturer claimed that at least 95% of the equipment which he supplied to a factory
conformed to specifications. An examination of a sample of 200 pieces f equipment
revealed that 18 were faulty. Test his claim at 5% level of significance.
2. In a big city 325 men out of 600 men were found to be smokers. Does this information
support the conclusion that the majority of men in this city are smokers?
3. A die was thrown 9000 times and of these 3220 yielded a 3 or 4. Is this consistent with
the hypothesis that the die was unbiased?
4. Among 900 people in a state 90 are found to be chapatti eaters. Construct 99%
confidence limits for the true proportion.
5. 20 people were attacked by a disease and only 18 survived. Will you reject the
hypothesis that the survival rate if attacked by this diseased is 85% in favour of the
hypothesis that is more at 5% level.
Note : (a) limits for population proportion P are given by p  3
5
TEST OF SIGNIFICANCE FOR TWO PROPORTIONS-LARGE SAMPLES:
Let p1 , p2 be the proportion in two large random samples of sizes n1 , n2 drawn from two
populations having proportions p1 , p2 . To test whether the two population proportions are equal,
The Null hypothesis H 0 : P1  P2 , The alternative hypothesis H1 : P1  P2 .
Assume that the null hypothesis is true, the test statistic is defined as
p1  p2
n p n p
x x
z
; p  1 1 2 2  1 2 and q  1  p is approximately normally
n1  n2
n1  n2
1 1 
pq   
 n1 n2 
distributed with mean 0 and standard deviation 1.
Problems:
1. Random samples of 400 men and 600 women were asked whether they would like to
have a flyover near their residence. 200 men and 325 women were in favour of the
proposal. Test the hypothesis that proportions of men and women in favour of the
proposal are same at 5% level.
2. On the basis of their total scores 200 candidates of a civil service examination are divided
in to two groups, the upper 30% and the remaining 70%. Consider the first question of
the examination. Among the first group, 40 had the correct answer, whereas among the
second group, 80 had the correct answer. On the basis of these results, can one conclude
that the first question is not good at discriminating ability of the type being examined
here?
3. In two large populations, there are 30% and 25% respectively of fair haired people. Is
this difference likely to be hidden in samples of 1200 and 900 respectively from the two
populations?
4. In a random sample of 1000 persons from town A, 400 are found to be consumers of
wheat. In a sample of 800 from town B, 400 are found to be consumers of wheat. Do
these data reveal a significant difference between town A and town B, so far as the
proportion of wheat consumers is concerned?
5. In a city A, 20% of a random sample of 900 school boys has a certain slight physical
defect. In another city B, 18.5% of a random sample of 1600 school boys has the same
defect. Is the difference between the proportions significant at 0.05 level of significance?
6
TEST OF HYPOTHESIS
SMALL samples
Degree of Freedom: it is a number which indicated how many of the values of a variable may be
independently chosen. In general the number of degrees of freedom is equal to the total number
of observations less the number of independent constraints imposed on the observations.
t-Distribution (OR) Students t-distribution: it is used for testing of hypothesis when the sample
size is small and population S.D.  is not known.
If x1 , x2 , x3 , x4 ,....., xn  be any random sample of size n drawn from a normal population with
mean  and variance  2 , then the test statistic ‘t’ is defined by t 
mean and S 2 

1 n
 xi  x
n  1 i 1

2
x
, where x = sample
S n
is an unbiased estimate of  2 . The test statistic t 
x
is a
S n
random variable having the t-distribution with v  n 1 degrees of freedom.
Students ‘t’ Test: Let x =sample mean, n =sample size,  =standard deviation of the sample,
 =mean of the population supposed to be normal. Then the students ‘t’ is defined by
t
x
.
s n 1

1 n
If s is the sample variance, s   xi  x
n i 1
2
2

2
Note: if the standard deviation of the sample is given directly, then the test statistic is given by
t
x
.
S .D n  1
If t0.05 is the table value of t for ( n  1) degrees of freedom at 5% level of significance, then
95% confidence limits for  are given by x  t0.05 .
are given by x  t0.01.
S
, similarly 99% confidence limits for 
n
S
n
Problems:
1. A sample of 26 bulbs gives a mean life of 990 hours with a S.D. of 20 hours. The
manufacturer claims that the mean life of bulbs is 1000 hours. Is the sample not u to the
standard?
7
2. A random sample of 6 steel beams has a mean compressive strength of 58,392 psi with a
standard deviation of 648 psi. use this information and the level of significance   0.05
to test whether the true average compressive strength of the steel from which this sample
came is 58,000 psi.
3. A random sample of 10 boys had the following I.Q’s:70,120,110,101,88,83,95,98,107
and 100. (a) do these data support the assumption of a population mean I.Q of 100
(b) find a reasonable range in which most of the mean I.Q values of samples of 10 boys
lie.
4. A random sample from a company’s very extensive files show that the orders for a
certain kind of a machinery were filled, respectively in 10,12,m19,14,15,18,11 and 13
days. Use the level of significance   0.01 to test the claim that on the average such
orders are filled in 10.5 days. Choose the alternative hypothesis so that rejection of null
hypothesis   10.5 days implies that it takes longer than indicated.
Students ‘t’ Test for difference of means: let x, y be the means of two independent samples of
sizes n1 , n2 respectively drawn from two normal populations having means 1 , 2 . To test
whether the two populations means are equal, let the null hypothesis is H 0 : 1  2 , against the
alternative hypothesis H1 : 1  2 .
If 1   2   , then an unbiased estimate S 2 of the common variance  2 is given by
S2 
n1s12  n2 s2 2
, where s12 , s2 2 are the two sample variances.
n1  n2  2
The test statistic is given by t 
freedom. Here x 
x y
follows t-distribution with  n1  n2  2 degrees of
1 1
S

n1 n2


n1s12  n2 s2 2
.
S 
n1  n2  2
2
The 95% confidence limits for the difference of two population means are
 x  y   t .S

1 1
 , where   0.025
n1 n2
The 99% confidence limits for the difference of two population means are
 x  y   t .S



2
2
1 n1
1 n2
1

xi  x   yi  y  or
x
,
y

yi and S 2 



i

n1  n2  2 
n1 i 1
n2 i 1
1 1
 , where   0.0005
n1 n2
8
Problems:
1. Two horses A and B were tested according to the time in seconds to run a particular track
with the following results. Test whether two horses have the same running capacity.
Horse-A 28
30
32
33
33
29
34
Horse-B 29
30
30
24
27
29
2. To examine the hypothesis that the husbands are more intelligent than the wives, an
investigator took a sample of 10 couples and administered them a test which measures the
I.Q. the results are as follows.
Husbands 117
105
97
105
123
109
86
78
103
107
Wives
106
98
87
104
116
95
90
69
108
85
3. Measuring the specimens of nylon yarn, taken from two machines, it was found that 8
specimens from first machine had a mean denier of 9.67 with a standard deviation of 1.81
while 10 specimens from second machine had a mean denier of 7.43 with a standard
deviation of 1.48. assuming that the proportions are normal, test the hypothesis
H 0 : 1  2  1.5 against H1 : 1  2  1.5 at 0.05 level of significance.
4. Random samples of specimens of coal from two mines A and B are drawn and their heat
producing capacity were measures yielding the following results.
Mine-A
8350
8070
8340
8130
8260
Mine-B
7900
8140
7920
7840
7890
7950
Is there is significant difference between the means of these two samples at 0.01 level of
significance.
PAIRED –SAMPLE t-Test: suppose a business concern is interested to know whether a
particular media of promoting sales of a product is really effective or not. In this case we have to
test whether the average sales before and after the sales promotion are equal.
If  x1 , y1  ,  x2 , y2  ,.....  xn , yn  be the pairs of sales data before and after the sales promotion in a
business concern, we apply paired t-test to examine the significance of the difference of the two
situations. Let di  xi  yi or yi  xi for i  1, 2,..., n
Then the Null hypothesis H0 : 1  2  i.e  0 , there is no significant difference between the
means in two situations.
Then the alternative hypothesis is H1 : 1  2 .
Assuming the null hypothesis, the test statistic is defined by
2
d 
1
1 n
t
; d   di ; S 2 
d

d
.
 i
n
n  1 i 1
S n


The above statistic follows student’s t-distribution with  n  1 degrees of freedom.
Problems:
1. Scores obtained in a shooting competition by 10 soldiers before and after intensive
training are given below.
Before 67
24
57
55
63
54
56
68
33
43
After
70
38
58
58
56
67
68
75
42
38
Test whether the intensive training is useful at 0.05 level of significance.
9
SNEDECOR’S F-TEST OF SIGNIFICANCE:
Let two independent random samples of sizes n1 , n2 be drawn from two normal populations. To
test the hypothesis that the two population variances  12 ,  2 2 are equal,
Let the null hypothesis be H 0 :  12   2 2
Then the alternative hypothesis is H1 :  12   2 2
The estimates of  12 ,  2 2
n s2
are given by S12  1 1 
n1  1
 x  x
i
n1  1
2
n s2
and S22  2 2 
n2  1
 y  y
2
i
n2  1
.
Where s12 , s22 are the variances of the two samples.
S12
S22
Assuming that H 0 is true, the test statistic F  2 or F  2 according as S12  S 22 or S 22  S12
S2
S1
follows F-distribution with  n1  1, n2  1 degrees of freedom.
Problems:
1. The measurements of the output of two units have given the following results. Assuming
that both samples have been obtained from the normal populations at 10% significant
level, test the two populations have the same variance.
Unit-A
14.1
10.1
14.7
13.7
14.0
Unit-B
14.0
14.5
13.7
12.7
14.1
2. The following samples are measurements of the heat producing capacity of specimens of
coal from two mines.
Mine-1
8260
8130
8350
8070
8340
Mine-2
7950
7890
7900
8140
7920
7840
Use the 0.02 level of significance to test whether it is reasonable to assume that the
variances of the two populations sampled are equal.
CHI-SQUARE TEST:(  2  test )
If a set of events A1 , A2 , A3 ,....., An are observed to occur with frequencies O1 , O2 , O3 ,....., On
respectively and according to probability rules A1 , A2 , A3 ,....., An are expected to occur with
frequencies E1 , E2 , E3 ,....., En respectively with O1 , O2 , O3 ,....., On are called observed frequencies
and E1 , E2 , E3 ,....., En are called expected frequencies. Then  2 is defined as
n
 
 Oi  Ei 
2
with  n  1 degrees of freedom.
Ei
This test is used whether differences between observed and expected frequencies are significant.
Note: if the data is given in a series of n numbers then degrees of freedom is  n  1 .
2
i 1
Incase of Binomial distribution degrees of freedom =  n  1 .
Incase of Poisson distribution degrees of freedom =  n  2 
Incase of Normal distribution degrees of freedom =  n  3
Problems:
1. A pair of dice are thrown 360 times and the frequency of each sum is indicated below.
Sum
2
3
4
5
6
7
8
9
10
11
12
frequency 8
24
35
37
44
65
51
42
26
14
14
Would you say that the dice are fair on the basis of the chi-square test at 0.05 level of
significance?
10
2. A sample analysis of examination results of 500 students was made. It was found that
220 students had failed, 170 had secured a third class, 90 were placed in second class,
and 20 got a first class. Do these figures commensurate with the general examination
result which is in the ration of 4:3: 2:1 for the various categories respectively?
CHI-SQUARE TEST:(  2  test ) FOR INDEPENDENCE OF ATTRIBUTES:
Literally an attribute means a quality or a characteristic. Examples of attributes are drinking,
smoking, blindness, honesty ,beauty etc.
RowTotal  ColumnTotal
In this case the expected frequencies are calculated for any cell =
Grand Total
n
The test statistic   
2
i 1
 Oi  Ei 
Ei
2
approximately follows chi-square distribution with degrees
of freedom  no.of rows 1   no.of columns 1
Problems:
1. Four methods are under development for making discs of a super conducting material.
Fifty discs are made by each method and they are checked for super conductivity when
cooled with liquid.
I-method
II-method
III-method
IV-method
Super conductors
31
42
22
25
Failures
19
8
28
25
Test the significant difference between the proportions of super conductors at 0.05.
2. From the following data , find whether there is any significant liking in the habit of
taking soft drinks among the categories of employees.
Soft drinks
Clerks
Teachers
Officers
Pepsi
10
25
65
Thumsup
15
30
65
Fanta
50
60
30
2
CHI-SQUARE TEST:(   test ) FOR POPULATION VARIANCES:
Suppose that a random sample xi  i  1, 2,...., n  is drawn from a normal population with mean 
and variance  2 . To test the hypothesis that the population variance  2 has a specified value
 0 , let the null hypothesis H 0 :  2   0 2 .
The test statistic is given by  2  
s 
2
 x  x
i
 x  x
i
 02
2

ns 2
 02
,
where
2
and ns 2   n  1 S 2
n
Problem:
1. A firm manufacturing rivets wants to limit variations in their length as much as possible.
The lengths (in cms) of 10 rivets manufactured by a new process are
2.15,1.99, 2.05, 2.12, 2.17, 2.01,1.98, 2.03, 2.25,1.93 . Examine whether the new process
can be considered superior to the old if the old population has standard deviation of 0.145
cm?