Download H 1 : µ 1 - SI-35-02

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
THE MULTINOMIAL DISTRIBUTION
AND ELEMENTARY TESTS FOR CATEGORICAL DATA
It is useful to have a probability model for the number of observations
falling into each of k mutually exclusive classes. Such a model is given
by the multinomial random variable, for which it is assumed that :
1. A total for n independent trials are made
2. At each trial an observation will fall into exactly one of k
mutually exclusive classes
3. The probabilities of falling into the k classes are
p1, p2,……., pk where pi is the probability of falling into
class i, i = 1,2,…k
These probabilities are constant for all trials, with
1
If k =2 , we have the Binomial distribution.
Let us define :
X1 to be the number of type 1 outcomes in the n trials,
X2 to be the number of type 2 outcomes,
.
.
Xk to be the number of type k outcomes.
As there are n trials,
2
The joint probability function for these RV can be shown to be :
where
For k=2, the probability function reduces to
which is the Binomial probability of
each with probability of success .
3
- successes in n trials,
EXAMPLE
A simple example of multinomial trials is the tossing of a die n
times. At each trial the outcome is one of the values 1, 2, 3, 4, 5
or 6. Here k=6. If n=10 , the probability of 2 ones, 2 twos, 2
threes, no fours, 2 fives and 2 sixes is :
To testing hypotheses concerning the
for this example,
, the null hypothesis
states that the die is fair.
vs
is false
which, of course, means that the die is not fair
4
The left-hand side can be thought of as the sum of the terms :
Which will be used in testing
versus
where the
5
are hypothesized value of the
In the special case of k=2, there are two-possible outcomes at
each trial, which can be called success and failure.
A test of
is a test of the same null hypothesis
(
). The following are observed an expected values for
this situation :
Success
Failure
Expected
Observed
6
Total
n
X
n-X
n
For an α-level test, a rejection region for testing
versus
is given by
We know that
Hence,
By definition,
We have,
, and using
if and only if
7
GOODNESS - of – FIT TESTS
Thus far all our statistical inferences have involved population parameters like :
means, variances and proportions. Now we make inferences about the entire
population distribution. A sample is taken, and we want to test a null
hypothesis of the general form ;
H0 : sample is from a specified distribution
The alternative hypothesis is always of the form
H1 : sample is not from a specified distribution
A test of H0 versus H1 is called a goodness-of-fit test.
Two tests are used to evaluate goodness of fit :
1. The
test, which is based on an approximate
statistic.
2. The Kolmogorov – Smirnov (K-S) test.
This is called a non parametric test, because it uses a test statistic that
makes no assumptions about distribution.
The
test is best for testing discrete distributions, and the K-S test is best on
continuous distributions.
8
Goodness of Fit ??
A goodness of fit test attempts to determine if a conspicuous discrepancy
exists between the observed cell frequencies and those expected under H0 .
A useful measure for the overall discrepancy is given by :
where O and E symbolize an observed frequency and the corresponding
expected frequency.
The discrepancy in each cell is measured by the squared difference between
the observed and the expected frequencies divided by the expected frequency.
9
The
statistic was originally proposed by Karl Pearson (1857 – 1936) , who
found the distribution for large n to be approximately a
distribution with
degrees of freedom = k-1.
Due to this distribution, the statistic is denoted by
and is called Pearson’s
statistic for goodness of fit .
Null hypothesis : H0 : pi = pio ; i = 1,2, ….k
H1 : at least one pi is not equal to its specified value.
Test statistic :
Rejection Region :
10
distribution with d.f = (k-1)
Chi – square statistic first proposed by Karl Pearson in 1900, begin with the
Binomial case.
Let X1 ~ BIN (n, p1) where 0 < p1 < 1.
According to the CLT :
for large n, particularly when np1 ≥ 5 and n(1- p1) ≥ 5.
As you know, that Q1 = Z2 ≈ χ2 (1)
If we let X2 = n - X1 and p2 = 1 - p1 ,
Because,
Hence,
11
Pearson the constructed an expression similar to Q1 ; which involves X1 and
X2 = n - X1 , that we denote by Qk-1 , involving X1 , X2 , ……., Xk-1 and
Xk = n - X1 - X2 - …….- Xk-1
Hence,
or
12
EXAMPLE
We observe n = 85 values of a – random variable X that is thought to have a
Poisson distribution, obtaining :
x
0
1
2
3
4
5
Frequency
41
29
9
4
1
1
The sample average is the appropriate estimate of
λ = E(X)
It is given by
The expected frequencies for the first three cells are : npi , i= o,1,2
85 p0 = 85 P(X=0) = 85 (0,449) = 38,2
85 p1 = 85 P(X=1) = 85 (0,360) = 30,6
85 p2 = 85 P(X=2) = 85 (0,144) = 12,2
13
The expected frequency for the cell { 3, 4, 5 } is :
85 (0,047) = 4,0 ;
WHY ???
The computed Q3 , with k=4 after combination,
 no reason to reject H0
H0 : sample is from Poisson distribution
vs
14
H1 : sample is not from Poisson distribution
EXERCISE
The number X of telephone calls received each minute at a certain switch
board in the middle of a working day is thought to have a Poisson
distribution.
Data were collected, and the results were as follows :
x
0
1
2
3
4
5
6
frequency
40
66
41
28
9
3
1
Fit a Poisson distribution. Then find the estimated expected value of each cell
after combining {4,5,6} to make one cell.
Compute Q4 , since k=5, and compare it to
Why do we use three degrees of freedom?
Do we accept or reject the Poisson distribution?
15
CONTINGENCY TABLES
In many cases, data can be classified into categories on the basis
of two criteria.
For example, a radio receiver may be classified as having low,
average, or high fidelity and as having low, average, or high
selectivity; or graduating engineering students may be classified
according to their starting salary and their grade-point-average.
In a contingency table, the statistical question is whether the row
criteria and column criteria are independent.
The null and alternative hypotheses are
H0 : The row and column criteria are independent
H1 : The row and column criteria are associated
Consider a contingency table with r rows and c columns. The
number of elements in the sample that are observed to fall into
row class i and column class j is denoted by
16
The row sum for the ith row is
And the column sum for jth column is
The total number of observations in the entire table is
The contingency table for the general case is given ON THE
NEXT SLIDESHOW :
17
The General r x c Contingency Table
X11
X12 …................ X1j .................... X1c
R1
X21
X22 …………….X2j …………… X2c
R2
Xi2 ................... Xij ………………Xic
Ri
Xr1
Xr2 ……………Xrj ……………... Xrc
Rr
C1
C2 ……………Cj ……………….Cc
n
.
.
.
.
Xi1
.
.
.
.
18
There are several probabilities of importance associated with
the table.
The probability of an element’s being in row class i and column
class j in the population is denoted by pij
The probability of being in row class i is denoted by pi• , and the
probability of being in column class j is denoted by p•j
Null and alternative hypotheses regarding the independence of
these probabilities would be stated as follows :
for all pairs (i , j)
versus
is false
As pij , pi• , p•j are all unknown, it is necessary to estimate these
probabilities.
19
and
under the hypothesis of independence,
, so
would be estimated by
The expected number of observations in cell (i,j) is
Under the null hypothesis, , the estimate of
is
The chi-square statistic is computed as
20
The actual critical region is given by
If the computed
gets too large,
namely, exceeds
we reject the hypothesis
that the two attributes are independent.
21
EXAMPLE
Ninety graduating male engineers were classified by two
attributes : grade-point average (low, average, high) and
initial salary (low, high).
The following results were obtained.
Salary
22
Grade-Point Average
Low
Average
High
Low
15
18
7
40
High
5
22
23
50
20
40
30
90
SOLUTION
;
;
;
23
APA ARTINYA ???
EXERCISES
1. Test of the fidelity and the selectivity of 190 radios produced.
The results shown in the following table :
Fidelity
Low Average
Selectivity
High
Low
7
12
31
Average
35
59
18
High
15
13
0
Use the 0,01 level of significance to test the null hypothesis
that fidelity is independent of selectivity.
24
2. A test of the quality of two or more multinomial distributions
can be made by using calculations that are associated with a
contingency table. For example, n = 100 light bulbs were taken
at random from each of three brands and were graded as A, B, C,
or D.
Brand
25
Grade
Totals
A
B
C
D
1
27
42
21
10
100
2
23
39
25
13
100
3
22
36
23
19
100
72
117
69
42
300
Clearly, we want to test the equality of three multinomial
distributions, each with k=4 cells. Since under
the
probability of falling into a particular grade category is
independent of brand, we can test this hypothesis by
computing
Use
26
and comparing it with
.
.
ANALYSIS OF VARIANCE
The Analysis of Variance
ANOVA
(AOV)
is generalization of the two sample t-test, so that the means of k > 2 populations may be
compared
ANalysis Of VAriance, first suggested by Sir Ronald Fisher, pioneer of the theory of
design of experiments.
He is professor of genetics at Cambridge University.
The F-test, name in honor of Fisher
1927
The name Analysis of Variance stems from the somewhat
surprising fact that a set of computations on several variances
is used to test the equality of several means
IRONICALLY
28
The term ANOVA appears to be a misnomer, since the objective is
to analyze differences among the group means
The ANOVA deals
with means, it
may appear to be
misnamed
ANOVA
The terminology
of ANOVA can be
confusing, this
procedure is
actually concerned
with levels of
means
The ANOVA belies its name in that it is not concerned with
analyzing variances but rather with analyzing variation in means
29
DEFINITION:
ANOVA, or one-factor analysis of variance, is a
procedure to test the hypothesis that several
populations have the same means.
FUNCTION:
Using analysis of variance, we will be able to
make inferences about whether our samples are
drawn from populations having the same means
30
INTRODUCTION
The Analysis of Variance (ANOVA) is a statistical technique used to compare
the locations (specifically, the expectations) of k>2 populations.
The study of ANOVA involves the investigation of very complex statistical
models, which are interesting both statistically and mathematically.
The first is referred to as a one-way classification or a completely
randomized design.
The second is called a two-way classification or a randomized block design.
The basic idea behind the term “ANOVA” is that the total variability of all
the observations can be separated into distinct portions, each of which can
be assigned a particular source or cause.
This decomposition of the variability permits statistical estimation and tests
of hypotheses.
31
Suppose that we are interested in k populations, from each of which we
sample n observations. The observations are denoted by:
Yij , i = 1,2,…k ; j = 1,2,…n
where Yij represents the jth observation from population i.
A basic null hypothesis to test is :
H0 : µ1 = µ2 = … =µk
that is , all the populations have the same expectation.
The ANOVA method to test this null hypothesis is based on an F statistic.
32
THE COMPLETELY RANDOMIZED DESIGN
WITH EQUAL SAMPLE SIZES
First we will consider comparison of the true expectation of k > 2
populations, sometimes referred to as the k – sample problem.
For simplicity of presentation, we will assume initially that an equal
number of observations are randomly sampled from each population.
These observations are denoted by:
Y11 ,Y12 , …… ,Y1n
Y21 ,Y22 , …… ,Y2n
.
.
.
Yk1 ,Yk2 , …… ,Ykn
33
where Yij represents the jth observation out of the n randomly sampled
observations from the ith population.
Hence, Y12 would be the second observation from the first population.
In the completely randomized design, the observations are assumed to :
1. Come from normal populations
2. Come from populations with the same variance
3. Have possibly different expectations, µ1 , µ2 , … , µk
These assumptions are expressed mathematically as follows :
Yij ~ NOR (µi , σ2) ; i = 1,2,...k (*)
j = 1,2,…n
This equation is equivalent to………….
34
Yij = µi + εij , with εij ~ NID (0, σ2)
Where N represents “normally”, I represents “ independently” and D
represents “ distributed”.
The 0 means
for all pairs of indices i and j, and σ2 means that
Var ( ) = σ2 for all such pairs.
The parameters µ1 ,µ2 , … ,µk are the expectations of the k populations,
about which inference is to be made.
The initial hypotheses to be tested in the completely randomized design are :
H0 : µ1 = µ2 = … =µk
versus
H1 : µi ≠ µj for some pair of indices i ≠ j (**)
35
The null hypothesis states that all of the k populations have the same expectation. If this
is true, then we know from equation (*) that all of the Yij observations have the same
normal distribution and we are observing not n observations from each of k
populations, but nk observations, all from the same population.
The random variable Yij may be written as :
where,
defining,
So,
36
Hence,
, with
and
The hypotheses in equation (**) may be restated as :
VS
(***)
The observation
37
has expectation,
The parameters
are differences or deviations from this common part
of the individual population expectations
. If all of the
are equal (say to ),
then
. In this case all of the deviation
are zero, because :
Hence, the wall hypothesis in equation (***) means that,
expectations consist only of the common part .
The total variability of the observations :
, where
It can be shown, that :
38
, these
, is the means of all of the observations.
The notation
that is
represents the average of the observations from the ith population ;
The last equation, is represented by :
SST = SSA + SSE
where SST represents the total sum of squares, SSA represents the sum of squares due
to differences among populations or treatments, and SSE represents the sum of
squares that is unexplained or said to be “ due to error”.
The result of ANOVA, usually reported in an analysis of variance table.
ANOVA table……….
39
ANOVA Table for the Completely Randomized Design with Equal
Sample Sizes :
Source of
Variation
Degrees of
Freedom
Sum of
Squares
k-1
SSA
Error
k(n-1)
SSE
Total
kn-1
SST
Among
populations or
treatments
For an
(**) is
40
Mean Square
F
-level test, a reasonable critical region for the alternative hypotheses in equation
THE COMPLETELY RANDOMIZED DESIGN WITH
UNEQUAL SAMPLE SIZES
In many studies in which expectation of k>2 populations are compared, the
samples from each population are not ultimately of equal size, even in cases
where we attempt to maintain equal sample size. For example, suppose we
decide to compare three teaching methods using three classes of students.
The teachers of the classes agree to teach use one of the three teaching
methods.
The plan for the comparison is to give a common examination to all of the
students in each class after two months of instruction.
Even if the classes are initially of the same size, they may differ after two
months because students have dropped out for one reason or another. Thus
we need a way to analyze the k-sample problem, when the samples are of
unequal sizes.
41
In the case of UNEQUAL SAMPLE SIZE, the observations are denoted by :
.
.
.
where, represents the jth observation from the ith population.
For the ith population there are ni observations.
In the case of equal sample sizes, ni = n for i = 1,2,…,k.
The model assumptions are the same for the unequal sample size case as for the equal
sample size case. The
are assumed to :
1. Come from normal populations
2. Come from populations with the same variance
3. Have possibly different expectations, µ1, µ2, …, µk
42
These assumptions are expressed formally as
; i = 1, 2, …, k
j = 1, 2, …, ni
or as Yij = µi + εij , with εij ~ NID (0, σ2)
The first null and alternative hypotheses to test are exactly the same as those in the
previous section-namely :
H0 : µ1 = µ2 = … =µk
versus
H1 : µi ≠ µl for some pair of indices i ≠ l
The model for the completely randomized design may be presented as :
with
In this case the overall mean,
observations.
43
and εij ~ NID (0, σ2)
, is given by
where
is the total number of
Here is a weighted average of the population expectations , where the weights
are
, the proportion of observations coming from the ith population.
The hypotheses, can also be restated as
versus
for at least one i.
The observation Yij has expectation
,
If H0 is true, then
, hence all of the
have a common distribution.
Thus,
, under H0. The total variability of the observations is again
partitioned into two portions by
or
SST = SSA +SSE,
here
44
where
As before :
represents the average of the observations from the ith population.
N is the total number of observations
is the average of all the observations
Again,
45
SST represents the total sum of squares.
SSA represents the sum of squares due to differences among populations or
treatments.
SSE represents the sum of squares due to error.
The number of Degrees of Freedom for :
DEGREE OF FREEDOM
TOTAL = TREATMENTS + ERROR
(N-1) =
(k-1)
+ (N-k)
46
The mean square among treatments and the mean square for error are equal to
appropriate sum of squares divided by corresponding dof.
That is,
It can be shown that MSE is an unbiased estimate of σ2 , that is :
, similarly ;
Under hypothesis,
has an F-distribution with (k-1) and (N-k) dof.
Finally, we reject the null hypothesis at significance level α if :
47
ANOVA TABLE
for the Completely Randomized Design with unequal sample sizes
SOURCE
dof
SS
k-1
SSA
ERROR
N-k
SSE
TOTAL
N-1
SST
Among
Populations or
Treatments
Sometimes, SSA be denoted SSTR
SSE be denoted SSER
SST be denoted SSTO
48
MS
F
SUMMARY NOTATION FOR A CRD
POPULATIONS (TREATMENTS)
MEAN
1
2
3
…………..... k
µ1
µ2
µ3
…………………
µk
……………..
VARIANCE
INDEPENDENT RANDOM SAMPLES
1
2
3
……………
k
SAMPLE SIZE
n1
n2
n3
…................
nk
SAMPLE TOTALS
T1
T2
T3
…................
Tk
SAMPLE MEANS
…………….
Total number of measurements N = n1 + n2 + n3 +…+ nk
49
ANOVA F-TEST FOR A CRD
with k treatments
H0 : µ1 = µ2 = … =µk
(i.e., there is no difference in the treatment means)
versus
Ha : At least two of the treatment means differ.
Test Statistic :
Rejection Region :
50
PARTITIONING OF THE TOTAL SUM OF SQUARES
FOR THE COMPLETELY RANDOMIZED DESIGN
SUM OF SQUARES FOR
TREATMENTS (SSTR)
TOTAL SUM OF
SQUARES (SSTO)
SUM OF SQUARES FOR
ERROR (SSER)
51
FORMULAS FOR THE CALCULATIONS IN THE CRD
SSTR = sum of squares for treatments
= (sum of squares of treatment totals with each square divided by number of observations for
that treatment) - CM
=
52
where k is the total of treatments and N is the total number of observations.
53
EXAMPLE
For group of students were subjected to different teaching techniques and tested at the
end of a specified period of time. As a result of drop outs from the experimental groups
(due to sickness, transfer, and so on) the number of students varied from group to
group. Do the data shown in table (below) present sufficient evidence to indicate a
difference in the mean achievement for the four teaching techniques ??
1DATA FOR
2 EXAMPLE
3 1 4
54
65
87
73
79
81
69
75
69
83
81
72
79
90
454
6
75,67
549
7
78,43
59
78
67
62
83
76
94
89
80
88
425
351
6
4
70,83 87,75
SOLUTION
The mean squares for treatment and error are
55
The test statistic for testing
H0 : µ1 = µ2 = µ3= µ4 is
The critical value of F for α = 0.05 is
reject H0
CONCLUDE ?????
56
THE RANDOMIZED BLOCK DESIGN
The randomized block design implies the presence of two
quantitative independent variables, “blocks” and
“treatments”
Consequently, the total sum of squares of deviations of the
response measurements about their mean may be
partitioned into three parts, the sum of squares for blocks,
treatments and error.
57
CRD
RBD
SSTR
SSTR
SSTO
SSBL
SSER
SSER
58
Definition :
A randomized block design is a design devised to compare
the means for k treatments utilizing b matches blocks of k
experimental unit each. Each treatment appears once in
every block.
The observations in a RBD can be represented by an array of
the following type :
.
.
.
59
As before, the expectation of Yij the ith observation from the jth
treatment (population), was given by :
In this section / RBD, the assumption about Yij is that :
(i)
; i = 1,2, … , t
j = 1, 2, … , b
with
and
The observation Yij is said that to be the observation from block j
on treatment i.
As equation (i), it’s assumed that there are t different treatments
and b blocks.
60
Hence,
overall effect
block effect
treatment effect
One task is to test the null hypothesis
which states that there are no treatment differences.
61
Here, the ith treatment mean is :
The jth block mean is :
And the overall mean is :
Expression above can be abbreviated as
SSTO = SSTR + SSBL +SSER
62
The degrees of freedom are partitioned as follows :
dof TO = dof TR + dof BL + dof ER
bt – 1 = (t-1) + (b-1) + (b-1)(t-1)
If the null hypothesis of no treatment differences given in
is true,
. Then both MSTR and MSER are unbiased
estimate of .
63
It can be further shown, under
:
Hence, using an level test, we reject
in favor of
if :
For reasons analogous, a test of :
versus
, can be carried out using the critical region :
64
Data Structure of a RBD with b blocks and k treatments
1
B
1
………………..
L
2
………………..
O
C
K
.
.
.
.
b
Treatment
means
65
T R E AT M E NT S
2
3 ………………… k
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
…………………
Block means
GENERAL FORM OF THE RANDOMIZED BLOCK
DESIGN (TREATMENT i IS DENOTED BY Ai)
BLOCK
………………………
1
2
b
A1
A1
A1
A2
A2
A2
A3
A3
A3
Ak
Ak
.
.
.
Ak
Although we show thetreatments in order within the blocks, in practice they would be
assigned to the experimental units in a random order (thus the name randomized block
design)
66
FORMULAS FOR CALCULATIONS IN RBD
where,
N = total number of observations
b = number of blocks
k = number of treatments
67
ANOVA Summary Table For RBD
SOURCE
DOF
SS
MS
Treatments
k-1
SSTR
MSTR
Blocks
b-1
SSBL
MSBL
Error
N-k-b+1
SSER
MSER
N-1
SSTO
TOTAL
68
F
EXAMPLE
A study was conducted in a large city to compare the supermarket prices of the four
leading brands of coffee at the end of the year. Ten supermarkets in the city were
selected, and the price per pound was recorded for each brand.
1. Set up the test of the null hypothesis that the mean prices of the four brands sold
in the city were the same at the end of the year. Use α = 0,05
2. Calculate the F statistic
3. Do the data provide sufficient evidence to indicate a difference in the mean prices
for the four brands of coffee?
69
70
SUPER
MARKET
A
1
$ 2,43
2
BRAND
B
TOTALS
C
D
$ 2,47
$ 2,27
$2,41
9,78
2,48
2,52
2,53
2,48
10,01
3
2,38
2,44
2,42
2,35
9,59
4
2,40
2,47
2,46
2,39
9,72
5
2,35
2,42
2,44
2,32
9,53
6
2,43
2,49
2,47
2,42
9,81
7
2,55
2,62
2,64
2,56
10,37
8
2,41
2,49
2,47
2,39
9,76
9
2,53
2,60
2,59
2,49
10,21
10
2,35
2,43
2,44
2,36
9,58
TOTALS
24,31
24,95
24,93
24,17
SOLUTION
71
72
Since the calculation F >F0,05 , there is very strong evidence that at
least two of the means for the populations/treatments of prices of
four coffee brands differ.
Treatments :
H 0 : µ 1 = µ 2 = µ 3= µ 4
H1 : at least two brands have different mean prices
Test Statistic
Blocks :
H0 :Mean coffee prices are the same for all ten supermarkets
H1 : Mean coffee prices differ for at least two supermarkets
Test Statistic
73
dof for the test statistic are
b - 1 = 9 and N – k – b +1=27
F0,05 = 2,25
ANOVA TABLE
SOURCE
DOF
SS
MS
F
Treatment
3
0,05000
0,016667
92,8
Block
9
0,17451
0,019390
107,9
Error
27
0,00485
0,00017963
39
0,22936
TOTAL
74
NON PARAMETRIC TEST
The majority of hypothesis tests discussed so far have made inferences
about population parameters, such as the mean and the proportion.
These parametric tests have used the parametric statistics of samples
that came from the population being tested.
To formulate these tests, we made restrictive assumptions about the
populations from which we drew our samples. For example, we
assumed that our samples either were large or came from normally
distributed populations. But populations are not always normal.
75
And even if a goodness-of-fit test indicates that a population is approximately
normal. We cannot always be sure we’re right, because the test is not 100 percent
reliable.
Fortunately, in recent times statisticians have develops useful techniques that do
not make restrictive assumption about the shape of population distribution.
These are known as distribution – free or, more commonly, nonparametric test.
Non parametric statistical procedures in preference to their parametric
counterparts.
The hypotheses of a nonparametric test are concerned with something other than
the value of a population parameter.
A large number of these tests exist, but this section will examine only a few of the
better known and more widely used ones :
76
SIGN TEST
WILCOXON SIGNED RANK
TEST
MANN – WHITNEY TEST
(WILCOXON RANK SUM TEST)
NON PARAMETRIC
TESTS
RUN TEST
KRUSKAL – WALLIS TEST
KOLMOGOROV – SMIRNOV
TEST
LILLIEFORS TEST
77
THE SIGN TEST
The sign test is used to test hypotheses about the median of a
continuous distribution. The median of a distribution is a value of
the random variable X such that the probability is 0,5 that an
observed value of X is less than or equal to the median, and the
probability is 0,5 that an observed value of X is greater than or
equal to the median. That is,
Since the normal distribution is symmetric, the mean of a normal
distribution equals the median. Therefore, the sign test can be used
to test hypotheses about the mean of a normal distribution.
78
Let X denote a continuous random variable with median and let
denote a random sample of size n from the
population of interest.
If
denoted the hypothesized value of the population median,
then the usual forms of the hypothesis to be tested can be stated as
follows :
VERSUS
(right-tailed test)
79
(left-tailed test)
(two-tailed test)
Form the differences :
Now if the null hypothesis
is true,
any difference
is equally likely to be positive or negative.
An appropriate test statistic is the number of these differences that
are positive, say
. Therefore, to test the null hypothesis we are
really testing that the number of plus signs is a value of a Binomial
random variable that has the parameter p = 0,5 .
A p-value for the observed number of plus signs
can be
calculated directly from the Binomial distribution. Thus, if the
computed p-value.
is less than or equal to some preselected significance level α , we will
reject
and conclude
is true.
80
To test the other one-sided hypothesis,
vs
is less than or equal α,
we will reject .
The two-sided alternative may also be tested. If the hypotheses are:
vs
p-value is :
81
It is also possible to construct a table of critical value for the sign
test.
As before, let
denote the number of the differences
that are positive and let
denote the number of the differences
that are negative.
Let
, table of critical values
for the sign
test that ensure that
If the observed value of the test-statistic
, the the null
hypothesis
should be reject and accepted
82
If the alternative is
,
then reject
if
.
If the alternative is
,
then reject
if
.
The level of significance of a one-sided test is one-half the value for
a two-sided test.
83
TIES in the SIGN TEST
Since the underlying population is assumed to be continuous,
there is a zero probability that we will find a “tie” , that is , a
value of
exactly equal to .
When ties occur, they should be set aside and the sign test
applied to the remaining data.
84
THE NORMAL APPROXIMATION
When
, the Binomial distribution is well approximated
by a normal distribution when n is at least 10. Thus, since the
mean of the Binomial is
and the variance is
, the
distribution of
is approximately normal with mean 0,5n
and variance 0,25n whenever n is moderately large.
Therefore, in these cases the null hypothesis
can be
tested using the statistic :
85
Critical Regions/Rejection Regions for α-level tests of :
versus
are given in this table :
CRITICAL/REJECTION REGIONS FOR
Alternative
86
CR/RR
THE WILCOXON SIGNED-RANK TEST
The sign test makes use only of the plus and minus signs of the
differences between the observations and the median (the plus and
minus signs of the differences between the observations in the paired
case).
Frank Wilcoxon devised a test procedure that uses both direction
(sign) and magnitude.
This procedure, now called theWilcoxon signed-rank test.
The Wilcoxon signed-rank test applies to the case of the symmetric
continuous distributions.
Under these assumptions, the mean equals the median.
87
Description of the test :
We are interested in testing,
versus
88
Assume that
is a random sample from a continuous and
symmetric distribution with mean/median :
.
Compute the differences
, i = 1, 2, … n
Rank the absolute differences
, and then give the ranks the
signs of their corresponding differences.
Let
be the sum of the positive ranks, and
be the absolute
value of the sum of the negative ranks, and let
.
Critical values of , say
.
1. If
, then value of the statistic
2. If
, reject
if
3. If
, reject
if
89
, reject
LARGE SAMPLE APPROXIMATION
If the sample size is moderately large (n>20), then it can be shown
that
or
has approximately a normal distribution with
mean
and
variance
Therefore, a test of
90
can be based on the statistic
Wilcoxon Signed-Rank Test
Test statistic :
Theorem : The probability distribution of
when
is true,
which is based on a random sample of size n, satisfies :
91
Proof :
Let
if
, then
where
For a given
“+” or “-”. Hence,
92
, the discrepancy has a 50 : 50 chance being
where
93
94
PAIRED OBSERVATIONS
TheWilcoxon signed-rank test can be applied to paired data.
Let (
) , j = 1,2, …n be a collection of paired observations
from two continuous distributions that differ only with respect to
their means. The distribution of the differences
is
continuous and symmetric.
The null hypothesis is :
, which is equivalent to
To use the Wilcoxon signed-rank test, the differences are first ranked
in ascending order of their absolute values, and then the ranks are
given the signs of the differences.
95
Let
be the sum of the positive ranks and
be the absolute
value of the sum of the negative ranks, and
.
If the observed value
, then is rejected and
accepted.
If
, then reject , if
If
, reject , if
96
EXAMPLE
Eleven students were randomly selected from a large statistics class, and
their numerical grades on two successive examinations were recorded.
Student
Test 1
Test 2
Difference
1
2
3
4
5
6
7
8
9
10
11
94
78
89
62
49
78
80
82
62
83
79
85
65
92
56
52
74
79
84
48
71
82
9
13
-3
6
-3
4
1
-2
14
12
-3
Rank Sign Rank
8
10
4
7
4
6
1
2
11
9
4
8
10
-4
7
-4
6
1
-2
11
9
-4
Use the Wilcoxon signed rank test to determine whether the second test
was more difficult than the first. Use α = 0,1.
97
solution :
Jumlah ranks positif :
0
TOLAK H0
1,28
1,69
98
EXAMPLE
Ten newly married couples were randomly selected, and each
husband and wife were independently asked the question of how
many children they would like to have. The following information
was obtained.
COUPLE
1
2
3
4
5
6
7
8
9
10
WIFE X
HUSBAND Y
3
2
2
3
1
2
0
2
0
0
1
2
2
1
2
3
2
1
0
2
Using the sign test, is test reason to believe that wives want fewer
children than husbands?
Assume a maximum size of type I error of 0,05
99
SOLUSI
Tetapkan dulu H0 dan H1 :
H0 : p = 0,5
vs H1 : p < 0,5
Pasangan
Tanda
1 2 3 4 6 7 8 9 10
+ -
- -
-
+ - +
-
Ada tiga tanda +.
Di bawah H0 , S ~ BIN (9 , 1/2)
P(S ≤ 3) = 0,2539
Pada peringkat α = 0,05 , karena 0,2539 > 0,05
maka H0 jangan ditolak.
100
THE WILCOXON RANK-SUM TEST
Suppose that we have two independent continuous populations X1
and X2 with means µ1 and µ2. Assume that the distributions of X1
and X2 have the same shape and spread, and differ only (possibly)
in their means.
TheWilcoxon rank-sum test can be used to test the hypothesis
H0 : µ1 = µ2. This procedure is sometimes called the MannWhitney test or Mann-Whitney U Test.
101
Description of the Test
Let
and
be two independent
random samples of sizes
from the continuous populations
X1 and X2. We wish to test the hypotheses :
H0 : µ1 = µ2
versus H1 : µ1 ≠ µ2
The test procedure is as follows. Arrange all n1 + n2 observations in
ascending order of magnitude and assign ranks to them. If two or
more observations are tied, then use the mean of the ranks that
would have been assigned if the observations differed.
102
Let W1 be the sum of the ranks in the smaller sample (1), and
define W2 to be the sum of the ranks in the other sample.
Then,
Now if the sample means do not differ, we will expect the sum of the
ranks to be nearly equal for both samples after adjusting for the
difference in sample size. Consequently, if the sum of the ranks differ
greatly, we will conclude that the means are not equal.
Refer to table with the appropriate sample sizes n1 and n2 , the
critical value wα can be obtained.
103
H0 : µ1 = µ2 is rejected, if either of the observed values
w1 or w2 is less than or equal wα
If H1 : µ1 < µ2, then reject H0 if w1 ≤ wα
For H1 : µ1 > µ2, reject H0 if w2 ≤ wα.
104
LARGE-SAMPLE APPROXIMATION
When both n1 and n2 are moderately large, say, greater than 8, the
distribution of W1 can be well approximated by the normal
distribution with mean :
and variance :
105
Therefore, for n1 and n2 > 8, we could use :
as a statistic, and critical region is :

 two-tailed test
106

 upper-tail test

 lower-tail test
EXAMPLE
A large corporation is suspected of sex-discrimination in the salaries
of its employees. From employees with similar responsibilities and
work experience, 12 male and 12 female employees were randomly
selected ; their annual salaries in thousands of dollars are as follows :
Females
22,5
19,8 20,6 24,7 23,2 19,2 18,7 20,9
21,6 23,5 20,7 21,6
Males
21,9
21,6 22,4 24,0 24,1 23,4 21,2 23,9
20,5 24,5 22,3 23,6
Is there reason to believe that there random samples come from
populations with different distributions ? Use α = 0,05
107
SOLUSI
H0 : f1(x) = f2(x)  APA ARTINYA??
random samples berasal dari
populasi dengan distribusi yang sama
H1 : f1(x) ≠ f2(x)
Gabungkan dan buat peringkat salaries :
108
SEX
GAJI
PERINGKAT
F
20,7
6
F
18,7
1
F
20,9
7
F
19,2
2
M
21,2
8
F
19,8
3
M
21,6
10
M
20,5
4
F
21,6
10
F
20,6
5
F
21,6
10
CONT’D...........
109
M
21,9
12
M
22,3
13
M
22,4
14
F
22,5
15
F
23,2
16
M
23,4
17
F
23,5
18
M
23,6
19
M
23,9
20
M
24,0
21
M
24,1
22
M
24,5
23
F
24,7
24
Andaikan, kita pilih sampel dari female, maka jumlah peringkatnya
R1 = RF = 117
Statistic
nilai dari statistic U adalah
110
Grafik
α = 0,05
-1,96
111
1,96
Zhit = 1,91
maka terima H0
ARTINYA ???
KOLMOGOROV – SMIRNOV TEST
The Kolmogorov-Smirnov Test (K-S) test is conducted by the
comparing the hypothesized and sample cumulative distribution
function.
A cumulative distribution function is defined as :
and
the sample cumulative distribution function, S(x), is defined as the
proportion of sample values that are less than or equal to x.
The K-S test should be used instead of the
to determine if a
sample is from a specified continuous distribution.
To illustrate how S(x) is computed, suppose we have the following
10 observations :
110, 89, 102, 80, 93, 121, 108, 97, 105, 103.
112
We begin by placing the values of x in ascending order, as follows :
80, 89, 93, 97, 102, 103, 105, 108, 110, 121.
Because x = 80 is the smallest of the 10 values, the proportion of
values of x that are less than or equal to 80 is : S(80) = 0,1.
113
x
S(x) = P(X ≤ x)
80
0,1
89
0,2
93
0,3
97
0,4
102
0,5
103
0,6
105
0,7
108
0,8
110
0,9
121
1,0
The test statistic D is the maximum- absolute difference between
the two cdf’s over all observed values.
The range on D is 0 ≤ D ≤ 1, and the formula is
where x = each observed value
S(x) = observed cdf at x
F(x) = hypothesized cdf at x
114
Let X(1) , X(2) , …. , X(n) denote the ordered observations of a
random sample of size n, and define the sample cdf as :
is the proportion of the number of sample values less than
or equal to x.
115
The Kolmogorov – Smirnov statistic, is defined to be :
For the size α of type I error, the critical region is of form :
116
EXAMPLE 1
A state vehicle inspection station has been designed so that
inspection time follows a uniform distribution with limits of 10
and 15 minutes.
A sample of 10 duration times during low and peak traffic
conditions was taken. Use the K-S test with α = 0,05 to
determine if the sample is from this uniform distribution. The
time are :
11,3 10,4 9,8
12,6 14,8
13,0 14,3 13,3 11,5 13,6
117
SOLUTION
H0 : sampel berasal dari distribusi Uniform (10,15)
versus
H1 : sampel tidak berasal dari distribusi Uniform (10,15)
2. Fungsi distribusi kumulatif dari sampel : S (x) dihitung dari,
1.
118
Hasil Perhitungan dari K-S
119
Waktu
Pengamatan x
S(x)
F(x)
9,8
0,10
0,00
0,10
10,4
0,20
0,08
0,12
11,3
0,30
0,26
0,04
11,5
0,40
0,30
0,10
12,6
0,50
0,52
0,02
13,0
0,60
0,60
0,00
13,3
0,70
0,66
0,04
13,6
0,80
0,72
0,08
14,3
0,90
0,86
0,04
14,8
1,00
0,96
0,04
, untuk x = 10,4
Dalam tabel , n = 10 , α = 0,05  D10,0.05 = 0,41
f(D)
α = P(D ≥ D0)
D0
D
0,12 < 0,41 maka do not reject H0
120
EXAMPLE 2
Suppose we have the following ten observations
110, 89, 102, 80, 93, 121, 108, 97, 105, 103 ;
were drawn from a normal distribution, with mean µ = 100 and
standard-deviation σ = 10.
Our hypotheses for this test are
H0 : Data were drawn from a normal distribution, with µ = 100
and σ = 10.
versus
H1 : Data were not drawn from a normal distribution, with µ = 100
and σ = 10.
121
SOLUTION
F(x) = P(X ≤ x)
122
x
F(x)
80
P(X ≤ 80) = P(Z ≤ -2) = 0,0228
89
P(X ≤ 89) = P(Z ≤ -1,1) = 0,1357
93
P(X ≤ 93) = P(Z ≤ -0,7) = 0,2420
97
P(X ≤ 97) = P(Z ≤ -0,3) = 0,3821
102
P(X ≤ 102) = P(Z ≤ 0,2) = 0,5793
103
P(X ≤ 103) = P(Z ≤ 0,3) = 0,6179
105
P(X ≤ 105) = P(Z ≤ 0,5) = 0,6915
108
P(X ≤ 108) = P(Z ≤ 0,8) = 0,7881
110
P(X ≤ 110) = P(Z ≤ 1,0) = 0,8413
121
P(X ≤ 121) = P(Z ≤ 2,1) = 0,9821
123
x
F(x)
S(x)
80
0,0228
0,1
0,0772
89
0,1357
0,2
0,0643
93
0,2420
0,3
0,0580
97
0,3821
0,4
0,0179
102
0,5793
0,5
0,0793 =
103
0,6179
0,6
0,0179
105
0,6915
0,7
0,0085
108
0,7881
0,8
0,0119
110
0,8413
0,9
0,0587
121
0,9821
1,0
0,0179
Jika α = 0,05 , maka critical value, dengan n=10 diperoleh di tabel
= 0,409.
Aturan keputusannya, tolak H0 jika D > 0,409
Karena
H0 jangan ditolak
atau terima H0 .
Artinya, data berasal dari distribusi normal dengan µ = 100 dan
σ = 10.
124
LILLIEFORS TEST
In most applications where we want to test for normality, the
population mean and the population variance are known.
In order to perform the K-S test, however, we must assume that
those parameters are known. The Lilliefors test, which is quite
similar to the K-S test.
The major difference between two tests is that, with the Lilliefors
test, the sample mean
and the sample standard deviation s are
used instead of µ and σ to calculate F (x).
125
EXAMPLE
A manufacturer of automobile seats has a production line that
produces an average of 100 seats per day. Because of new
government regulations, a new safety device has been installed,
which the manufacturer believes will reduce average daily output.
A random sample of 15 days’ output after the installation of the
safety device is shown:
93, 103, 95, 101, 91, 105, 96, 94, 101, 88, 98, 94, 101, 92,
95
The daily production was assumed to be normally distributed.
Use the Lilliefors test to examine that assumption, with α = 0,01
126
SOLUSI
Seperti pada uji K-S, untuk menghitung S (x) urutkan, sbb :
127
x
S(x)
88
1/15 = 0,067
91
2/15 = 0,133
92
3/15 = 0,200
93
4/15 = 0,267
94
6/15 = 0,400
95
8/15 = 0,533
96
9/15 = 0,600
98
10/15 = 0,667
101
13/15 = 0,867
103
14/15 = 0,933
105
15/15 = 1,000
Dari data di atas, diperoleh
Selanjutnya F(x) dihitung sbb :
x
88
91
92
.
.
.
.
101
103
105
128
dan s = 4,85
F(x)
.
Akhirnya, buat rangkuman sbb :
x
F(x)
S(x)
88
0,0401
0,067
0,0269
91
0,1292
0,133
0,0038
92
0,1788
0,200
0,0212
93
0,2358
0,267
0,0312
94
0,3050
0,400
0,0950
95
0,3821
0,533
0,1509 = D
96
0,4602
0,600
0,1398
98
0,6255
0,667
0,0415
101
0,8238
0,867
0,0432
103
0,9115
0,933
0,0215
105
0,9608
1,000
0,0392
Tabel, nilai kritis dari uji Lilliefors : α = 0,01 , n = 15 Dtab = 0,257
maka terima H0
129
TEST BASED ON RUNS
Usually a sample that is taken from a population should be random.
The runs test evaluates the null hypothesis
H0 : the order of the sample data is random
The alternative hypothesis is simply the negation of H0. There is no
comparable parametric test to evaluate this null hypothesis.
The order in which the data is collected must be retained so that the
runs may be developed.
130
DEFINITIONS :
1. A run is defined as a sequence of the same symbols.
Two symbols are defined, and each sequence must contain a
symbol at least once.
2. A run of length j is defined as a sequence of j observations, all
belonging to the same group, that is preceded or followed by
observations belonging to a different group.
For illustration, the ordered sequence by the sex of the employee is
as follows :
FFF M FFF MM FF MMM FF M F MMMMM F
For the sex of the employee the ordered sequence exhibits runs of F’s
and M’s.
131
The sequence begins with a run of length three, followed by a run of
length one, followed by another run of length three, and so on.
The total number of runs in this sequence is 11.
Let R be the total number of runs observed in an ordered sequence of
n1 + n2 observations, where n1 and n2 are the respective sample sizes.
The possible values of R are 2, 3, 4, …. (n1 + n2 ).
The only question to ask prior to performing the test is, Is the sample
size small or large?
We will use the guideline that a small sample has n1 and n2 less than
or equal to 15.
In the table, gives the lower rL and upper rU values of the distribution
f(r) with α/2 = 0,025 in each tail.
132
f(r)
r
rL
AR
rU
If n1 or n2 exceeds 15, the sample is considered large, in which case
a normal approximation to f(r) is used to test H0 versus H1.
133
The mean and variance of R are determined to be
normal approximation
134
THE KRUSKAL - WALLIS H TEST
The Kruskal – Wallis H test is the nonparametric equivalent of the
Analysis of Variance F test.
It test the null hypothesis that all k populations possess the same
probability distribution against the alternative hypothesis that the
distributions differ in location – that is, one or more of the
distributions are shifted to the right or left of each other.
The advantage of the Kruskall – Wallis H test over the F test is that
we need make no assumptions about the nature of sampled
populations.
A completely randomized design specifies that we select
independent random samples of n1, n2 , …. nk observations from
the k populations.
135
To conduct the test, we first rank all :
n = n1 + n2 + n3 + … +nk observations and compute the rank sums,
R1 , R2 , …, Rk for the k samples.
The ranks of tied observations are averaged in the same manner as for
the WILCOXON rank sum test.
Then, if H0 is true, and if the sample sizes n1 , n2 , …, nk each equal 5
or more, then the test statistic is defined by :
will have a sampling distribution that can be approximated by a chisquare distribution with (k-1) degrees of freedom.
Large values of H imply rejection of H0 .
136
Therefore, the rejection region for the test is
,
where
is the value that located α in the upper tail of the
chi- square distribution.
The test is summarized in the following :
137
KRUSKAL – WALLIS H TEST
FOR COMPARING k POPULATION PROBABILITY DISTRIBUTIONS
H0 : The k population probability distributions are identical
H1 : At least two of the k population probability distributions
differ in location
Test statistic :
where,
ni = Number of measurements in sample i
Ri = Rank sum for sample i, where the rank of each measurement
is computed according to its relative magnitude in the totality
of data for the k samples.
138
n = Total sample size = n1 + n2 + … +nk
Rejection Region :
with (k-1) dof
Assumptions :
1. The k samples are random and independent
2. There are 5 or more measurements in each sample
3. The observations can be ranked
No assumptions have to be made about the shape of the population
probability distributions.
139
Example
Independent random samples of three different brands of magnetron
tubes (the key components in microwave ovens) were subjected to
stress testing, and the number of hours each operated without repair
was recorded. Although these times do not represent typical life
lengths, they do indicate how well the tubes can withstand extreme
stress. The data are shown in table (below). Experience has shown
that the distributions of life lengths for manufactured product are
often non normal, thus violating the assumptions required for the
proper use of an ANOVA F test.
Use the K-S H test to determine whether evidence exists to
conclude that the brands of magnetron tubes tend to differ in length
of life under stress.Test using α = 0,05
140
141
A
BRAND
B
C
36
48
5
67
53
49
33
60
2
55
71
31
140
59
42
Solusi
Lakukan ranking/peringkat dan jumlahkan peringkat dari 3 sample
tersebut.
A
peringkat
B
peringkat
C
peringkat
36
5
49
8
71
14
48
7
33
4
31
3
5
2
60
12
140
15
67
13
2
1
59
11
53
9
55
10
42
6
R1 = 36
R2 = 35
R3 = 49
H0 : the population probability distributions of length of life under
stress are identical for the three brands of magnetron tubes.
versus
H1 : at least two of the population probability distributions differ in
location
142
Test statistic :
H0 ???
f(H)
1,22
143
5,99
H