Download Statistical Inference with SCILAB

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistical Inference with SCILAB
By
Gilberto E. Urroz, Ph.D., P.E.
Distributed by
i nfoClearinghouse.com
©2001 Gilberto E. Urroz
All Rights Reserved
A "zip" file containing all of the programs in this document (and other
SCILAB documents at InfoClearinghouse.com) can be downloaded at the
following site:
http://www.engineering.usu.edu/cee/faculty/gurro/Software_Calculators/Scil
ab_Docs/ScilabBookFunctions.zip
The author's SCILAB web page can be accessed at:
http://www.engineering.usu.edu/cee/faculty/gurro/Scilab.html
Please report any errors in this document to: [email protected]
STATISTICAL INFERENCE
3
Definitions
3
Estimation of Confidence Intervals
Sampling distribution of the mean
Examples of confidence intervals for the mean
Confidence interval for proportions
Example of proportion confidence interval for a large sample
Example of proportion confidence interval for a small sample
Sampling Distribution of Differences and Sums of Statistics
Confidence intervals
Interval Estimation for the Variance
3
4
5
6
7
7
8
9
10
Hypothesis Testing
Procedure for hypothesis testing
Errors in hypothesis testing
Power of hypothesis testing
Selecting the values of α and β
12
12
13
13
14
Hypothesis testing involving mean values
Hypothesis testing on one mean
Case I: Knowing σ , or large sample if σ unknown
Case II: Small sample with unknown σ
A function to perform hypothesis testing on one mean
Examples of hypothesis testing on one mean
Hypothesis testing on one proportion
Examples of hypothesis testing on one proportion
14
14
14
15
16
18
21
22
Hypothesis testing on two means
Testing the difference between two means using known variances
Testing the differences between two means when the variances are unknown but equal
Testing the difference between two means when the variances are unknown and unequal
A user-defined SCILAB function for hypothesis testing on two means
Examples of application of function htestmu2
23
23
23
24
24
26
Testing the difference between two proportions
A function for hypothesis testing on two proportions
Examples of application of function htestmu2
33
34
35
Characteristic and power equations
35
Hypothesis testing on one variance
A function for hypothesis testing on one variance
Examples of application of function htestsigma1
37
38
39
Hypothesis testing on two variances
A function for hypothesis testing with two variances
Examples of application of function htestsigma2
40
41
43
Download at InfoClearinghouse.com
1
© 2001 Gilberto E. Urroz
All rights reserved
Chi-square criteria for goodness of fitting
Examples of goodness-of-fitting for the normal distribution
Examples of goodness-of-fitting for the beta distribution
48
49
50
Chi-square criteria for R×
×C tables
58
Exercises
60
REFERENCES (FOR ALL SCILAB DOCUMENTS AT
INFOCLEARINGHOUSE.COM)
Download at InfoClearinghouse.com
2
66
© 2001 Gilberto E. Urroz
All rights reserved
Statistical Inference
Statistical inference involves the analysis of estimators of population parameters based on the
statistics of samples, as well as the testing of hypotheses about those parameters. In this
chapter we define point estimators and learn how to produce confidence intervals about those
point estimators. We also introduce hypothesis testing on one or two means, and on one or
two variances. Finally, we present some applications of the Chi-square distribution for
statistical inference.
Definitions
A population constitutes the collection of all conceivable results of a random process, while a
sample is a sub-set of a population. Typically, it is very difficult or impractical to evaluate the
entire population for a given parameter. Therefore, we select one or more samples out of the
population to analyze. In order for the sample to be representative of the population, it must
be random, i.e., each element of the sample should have the same probability of being chosen.
If such condition is not fulfilled, the sample is said to be biased, and the information obtained
from such a sample will most likely be useless in estimating population parameters.
In Chapter … we introduced the concept of random variables and their probability distributions.
A measurement on a given population follows a given probability distribution. If the
distribution depends on a parameter θ, a random sample of observations { X1, X2, …, Xn } of size
n can be used to estimate θ. Each observation X1, X2, …, Xn, represents a random variable.
The joint probability distribution of the n observations is referred to as a sampling distribution.
A statistic of a sample is a function of the observations that does not contain any unknown
parameter, e.g., the mean of the sample. Statistics of a sample provide means of estimating
parameters of the population from which the sample originated. Thus, a single value of a given
sample statistic, say θˆ , constitutes a point estimator of the corresponding population
parameter, θ. A confidence interval is an interval that contains the parameter θ at a certain
level of probability.
Estimation of Confidence Intervals
A confidence interval is determined by two statistics, Cl and Cu, which define an interval
containing the parameter θ with a certain level of probability. The end points of the interval
are known as confidence limits, and the interval (Cl,Cu) is known as the confidence interval.
Let (Cl,Cu) be a confidence interval containing an unknown parameter θ. The confidence level
or confidence coefficient is the quantity (1- α), where 0 < α < 1, such that Pr[ Cl < θ < Cu ]
= 1 - α. This relationship defines two-sided confidence limits. A lower one-sided confidence
interval is defined by Pr[ Cl < θ ] = 1 - α . An upper one-sided confidence interval is
defined by Pr[ θ <
Cu ] = 1 - α . Typical values of α are 0.01, 0.05, 0.1, corresponding to
confidence levels of 0.99, 0.95, and 0.90, respectively.
Download at InfoClearinghouse.com
3
© 2001 Gilberto E. Urroz
All rights reserved
Sampling distribution of the mean
Let
be the mean of a random sample of size n drawn from a population with known
standard deviation σ . The 100(1- α ) % [i.e., 99%, 95%, 90%, etc.] central two-sided
confidence interval for the population mean µ is (
- z
α
2
zα
σ
,
n
+ z
α
2
σ
), where
n
α
is a standard normal variate that is exceeded with a probability of
. The location of
2
2
the value zα/2 is illustrated in the figure below using the plot of the standard normal probability
density function.
σ
. The one-sided upper and
n
lower 100(1- α ) % confidence limits for the population mean µ are, respectively:
The
standard error of the sample mean,
+
zα
σ
n
,is
and
=
-
zα
σ
.
n
The previous result assumes that the standard deviation of the population, σ, is known. If the
population standard deviation is not known the sample mean follows the Student’s t
distribution with ν = n − 1 degrees of freedom where n is the size of a random sample. If n
Download at InfoClearinghouse.com
4
© 2001 Gilberto E. Urroz
All rights reserved
> 30 then the Student t distribution can be approximated by the standard normal distribution,
φ( z ) . A sample with size n>30 is called a large sample.
Let
and S be the mean and standard deviation of a random sample of size n drawn from a
population that follows the normal distribution with unknown standard deviation σ . The
100(1- α ) % [i.e., 99%, 95%, 90%, etc.] central two-sided confidence interval for the population
mean µ is
(X − t
where t
n − 1,
α
2
α
n −1,
2
⋅
s
n
,X +t
α
n −1,
2
⋅
s
n
),
is Student's t variate with n − 1 degrees of freedom and probability
α
of
2
excedence.
The one-sided upper and lower 100(1- α ) % confidence limits for the population mean µ are,
respectively, as follows:
X +t
n −1,
α
2
⋅
s
n
X −t
, and
n −1,
α
2
⋅
s
n
.
Examples of confidence intervals for the mean
Example 1 - Known population variance. A sample of 25 fuses is used to determine the electric
current at which the fuse fails. The average current for the 25 fuses is calculated to be 180.5
mA. If the sample is known to come from a factory such that the standard deviation of the
current at failure point is 5 mA, determine the 95% confidence interval for the mean value of
the electric current.
The data provided is translated as n = 25,X = 180.5, σ = 5, α = 0.05. To calculate the
confidence interval we need to find zα/2 from P(Z>zα/2) = α/2, or P(Z>zα/2) = 1 - α/2, where Z ~
Normal(0,1), i.e., the standard normal distribution. To calculate zα/2 we can use SCILAB’s own
cdfnor function with the following call:
z_alpha_2 = cdfnor(“X”,0,1,1-alpha/2,alpha/2)
or, if you have the statistical toolbox STIXBOX available, you can use:
z_alpha_2 = qnorm(1-alpha/2,0,1)
The SCILAB calculations for this problem will proceed as follows:
-->n=25;xbar=180.5;sigma=5;alpha=0.05;
-->z_alpha_2=cdfnor('X',0,1,1-alpha/2,alpha/2)
z_alpha_2 = 1.959964
-->CL=xbar-z_alpha_2*sigma/sqrt(n)
CL = 178.54004
Download at InfoClearinghouse.com
5
© 2001 Gilberto E. Urroz
All rights reserved
-->CU=xbar+z_alpha_2*sigma/sqrt(n)
CU = 182.45996
Alternatively, za/2 can be calculated using STIXBOX’s function qnorm:
-->z_alpha_2 = qnorm(1-alpha/2,0,1)
z_alpha_2 = 1.959964
Example 2 - Small sample with unknown population variance. A sample of 10 carbon
composite cylinders indicate that the mean value of the carbon content in each cylinder is
0.65, with a sample standard deviation of 0.05. Determine the 90% confidence interval for the
carbon content.
The data provided is interpreted as follows: n=10,X = 0.65, s = 0.05, α = 0.10. To calculate
the confidence interval we need to find tn-1,α/2 from P(T>tα/2) = α/2, or P(T>tα/2) = 1 - α/2,
where T ~ Student t(ν = n-1), i.e., the Student t distribution with ν = n-1 degrees of freedom.
To calculate tn-1,α/2 we can use SCILAB’s own cdfnor function with the following call:
t_alpha_2=cdft("T",n-1,1-alpha/2,alpha/2)
or, if you have the statistical toolbox STIXBOX available, you can use:
t_alpha_2 = qt(1-alpha/2,n-1)
The SCILAB calculations for this problem will proceed as follows:
-->n=10;xbar=0.65;s=0.05;alpha=0.10;
-->t_alpha_2 = cdft('T',n-1,1-alpha/2,alpha/2)
t_alpha_2 = 1.8331129
-->CU=xbar+t_alpha_2*s/sqrt(n)
CU
.6789841
-->CL=xbar-t_alpha_2*s/sqrt(n)
CL =
.6210159
The value ta/2 can be obtained using STIXBOX’s function qt:
-->t_alpha_2 = qt(1-alpha/2,n-1)
t_alpha_2 = 1.8331129
Confidence interval for proportions
Let X ~ Bernoulli(p), where p is the probability of success, then E[X] = p, Var[X] = p(1-p). If an
experiment involving X is repeated n times and k successful outcomes are recorded, then an
^
estimate of p is given by
p=
^
k
, while the standard error of p is
n
=
p (1 − p)
. In
n
^
practice, the sample estimate for p, i.e., p , replaces p in the standard error formula. For a
large sample size, n>30, and np > 5 and n(1-p)>5, the sampling distribution is very nearly
Download at InfoClearinghouse.com
6
© 2001 Gilberto E. Urroz
All rights reserved
normal, i.e., The 100(1- α ) % central two-sided confidence interval for the population mean
p is ( ^p - zα/2 σ^p, ^p + zα/2 σ^p).
Example of proportion confidence interval for a large sample
Suppose an irrigation engineer keeps track of the number of days during a 90-day period in late
spring and early summer in which significant rainfall is available as to not needing to activate
an irrigation sprinkler system in an orchard. Observations taken at random during 70 days in
the last three summers indicate that enough rainfall was recorded only during 20 out of those
70 days. Determine the 90% confidence interval for the proportion p of the number of days
where enough rainfall is available in the orchard.
Estimate for p and the standard error are calculated as:
-->k=20;n=90;alpha=0.10;
-->p_hat = k/n
p_hat
= .2222222
-->sigma_p_hat =sqrt(p_hat*(1-p_hat)/n)
sigma_p_hat
= .0438228
The parameter z is obtained from:
α
2
-->z_alpha_2 = qnorm(1-alpha/2,0,1)
z_alpha_2
= 1.6448536
The lower and upper limits of the confidence interval are:
-->CL=p_hat-z_alpha_2*sigma_p_hat
CL = .1501401
-->CU=p_hat+z_alpha_2*sigma_p_hat
CU = .2943043
For a small sample, n<30, we can estimate a confidence interval using:
(
- t
n − 1,
α
2
,
+ t
n − 1,
α
2
).
Example of proportion confidence interval for a small sample
The same engineer has kept data belonging to the early fall rainfall. His records, however,
include only 25 days in the last three years, and they indicate that 20 out of those 25 days
there was sufficient rainfall as to turn off the sprinkler system. Estimate the 90% confidence
interval for the proportion of days with sufficient rainfall. The following is the SCILAB solution
for this problem:
-->k=20;n=25;alpha=0.10;
Download at InfoClearinghouse.com
7
© 2001 Gilberto E. Urroz
All rights reserved
-->p_hat = k/n
p_hat = .8
-->sigma_p_hat =sqrt(p_hat*(1-p_hat)/n)
sigma_p_hat = .08
-->t_alpha_2 = cdft('T',n-1,1-alpha/2,alpha/2)
t_alpha_2 = 1.7108821
-->CL=p_hat-t_alpha_2*sigma_p_hat
CL = .6631294
-->CU=p_hat+t_alpha_2*sigma_p_hat
CU =
.9368706
Sampling Distribution of Differences and Sums of Statistics
Let
S1 and S2 be independent statistics from two populations based on samples of sizes n1
and n2 , respectively. Also, let the respective means and standard errors of the sampling
distributions of those statistics be µS and µS , and σS and σS , respectively.
1
2
1
2
The differences between the statistics from the two populations have a sampling distribution
with mean
µS
1
−S
2
= µS − µS ,
1
2
and standard error
σS
The sum of the statistics
1
−S
=
2
2
σS + σS
1
2
.
2
S1 + S2 has a mean
µS
1
+S
2
= µS + µS ,
1
2
and standard error
σS
1
+S
=
2
2
σS + σS
1
2
2
Estimators for the mean and standard deviation of the difference and sum of the statistics
and
S1
S2 are given by:
µˆ S1 ± S2 = X 1 ± X 2 ;σˆ S1 ± S2 =
Download at InfoClearinghouse.com
8
σ S21
n1
+
σ S22
n2
.
© 2001 Gilberto E. Urroz
All rights reserved
Confidence intervals
For large
2
samples, i.e., 30 ≤ n1 and 30 ≤ n2 , and assuming that the variances σS and
1
2
σS are known, the confidence intervals for the difference and sum of the statistics S , S
1 2
2
are given by:

σ S21 σ S22
σ S21 σ S22 
(X − X ) − z
;
(
X
X
)
z
+
−
+
+
α/2
α /2
2
1
2
 1
n1
n2
n1
n2 


and

σ S21 σ S22
σ S21 σ S22 
(X + X ) − z
+
;
(
X
+
X
)
+
z
+
,
α /2
α /2
2
1
2
 1
n1
n2
n1
n2 


respectively.
If one of the samples is
2
small, i.e., n1 < 30 or n2 < 30 , or if the variances σS and
1
2
σS are unknown, the confidence intervals for the difference and sum of the statistics S , S
1 2
2
are given by:

S12 S 22
S12 S 22 
(X − X ) − t
+
−
+
+
; ( X 1 X 2 ) tν ,α / 2
ν ,α / 2
2
 1
n
n
n
n 2 
1
2
1

and

S12 S 22
S12 S 22 
(X + X ) − t
+
+
+
+
;
(
X
X
)
t
,
ν ,α / 2
ν ,α / 2
2
1
2
 1

n
n
n
n
1
2
1
2


respectively, where
ν = n1 + n2 − 2 is the number of degrees of freedom in the variate tν .
Examples of confidence intervals for sum and difference of means
An industrial process consists of two consecutive steps taking times X1 and X2, respectively, for
completion. Measurements from 20 repetitions indicate that the first step takes an average
ofX1 = 45 minutes with a standard deviation of S1 = 10 minutes, while measurements from 15
repetitions indicate that the second step takes an average of X2 = 65 minutes with a standard
Download at InfoClearinghouse.com
9
© 2001 Gilberto E. Urroz
All rights reserved
deviation of S2 = 5 minutes. Determine the 99% confidence interval for total process time, XT
= X1 + X2.
Using SCILAB we proceed as follows:
-->n1 = 20; X1bar = 45; S1 = 10; n2 = 15; X2bar = 65; S2 = 5;
-->XTbar = X1bar + X2bar
XTbar = 110.
-->sigmaTbar = sqrt(S1^2/n1+S2^2/n2)
sigmaTbar = 2.5819889
-->t_alpha_2 = cdft('T',n1+n2-1,1-alpha/2,alpha/2)
t_alpha_2 = 1.6909243
-->CL=XTbar-t_alpha_2*sigmaTbar
CL = 105.63405
-->CU=XTbar+t_alpha_2*sigmaTbar
CU = 114.36595
Interval Estimation for the Variance
Consider a random sample
X1, X2 , ..., Xn of independent normally distributed variables with
mean µ , variance σ , and sample mean
2
. The statistic
1 n
Sˆ 2 =
( X i − X )3
∑
n − 1 i =1
is an unbiased estimator of the variance σ . The quantity,
2
(n − 1)
Sˆ 2
1
= 2
2
σ
σ
n
∑(X
i =1
i
− X )2
follows the χ2 distribution with ν = n - 1 degrees of freedom.
The ( 1 − α )*100 % two-sided confidence interval for the variance σ2 is found from


Sˆ 2
P χ n2−1,1−α / 2 ≤ (n − 1) 2 ≤ χ n2−1,α / 2  = 1 − α ,
σ


as illustrated in the figure below.
Download at InfoClearinghouse.com
10
© 2001 Gilberto E. Urroz
All rights reserved
The confidence interval for the population variance σ
2
is therefore,
 (n − 1) Sˆ 2 (n − 1) Sˆ 2 

.
, 2
 χ2

χ
n −1,1−α / 2 
 n−1,α / 2
where χ α
2
and 1 −
2
and χ
α
1−
2
2
are the values that a χ
n−1
2
variable exceeds with probabilities
α
2
α
, respectively.
2
The one-sided upper confidence limit for σ
2
is defined as
(n − 1) Sˆ 2
.
χ n2−1,1−α
Two-sided and upper 99% confidence limit for the standard deviation
Suppose that the compressive strengths of 40 test concrete cubes have an estimated standard
deviation of 5.02 N/mm2. We will determine the two-sided and the upper 99% confidence
limits as follows:
-->n=40;s=5.02;alpha=0.01;
-->Chi_alpha_2 = cdfchi('X',n-1,1-alpha/2,alpha/2)
Chi_alpha_2 = 65.475571
-->Chi_1_alpha_2=cdfchi('X',n-1,alpha/2,1-alpha/2)
Chi_1_alpha_2 = 19.995868
Download at InfoClearinghouse.com
11
© 2001 Gilberto E. Urroz
All rights reserved
-->CL=(n-1)*s^2/Chi_alpha_2
CL = 15.010417
-->CU=(n-1)*s^2/Chi_1_alpha_2
CU = 49.150935
Hypothesis Testing
A hypothesis is a declaration made about a population (for instance, with respect to its mean).
Acceptance of the hypothesis is based on a statistical test on a sample taken from the
population. The consequent action and decision-making are called hypothesis testing.
The process consist on taking a random sample from the population and making a statistical
hypothesis about the population. If the observations do not support the model or theory
postulated, the hypothesis is rejected. However, if the observations are in agreement, then
hypothesis is not rejected, but it is not necessarily accepted.
Associated with the decision is a level of significance α . This is complementary to the
probability used earlier for setting confidence limits.
The initial assumption of a significance level removes any subjectivity in the decision making
process, i.e., two or more investigators will reach the same conclusion based on the same
data. Hypothesis testing therefore involves procedures for rejecting or not rejecting a
statement, and the chances of making incorrect decisions of either kind, i.e., rejecting if the
hypothesis is true or accepting if the hypothesis is false
Procedure for hypothesis testing
The procedure for hypothesis testing involves the following six steps:
1. Declare a null hypothesis,
H0 . This is the hypothesis to be tested. For example, H0 :
µ1 − µ2 = 0 , i.e., we hypothesize that the mean value of population 1 and the mean value of
population 2 are the same. If H0 is true, any observed difference in means is attributed to
errors in random sampling.
2. Declare an alternative hypothesis,
H1 . For the example under consideration, it could be
H1 : µ1 − µ2 ≠ 0 . [Note: this is what we really want to test.]
3. Determine or specify a test statistic, T. In the example under consideration T will be based
on X1 - X2, the difference of observed means.
4. Use the known (or assumed) distribution of the test statistic, T.
5. Define a rejection region (the critical region, R) for the test statistic based on a preassigned significance level α .
Download at InfoClearinghouse.com
12
© 2001 Gilberto E. Urroz
All rights reserved
6. Use observed data to determine whether the computed value of the test statistic is within or
outside the critical region. If the test statistic is within the critical region then we say that the
quantity we are testing is significant at the 100 α percent level.
Notes:
1. For the example under consideration, the alternate hypothesis
H1 : µ1 − µ2 ≠ 0 produces
what is called a two-tailed test. If the alternate hypothesis is H1 : 0 < µ1 − µ2 , or H1 :
µ1 − µ2 < 0 , then we have a one-tailed test.
2. The probability of rejecting the null hypothesis is equal to the level of significance, i.e.,
Pr[
|
H0 ] = α .
Errors in hypothesis testing
In hypothesis testing we use the terms errors of Type I and Type II to define the cases in which
a true hypothesis is rejected or a false hypothesis is accepted (not rejected), respectively. Let
T = value of test statistic, R = rejection region, A = acceptance region, thus,
, and
, where Ω = the parameter space for T. The probabilities of making an error of
Type I or of Type II are as follows:
Rejecting a true hypothesis,
P[Type I error] = P[
|
H0 ] = α
Not rejecting a false hypothesis,
P[Type II error] = P[
|
H1 ] = β .
Now, let's consider the cases in which we make the correct decision:
Not rejecting a true hypothesis,
P[Not(Type I error)] = P[
Rejecting a false hypothesis,
P[Not(Type II error)] = P [
|
H0 ] = 1 - α .
|
H1 ] = 1 - β .
Power of hypothesis testing
The complement of β is called the power of the test of the null hypothesis
alternative
H0 vs. the
H1 . The power of a test is used, for example, to determine a minimum sample
size to restrict errors.
Download at InfoClearinghouse.com
13
© 2001 Gilberto E. Urroz
All rights reserved
Selecting the values of α and β
A typical value of the level of significance (or probability of Type I error) is α = .05 , (i.e.,
incorrect rejection once in 20 times on the average). If the consequences of a Type I error are
more serious, choose smaller values of α , say 0.01 or even 0.001.
The value of β , i.e., the probability of making an error of Type II, depends on α , the sample
size n , and on the true value of the parameter tested. Thus, the value of β is determined
after the hypothesis testing is performed. It is customary to draw graphs showing β or the
power of the test ( 1 − β ) as a function of the true value of the parameter tested. These
graphs are called operating characteristic curves or power function curves, respectively.
Hypothesis testing involving mean values
Hypothesis testing on one mean
Suppose you want to test the hypothesis that the mean of a population is equal to a certain
value, i.e., H0 : µ = µ0 , at a significance level α .
We could use three different alternate hypothesis for the test. These are:
Two-tailed test:
One-tailed tests:
H1 : µ ≠ µ0
H1 : µ0 < µ , or H1 : µ < µ0 .
A sample of size n is taken from the population yielding a mean value, x, and a standard
deviation, sx .
Case I: Knowing σ , or
large sample if
σ
unknown
Assuming that we know the standard deviation of the population σ , we use the standard
normal score:
Z = (X- µ )/( σ /
as the test statistic. The particular value for the test is
n)
z0 = (x - µ0 )/( σ / n ) .
If the standard deviation of the population, σ , is unknown, but the sample is large ( 30 ≤ n ),
we can still use the standard normal score as the test statistic, but replacing σ with sx , i.e.,
z0 = (x - µ0 )/( sx / n )
Download at InfoClearinghouse.com
14
© 2001 Gilberto E. Urroz
All rights reserved
Let Φ( z ) be the CDF of the standard normal distribution, i.e., Z~N(0,1).
Two-tailed test
If using a two-tailed test we will find the value of z , from
α
2
α
Pr[Z> z ] = 1- Φ z  =
, or
α
α
2
 
2
 2
We will reject the null hypothesis,
α
Φ z α  = 1 − .
2
 
 2
H0 , if z < z or if z < −z .
α
α
0
0
2
2
In other words, the rejection region is R = { z < z }, while the acceptance region is A =
α
0
2
{ z < z }.
α
0
2
One-tailed test
If using a one-tailed test we will find the value of
Pr[Z>
Reject the null hypothesis,
µ < µ0 .
Case II:
zα , from
zα ] = 1- Φ( zα ) = α , or
Φ ( zα ) = 1 − α .
H0 , if zα < z0 , and H1 : µ0 < µ , or if z0 < −zα , and H1 :
Small sample with unknown
σ
If the standard deviation of the population, σ , is unknown and n < 30 (small sample), we
use the Student's t score:
t = (X- µ )/(
Sx / n ),
with ν = n − 1 degrees of freedom, as the test statistic. The particular value for the test is,
t0 = (x - µ0 )/( sx / n ) .
Let
Fν ( t ) be the CDF of the Student's t variate with ν degrees of freedom, i.e., t ~Student's
t( ν ).
Two-tailed test
If using a two-tailed test we will find the value of t , from
α
2
Download at InfoClearinghouse.com
15
© 2001 Gilberto E. Urroz
All rights reserved
α
Pr[t > t ] = F  t  =
, or
α
ν α
2
 
2
Reject the null hypothesis,

2

α
Fν  t α  = 1 − .
2
 
 2
H0 , if t < t or if t < −t .
α
α
0
0
2
2
In other words, the rejection region is R = { t < t }, while the acceptance region is A =
α
0
2
{ t < t }.
α
0
2
One-tailed test
If using a one-tailed test we will find the value of
Pr[t>
Reject the null hypothesis,
µ < µ0 .
tα , from
tα ] = 1- Fν ( tα ) = α , or
Fν ( tα ) = 1 − α .
H0 , if tα < t0 , and H1 : µ0 < µ , or if t0 < −tα , and H1 :
A function to perform hypothesis testing on one mean
The following function, htestmu1, can be used to perform hypothesis testing on one mean.
The possible calls to the function are:
[xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,xbar,s,sigma,n)
[xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,x)
A listing of the function follows:
function [xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,x,s,sigma,n)
//Hypothesis testing on one mean. Possible function calls:
//
// [xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,xbar,s,sigma,n)
// [xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,x)
//
// altype can be 'one' - for one-sided alternative hypothesis,
// or 'two' - for two sided alternative hypothesis
// alpha = level of significance (typical values = 0.01,0.05,0.10)
// mu0
= value of population mean being tested, H0:mu = mu0
// x
= mean value of a sample (xbar) or vector representing the sample
// if x = mean value, then s = standard deviation of sample
// if x = vector representing sample, s = standard deviation of population
// if x = mean value, sigma = standard deviation of population and
//
n = sample size
if altype<>'one' & altype<>'two' then
Download at InfoClearinghouse.com
16
© 2001 Gilberto E. Urroz
All rights reserved
error('htestmu1 - select type of alternative hypothesis = one or two');
abort;
end;
[nargout,nargin] = argn(0)
if nargin == 5 then
if length(x)<1 then
error('htestmu1 - x must be a vector');
abort;
else
sigma = s;
n
= length(x);
xbar = mean(x);
s
= st_deviation(x);
end;
else
xbar = x;
end;
printf(' \n');
printf('Hypothesis testing on one mean: ' ...
+ altype + '-side alternative hypothesis.\n')
printf(' \n');
if sigma > 0 & altype=='one' then
Ta = cdfnor('X',0,1,1-alpha,alpha);
T0 = (xbar-mu0)/(sigma/sqrt(n));
printf('Test parameter used: z');
elseif sigma >0 & altype=='two' then
Ta = cdfnor('X',0,1,1-alpha/2,alpha/2);
T0 = (xbar-mu0)/(sigma/sqrt(n));
printf('Test parameter used: z');
elseif sigma <=0 & n>=30 & altype=='one' then
Ta = cdfnor('X',0,1,1-alpha,alpha);
T0 = (xbar-mu0)/(s/sqrt(n));
printf('Test parameter used: z');
elseif sigma <=0 & n<30 & altype=='one' then
Ta = cdft('T',n-1,1-alpha,alpha);
T0 = (xbar-mu0)/(s/sqrt(n));
printf('Test parameter used: t');
elseif sigma <=0 & n>=30 & altype=='two' then
Ta = cdfnor('X',0,1,1-alpha/2,alpha/2);
T0 = (xbar-mu0)/(s/sqrt(n));
printf('Test parameter used: z');
else
Ta = cdft('T',n-1,1-alpha/2,alpha/2);
T0 = (xbar-mu0)/(s/sqrt(n));
printf('Test parameter used: t');
end;
if altype == 'two' then
if T0>Ta | T0<-Ta then
printf('Reject the null hypothesis H0 : mu = %f\n',mu0);
else
printf('Do not reject the null hypothesis H0 : mu = %f\n',mu0);
end
else
if T0>Ta then
printf('Reject the null hypothesis H0 : mu = %f\n',mu0);
printf('if the alternative hypothesis is H1 : mu > %f\n',mu0);
elseif T0<-Ta
printf('Reject the null hypothesis H0 : mu = %f\n',mu0);
Download at InfoClearinghouse.com
17
© 2001 Gilberto E. Urroz
All rights reserved
printf('if the alternative hypothesis is H1 : mu < %f\n',mu0);
else
printf('Do not reject the null hypothesis H0 : mu = %f\n',mu0);
end
end;
printf(' \n');
Examples of hypothesis testing on one mean
In the following examples values of n , x, sx , µ0 , and α are provided and the hypothesis
testing performed by using function htestmu1.
Example 1. Test the null hypothesis H0:µ = 2.0, against an one-sided alternative
hypothesis using data from a sample of size 15, with a sample mean of 2.5 and
sample standard deviation sx = 3.5 for a confidence level α = 0.05. Assume
that the population standard deviation is known, σ = 1.5.
-->n=15;xbar=2.5;sx=3.5;mu0=2.0;alpha=0.05;sigma=1.5;
-->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,xbar,sx,sigma,n)
Hypothesis testing on one mean: one-side alternative hypothesis.
Test parameter used: z
Do not reject the null hypothesis H0 : mu = 2.000000
T0 =
Ta =
s =
xbar
1.2909944
1.6448536
3.5
= 2.5
Example 2. Test the same null hypothesis as in Example 1 but using a two-sided alternative
hypothesis.
-->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,xbar,sx,sigma,n)
Hypothesis testing on one mean: two-side alternative hypothesis.
Test parameter used: z
Do not reject the null hypothesis H0 : mu = 2.000000
T0 = 1.2909944
Ta = 1.959964
s = 3.5
xbar = 2.5
Example 3. Test the null hypothesis H0:µ = 6.0 against a one-sided alternative hypothesis
based on a sample of size 45 (large sample) with a sample mean 0f 12.3 and a sample standard
deviation of 2.0 at a confidence level of 0.05. The population standard deviation is not known.
-->n=45;xbar=12.3;sx=2.0;mu0=6.0;alpha=0.05;sigma=0.0;
-->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,xbar,sx,sigma,n)
Hypothesis testing on one mean: one-side alternative hypothesis.
Download at InfoClearinghouse.com
18
© 2001 Gilberto E. Urroz
All rights reserved
Test parameter used: z
Reject the null hypothesis H0 : mu = 6.000000
if the alternative hypothesis is H1 : mu > 6.000000
T0 = 21.130842
Ta = 1.6448536
s = 2.
xbar = 12.3
Example 4. Test the null hypothesis of Example 3 against a two-sided hypothesis.
-->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,xbar,sx,sigma,n)
Hypothesis testing on one mean: two-side alternative hypothesis.
Test parameter used: z
Reject the null hypothesis H0 : mu = 6.000000
T0 = 21.130842
Ta = 1.959964
s = 2.
xbar = 12.3
Example 5. Test the null hypothesis H0:µ = 14.0 against a one-sided alternative hypothesis
based on a sample of size 10 (small sample) with a sample mean of 11 and a sample standard
deviation of 1.5 at a level of significance of 0.01. The standard deviation of the population is
unknown.
-->n=10;xbar=11;sx=1.5;mu0=14.0;alpha=0.01;sigma=0.0;
-->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,xbar,sx,sigma,n)
Hypothesis testing on one mean: one-side alternative hypothesis.
Test parameter used: t
Reject the null hypothesis H0 : mu = 14.000000
if the alternative hypothesis is H1 : mu < 14.000000
T0 = - 6.3245553
Ta =
2.8214379
s =
1.5
xbar = 11.
Example 6. Test the null hypothesis of Example 5 against a two-sided alternative hypothesis.
-->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,xbar,sx,sigma,n)
Hypothesis testing on one mean: two-side alternative hypothesis.
Test parameter used: t
Reject the null hypothesis H0 : mu = 14.000000
T0 = - 6.3245553
Ta =
3.2498355
s = 1.5
xbar = 11.
Download at InfoClearinghouse.com
19
© 2001 Gilberto E. Urroz
All rights reserved
Example 7. For the vector of data, X, generated below, test the null hypothesis H0:µ = 23
against a one-sided alternative hypothesis at the significance level α = 0.01. The population
standard deviation is assumed to be known, σ = 5.
-->X = int(100*rand(1,10))
X =
!
30.
93.
21.
31.
36.
29.
56.
48.
33.
59. !
-->alpha = 0.01;mu0=23;sigma=5;
-->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,X,sigma)
Hypothesis testing on one mean: one-side alternative hypothesis.
Test parameter used: z
Reject the null hypothesis H0 : mu = 23.000000
if the alternative hypothesis is H1 : mu > 23.000000
T0 = 13.028584
Ta =
2.3263479
s =
21.313532
xbar = 43.6
Example 8. Test the null hypothesis of Example 7 against a two-sided alternative hypothesis.
-->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,X,sigma)
Hypothesis testing on one mean: two-side alternative hypothesis.
Test parameter used: z
Reject the null hypothesis H0 : mu = 23.000000
T0 = 13.028584
Ta = 2.5758293
s = 21.313532
xbar = 43.6
-->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,X,-1)
Example 9. Test the null hypothesis of Example 7 against a one-sided alternative hypothesis
assuming that the population standard deviation is not known.
Hypothesis testing on one mean: one-side alternative hypothesis.
Test parameter used: t
Reject the null hypothesis H0 : mu = 23.000000
if the alternative hypothesis is H1 : mu > 23.000000
T0 = 3.0564112
Ta = 2.8214379
s = 21.313532
xbar = 43.6
Example 10. Test the null hypothesis of Example 7 against a two-sided alternative hypothesis
assuming that the population standard deviation is not known.
-->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,X,-1)
Download at InfoClearinghouse.com
20
© 2001 Gilberto E. Urroz
All rights reserved
Hypothesis testing on one mean: two-side alternative hypothesis.
Test parameter used: t
Do not reject the null hypothesis H0 : mu = 23.000000
T0 = 3.0564112
Ta = 3.2498355
s = 21.313532
xbar = 43.6
Hypothesis testing on one proportion
Suppose that we want to test the null hypothesis,
H0 : p = p0 , where p represents the
probability of obtaining a successful outcome in any given repetition of a Bernoulli trial. To
test the hypothesis, we perform n repetitions of the experiment, and find that k successful
outcomes are recorded. Thus, an estimate of p is given by
^p = k/n.
The standard deviation for the sample will be estimated as
sp =
pˆ ⋅ (1 − pˆ )
=
n
k (n − k)
n3
Assume that the Z score,
Z = ( p - p0 )/ sp
follows the standard normal distribution, i.e., Z ~ N(0,1). The particular value of the statistic
to test is
z0 = (^p-p0)/sp.
We could use three different alternate hypothesis for the test. These are:
Two-tailed test:
One-tailed tests:
H1 : p ≠ p0
H1 : p0 < p , or H1 : p < p0 .
Two-tailed test
If using a two-tailed test we will find the value of z , from
α
2
α
Pr[Z> z ] = 1- Φ z  =
, or
α
2
 α 
2
 2
Reject the null hypothesis,
α
Φ z α  = 1 − .
2
 
 2
H0 , if z < z or if z < −z .
α
α
0
0
2
Download at InfoClearinghouse.com
2
21
© 2001 Gilberto E. Urroz
All rights reserved
In other words, the rejection region is R = { z < z }, while the acceptance region is A =
α
0
2
{ z < z }.
α
0
2
One-tailed test
If using a one-tailed test we will find the value of
Pr[Z>
Reject the null hypothesis,
p < p0 .
zα , from
zα ] = 1- Φ( zα ) = α , or
Φ ( zα ) = 1 − α .
H0 , if zα < z0 , and H1 : p0 < p , or if z0 < −zα , and H1 :
Examples of hypothesis testing on one proportion
Example 1. Test the null hypothesis H0:p0 = 0.25 against a one-sided alternative hypothesis
based on 100 repetitions of a test out of which 20 successful outcomes are recorded using a
significance level of 0.05.
-->k=20;n=100;p0=0.25;alpha=0.05;
-->[p_hat,sigma,Ta,T0]=htestprop1('one',alpha,p0,k,n)
Hypothesis testing on one proportion: one-side alternative hypothesis.
Test parameter used: z
Do not reject the null hypothesis H0 : p =
.250000
T0 = - 1.25
Ta = 1.6448536
sigma = .04
p_hat = .2
Example 2. Test the same null hypothesis as in Example 1 against a two-sided alternative
hypothesis.
-->[p_hat,sigma,Ta,T0]=htestprop1('two',alpha,p0,k,n)
Hypothesis testing on one proportion: two-side alternative hypothesis.
Test parameter used: z
Do not reject the null hypothesis H0 : p =
.250000
T0 = - 1.25
Ta =
1.959964
sigma = .04
p_hat = .2
Download at InfoClearinghouse.com
22
© 2001 Gilberto E. Urroz
All rights reserved
Hypothesis testing on two means
Assume that we have two populations, population 1 and population 2, with mean values
µ1
and
µ2 , respectively, and with standard deviations, σ1 and σ2 , respectively. A sample of
size
n1 is taken out of population 1 yielding a mean value  x1 and standard deviation s1 .
Similarly, a sample of size
standard deviation
n2 is taken out of population 2 yielding a mean value  x2 and
s2 .
Testing the difference between two means using known variances
If both population 1 and population 2 are normal, the statistic
Z=
( X 1 − X 2 ) − ( µ1 − µ 2 )
σ X1 − X 2
has a N(0,1) distribution. The standard error of the difference between the two means is:
σ1
=
2
n1
+
σ2
2
n2
.
The criteria for rejection of the null hypothesis, H0:µ1− µ 2 = δ, is the same as for a single mean
value, µ = µ1− µ 2 = δ.
Testing the differences between two means when the variances are unknown but equal
This could be the case in which the two samples are taken from the same population, or when
there is evidence that the standard deviation of two different populations are the same. In
this case, we obtain a "pooled estimate" of the common standard deviation of the two
populations, σ , as:
2
2
sp =
( n1 − 1 ) s1 + ( n2 − 1 ) s2
n1 + n2 − 2
2
.
Then, the random variable
has the Student's t distribution with
Download at InfoClearinghouse.com
ν = n1 + n2 − 2 degrees of freedom.
23
© 2001 Gilberto E. Urroz
All rights reserved
Testing the difference between two means when the variances are unknown and unequal
For observations taken from normal populations with unknown and unequal variances, the
statistic
has an approximate Student's t distribution with
degrees of freedom.
For the last two cases in which a t-parameter is used for the test, the criteria for rejection of
the null hypothesis, H0:µ1− µ 2 = δ, is the same as for a single mean value, µ = µ1− µ 2 = δ.
A user-defined SCILAB function for hypothesis testing on two means
The procedure for hypothesis testing on two means is coded in the following function,
htestmu2, which has possible calls
[X1Info,X2Info,sp,nu,T0,Ta] = htestmu2()
[X1Info,X2Info,sp,nu,T0,Ta] = htestmu2(Xdata)
[X1Info,X2Info,sp,nu,T0,Ta] = htestmu2(X1data,X2data)
Xdata, X1data, and X2data are row vectors of data. X1Info is a vector that contains the sample
size, n1, the mean value, x1bar, and the standard deviation, s1, of sample 1. X2Info is a
vector containing n2, x2bar, and s2. The value sp represents the standard deviation for the
two samples which could be the value σX1-X2 or sp, as defined above. The value nu represents
the degrees of freedom of the Student’s t distribution, is a t parameter is used in the test. T0
is the actual value of the z or t parameter used in the test. Ta represents zα, zα/2, tα, or tα/2,
depending on the test parameter used and on the type of alternative hypothesis (one- or twsided) used.
The function operates interactively requesting information from the user and provides verbose
information on the type of test parameter and alternative hypothesis used, as well as providing
a recommendation about the rejection or not-rejection of the null hypothesis.
If the function call [X1Info,X2Info,sp,nu,T0,Ta] = htestmu2() is used, the user will be prompted
for the summary information on the samples, i.e., the samples sizes, mean values, and
standard deviations. If the function call [X1Info,X2Info,sp,nu,T0,Ta] = htestmu2(Xdata) is
used, the user is asked to identify the vector Xdata as sample 1 or sample 2, and then is
prompted for the summary data for the other sample. Finally, if the function call
[X1Info,X2Info,sp,nu,T0,Ta] = htestmu2(X1data,X2data) is used, the function calculates the
sample summary data all by itself.
The function will also prompt the user for the following information:
•
The difference of means to be tested, i.e., δ = µ1 − µ2
Download at InfoClearinghouse.com
24
© 2001 Gilberto E. Urroz
All rights reserved
•
•
•
•
The level of confidence of the test, i.e., α
The type of alternative hypothesis to be used, i.e., one- or two-sided
The standard deviation of the populations, σ1 and σ 2, if known
For unknown σ1 and σ 2, the function asks if the user suspects if the values of σ1 and σ2,
are equal or not. This helps the function select the t parameter to use.
A listing of the function is shown below.
function [X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1,X2)
[nargout,nargin]=argn(0)
if nargin == 0 then
X1in = input('For sample 1 enter n, xbar, s :')
n1 = X1in(1);x1bar = X1in(2);s1 = X1in(3);
X2in = input('For sample 2 enter n, xbar, s :')
n2 = X2in(1);x2bar = X2in(2);s2 = X2in(3);
elseif nargin == 1 then
disp('You entered a vector as input to the function.')
disp('Do you want this vector to represent sample 1 or 2?')
idsample = input(' ')
if idsample == 1 then
n1 = length(X1); x1bar = mean(X1); s1 = st_deviation(X1);
printf('n1 = %g
x1bar = %g
s1 = %g',n1,x1bar,s1)
X2in = input('For sample 2 enter n, xbar, s :')
n2 = X2in(1);x2bar = X2in(2);s2 = X2in(3);
else
n2 = length(X1); x2bar = mean(X1); s2 = st_deviation(X1);
printf('n2 = %g
x2bar = %g
s2 = %g',n2,x2bar,s2)
X1in = input('For sample 1 enter n, xbar, s :')
n1 = X1in(1);x1bar = X1in(2);s1 = X1in(3);
end
else
n1 = length(X1); x1bar = mean(X1); s1 = st_deviation(X1);
printf('n1 = %g
x1bar = %g
s1 = %g',n1,x1bar,s1)
n2 = length(X2); x2bar = mean(X2); s2 = st_deviation(X2);
printf('n2 = %g
x2bar = %g
s2 = %g',n2,x2bar,s2)
end
X1Info = [n1,x1bar,s1]; X2Info = [n2,x2bar,s2];
delta = ...
input('Enter the difference of population means to be tested:');
disp('Enter the level of confidence, alpha, for the test:')
disp('(Typical values: 0.01, 0.05, 0.10)')
alpha = input(' ');
disp('Enter the type of alternative hypothesis to test:')
disp('
1 - one-sided
2 - two-sided');
atype = input(' ');
disp('Enter population standard deviations, sigma1 & sigma2.')
disp('Note: Enter zero if sigma1 or sigma2 are unknown.')
sigmas = input('');
if sigmas == 0 then
sigma1 = 0; sigma2 = 0;
else
sigma1 = sigmas(1); sigma2 = sigmas(2);
end
if sigma1<=0 | sigma2<=0 then
Download at InfoClearinghouse.com
25
© 2001 Gilberto E. Urroz
All rights reserved
disp('Do you suspect that the unknown population standard')
disp('deviations are equal? If so enter 1, otherwise enter 0')
answer = input('')
if answer == 1 then
nu=int(((s1^2/n1)+(s2^2/n2))^2/((s1^2/n1)^2/(n1-1)+(s2^2/n2)^2/(n2-1)));
sp=sqrt(s1^2/n1+s2^2/n2);
T0=((x1bar-x2bar)-delta)/sp;ttype = ' t';
else
nu=n1+n2-2;
sp=sqrt(((n1-1)*s1^2+(n2-1)*s2^2)/nu);
T0=((x1bar-x2bar)-delta)/(sp*sqrt(1/n1+1/n2));ttype = ' t';
end
if atype == 1 then
Ta = cdft('T',nu,1-alpha,alpha)
else
Ta = cdft('T',nu,1-alpha/2,alpha/2)
end
else
sp=sqrt(sigma1^2/n1+sigma2^2/n2);
T0=((x1bar-x2bar)-delta)/sp;ttype = ' z';nu=[];
if atype == 1 then
Ta = cdfnor('X',0,1,1-alpha,alpha)
else
Ta = cdfnor('X',0,1,1-alpha/2,alpha/2)
end
end;
if atype == 1 then
printf('Hypothesis testing on two means: one-sided test.\n')
printf('Test parameter used:' + ttype +'\n');printf(' \n');
if T0<-Ta then
printf('Reject the null hypothesis H0:mu1-mu2=%f, \n',delta);
printf('if the alternative hypothesis is H1:mu1-mu2<%f. \n',delta);
elseif T0>Ta
printf('Reject the null hypothesis H0:mu1-mu2=%f, \n',delta);
printf('if the alternative hypothesis is H1:mu1-mu2>%f. \n',delta);
else
printf('Do not reject the null hypothesis H0:mu1-mu2=%f, \n',delta);
end
else
printf('Hypothesis testing on two means: two-sided test.\n')
printf('Test parameter used:' + ttype +'\n');printf(' \n');
if T0<-Ta | T0>Ta then
printf('Reject the null hypothesis H0:mu1-mu2=%f, \n',delta);
else
printf('Do not reject the null hypothesis H0:mu1-mu2=%f, \n',delta);
end
end;
Examples of application of function htestmu2
Example 1. Two samples taken from two different populations are described by the statistics
n1 = 100,x1 = 2.3, n2 = 75,x2 = 2.5. The populations are known to have the standard
deviations σ1 = 5.5 and σ 2 = 3.0. Test the null hypothesis H0:µ1-µ2 = 0 (i.e., δ = 0) at the level
of confidence α = 0.05 against (a) a two-sided alternative hypothesis, and (b) a one-sided
alternative hypothesis. [Note: since the populations’ standard deviations are given, we do not
need to use the samples’ standard deviation (which are not given, anyway). In this case we
Download at InfoClearinghouse.com
26
© 2001 Gilberto E. Urroz
All rights reserved
simply enter them as zero when prompted by function htestmu2.] The user’s input requested
by function htestmu2 is shown in italics:
-->Solution to Example 1 - part (a)
-->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2()
For sample 1 enter n, xbar, s :
100, 2.3, 0.0
For sample 2 enter n, xbar, s :
75, 2.5, 0.0
Enter the difference of population means to be tested:
0
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.05
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Enter population standard deviations, sigma1 & sigma2.
Note: Enter zero if sigma1 or sigma2 are unknown.
5.5 3.0
Hypothesis testing on two means: two-sided test.
Test parameter used: z
Do not reject the null hypothesis H0:mu1-mu2=0.000000,
Ta =
1.959964
T0 = - .3076923
nu = []
sp =
.65
X2Info = !
75.
2.5
0. !
X1Info = !
100.
2.3
0. !
-->Solution to Example 1 - part (b)
-->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2()
For sample 1 enter n, xbar, s :
100, 2.3, 0.0
For sample 2 enter n, xbar, s :
75, 2.5, 0.0
Enter the difference of population means to be tested:
0
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.05
Enter the type of alternative hypothesis to test:
1 - one-sided
Download at InfoClearinghouse.com
2 - two-sided
27
© 2001 Gilberto E. Urroz
All rights reserved
1
Enter population standard deviations, sigma1 & sigma2.
Note: Enter zero if sigma1 or sigma2 are unknown.
5.5 3.0
Hypothesis testing on two means: one-sided test.
Test parameter used: z
Do not reject the null hypothesis H0:mu1-mu2=0.000000,
Ta =
1.6448536
T0 = - .3076923
nu =
[]
sp = .65
X2Info = !
75.
2.5
0. !
X1Info = !
100.
2.3
0. !
Example 2. Sample 1 is given by X1 = [2.4,3.2,1.1,2.5,4.2,3.6], while sample 2 is characterized
by the statistics n2 = 8,x2 = 3.2, s2 = 0.5. Test the null hypothesis H0:µ1-µ2 = 0 (i.e., δ = 0) at
the level of confidence α = 0.10 against (a) a two-sided alternative hypothesis, and (b) a onesided alternative hypothesis. The standard deviations of the populations are unknown, and
suspected to be different.
-->//Example 2 - part(a)
-->X1 = [2.4,3.2,1.1,2.5,4.2,3.6]
X1 = !
2.4
3.2
1.1
2.5
4.2
3.6 !
-->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1)
You entered a vector as input to the function.
Do you want this vector to represent sample 1 or 2?
1
n1 = 6
x1bar = 2.83333
s1 = 1.08566
For sample 2 enter n, xbar, s :
8 3.2 0.5
Enter the difference of population means to be tested:
0
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.10
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Enter population standard deviations, sigma1 & sigma2.
Note: Enter zero if sigma1 or sigma2 are unknown.
0
Do you suspect that the unknown population standard
deviations are equal? If so enter 1, otherwise enter 0
0
Download at InfoClearinghouse.com
28
© 2001 Gilberto E. Urroz
All rights reserved
Hypothesis testing on two means: two-sided test.
Test parameter used: t
Do not reject the null hypothesis H0:mu1-mu2=0.000000,
Ta =
1.7822876
T0 = - .8507016
nu =
12.
sp =
.7980880
X2Info =
!
8.
3.2
.5 !
X1Info =
!
6.
2.8333333
1.0856642 !
-->//Example 2 - part(b)
-->X1 = [2.4,3.2,1.1,2.5,4.2,3.6]
X1 = !
2.4
3.2
1.1
2.5
4.2
3.6 !
-->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1)
You entered a vector as input to the function.
Do you want this vector to represent sample 1 or 2?
1
n1 = 6
x1bar = 2.83333
s1 = 1.08566
For sample 2 enter n, xbar, s :
8 3.2 0.5
Enter the difference of population means to be tested:
0
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.10
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Enter population standard deviations, sigma1 & sigma2.
Note: Enter zero if sigma1 or sigma2 are unknown.
0
Do you suspect that the unknown population standard
deviations are equal? If so enter 1, otherwise enter 0
0
Hypothesis testing on two means: one-sided test.
Test parameter used: t
Do not reject the null hypothesis H0:mu1-mu2=0.000000,
Ta =
1.3562173
T0 = - .8507016
nu = 12.
sp =
.7980880
X2Info = !
8.
3.2
.5 !
X1Info = !
6.
2.8333333
1.0856642 !
Example 3. Sample 2 is given by X2 = [12.4,13.2,11.1,12.5,14.2,13.6], while sample 1 is
characterized by the statistics n1 = 15,x1 = 16.2, s1 = 2.0. Test the null hypothesis H0:µ1-µ2 =
Download at InfoClearinghouse.com
29
© 2001 Gilberto E. Urroz
All rights reserved
0 (i.e., δ = 0) at the level of confidence α = 0.01 against (a) a two-sided alternative hypothesis,
and (b) a one-sided alternative hypothesis. The standard deviations of the populations are
unknown, and suspected to be the same.
-->//Example 3 - part(a)
-->X2= [12.4,13.2,11.1,12.5,14.2,13.6]
X2 =
!
12.4
13.2
11.1
12.5
14.2
13.6 !
-->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X2)
You entered a vector as input to the function.
Do you want this vector to represent sample 1 or 2?
n2 = 6
x2bar = 12.8333
s2 = 1.08566
For sample 1 enter n, xbar, s :
15 16.2
2.0
Enter the difference of population means to be tested:
0
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.01
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Enter population standard deviations, sigma1 & sigma2.
Note: Enter zero if sigma1 or sigma2 are unknown.
0
Do you suspect that the unknown population standard
deviations are equal? If so enter 1, otherwise enter 0
1
Hypothesis testing on two means: two-sided test.
Test parameter used: t
Reject the null hypothesis H0:mu1-mu2=0.000000,
Ta = 2.9207816
T0 = 4.9471778
nu = 16.
sp = .6805227
X2Info = !
6.
12.833333
1.0856642 !
X1Info = !
15.
16.2
2. !
-->//Example 3 - part(b)
-->X2= [12.4,13.2,11.1,12.5,14.2,13.6]
X2 =
!
12.4
13.2
11.1
Download at InfoClearinghouse.com
12.5
14.2
13.6 !
30
© 2001 Gilberto E. Urroz
All rights reserved
-->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X2)
You entered a vector as input to the function.
Do you want this vector to represent sample 1 or 2?
n2 = 6
x2bar = 12.8333
s2 = 1.08566
For sample 1 enter n, xbar, s :
15 16.2
2.0
Enter the difference of population means to be tested:
0
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.01
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
1
Enter population standard deviations, sigma1 & sigma2.
Note: Enter zero if sigma1 or sigma2 are unknown.
0
Do you suspect that the unknown population standard
deviations are equal? If so enter 1, otherwise enter 0
1
Hypothesis testing on two means: one-sided test.
Test parameter used: t
Reject the null hypothesis H0:mu1-mu2=0.000000,
if the alternative hypothesis is H0:mu1-mu2>0.000000.
Ta = 2.5834872
T0 = 4.9471778
nu = 16.
sp = .6805227
X2Info = !
6.
12.833333
1.0856642 !
X1Info = !
15.
16.2
2. !
Example 4. Given samples X1 = [3.2, 3.1, 3.0, 3.2], and X2 = [2.8,3.0, 2.9, 2.7, 3.1], test the
null hypothesis H0:µ1-µ2 = 0 (i.e., δ = 0) at the level of confidence α = 0.01 against (a) a twosided alternative hypothesis, and (b) a one-sided alternative hypothesis. The standard
deviations of the populations are unknown, and suspected to be the different.
-->//Example 4 - part(a)
-->X1=[3.2,3.1,3.0,3.2], X2=[2.8,3.0,2.9,2.7,3.1]
X1 = !
3.2
3.1
3.
3.2 !
X2 = !
2.8
3.
2.9
2.7
3.1 !
-->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1,X2)
n1 = 4
x1bar = 3.125
s1 = .0957427
n2 = 5
x2bar = 2.9
s2 = .15811
Download at InfoClearinghouse.com
31
© 2001 Gilberto E. Urroz
All rights reserved
Enter the difference of population means to be tested:
0
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.01
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Enter population standard deviations, sigma1 & sigma2.
Note: Enter zero if sigma1 or sigma2 are unknown.
0
Do you suspect that the unknown population standard
deviations are equal? If so enter 1, otherwise enter 0
0
Hypothesis testing on two means: two-sided test.
Test parameter used: t
Do not reject the null hypothesis H0:mu1-mu2=0.000000,
Ta = 3.4994833
T0 = 2.4852506
nu = 7.
sp =
.1349603
X2Info = !
5.
2.9
.1581139 !
X1Info = !
4.
3.125
.0957427 !
-->X1=[3.2,3.1,3.0,3.2], X2=[2.8,3.0,2.9,2.7,3.1]
X1 = !
3.2
3.1
3.
3.2 !
X2 = !
2.8
3.
2.9
2.7
3.1 !
-->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1,X2)
n1 = 4
x1bar = 3.125
s1 = .0957427
n2 = 5
x2bar = 2.9
s2 = .15811
Enter the difference of population means to be tested:
0
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.01
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Enter population standard deviations, sigma1 & sigma2.
Note: Enter zero if sigma1 or sigma2 are unknown.
0
Do you suspect that the unknown population standard
deviations are equal? If so enter 1, otherwise enter 0
Download at InfoClearinghouse.com
32
© 2001 Gilberto E. Urroz
All rights reserved
0
Hypothesis testing on two means: one-sided test.
Test parameter used: t
Do not reject the null hypothesis H0:mu1-mu2=0.000000,
Ta = 2.9979516
T0 = 2.4852506
nu = 7.
sp = .1349603
X2Info = !
5.
2.9
.1581139 !
X1Info = !
4.
3.125
.0957427 !
Testing the difference between two proportions
Suppose that we want to test the null hypothesis,
H0 : p1 − p2 = p0 , where the p 's
represents the probability of obtaining a successful outcome in any given repetition of a
Bernoulli trial for two populations 1 and 2. To test the hypothesis, we perform n1 repetitions
of the experiment from population 1, and find that
Thus, an estimate of
k1 successful outcomes are recorded.
p1 and p2 are given, respectively, by
^p1 = k1/n1 and ^p2 = k2/n2.
The standard deviations for the samples will be estimated, respectively, as
s1 =
pˆ 1 ⋅ (1 − pˆ 1 )
=
n1
k1 (n1 − k1 )
, and s 2 =
n13
pˆ 2 ⋅ (1 − pˆ 2 )
=
n2
k 2 (n2 − k 2 )
.
n23
and the standard deviation of the difference of proportions is estimated from:
2
2
sp = s1 + s2
2
Assume that the Z score,
Z = (( p1 − p2 )- p0 )/ sp
follows the standard normal distribution, i.e., Z ~ N(0,1). The particular value of the statistic
to test is
k1
z0 =
n1
−
k2
n2
sp
− p0
,
We could use three different alternate hypothesis for the test. These are:
Two-tailed test:
H1 : p1 − p2 ≠ p0
Download at InfoClearinghouse.com
33
© 2001 Gilberto E. Urroz
All rights reserved
H1 : p0 < p1 − p2 , or H1 : p1 − p2 < p0 .
One-tailed tests:
Two-tailed test
If using a two-tailed test we will find the value of z , from
α
2
α
Pr[Z> z ] = 1- Φ z  =
, or
α
α
2
 
2
 2
Reject the null hypothesis,
α
Φ z α  = 1 − .
2
 
 2
H0 , if z < z or if z < −z .
α
α
0
0
2
2
In other words, the rejection region is R = { z < z }, while the acceptance region is A =
α
0
2
{ z < z }.
α
0
2
One-tailed test
If using a one-tailed test we will find the value of
Pr[Z>
Reject the null hypothesis,
H1 : p1 − p2 < p0 .
zα , from
zα ] = 1- Φ( zα ) = α , or
Φ ( zα ) = 1 − α .
H0 , if zα < z0 , and H1 : p0 < p1 − p2 , or if z0 < −zα , and
A function for hypothesis testing on two proportions
Function htestprop2 performs hypothesis testing on two proportions, based on measurements
that show k1 successful outcomes out of n1 repetitions, and k2 successful outcomes out of n2
repetitions. The null hypothesis is H0:p1-p2=p0.
function [p1,p2,s1,s2,sp,z0,za] = htestprop2(atype,k1,n1,k2,n2,p0,alpha)
//Hypothesis testing in two proportions. Test the null hypothesis
//H0:p1-p2=p0, given k1, k2 successful outcomes out of n1, n2
//repetitions, respectively. Significance level = alpha.
//Variable atype represents the type of alternative hypothesis, i.e.,
//atype = 'one' for one-sided test, atype = 'two' for two-sided test
p1 = k1/n1; p2 = k2/n2;
s1 = sqrt(p1*(1-p1)/n1);
s2 = sqrt(p2*(1-p2)/n2);
sp = sqrt(s1^2+s2^2);
z0 = (p1-p2-p0)/sp;
printf('Hypothesis testing on two proportions:'+atype+'-sided test.')
if atype == 'one' then
za = cdfnor('X',0,1,1-alpha,alpha)
if z0>za then
printf('Reject the null hypothesis H0:p1-p2=%g \n',p0)
printf('if the alternative hypothesis is H1:p1-p2>%g \n',p0)
elseif z0<-za then
Download at InfoClearinghouse.com
34
© 2001 Gilberto E. Urroz
All rights reserved
printf('Reject the null hypothesis H0:p1-p2=%g \n',p0)
printf('if the alternative hypothesis is H1:p1-p2<%g \n',p0)
else
printf('Do not reject the null hypothesis H0:p1-p2=%g \n',p0)
end
else
za = cdfnor('X',0,1,1-alpha/2,alpha/2)
if z0>za | z0<-za then
printf('Reject the null hypothesis H0:p1-p2=%g \n',p0)
else
printf('Do not reject the null hypothesis H0:p1-p2=%g \n',p0)
end
end
Examples of application of function htestmu2
Example 1. Test the null hypothesis H0:p1-p2 = 0 at a significance level a = 0.05 based on the
values k1 = 33, n1 = 90, k2 = 44, n2 = 100. (a) Use a one-sided test. (b) Use a two-sided test.
-->getf('htestprop2')
-->//part (a)
-->[p1,p2,s1,s2,sp,z0,za]=htestprop2('one',10,200,45,100,0.3,0.1)
Hypothesis testing on two proportions:one-sided test.
Reject the null hypothesis H0:p1-p2= .3
if the alternative hypothesis is H1:p1-p2< .3
za =
1.2815516
z0 = - 13.44043
sp =
.0520817
s2 =
.0497494
s1 =
.0154110
p2 =
.45
p1 =
.05
-->//part (b)
-->[p1,p2,s1,s2,sp,z0,za]=htestprop2('two',10,200,45,100,0.3,0.1)
Hypothesis testing on two proportions:two-sided test.
Reject the null hypothesis H0:p1-p2= .3
za =
1.6448536
z0 = - 13.44043
sp =
.0520817
s2 =
.0497494
s1 =
.0154110
p2 =
.45
p1 =
.05
Characteristic and power equations
H0 : µ = µ0 , H1 : µ ≠ µ0 . Suppose that it is correct to reject
H0 because the true value of µ is µ0 + c , where c is a constant. The probability β of a
Consider the two-tailed test:
Type II error is given by
c n
c n


 − Φ −z α −
,
β = Φ z α −
σ 
σ 
 2
 2




Download at InfoClearinghouse.com
35
© 2001 Gilberto E. Urroz
All rights reserved
where Φ( z ) is the CDF of the standard normal distribution. Notice that the probability β is a


function β = f α, n,  . Curves representing β vs. µ are called characteristic curves.
σ


The complement of β is the power function = 1 - β = probability of rejecting the null
hypothesis when it is not true.
c

Power = 1- β = 1 - [ Φ z −
α



2
c n
c n

 − Φ −z α −
 ].
σ 
σ 
 2



Characteristic and power curves are shown below for α=0.05, and c/σ= 0.25, 0.50, 0.875, 1.0.
//Script to plot characteristic and power curves for alpha = 0.05
alpha = 0.05; z = cdfnor('X',0,1,1-alpha/2,alpha/2);
n=[0:1:100]; cs = [0.25,0.50,0.875,1.0];
b=zeros(length(n),length(cs));p=b;
for i = 1:length(n)
for j = 1:length(cs)
b(i,j) = ...
cdfnor('PQ',z-cs(j)*sqrt(n(i)),0,1) -cdfnor('PQ',-z-cs(j)*sqrt(n(i)),0,1);
p(i,j)=1-b(i,j);
end;
end;
xset('window',1);minn=min(n);maxn=max(n);minb=min(b);maxb=max(b);
rect1=[minn,minb,maxn,maxb];
plot2d([n',n',n',n'],[b(:,1),b(:,2),b(:,3),b(:,4)],[1,2,3,4],...
'111','c/sigma=0.25@c/sigma=0.50@c/sigma=0.875@c/sigma=1',rect1);
xtitle('Characteristic curves for alpha = 0.05','n','beta');
xset('window',2);minp=min(p);maxp=max(p);rect2=[minn,minp,maxn,maxp];
plot2d([n',n',n',n'],[p(:,1),p(:,2),p(:,3),p(:,4)],[1,2,3,4],...
'111','c/sigma=0.25@c/sigma=0.50@c/sigma=0.875@c/sigma=1',rect2);
xtitle('Power curves for alpha = 0.05','n','power');
Download at InfoClearinghouse.com
36
© 2001 Gilberto E. Urroz
All rights reserved
Hypothesis testing on one variance
Suppose that a sample of size n is taken out of a population of mean µ and variance σ .
2
The sample yields a mean x and variance
hypothesis,
sx . We will use these data to test the null
2
H0 : σ 2 = σ0 , at a level of confidence α . The test statistic to be used is a chi-
square statistic,
2
χ0 =
( n − 1 ) sx
σ0
2
=
2
ν sx
σ0
2
2
where ν = n − 1 represents the degrees of freedom of a χ
2
distribution.
Let Fν ( χ ) = Pr[ Χ < χ ] be the CDF corresponding to the chi-square distribution with ν
2
2
2
degrees of freedom.
Two-tailed test
In this case, the alternate hypothesis is
2
2
if χ α < χ0 , or if
2
χ0 < χ
2
2
Pr[ χ α < Χ ] = 1 − Fν  χ α  =
2
2
1−
α
.
2
2
α
1−
2
2


2
Download at InfoClearinghouse.com


2
H1 : σ 2 ≠ σ0 , then we will reject the null hypothesis
, where
α
, and
2
Pr[ χ
1−
37
2
α
2
< Χ 2 ] = 1 − Fν  χ α
 1 −
2

2
 =


© 2001 Gilberto E. Urroz
All rights reserved
One-tailed test
We consider two possibilities:
2
H1 : σ0 < σ 2 , then we will reject the null hypothesis if
(1) If the alternate hypothesis is
2
2
χα < χ0 , where
2
2
Pr[ χα < Χ ] = 1 − F ( χ ) = α .
ν
α
(2) If the alternate hypothesis is
2
2
2
H1 : σ 2 < σ0 , then we will reject the null hypothesis if
2
χ0 < χ1 − α , where
2
2
Pr[ χ1 − α < Χ ] = 1 − F ( χ
) = 1−α .
ν
1−α
2
A function for hypothesis testing on one variance
The following function, htestsigma1, can be used to test the null hypothesis H0:σ2 = σ02, at the
level of significance α. There are two possible calls to the function:
[n,s,X0,Xa,X1a] = htestmu1(altype,alpha,sigma0_2,var,n)
[n,s,X0,Xa,X1a] = htestmu1(altype,alpha,sigma0_2,x)
In the first call the sample variance (s2) and the sample size (n) are given, besides the type of
alternative hypothesis (altype, which could be equal to ‘one’ or ‘two’ corresponding to one- or
two-sided tests), the level of significance, α, and the value of σ02. In the second call, instead
of providing the sample variance and sample size, the user provides the actual sample as a
vector x.
The function returns n and s, the sample’s size and standard deviation, as well as the chisquare test parameter, X0 = X02, and the values Xa and X1a which represent χ2α and χ21- α,
respectively, if using a one-sided test, or χ2α/2 and χ21- α/2, respectively, if using a one-sided
test.
A listing of the function follows:
function [n,s,X0,Xa,X1a] = htestsigma1(altype,alpha,sigma0_2,x,n)
//Hypothesis testing on one variance. Possible function calls:
//
// [s,X0,Xa,X1a] = htestmu1(altype,alpha,sigma0_2,s,n)
// [s,X0,Xa,X1a] = htestmu1(altype,alpha,sigma0_2,x)
//
// altype can be 'one' - for one-sided alternative hypothesis,
// or 'two' - for two sided alternative hypothesis
// alpha = level of significance (typical values = 0.01,0.05,0.10)
// sigma0 = value of population standard deviation being tested,
// i.e.,
H0:sigma^2 = sigma0^2
// x
= sample variance (s^2) or vector containing sample (x)
// if x = sample variance, n = sample size
// X0
= test statistics
// Xa
= X_alpha/2 for altype='two' or X_alpha for altype='one'
Download at InfoClearinghouse.com
38
© 2001 Gilberto E. Urroz
All rights reserved
// X1a
= X_(1-alpha/2) for 'two' or X_(1-alpha) for 'one'
if altype<>'one' & altype<>'two' then
error('htestmu1 - select type of alternative hypothesis = one or two');
abort;
end;
[nargout,nargin] = argn(0)
if nargin == 4 then
if length(x)<1 then
error('htestmu1 - x must be a vector');
abort;
else
n
= length(x);
s
= st_deviation(x);
end
else
s = sqrt(x);
end;
printf(' \n');
printf('Hypothesis testing on one variance: ' ...
+ altype + '-side alternative hypothesis.\n')
printf(' \n');
X0 = (n-1)*s^2/sigma0_2;
if altype == 'one' then
Xa = cdfchi('X',n-1,1-alpha,alpha);
X1a = cdfchi('X',n-1,alpha,1-alpha);
if X0>Xa then
printf('Reject the null hypothesis H0:sigma^2=%g \n',sigma0_2);
printf('if the alternative hypothesis is H1:sigma^2>%g \n',sigma0_2);
elseif X0<X1a then
printf('Reject the null hypothesis H0:sigma^2=%g \n',sigma0);
printf('if the alternative hypothesis is H1:sigma^2<%g \n',sigma0_2);
else
printf('Do not reject the null hypothesis H0:sigma^2=%g \n',sigma0_2);
end
else
Xa = cdfchi('X',n-1,1-alpha/2,alpha/2);
X1a = cdfchi('X',n-1,alpha/2,1-alpha/2);
if X0>Xa | X0<X1a then
printf('Reject the null hypothesis H0:sigma^2=%g \n',sigma0_2);
else
printf('Do not reject the null hypothesis H0:sigma^2=%g \n',sigma0_2);
end
end;
Examples of application of function htestsigma1
Example 1. A sample of size 10 produces a variance of 20. With a confidence level of 0.05,
test the null hypothesis H0:σ02=25, using a two-sided test.
-->alpha=0.05; sigma0_2 = 25; var = 20; n = 10;
-->[n,s,X0,Xa,X1a]=htestsigma1('two',alpha,sigma0_2,var,n)
Hypothesis testing on one mean: two-side alternative hypothesis.
Download at InfoClearinghouse.com
39
© 2001 Gilberto E. Urroz
All rights reserved
Do not reject the null hypothesis H0:sigma^2=25
X1a = 2.7003895
Xa = 19.022768
X0 = 7.2
s =
4.472136
n = 10.
Example 2. Given the sample X = [3.5 2.2 1.5 4.2 3.2 1.4 5.6 2.3 4.8], with a confidence level
of 0.05, test the null hypothesis H0:s02=25, using a two-sided test.
-->alpha=0.05; sigma0_2 = 25;
-->X = [3.5 2.2 1.5 4.2 3.2 1.4 5.6 2.3 4.8]
X = !
3.5
2.2
1.5
4.2
3.2
1.4
5.6
2.3
4.8 !
-->[n,s,X0,Xa,X1a]=htestsigma1('two',alpha,sigma0_2,X)
Hypothesis testing on one mean: two-side alternative hypothesis.
Reject the null hypothesis H0:sigma^2=25
X1a = 2.1797307
Xa = 17.534546
X0 =
.6939556
s =
1.4726205
n =
9.
Hypothesis testing on two variances
If two populations are normal and independent samples of sizes n1 and n2 are drawn from them,
then the statistic
follows the F distribution with n1-1 degrees of freedom for the numerator and n2-1 degrees of
freedom for the denominator.
We can test the null hypothesis,
H1 :
σ1
2
σ2
2
2
H0 :
σ1
2
σ2
2
= 1 , against the two-tailed alternate hypothesis,
2
≠ 1 , where σ and σ are the variances of populations 1 and 2, respectively,
1
2
by taking samples from the two populations and evaluating their variances, s
1
the size of sample 1 and 2 be
Let s
M
2
and s
m
2
2
and s . Let
2
n1 and n2 , respectively, and let α be the level of confidence.
be the largest and smallest of the variances s
1
Download at InfoClearinghouse.com
2
40
2
2
and s , respectively.
2
© 2001 Gilberto E. Urroz
All rights reserved
Calculate the statistic
F0 =
sM
sm
2
2
, and the quantiles F
α
, with the appropriate degrees of
2
freedom for numerator and denominator. Reject the null hypothesis if
F0 > F .
α
2
If the alternate hypothesis is
hypothesis if,
H1 :
σ1
2
σ2
2
> 1, use the statistic
F0 =
s1
s2
2
2
, reject the null
F0 > Fα .
If, on the other hand, the alternate hypothesis is
reject the null hypothesis if
H1 :
σ1
2
σ2
2
< 1, use the statistic
F0 =
s2
s1
2
2
,
F0 > Fα .
A function for hypothesis testing with two variances
The following function, htestsigma2, can be used for hypothesis testing in two variances.
Possible calls to the function are:
[X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2()
[X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2(Xdata)
[X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2(X1data,X2data)
Xdata, X1data, and X2data are row vectors of data. X1Info is a vector that contains the sample
size, n1, and the standard deviation, s1, of sample 1. X2Info is a vector containing n2, and s2.
The vector nuInfo contains the degrees of freedom for the numerator and denominator,
respectively, of the F distribution. F0 is the F parameter used in the test. Fa represents Fα.
The function operates interactively requesting information from the user and provides verbose
information on the type of test parameter and alternative hypothesis used, as well as providing
a recommendation about the rejection or not-rejection of the null hypothesis.
If the function call [X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2() is used, the user will be
prompted for the summary information on the samples, i.e., the samples sizes and standard
deviations. If the function call [X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2(Xdata) is used, the
user is asked to identify the vector Xdata as sample 1 or sample 2, and then is prompted for
the summary data for the other sample. Finally, if the function call
[X1Info,X2Info,nuInfo,F0,Fa] = htestsigma2(X1data,X2data) is used, the function calculates the
sample summary data all by itself.
The function will also prompt the user for the following information:
•
•
•
The level of confidence of the test, i.e., α
The type of alternative hypothesis to be used, i.e., one- or two-sided
The type of one-sided alternative hypothesis to be tested.
A listing of the function is shown below.
Download at InfoClearinghouse.com
41
© 2001 Gilberto E. Urroz
All rights reserved
function [X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1,X2)
[nargout,nargin]=argn(0)
if nargin == 0 then
X1in = input('For sample 1 enter n, s :')
n1 = X1in(1);s1 = X1in(2);
X2in = input('For sample 2 enter n, s :')
n2 = X2in(1);s2 = X2in(2);
elseif nargin == 1 then
disp('You entered a vector as input to the function.')
disp('Do you want this vector to represent sample 1 or 2?')
idsample = input(' ')
if idsample == 1 then
n1 = length(X1); s1 = st_deviation(X1);
printf('n1 = %g
s1 = %g',n1,s1)
X2in = input('For sample 2 enter n, s :')
n2 = X2in(1);s2 = X2in(2);
else
n2 = length(X1); s2 = st_deviation(X1);
printf('n2 = %g
s2 = %g',n2,s2)
X1in = input('For sample 1 enter n, s :')
n1 = X1in(1);s1 = X1in(2);
end
else
n1 = length(X1); s1 = st_deviation(X1);
printf('n1 = %g
s1 = %g',n1,s1)
n2 = length(X2); s2 = st_deviation(X2);
printf('n2 = %g
s2 = %g',n2,s2)
end
X1Info = [n1,s1]; X2Info = [n2,s2];
disp('Enter the level of confidence, alpha, for the test:')
disp('(Typical values: 0.01, 0.05, 0.10)')
alpha = input(' ');
disp('Enter the type of alternative hypothesis to test:')
disp('
1 - one-sided
2 - two-sided');
atype = input(' ');
if atype == 1 then
disp('Enter the type of one-sided alternative hypothesis to test:')
disp('
1
H1:sigma1^2/sigma2^2>1');
disp('
2
H1:sigma1^2/sigma2^2<1');
onetype = input(' ');
printf('Hypothesis testing on two variances: one-sided test.\n')
if onetype == 1 then
nuInfo = [n1-1,n2-1];
F0 = (s1/s2)^2; Fa = cdff('F',n1-1,n2-1,1-alpha,alpha)
printf('The alternative hypothesis is H1:sigma1^2/sigma2^2>1');
if F0>Fa then
printf('Reject the null hypothesis H0:sigma1^2/sigma2^2=1.\n');
else
printf('Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1.\n');
end
else
nuInfo = [n2-1,n1-1];
F0 = (s1/s2)^2; Fa = cdff('F',n2-1,n1-1,1-alpha,alpha);
printf('The alternative hypothesis is H1:sigma1^2/sigma2^2<1.\n');
if F0>Fa then
printf('Reject the null hypothesis H0:sigma1^2/sigma2^2=1.\n');
else
Download at InfoClearinghouse.com
42
© 2001 Gilberto E. Urroz
All rights reserved
printf('Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1.\n');
end
end
else
printf('Hypothesis testing on two variances: two-sided test.\n')
if s1>=s2 then
sM = s1; nM = n1; sm = s2; nm = n2;
else
sM = s2; nM = n2; sm = s1; nm = n1;
end;
nuInfo = [nM,nm];
F0 = (sM/sm)^2; Fa = cdff('F',nM-1,nm-1,1-alpha/2,alpha/2);
if F0>Fa then
printf('Reject the null hypothesis H0:sigma1^2/sigma2^2=1. \n');
else
printf('Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1. \n');
end
end;
Examples of application of function htestsigma2
In the following examples, user input is given in italics.
Example 1. Given two samples with n1 = 25, s1 = 2.3, n2 = 15, s2 = 3.2, test the null hypothesis
H0:σ12/ σ22 = 1, at a significance level of 0.10, against (a) a two-sided hypothesis; (b) a onesided hypothesis, H1:σ12/ σ22 > 1; and, (c) a one-sided hypothesis, H1:σ12/ σ22 < 1.
-->//Part (a) - Two sided hypothesis
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2()
For sample 1 enter n, s :
25, 2.3
For sample 2 enter n, s :
15, 3.2
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.10
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Hypothesis testing on two means: two-sided test.
Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 2.1297969
F0 = 1.9357278
nuInfo = !
14.
24. !
X2Info = !
15.
3.2 !
X1Info = !
25.
2.3 !
-->//Part (b) - One-sided hypothesis, H1: sigma1^2/sigma2^2 > 1
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2()
Download at InfoClearinghouse.com
43
© 2001 Gilberto E. Urroz
All rights reserved
For sample 1 enter n, s :
25, 2.3
For sample 2 enter n, s :
15, 3.2
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.10
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
1
Enter the type of one-sided alternative hypothesis to test:
1
-
H1:sigma1^2/sigma2^2>1
2
-
H1:sigma1^2/sigma2^2<1
1
Hypothesis testing on two variances: one-sided test.
The alternative hypothesis is H1:sigma1^2/sigma2^2>1
Do not eject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 1.937663
F0 =
.5166016
nuInfo = !
24.
14. !
X2Info = !
15.
3.2 !
X1Info = !
25.
2.3 !
-->//Part (c) - One-sided hypothesis, H1: s1^2/s2^2 < 1
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2()
For sample 1 enter n, s :
25, 2.3
For sample 2 enter n, s :
15, 3.2
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.10
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
1
Enter the type of one-sided alternative hypothesis to test:
1
-
H1:sigma1^2/sigma2^2>1
2
-
H1:sigma1^2/sigma2^2<1
2
Hypothesis testing on two variances: one-sided test.
The alternative hypothesis is H1:sigma1^2/sigma2^2<1
Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 1.7974154
F0 = .5166016
Download at InfoClearinghouse.com
44
© 2001 Gilberto E. Urroz
All rights reserved
nuInfo
X2Info
X1Info
= !
= !
= !
14.
15.
25.
24. !
3.2 !
2.3 !
Example 2. Given the sample X1 = [3.2, 2.1, 4.5, 6.2, 3.4], and a second sample with n2=10,
s2=0.5, test the null hypothesis H0:σ12/ σ22 = 1, at a significance level of 0.05, against (a) a twosided hypothesis; (b) a one-sided hypothesis, H1:σ12/ σ22 > 1.
--> Example 2 - part (a)
--> X1 = [3.2, 2.1, 4.5, 6.2, 3.4]
X1 =
!
3.2
2.1
4.5
6.2
3.4 !
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1)
You entered a vector as input to the function.
Do you want this vector to represent sample 1 or 2?
1
n1 = 5
s1 = 1.55145
For sample 2 enter n, s :
10, 0.5
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.05
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Hypothesis testing on two variances: two-sided test.
Reject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 4.7180785
F0 = 9.628
nuInfo =
!
4.
9. !
X2Info =
!
10.
.5 !
X1Info =
!
5.
1.5514509 !
--> Example 2 - part (b)
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1)
You entered a vector as input to the function.
Do you want this vector to represent sample 1 or 2?
1
n1 = 5
s1 = 1.55145
For sample 2 enter n, s :
10, 0.5
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
Download at InfoClearinghouse.com
45
© 2001 Gilberto E. Urroz
All rights reserved
0.05
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Enter the type of one-sided alternative hypothesis to test:
1
-
H1:sigma1^2/sigma2^2>1
2
-
H1:sigma1^2/sigma2^2<1
1
Hypothesis testing on two variances: one-sided test.
The alternative hypothesis is H1:sigma1^2/sigma2^2>1
Reject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 3.6330885
F0 = 9.628
nuInfo = !
4.
9. !
X2Info = !
10.
.5 !
X1Info = !
5.
1.5514509 !
Example 3. Given the sample X2 = [0.9,11.1,0.2,3.4,5.6,2.1,8.2,3.2] and sample 1 with n1 = 22
and s1 = 0.5, test the null hypothesis H0:σ12/ σ22 = 1, at a significance level of 0.05, against (a)
a two-sided hypothesis; (b) a one-sided hypothesis, H1:σ12/ σ22 > 1.
-->X2 = [0.9,11.1,0.2,3.4,5.6,2.1,8.2,3.2]
X2 =
!
.9
11.1
.2
3.4
5.6
2.1
8.2
3.2 !
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X2)
You entered a vector as input to the function.
Do you want this vector to represent sample 1 or 2?
2
n2 = 8
s2 = 3.7485
For sample 1 enter n, s :
22, 0.5
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.01
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Hypothesis testing on two variances: two-sided test.
Reject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 4.1789302
F0 = 6.245
nuInfo = !
7.
21. !
X2Info = !
8.
3.7484997 !
X1Info = !
22.
1.5 !
Download at InfoClearinghouse.com
46
© 2001 Gilberto E. Urroz
All rights reserved
--> Example 3 - part (b)
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X2)
You entered a vector as input to the function.
Do you want this vector to represent sample 1 or 2?
2
n2 = 8
s2 = 3.7485
For sample 1 enter n, s :
22, 0.5
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.01
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Enter the type of one-sided alternative hypothesis to test:
1
-
H1:sigma1^2/sigma2^2>1
2
-
H1:sigma1^2/sigma2^2<1
1
Hypothesis testing on two variances: one-sided test.
The alternative hypothesis is H1:sigma1^2/sigma2^2>1
Do not eject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 6.1323795
F0 = .1601281
nuInfo = !
21.
7. !
X2Info = !
8.
3.7484997 !
X1Info = !
22.
1.5 !
Example 4 - Using the samples X1 and X2, defined in examples 2 and 3, respectively, test the
null hypothesis H0:σ12/ σ22 = 1, at a significance level of 0.05, against (a) a two-sided
hypothesis; (b) a one-sided hypothesis, H1:σ12/ σ22 > 1.
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1,X2)
n1 = 5
s1 = 1.55145
n2 = 8
s2 = 3.7485
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.10
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
2
Hypothesis testing on two means: two-sided test.
Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 6.0942109
Download at InfoClearinghouse.com
47
© 2001 Gilberto E. Urroz
All rights reserved
F0 = 5.837661
nuInfo = !
8.
X2Info = !
8.
X1Info = !
5.
5. !
3.7484997 !
1.5514509 !
-->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1,X2)
n1 = 5
s1 = 1.55145
n2 = 8
s2 = 3.7485
Enter the level of confidence, alpha, for the test:
(Typical values: 0.01, 0.05, 0.10)
0.10
Enter the type of alternative hypothesis to test:
1 - one-sided
2 - two-sided
1
Enter the type of one-sided alternative hypothesis to test:
1
-
H1:sigma1^2/sigma2^2>1
2
-
H1:sigma1^2/sigma2^2<1
1
Hypothesis testing on two variances: one-sided test.
The alternative hypothesis is H1:sigma1^2/sigma2^2>1
Do not eject the null hypothesis H0:sigma1^2/sigma2^2=1.
Fa = 2.9605341
F0 = .1713015
nuInfo = !
4.
7. !
X2Info = !
8.
3.7484997 !
X1Info = !
5.
1.5514509 !
Chi-square criteria for goodness of fitting
Function histnorm, introduced in Chapter 15, produces a histogram for a given data set, x,
based on a number of class boundaries, xclass. The function, whose call is
[chi2,cm,f] = freqdist(x,xclass)
returns vectors of class marks, cm, and frequency, f, as well as the parameter chi2
corresponding to a chi-square statistic calculated as
k
χ2 = ∑
i =1
( f i − fci ) 2
,
fci
where fi is the actual frequency count for the ith class, fci is the estimated frequency count
obtained from the normal distribution for the ith class, and k is the number of classes in the
frequency distribution.
The parameter χ2, defined above, follows the chi-square distribution with ν = k-3 degrees of
freedom, where k is the number of classes in the histogram. To produce a fitting of the
normal distribution based on a sample of size n we use not only the sample size, n, but also the
Download at InfoClearinghouse.com
48
© 2001 Gilberto E. Urroz
All rights reserved
mean value,x, and the sample standard deviation, s. Thus, the number of degrees of
freedom is k-3, since three parameters are already known in the data fitting.
The idea of goodness of fitting for the normal distribution, for example, means to test the
hypothesis H0: {the data fits the normal distribution with µ =x and σ = s}, tested against the
alternative hypothesis H1: {the data does not fit the normal distribution with µ =x and σ = s}.
The latter is a form of one-sided alternative hypothesis. Given a significance level α, we
calculate the parameter χ2 based on the observed and predicted frequencies, and compare its
value against the parameter χ2α obtained from the chi-square distribution with k-3 degrees of
freedom. If χ2> χ2α, we reject the null hypothesis H0.
Examples of goodness-of-fitting for the normal distribution
Example 1. Consider the sample x loaded below into SCILAB. Its frequency distribution is to be
obtained using the class boundaries indicated in vector xclass. The following SCILAB
commands are used to generate the test statistic for testing the hypothesis H0: {the data fits
the normal distribution with µ =x and σ = s}:
-->x=[2.3,3.2,1.1,4.5,6.2,8.4,1.3,2.2,4.5,3.6,2.2,1.0];
-->min(x),max(x)
ans = 1.
ans = 8.4
-->xclass = [1:0.5:9]; k = length(xclass)-1, nu = k-3
k = 16.
nu = 13.
-->[chi2,xmark,f] = histnorm(x,xclass);
-->chi2
chi2 =
33.263408
-->chi_a = cdfchi('X',nu,1-alpha,alpha)
chi_a = 19.811929
The results are χ2 = 33.263408, χ2α = 19.811929. Because χ2> χ2α, we reject the null
hypothesis that the data in vector x belongs to a normal distribution. The histogram is shown
in the following figure.
Download at InfoClearinghouse.com
49
© 2001 Gilberto E. Urroz
All rights reserved
Example 1. In this second example the data analyzed is generated from a normal distribution
using function grand. The null hypothesis H0: {the data fits the normal distribution with µ =x
and σ = s} is tested at a level of significance of 0.01.
-->X=grand(1,200,'nor',350,100);
-->min(X),max(X)
ans = 119.33031
ans = 655.10351
-->Xclass=[100:50:700];k=length(Xclass)-1,nu=k-3
k = 12.
nu = 9.
-->[chi2,Xmark,Xfreq] = histnorm(X,Xclass);
-->chi2
chi2 = 10.142254
-->alpha = 0.01;
-->chi_a = cdfchi('X',nu,1-alpha,alpha)
chi_a = 21.665994
The results are χ2 = 10.142254, χ2α = 21.665994. Because χ2< χ2α, we cannot reject the null
hypothesis that the data in vector x belongs to a normal distribution. The histogram is shown
in the following figure.
Examples of goodness-of-fitting for the beta distribution
The approach followed in function histnorm for checking the goodness-of-fit of a sample to the
normal distribution can be used, in general, for other probability distributions. The key is to
determine the parameters of the distribution based on statistics of the data. For the normal
distribution, X ~ N(µ,σ), for example, we use the parameters µ =x and σ = sx. In this section
we present a function, histbeta, that can be used to check the goodness-of-fit of data sets to
the beta distribution. The beta distribution, introduced in Chapter 15, requires two
parameters, α and β, which can be obtained from a sample by making µ =x and σ = sx and
solving the following two equations
Download at InfoClearinghouse.com
50
© 2001 Gilberto E. Urroz
All rights reserved
µX = x =
α
α ⋅β
, σ 2 = s x2 =
α+β
(α + β + 1)(α + β ) 2 .
The solution can be accomplished numerically by using SCILAB’s function fsolve. The listing of
function histbeta, incorporating such numerical solution, follows. Notice that the function
returns not only the parameter χ2 (chi2), the class mark (cmark), and the frequency count
(fcount), but also the parameters of the beta distribution α (a) and β (b).
function [a,b,chi2,cmark,fcount]=histbeta(x, xclass)
//This function calculates the frequency distribution
//for the data in (row) vector x according to the
//class boundaries contained in the (row) vector
//xclass. It also produces a histogram of the
//data and the beta distribution that best fit the data
//Note: the beta distribution works only for data between 0 and 1
//
//Typical call: [chi2,cm,f] = freqdist(x,xclass)
//where cm
= class marks, f = frequency count,
//
chi2 = chi-square parameter for the fitting
if min(x)<0 | max(x)>1 then
error('histbeta - sample contains data outside of [0,1]');
abort;
end;
[m n] = size(x);
[m nB] = size(xclass);
k = nB - 1;
//Sample size
//Number of class boundaries
//Number of classes
//Calculate class marks
cmark = zeros(1,k);
for ii = 1:k
cmark(ii) = 0.5*(xclass(ii)+xclass(ii+1));
end
//Initialize frequency counts to zero
fcount=zeros(1,k);
fbelow=0; fabove=0;
//Accumulate frequency counts
for ii = 1:n
if x(ii) < xclass(1)
fbelow = fbelow + 1;
elseif x(ii) > xclass(nB)
fabove = fabove + 1;
else
for jj = 1:k
if x(ii)>= xclass(jj) & x(ii)< xclass(jj+1)
fcount(jj) = fcount(jj) +1;
end
end
end
end
//Calculate sample size, mean, standard deviation, and
//minimum and maximum values for the plot
nn = sum(fcount);
xbar = mean(x);
sx = st_deviation(x);
xmin = min(xclass); xmax = max(xclass);
Download at InfoClearinghouse.com
51
© 2001 Gilberto E. Urroz
All rights reserved
//Calculate values of a (alpha) and b (beta)
deff('[w]=ff(xx)',['ff1=xbar-xx(1)/(xx(1)+xx(2))';...
'ff2=sx^2-xx(1)*xx(2)/((xx(1)+xx(2)+1)*(xx(1)+xx(2))^2)';...
'w=[ff1;ff2]']);
xx0 = [1;1];xxs=fsolve(xx0,ff); a = xxs(1); b = xxs(2);
//Calculate predicted frequencies
pk = [];
for j = 1:k+1
pk = [pk cdfbet("PQ",xclass(j),1-xclass(j),a,b)];
end;
p_in_classes = pk(k+1)-pk(1);
pxclass = pk(2:k+1) - pk(1:k);
fc = pxclass*nn*p_in_classes;
//Calculate chi-square parameter
chi2=0;
for j = 1:length(fc)
chi2 = chi2 + (fcount(j)-fc(j))^2/fc(j);
end;
//Produce beta distribution for data
Dx = (xmax-xmin)/100;
xx = [xmin:Dx:xmax];
xxx = xx(1:100) + Dx/2;
pkk = [];
for j = 1:101
pkk = [pkk cdfbet("PQ",xx(j),1-xx(j),a,b)];
end;
pp = pkk(2:101) - pkk(1:100);
fcc = pp*p_in_classes*nn*100/k;
//Determine plot rectangle
ymin = 0;
ymaxf = max(fcount); ymaxy = max(fcc);
ymax = max(ymaxf,ymaxy);
ymax = int(1.1*ymax);
plotrectangle = [xmin ymin xmax ymax];
//plot the histogram and normal curve
xp = xclass(1:k);
xset('window',1);xbasc(1);
plot2d2('onn',xclass',[fcount fcount(k)]',[1],'011','y',[xmin ymin xmax ymax]);
plot2d3('onn',xp',fcount',[1],'000');
plot2d(xxx',fcc',[2],'000');
xtitle('Histogram with normal curve','x','frequency');
//end function histbeta
For this function we would be testing the hypothesis H0: {the data fits the beta distribution},
tested against the alternative hypothesis H1: {the data does not fit the normal distribution}.
As with the test of the normal distribution testing, given a significance level α, we calculate
the parameter χ2 based on the observed and predicted frequencies, and compare its value
against the parameter χ2α obtained from the chi-square distribution with k-3 degrees of
freedom. If χ2> χ2α, we reject the null hypothesis H0. Please notice that the beta distribution
is used for data whose values are between 0 and 1 only.
Download at InfoClearinghouse.com
52
© 2001 Gilberto E. Urroz
All rights reserved
Example 1. Data from a uniform distribution
-->rand('info')
ans = uniform
-->X=rand(1,25);min(X),max(X)
ans = .0002211
ans = .9329616
-->Xclass=[0:0.1:1];k=length(Xclass)-1, nu = k - 3
k
= 10.
nu =
7.
-->[a,b,chi2,Xmark,Xfreq]=histbeta(X,Xclass)
Xfreq =
!
2.
Xmark
1.
5.
4.
0.
2.
5.
2.
3.
1. !
=
column 1 to 8
!
.05
.15
column
!
.25
.35
.45
.55
.65
.75 !
9 to 10
.85
.95 !
= 9.6996796
1.127886
1.0598387
chi2
b =
a =
-->alpha=0.01;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a =
18.475307
-->alpha=0.05;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a =
14.06714
-->alpha=0.1;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a =
12.017037
At confidence levels of α = 0.01, 0.05, and 0.10, we cannot reject the null hypothesis that the
data fits a beta distribution.
Download at InfoClearinghouse.com
53
© 2001 Gilberto E. Urroz
All rights reserved
Example 2. Data generated from a normal distribution. Data X is generated from a normal
distribution. Data Y is obtained from X so that values of Y to are between 0 and 1.
-->rand('normal')
-->X = rand(1,100);min(X), max(X)
ans = - 2.0552251
ans = 1.9347752
-->Y = (X-min(X))/(max(X)-min(X));min(Y),max(Y)
ans = 0.
ans = 1.
-->Yclass = [0:0.1:1];k = length(Yclass)-1, nu = k - 3
k = 10.
nu = 7.
-->[a,b,chi2,Ymark,Yfreq]=histbeta(Y,Yclass)
Yfreq =
!
1.
Ymark
4.
12.
10.
21.
17.
12.
10.
6.
6. !
=
column 1 to 8
!
.05
.15
.25
column 9 to 10
.35
.45
.55
.65
.75 !
!
.85
.95 !
chi2 = 8.9939089
b = 1.9850001
a = 2.2248757
-->alpha=0.01;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 18.475307
-->alpha=0.05;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 14.06714
-->alpha=0.1;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 12.017037
As in Example 1, at confidence levels of α = 0.01, 0.05, and 0.10, we cannot reject the null
hypothesis that the data fits a beta distribution.
Download at InfoClearinghouse.com
54
© 2001 Gilberto E. Urroz
All rights reserved
Example 3 - Data generated from a beta distribution with α = 0.5 and β = 6.
-->X = grand(1,50,'bet',0.5,6);min(X),max(X)
ans = .0001030
ans = .3289092
-->Xclass = [0:0.05:0.35]; k = length(Xclass)-1, nu = k-3
k = 6.
nu = 3.
-->[a,b,chi2,Xmark,Xfreq]=histbeta(X,Xclass)
Xfreq = !
25.
10.
5.
3.
3.
3. !
Xmark = !
.025
.075
.125
.175
.225
chi2 = 3.4613321
b = 8.0386858
a = .6906572
.275 !
-->alpha=0.01;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 11.344867
-->alpha=0.05;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 7.8147279
-->alpha=0.1;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 6.2513886
As expected, at confidence levels of α = 0.01, 0.05, and 0.10, we cannot reject the null
hypothesis that the data fits a beta distribution.
Download at InfoClearinghouse.com
55
© 2001 Gilberto E. Urroz
All rights reserved
The three examples above have produced goodness-of-fit results that indicate that we should
not reject the hypothesis that the data belongs to a beta distribution. These results indicate
the versatility of the distribution and the variety of shapes it can fit by using the values of α
and β obtained from the set of equations
µX = x =
α
α ⋅β
, σ 2 = s x2 =
.
α+β
(α + β + 1)(α + β ) 2 .
On the other hand, if we force the values of α and β, rather than using those from the two
equations above, we may find situations where the hypothesis of the data fitting the required
beta distribution must be rejected. To try such cases we modify function histbeta to create
function histbeta1 which requires that the values of α and β be given by the user:
function [chi2,cmark,fcount]=histbeta1(x,xclass,a,b)
//This function calculates the frequency distribution
//for the data in (row) vector x according to the
//class boundaries contained in the (row) vector
//xclass. It also produces a histogram of the
//data and the beta distribution that best fit the data
//Note: the beta distribution works only for data between 0 and 1
//
//Typical call: [chi2,cm,f] = freqdist(x,xclass)
//where cm
= class marks, f = frequency count,
//
chi2 = chi-square parameter for the fitting
if min(x)<0 | max(x)>1 then
error('histbeta - sample contains data outside of [0,1]');
abort;
end;
[m n] = size(x);
[m nB] = size(xclass);
k = nB - 1;
//Sample size
//Number of class boundaries
//Number of classes
//Calculate class marks
cmark = zeros(1,k);
for ii = 1:k
cmark(ii) = 0.5*(xclass(ii)+xclass(ii+1));
end
Download at InfoClearinghouse.com
56
© 2001 Gilberto E. Urroz
All rights reserved
//Initialize frequency counts to zero
fcount=zeros(1,k);
fbelow=0; fabove=0;
//Accumulate frequency counts
for ii = 1:n
if x(ii) < xclass(1)
fbelow = fbelow + 1;
elseif x(ii) > xclass(nB)
fabove = fabove + 1;
else
for jj = 1:k
if x(ii)>= xclass(jj) & x(ii)< xclass(jj+1)
fcount(jj) = fcount(jj) +1;
end
end
end
end
//Calculate sample size, mean, standard deviation, and
//minimum and maximum values for the plot
nn = sum(fcount);
xbar = mean(x);
sx = st_deviation(x);
xmin = min(xclass); xmax = max(xclass);
//Calculate predicted frequencies
pk = [];
for j = 1:k+1
pk = [pk cdfbet("PQ",xclass(j),1-xclass(j),a,b)];
end;
p_in_classes = pk(k+1)-pk(1);
pxclass = pk(2:k+1) - pk(1:k);
fc = pxclass*nn*p_in_classes;
//Calculate chi-square parameter
chi2=0;
for j = 1:length(fc)
chi2 = chi2 + (fcount(j)-fc(j))^2/fc(j);
end;
//Produce beta distribution for data
Dx = (xmax-xmin)/100;
xx = [xmin:Dx:xmax];
xxx = xx(1:100) + Dx/2;
pkk = [];
for j = 1:101
pkk = [pkk cdfbet("PQ",xx(j),1-xx(j),a,b)];
end;
pp = pkk(2:101) - pkk(1:100);
fcc = pp*p_in_classes*nn*100/k;
//Determine plot rectangle
ymin = 0;
ymaxf = max(fcount); ymaxy = max(fcc);
ymax = max(ymaxf,ymaxy);
ymax = int(1.1*ymax);
plotrectangle = [xmin ymin xmax ymax];
//plot the histogram and normal curve
xp = xclass(1:k);
xset('window',1);xbasc(1);
plot2d2('onn',xclass',[fcount fcount(k)]',[1],'011','y',[xmin ymin xmax ymax]);
Download at InfoClearinghouse.com
57
© 2001 Gilberto E. Urroz
All rights reserved
plot2d3('onn',xp',fcount',[1],'000');
plot2d(xxx',fcc',[2],'000');
xtitle('Histogram with beta distribution','x','frequency');
//end function histbeta1
Applying function histbeta1 to the data of Example 3 shown above, with α = 1 and β = 5,
produces the following results:
-->[chi2,Xmark,Xfreq]=histbeta1(X,Xclass,1,5)
Xfreq =
!
25.
Xmark =
!
10.
5.
.025
.075
= 28.784445
3.
.125
3.
3. !
.175
.225
.275 !
chi2
-->alpha=0.01;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 11.344867
-->alpha=0.05;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 7.8147279
-->alpha=0.1;chi_a=cdfchi('X',nu,1-alpha,alpha)
chi_a = 6.2513886
We will reject the null hypothesis that the data follows the beta distribution with α = 1 and β =
5, under levels of significance α = 0.01, 0.05, or 0.10. [Note: be careful distinguishing the level
of confidence α from the beta distribution parameter α]. A plot of the sample histogram and
the fitted data is shown below.
Chi-square criteria for R×C tables
The terms R×C tables refers to tables summarizing frequency counts of observations classified
according to two different criteria. For example, the following table summarizes the
frequency counts of slope damages due to rain summarized by a geotechnical engineer. The
slope damages are classified according to the depth of erosion into three categories (0-2.5 cm,
Download at InfoClearinghouse.com
58
© 2001 Gilberto E. Urroz
All rights reserved
2.5-5.0 cm, > 5 cm) or according to the percentage of vegetation cover into four categories (025%, 25%-50%, 50%-75%, and 75%-100%).
erosion depth
0-2.5 cm
2.5 cm - 5.0 cm
> 5.0 cm
Totals
0-25%
17
24
20
61
vegetation cover
25%-50% 50%-75% 75%-100%
23
15
6
13
40
10
10
20
5
46
75
21
Totals
61
87
55
203
If the vegetation cover and erosion depth criteria are independent, the expected frequency
counts for each of the cells in the table can be calculated by multiplying the corresponding row
total times the corresponding column total and dividing by the overall total (203, in this case).
For example, the expected frequency count for erosion depth of 2.5 cm - 5.0 cm and
vegetation cover of 50%-75% will be 87× 75/203 = 32.14. This procedure follows from the
calculation of probabilities for independent events, i.e., if A = { erosion depth of 2.5 cm - 5.0
cm } and B = { vegetation cover of 50%-75% }, P(A) = 87/203, P(B) = 75/203, and P(A∩B) =
P(A)P(B) = (87/203)(75/203) = 87×75/2032. Since a probability represents a relative frequency,
the actual frequency count will be the probability multiplied by the total number of
occurrences (i.e., 203) to produce (87×75/2032)× 203 = 87× 75/203 = 32.14.
The chi-square criteria can be used to determine how well the predicted frequency counts fcij
approximate the measured frequency counts fij. The chi-square statistic to be used is
n
m
χ 2 = ∑∑
( f ij − fcij ) 2
fcij
i =1 j =1
,
for i=1,2,…,n (rows in the table) and j=1,2,…,m (columns in the table). The parameter thus
defined will follow the chi-square distribution with ν = (n-1)⋅(m-1) degrees of freedom.
The following function, RC, calculates the predicted frequency count, the chi-square statistics,
the degrees of freedom, and provides a recommendation regarding the rejection or not
rejection of the null hypothesis H0:{criteria for the R×C table are independent}. A listing of
the function follows:
function [nu,chi_a,chi2,fPred] = RC(fObs,alpha)
//Determines the chi-square statistic for an RxC table
//passed on to the function as a matrix fObs. The
//function calculates the predicted frequency counts also.
TR = sum(fObs,'c'); TC = sum(fObs,'r'); TT = sum(fObs);
[n m] = size(fObs); fPred = zeros(fObs);
chi2 = 0.0;
for i = 1:n
for j = 1:m
fPred(i,j) = TR(i)*TC(j)/TT;
chi2 = chi2 + (fObs(i,j)-fPred(i,j))^2/fPred(i,j);
end;
end;
nu = (n-1)*(m-1);
chi_a=cdfchi('X',nu,1-alpha,alpha);
if chi2 > chi_a then
printf('Reject the null hypothesis H0:independent criteria.')
Download at InfoClearinghouse.com
59
© 2001 Gilberto E. Urroz
All rights reserved
else
printf('Do not reject the null hypothesis H0:independent criteria')
end;
As an example, we will use the R×C table presented earlier to check the hypothesis of
independence of the classification criteria:
-->fObs = [17,23,15,6;24,13,40,10;20,10,20,5]
fObs =
!
!
!
17.
24.
20.
23.
13.
10.
15.
40.
20.
6. !
10. !
5. !
-->[nu,chi_a,chi2,fPred]=RC(fObs,0.1)
Reject the null hypothesis H0:independent criteria
fPred =
!
!
!
18.330049
13.82266
26.142857
19.714286
16.527094
12.463054
chi2
= 14.524802
chi_a = 10.644641
nu
=
22.536946
32.142857
20.320197
6.3103448 !
9.
!
5.6896552 !
6.
Exercises
[1]. A sample of 150 electric components is tested for temperature response by measuring the
temperature of the component, X, after 10 minutes of operation. The mean value of the
temperature for the sample is found to bex = 86oF with a standard deviation of sx = 5.5oF. The
records from the factory indicate that the standard deviation of the 10-minute temperature
measurements for the entire population of electric components is σ = 10oF. Obtain confidence
intervals for the mean value of the 10-minute temperature measurement for the population of
electric components m using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[2]. A sample of 10 electric bulbs is used to determine the number of on-off cycles to produce
failure of the filament. The data shows a mean value of 1500 cycles with a standard deviation
of 50 cycles. Obtain confidence intervals for the mean value of the 10-minute temperature
measurement for the population of electric components m using levels of confidence of (a) α =
0.01, (b) α = 0.05, and (c) α = 0.10.
[3]. Records of traffic accidents in the main road through a small town shows that in the last
300 days there have been 10 days where a major closure of the road has been registered.
Obtain confidence intervals for the proportion of days of closure of the road using levels of
confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[4]. Measurements of the specific density in 10 soil samples show values of 1.25, 1.30, 1.45,
1.55, 1.20, 1.23, 1.90, 1.40, 1.35, 1.40. Obtain confidence intervals for the mean value of the
specific density using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[5]. In order to determine the need to keep a service station open after the regular closing
hour of 5:00 pm, a test is carried out in which the station is kept open for an extra hour for 20
Download at InfoClearinghouse.com
60
© 2001 Gilberto E. Urroz
All rights reserved
business days. The number of clients visiting the service station after 5 pm during those 20
days are the following:
2 3 5 6 3 2 1 0 3 4 2 5 7 8 7 6 5 8 2 3
Obtain confidence intervals for the mean value of the number of clients visiting the service
station after 5:00 pm using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[6]. In reference to problem [5], a successful day is one in which 4 or more clients show up at
the service station after 5:00 pm. Obtain confidence intervals for the proportion of successful
days using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[7]. A routine construction project consists of two stages. If T1 represents the time required
to complete stage 1 and T2 the time required to complete stage 2, determine the confidence
interval for the total construction time if for samples of sizes n1 = 10 and n2 = 8, the mean
completion times aret1 = 25 days andt2 = 40 days, with standard deviations s1 = 3 days and s2
= 5 days . Use levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[8]. To put together a mechanical component a factory needs to produce two separate
metallic pieces that then get assembled together. The time required to finish the first piece
has a mean value of t1 = 12 minutes, while the second piece requires an average of t2 = 14.0
minutes. The corresponding standard deviations are deviations s1 = 3 days and s2 = 5 days. The
sample sizes used in the measurements are n1 = 50 and n2 = 45. Of interest for optimizing the
operation of the factory is the difference between the times of completion of the two pieces,
i.e., T1-T2. Obtain confidence intervals for the time difference using levels of confidence of (a)
α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[9]. Obtain confidence intervals for the variance σ2 for the data of problem [1] using levels of
confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[10]. Obtain confidence intervals for the variance σ2 for the data of problem [2] using levels of
confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[11]. Obtain confidence intervals for the variance σ2 for the data of problem [3] using levels of
confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[12]. Obtain confidence intervals for the variance σ2 for the data of problem [4] using levels of
confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[13]. Obtain confidence intervals for the variance σ2 for the data of problem [5] using levels of
confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[14]. Obtain confidence intervals for the variance σ2 for the data of problem [6] using levels of
confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[15]. Obtain confidence intervals for the variance σ2 for each of the standard deviations in
problem [7] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[16]. Obtain confidence intervals for the variance σ2 for each of the standard deviations in
problem [8] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[17]. A sample of 20 measurements of the density of a liquid indicate a mean value of x =
3.25 mg/l with a standard deviation of sx = 0.25 mg/l. The manufacturer of the liquid claims
that the mean density of the population is µ = 3.30 mg/l. Should the manufacturer’s claim be
Download at InfoClearinghouse.com
61
© 2001 Gilberto E. Urroz
All rights reserved
rejected at levels of confidence of (a) α = 0.01?, (b) α = 0.05?, (c) α = 0.10?.
alternative hypothesis is µ ≠3.30 mg/l}.
{Note: the
[18]. A more detailed study for the liquid density of problem [17] includes measurements in
300 liquid samples. This study reveals that the mean value of the 300 samples isx = 3.35
mg/l with a standard deviation of sx = 0.15 mg/l. Should the manufacturer’s claim be rejected
at levels of confidence of (a) α = 0.01, (b) α = 0.05, (c) α = 0.10, based on the new evidence?
[19]. Tests of a new pavement are conducted by measuring the time required for different
cars to stop after reaching speeds of 35 mph. The following stop times are recorded for 12 car
tests:
12.5 s, 24.3 s, 18.7 s, 15.6 s, 18.2 s, 12.4 s , 23.2 s, 40.3 s, 18.2 s, 19.3 s, 15.4 s, 14.4 s
Test the claim that the mean stop time is 15.5 s against the null hypothesis that the actual
mean stop time is larger than 15.5 s at levels of confidence of (a) α = 0.01, (b) α = 0.05, (c) α =
0.10.
[20]. Twenty specimens of light-weight concrete taken from a concrete manufacturer indicate
that, in general, the standard deviation of the concrete density is 150 kg/m3. A sample of 20
concrete cubes shows a mean value of 1200 kg/m3 with a standard deviation of 100 kg/m3.
Test the null hypothesis that the mean value of the population of concrete densities is 1250
kg/m3 against the alternative hypothesis that the mean value of the population of concrete
densities is less than 1250 kg/m3.
[21]. A remote sensing device that measures the potential evapotranspiration of crops is
purported to produce accurate results 90% of the time. To check this claim, 10 crop test sites
are instrumented in the ground for measuring evapotranspiration and overflights with the
remove sensing device are scheduled to compare measurements. It is found that the remote
sensing device produced accurate measurements, as compared with the ground-based
measurements, in 7 out of the 10 measurements performed. Should we reject the assertion
that the remote sensing device is accurate 90% of the time based on these data, against an
alternative hypothesis that the device is accurate less than 90% of the data? Use confidence
levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[22]. A monitoring site in a small stream is checked daily to verify that the levels of a certain
contaminant produced by a farm operation are kept below the allowed limit.
The local
regulating agency records indicate that the levels of the contaminant were in violation of
regulations for 5 out of the last 35 days. Test the hypothesis that the farm operation violates
the regulations only 10% of the time against the alternative hypothesis that the farm operation
violates the regulations more than 10% of the time. Use confidence levels of (a) α = 0.01, (b) α
= 0.05, and (c) α = 0.10.
[23]. A new regulation regarding the amount of ozone produced by cars is being considered. A
sample of 40 cars is picked up at random for testing, and it is found that 12 of those cars
produce larger ozone concentrations than is considered safe by the local regulating agency.
The agency hypothesizes that 20% of the cars currently on the road produce excessive amounts
of ozone. Based on the data described above, should be accept this hypothesis against the
alternative hypothesis that the proportion of cars in violation of the ozone levels is not 20%?
Use confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[24]. Traffic studies are performed at two intersections of a city to determine whether a new
turn-signal is necessary. The study is aimed at determining the average number of left turns at
each intersection during a selected period of 1 hour. Intersection 1 is monitored through 20
consecutive days for 1 hour showing an average of 12.6 cars turning left with a standard
Download at InfoClearinghouse.com
62
© 2001 Gilberto E. Urroz
All rights reserved
deviation of 3.2 cars. Intersection 2 is monitored through 10 consecutive days at the same 1
hour period an it shows an average of 10.8 cars turning left with a standard deviation of 2.5
cars. It is hypothesized that the difference in the mean values of the populations of cars
turning left at intersections 1 and 2 is 2, i.e., H0:µ1 - µ2 = 2. Test this null hypothesis against
the alternative hypothesis H1:µ1 - µ2 ≠ 2, using confidence levels of (a) α = 0.01, (b) α = 0.05,
and (c) α = 0.10. Assume that the variances of the populations of left-turning cars in each
intersection are unknown and unequal.
[25]. Precision parts are delivered from two factories that use the same type of machine for
manufacturing the parts. A sample of 100 parts from factory number 1 shows an average
length of 12.5 cm with a standard deviation of 0.25 cm, while a sample of 50 parts from factory
number 2 shows an average length of 11.9 cm with a standard deviation of 0.30 cm. Test the
null hypothesis H0:µ1 - µ2 = 0 against the alternative hypothesis H1:µ1 - µ2 > 0, using confidence
levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. Assume that the standard deviation of the
populations are unknown but equal.
[26]. Measurements of the chlorine levels out of 11 sample bottles from site number 1 indicate
an average of 2.5 mg/l with a standard deviation of 0.2 mg/l. In a neighboring site (site
number 2) the following values of chlorine concentrations are measured (in mg/l):
1.2 4.3 2.5 3.2 1.7 2.8 4.5 6.3 7.2
Test the null hypothesis H0:µ1 - µ2 = 0 against the alternative hypothesis H1:µ1 - µ2 < 0, using
confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. Assume that the variances of
the population of chlorine concentrations are unknown and unequal.
[27]. The manager of a service station is trying to estimate the average time of service for a
particular task. He monitors the service time at two specific periods during the day obtaining
the following data (in minutes):
•
•
For period number 1: 5.0 12.5 15.0 8.5 6.2 7.8 11.4 12.5 10.0 9.2 8.7 11.2
For period number 2: 8.2 7.5 6.7 9.0 11.2 14.3 8.7 6.3 9.2
Test the null hypothesis H0:µ1 - µ2 = 0 against the alternative hypothesis H1:µ1 - µ2 < 0, using
confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. Assume that the variances of
the populations of service times are unknown but equal.
[28]. The traffic study of problem [24] is repeated with the purpose of determining the
proportion of cars through each intersection that perform a left turn during the selected
period. A sample from intersection 1 indicates that out of 450 cars counted, 60 made a left
turn during the period of study. On the other hand, at intersection 2, it was determined that
50 out of 300 cars made a left turn. Let p1 and p2 represent the proportions of the population
of cars making left turns at intersections 1 and 2, respectively. Test the null hypothesis H0:p1p2 = 0.2 against the alternative hypothesis H1:p1-p2 ≠ 0.2, using confidence levels of (a) α =
0.01, (b) α = 0.05, and (c) α = 0.10.
[29]. A computer manufacturer is testing the proportion of defective chips received from two
different factories.
A sample of 1000 chips from factory number 1 shows a total of 25
defective chips, while a sample of 300 chips from factory number 2 shows a total of 10
defective chips. Let p1 and p2 represent the proportions of the population of defective chips
from factories 1 and 2, respectively.
Test the null hypothesis H0:p1-p2 = 0 against the
alternative hypothesis H1:p1-p2 ≠ 0, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c)
α = 0.10.
Download at InfoClearinghouse.com
63
© 2001 Gilberto E. Urroz
All rights reserved
[30]. Plot the characteristic and power curves for the hypotheses tests of problems [13]
through [23] using a suitable range of population mean values µ and a confidence level of 0.05.
[31]. The standard deviation of a sample of 25 measurements of soil density is found to be 25
kg/m3. Let σ2 be the variance of the population of soil specimens from which the sample was
taken. Test the null hypothesis H0: σ2 = 30 kg/m3, against the alternative hypothesis H1: σ2 ≠
30 kg/m3, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[32]. The standard deviation of a sample of 25 measurements of soil density is found to be 25
kg/m3. Let σ2 be the variance of the population of soil specimens from which the sample was
taken. Test the null hypothesis H0: σ2 = 900 kg2/m6, against the alternative hypothesis H1: σ2 ≠
900 kg2/m6, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[33]. Repeat the hypothesis test of problem [32] if the alternative hypothesis is H1: σ2 < 900
kg2/m6.
[35]. The following data set represents measurements of hydrocarbon concentration (mg/l)
out of specimens taken from wells in a contaminated site:
3.5 5.6 2.3 4.5 8.5 2.3 4.5 1.2 5.6 3.2
Let σ2 be the variance of the population of water specimens from which the sample was taken.
Test the null hypothesis H0: σ2 = 30 kg/m3, against the alternative hypothesis H1: σ2 ≠ 30 kg/m3,
using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10.
[36]. Two samples of car speeds are taken at a selected site of a major highway. The first
sample, consisting of 40 measurements, shows a standard deviation of 10 mph while the second
sample, consisting of 20 measurements, shows a standard deviation of 5 mph. Test the null
hypothesis H0: σ12/σ22 = 1 against the alternative hypothesis (a) H1: σ12/σ22 < 1, (b) H1: σ12/σ22 >
1, and, (c) H1: σ12/σ22 ≠ 1, for a confidence level of α = 0.01.
[37]. Laboratory tests are performed on a batch of water specimens to detect the
concentration of coliforms producing the following results (mg/l):
0.10 0.20 0.35 0.15 0.25 0.05
0.23 0.35
0.42
Refer to this data set as sample 1. A previous batch of 5 specimens, taken the previous day
(sample 2), showed a mean value of coliform concentration of 0.20 mg/l with a standard
deviation of 0.05 mg/l. Test the null hypothesis H0: σ12/σ22 = 1 against the alternative
hypothesis (a) H1: σ12/σ22 < 1, (b) H1: σ12/σ22 > 1, and, (c) H1: σ12/σ22 ≠ 1, for a confidence level
of α = 0.05.
[38]. Two batches of erosion control tests are performed to determine the effectiveness of a
new type of hydromulch in controlling erosion at construction sites. The reported rates of
erosion (lb/acre/hr) for the two batches are:
batch 1 = { 175 276 280 125 456 235 172 180 235 }
batch 2 = { 150 175 350 120 275 178 200 }
Test the null hypothesis H0: σ12/σ22 = 1 against the alternative hypothesis (a) H1: σ12/σ22 < 1, (b)
H1: σ12/σ22 > 1, and, (c) H1: σ12/σ22 ≠ 1, for a confidence level of α = 0.05.
[39]. The following data set represents the diameter in mm of a sample of sand grains.
Download at InfoClearinghouse.com
64
© 2001 Gilberto E. Urroz
All rights reserved
6.81
3.51
5.02
3.95
4.24
3.57
5.00
4.59
5.39
5.39
2.66
2.64
4.33
3.50
2.81
3.42
2.41
5.86
3.11
4.14
3.78
2.96
2.53
6.18
2.35
5.45
2.96
4.12
4.54
3.44
4.29
4.12
4.30
5.23
3.44
3.61
3.47
2.57
6.47
3.86
3.66
1.16
4.83
4.33
4.29
5.66
4.67
5.11
3.65
3.58
4.00
3.41
2.58
3.20
5.08
3.83
3.47
4.21
3.36
3.43
(a) Use user-defined function histnorm to check the hypothesis that the data follows the
normal distribution if the data is grouped into 10 classes of the same width at a level of
confidence α = 0.05.
(b) Use user-defined function histbeta to check the hypothesis that the data follows the
beta distribution if the data is grouped into 10 classes of the same width at a level of
confidence α = 0.10.
[40]. Write a SCILAB function, along the lines of functions histnorm and histbeta that produces
the chi-square parameter needed to test the goodness-of-fit of a vector of data for the Weibull
distribution.
Use this function to check the hypothesis that the data from problem [39]
follows the Weibull distribution a level of confidence α = 0.05.
[41]. Write a SCILAB function, along the lines of functions histnorm and histbeta that
produces the chi-square parameter needed to test the goodness-of-fit of a vector of data for
the exponential distribution. Use this function to check the hypothesis that the data from
problem [39] follows the exponential distribution a level of confidence α = 0.05.
[42]. Samples of a particular species of fish are taken out of four fishing ponds and tested for
swirling disease. The table below summarizes the number of fish that tested positive and
negative for the disease in the four ponds. Use user-defined function RC to test the hypothesis
that the two criteria for classification in the table, namely, pond of origin and test result, are
independent at a level of confidence α = 0.05.
Positive
Negative
Pond 1
122
289
Location
Pond 2
11
26
Pond 3
3
8
Pond 4
28
33
[43]. The following table is based on a number of laboratory tests on soil samples. The
samples are classified according to the typical grain size as sand, lime, or clay. Tests are then
performed in a rainfall simulator and the soil samples get classified according to a high,
medium, or low erosion potential. Use user-defined function RC to test the hypothesis that
the two criteria for classification in the table, namely, soil type and erosion potential, are
independent at a level of confidence α = 0.10.
Soil type
Sand
Lime
Clay
Download at InfoClearinghouse.com
Low
45
23
12
Potential
Medium
12
10
5
65
High
8
10
2
© 2001 Gilberto E. Urroz
All rights reserved
REFERENCES (for all SCILAB documents at InfoClearinghouse.com)
Abramowitz, M. and I.A. Stegun (editors), 1965,"Handbook of Mathematical Functions with Formulas, Graphs, and
Mathematical Tables," Dover Publications, Inc., New York.
Arora, J.S., 1985, "Introduction to Optimum Design," Class notes, The University of Iowa, Iowa City, Iowa.
Asian Institute of Technology, 1969, "Hydraulic Laboratory Manual," AIT - Bangkok, Thailand.
Berge, P., Y. Pomeau, and C. Vidal, 1984,"Order within chaos - Towards a deterministic approach to turbulence," John
Wiley & Sons, New York.
Bras, R.L. and I. Rodriguez-Iturbe, 1985,"Random Functions and Hydrology," Addison-Wesley Publishing Company,
Reading, Massachussetts.
Brogan, W.L., 1974,"Modern Control Theory," QPI series, Quantum Publisher Incorporated, New York.
Browne, M., 1999, "Schaum's Outline of Theory and Problems of Physics for Engineering and Science," Schaum's
outlines, McGraw-Hill, New York.
Farlow, Stanley J., 1982, "Partial Differential Equations for Scientists and Engineers," Dover Publications Inc., New
York.
Friedman, B., 1956 (reissued 1990), "Principles and Techniques of Applied Mathematics," Dover Publications Inc., New
York.
Gomez, C. (editor), 1999, “Engineering and Scientific Computing with Scilab,” Birkhäuser, Boston.
Gullberg, J., 1997, "Mathematics - From the Birth of Numbers," W. W. Norton & Company, New York.
Harman, T.L., J. Dabney, and N. Richert, 2000, "Advanced Engineering Mathematics with MATLAB® - Second edition,"
Brooks/Cole - Thompson Learning, Australia.
Harris, J.W., and H. Stocker, 1998, "Handbook of Mathematics and Computational Science," Springer, New York.
Hsu, H.P., 1984, "Applied Fourier Analysis," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace
Jovanovich, Publishers, San Diego.
Journel, A.G., 1989, "Fundamentals of Geostatistics in Five Lessons," Short Course Presented at the 28th International
Geological Congress, Washington, D.C., American Geophysical Union, Washington, D.C.
Julien, P.Y., 1998,”Erosion and Sedimentation,” Cambridge University Press, Cambridge CB2 2RU, U.K.
Keener, J.P., 1988, "Principles of Applied Mathematics - Transformation and Approximation," Addison-Wesley
Publishing Company, Redwood City, California.
Kitanidis, P.K., 1997,”Introduction to Geostatistics - Applications in Hydogeology,” Cambridge University Press,
Cambridge CB2 2RU, U.K.
Koch, G.S., Jr., and R. F. Link, 1971, "Statistical Analysis of Geological Data - Volumes I and II," Dover Publications,
Inc., New York.
Korn, G.A. and T.M. Korn, 1968, "Mathematical Handbook for Scientists and Engineers," Dover Publications, Inc., New
York.
Kottegoda, N. T., and R. Rosso, 1997, "Probability, Statistics, and Reliability for Civil and Environmental Engineers,"
The Mc-Graw Hill Companies, Inc., New York.
Kreysig, E., 1983, "Advanced Engineering Mathematics - Fifth Edition," John Wiley & Sons, New York.
Lindfield, G. and J. Penny, 2000, "Numerical Methods Using Matlab®," Prentice Hall, Upper Saddle River, New Jersey.
Magrab, E.B., S. Azarm, B. Balachandran, J. Duncan, K. Herold, and G. Walsh, 2000, "An Engineer's Guide to
MATLAB®", Prentice Hall, Upper Saddle River, N.J., U.S.A.
McCuen, R.H., 1989,”Hydrologic Analysis and Design - second edition,” Prentice Hall, Upper Saddle River, New Jersey.
Download at InfoClearinghouse.com
66
© 2001 Gilberto E. Urroz
All rights reserved
Middleton, G.V., 2000, "Data Analysis in the Earth Sciences Using Matlab®," Prentice Hall, Upper Saddle River, New
Jersey.
Montgomery, D.C., G.C. Runger, and N.F. Hubele, 1998, "Engineering Statistics," John Wiley & Sons, Inc.
Newland, D.E., 1993, "An Introduction to Random Vibrations, Spectral & Wavelet Analysis - Third Edition," Longman
Scientific and Technical, New York.
Nicols, G., 1995, “Introduction to Nonlinear Science,” Cambridge University Press, Cambridge CB2 2RU, U.K.
Parker, T.S. and L.O. Chua, , "Practical Numerical Algorithms for Chaotic Systems,” 1989, Springer-Verlag, New York.
Peitgen, H-O. and D. Saupe (editors), 1988, "The Science of Fractal Images," Springer-Verlag, New York.
Peitgen, H-O., H. Jürgens, and D. Saupe, 1992, "Chaos and Fractals - New Frontiers of Science," Springer-Verlag, New
York.
Press, W.H., B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, 1989, “Numerical Recipes - The Art of Scientific
Computing (FORTRAN version),” Cambridge University Press, Cambridge CB2 2RU, U.K.
Raghunath, H.M., 1985, "Hydrology - Principles, Analysis and Design," Wiley Eastern Limited, New Delhi, India.
Recktenwald, G., 2000, "Numerical Methods with Matlab - Implementation and Application," Prentice Hall, Upper
Saddle River, N.J., U.S.A.
Rothenberg, R.I., 1991, "Probability and Statistics," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace
Jovanovich, Publishers, San Diego, CA.
Sagan, H., 1961,"Boundary and Eigenvalue Problems in Mathematical Physics," Dover Publications, Inc., New York.
Spanos, A., 1999,"Probability Theory and Statistical Inference - Econometric Modeling with Observational Data,"
Cambridge University Press, Cambridge CB2 2RU, U.K.
Spiegel, M. R., 1971 (second printing, 1999), "Schaum's Outline of Theory and Problems of Advanced Mathematics for
Engineers and Scientists," Schaum's Outline Series, McGraw-Hill, New York.
Tanis, E.A., 1987, "Statistics II - Estimation and Tests of Hypotheses," Harcourt Brace Jovanovich College Outline
Series, Harcourt Brace Jovanovich, Publishers, Fort Worth, TX.
Tinker, M. and R. Lambourne, 2000, "Further Mathematics for the Physical Sciences," John Wiley & Sons, LTD.,
Chichester, U.K.
Tolstov, G.P., 1962, "Fourier Series," (Translated from the Russian by R. A. Silverman), Dover Publications, New York.
Tveito, A. and R. Winther, 1998, "Introduction to Partial Differential Equations - A Computational Approach," Texts in
Applied Mathematics 29, Springer, New York.
Urroz, G., 2000, "Science and Engineering Mathematics with the HP 49 G - Volumes I & II", www.greatunpublished.com,
Charleston, S.C.
Urroz, G., 2001, "Applied Engineering Mathematics with Maple", www.greatunpublished.com, Charleston, S.C.
Winnick, J., , "Chemical Engineering Thermodynamics - An Introduction to Thermodynamics for Undergraduate
Engineering Students," John Wiley & Sons, Inc., New York.
Download at InfoClearinghouse.com
67
© 2001 Gilberto E. Urroz
All rights reserved