Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
Foundations of statistics wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Statistical Inference with SCILAB By Gilberto E. Urroz, Ph.D., P.E. Distributed by i nfoClearinghouse.com ©2001 Gilberto E. Urroz All Rights Reserved A "zip" file containing all of the programs in this document (and other SCILAB documents at InfoClearinghouse.com) can be downloaded at the following site: http://www.engineering.usu.edu/cee/faculty/gurro/Software_Calculators/Scil ab_Docs/ScilabBookFunctions.zip The author's SCILAB web page can be accessed at: http://www.engineering.usu.edu/cee/faculty/gurro/Scilab.html Please report any errors in this document to: [email protected] STATISTICAL INFERENCE 3 Definitions 3 Estimation of Confidence Intervals Sampling distribution of the mean Examples of confidence intervals for the mean Confidence interval for proportions Example of proportion confidence interval for a large sample Example of proportion confidence interval for a small sample Sampling Distribution of Differences and Sums of Statistics Confidence intervals Interval Estimation for the Variance 3 4 5 6 7 7 8 9 10 Hypothesis Testing Procedure for hypothesis testing Errors in hypothesis testing Power of hypothesis testing Selecting the values of α and β 12 12 13 13 14 Hypothesis testing involving mean values Hypothesis testing on one mean Case I: Knowing σ , or large sample if σ unknown Case II: Small sample with unknown σ A function to perform hypothesis testing on one mean Examples of hypothesis testing on one mean Hypothesis testing on one proportion Examples of hypothesis testing on one proportion 14 14 14 15 16 18 21 22 Hypothesis testing on two means Testing the difference between two means using known variances Testing the differences between two means when the variances are unknown but equal Testing the difference between two means when the variances are unknown and unequal A user-defined SCILAB function for hypothesis testing on two means Examples of application of function htestmu2 23 23 23 24 24 26 Testing the difference between two proportions A function for hypothesis testing on two proportions Examples of application of function htestmu2 33 34 35 Characteristic and power equations 35 Hypothesis testing on one variance A function for hypothesis testing on one variance Examples of application of function htestsigma1 37 38 39 Hypothesis testing on two variances A function for hypothesis testing with two variances Examples of application of function htestsigma2 40 41 43 Download at InfoClearinghouse.com 1 © 2001 Gilberto E. Urroz All rights reserved Chi-square criteria for goodness of fitting Examples of goodness-of-fitting for the normal distribution Examples of goodness-of-fitting for the beta distribution 48 49 50 Chi-square criteria for R× ×C tables 58 Exercises 60 REFERENCES (FOR ALL SCILAB DOCUMENTS AT INFOCLEARINGHOUSE.COM) Download at InfoClearinghouse.com 2 66 © 2001 Gilberto E. Urroz All rights reserved Statistical Inference Statistical inference involves the analysis of estimators of population parameters based on the statistics of samples, as well as the testing of hypotheses about those parameters. In this chapter we define point estimators and learn how to produce confidence intervals about those point estimators. We also introduce hypothesis testing on one or two means, and on one or two variances. Finally, we present some applications of the Chi-square distribution for statistical inference. Definitions A population constitutes the collection of all conceivable results of a random process, while a sample is a sub-set of a population. Typically, it is very difficult or impractical to evaluate the entire population for a given parameter. Therefore, we select one or more samples out of the population to analyze. In order for the sample to be representative of the population, it must be random, i.e., each element of the sample should have the same probability of being chosen. If such condition is not fulfilled, the sample is said to be biased, and the information obtained from such a sample will most likely be useless in estimating population parameters. In Chapter … we introduced the concept of random variables and their probability distributions. A measurement on a given population follows a given probability distribution. If the distribution depends on a parameter θ, a random sample of observations { X1, X2, …, Xn } of size n can be used to estimate θ. Each observation X1, X2, …, Xn, represents a random variable. The joint probability distribution of the n observations is referred to as a sampling distribution. A statistic of a sample is a function of the observations that does not contain any unknown parameter, e.g., the mean of the sample. Statistics of a sample provide means of estimating parameters of the population from which the sample originated. Thus, a single value of a given sample statistic, say θˆ , constitutes a point estimator of the corresponding population parameter, θ. A confidence interval is an interval that contains the parameter θ at a certain level of probability. Estimation of Confidence Intervals A confidence interval is determined by two statistics, Cl and Cu, which define an interval containing the parameter θ with a certain level of probability. The end points of the interval are known as confidence limits, and the interval (Cl,Cu) is known as the confidence interval. Let (Cl,Cu) be a confidence interval containing an unknown parameter θ. The confidence level or confidence coefficient is the quantity (1- α), where 0 < α < 1, such that Pr[ Cl < θ < Cu ] = 1 - α. This relationship defines two-sided confidence limits. A lower one-sided confidence interval is defined by Pr[ Cl < θ ] = 1 - α . An upper one-sided confidence interval is defined by Pr[ θ < Cu ] = 1 - α . Typical values of α are 0.01, 0.05, 0.1, corresponding to confidence levels of 0.99, 0.95, and 0.90, respectively. Download at InfoClearinghouse.com 3 © 2001 Gilberto E. Urroz All rights reserved Sampling distribution of the mean Let be the mean of a random sample of size n drawn from a population with known standard deviation σ . The 100(1- α ) % [i.e., 99%, 95%, 90%, etc.] central two-sided confidence interval for the population mean µ is ( - z α 2 zα σ , n + z α 2 σ ), where n α is a standard normal variate that is exceeded with a probability of . The location of 2 2 the value zα/2 is illustrated in the figure below using the plot of the standard normal probability density function. σ . The one-sided upper and n lower 100(1- α ) % confidence limits for the population mean µ are, respectively: The standard error of the sample mean, + zα σ n ,is and = - zα σ . n The previous result assumes that the standard deviation of the population, σ, is known. If the population standard deviation is not known the sample mean follows the Student’s t distribution with ν = n − 1 degrees of freedom where n is the size of a random sample. If n Download at InfoClearinghouse.com 4 © 2001 Gilberto E. Urroz All rights reserved > 30 then the Student t distribution can be approximated by the standard normal distribution, φ( z ) . A sample with size n>30 is called a large sample. Let and S be the mean and standard deviation of a random sample of size n drawn from a population that follows the normal distribution with unknown standard deviation σ . The 100(1- α ) % [i.e., 99%, 95%, 90%, etc.] central two-sided confidence interval for the population mean µ is (X − t where t n − 1, α 2 α n −1, 2 ⋅ s n ,X +t α n −1, 2 ⋅ s n ), is Student's t variate with n − 1 degrees of freedom and probability α of 2 excedence. The one-sided upper and lower 100(1- α ) % confidence limits for the population mean µ are, respectively, as follows: X +t n −1, α 2 ⋅ s n X −t , and n −1, α 2 ⋅ s n . Examples of confidence intervals for the mean Example 1 - Known population variance. A sample of 25 fuses is used to determine the electric current at which the fuse fails. The average current for the 25 fuses is calculated to be 180.5 mA. If the sample is known to come from a factory such that the standard deviation of the current at failure point is 5 mA, determine the 95% confidence interval for the mean value of the electric current. The data provided is translated as n = 25,X = 180.5, σ = 5, α = 0.05. To calculate the confidence interval we need to find zα/2 from P(Z>zα/2) = α/2, or P(Z>zα/2) = 1 - α/2, where Z ~ Normal(0,1), i.e., the standard normal distribution. To calculate zα/2 we can use SCILAB’s own cdfnor function with the following call: z_alpha_2 = cdfnor(“X”,0,1,1-alpha/2,alpha/2) or, if you have the statistical toolbox STIXBOX available, you can use: z_alpha_2 = qnorm(1-alpha/2,0,1) The SCILAB calculations for this problem will proceed as follows: -->n=25;xbar=180.5;sigma=5;alpha=0.05; -->z_alpha_2=cdfnor('X',0,1,1-alpha/2,alpha/2) z_alpha_2 = 1.959964 -->CL=xbar-z_alpha_2*sigma/sqrt(n) CL = 178.54004 Download at InfoClearinghouse.com 5 © 2001 Gilberto E. Urroz All rights reserved -->CU=xbar+z_alpha_2*sigma/sqrt(n) CU = 182.45996 Alternatively, za/2 can be calculated using STIXBOX’s function qnorm: -->z_alpha_2 = qnorm(1-alpha/2,0,1) z_alpha_2 = 1.959964 Example 2 - Small sample with unknown population variance. A sample of 10 carbon composite cylinders indicate that the mean value of the carbon content in each cylinder is 0.65, with a sample standard deviation of 0.05. Determine the 90% confidence interval for the carbon content. The data provided is interpreted as follows: n=10,X = 0.65, s = 0.05, α = 0.10. To calculate the confidence interval we need to find tn-1,α/2 from P(T>tα/2) = α/2, or P(T>tα/2) = 1 - α/2, where T ~ Student t(ν = n-1), i.e., the Student t distribution with ν = n-1 degrees of freedom. To calculate tn-1,α/2 we can use SCILAB’s own cdfnor function with the following call: t_alpha_2=cdft("T",n-1,1-alpha/2,alpha/2) or, if you have the statistical toolbox STIXBOX available, you can use: t_alpha_2 = qt(1-alpha/2,n-1) The SCILAB calculations for this problem will proceed as follows: -->n=10;xbar=0.65;s=0.05;alpha=0.10; -->t_alpha_2 = cdft('T',n-1,1-alpha/2,alpha/2) t_alpha_2 = 1.8331129 -->CU=xbar+t_alpha_2*s/sqrt(n) CU .6789841 -->CL=xbar-t_alpha_2*s/sqrt(n) CL = .6210159 The value ta/2 can be obtained using STIXBOX’s function qt: -->t_alpha_2 = qt(1-alpha/2,n-1) t_alpha_2 = 1.8331129 Confidence interval for proportions Let X ~ Bernoulli(p), where p is the probability of success, then E[X] = p, Var[X] = p(1-p). If an experiment involving X is repeated n times and k successful outcomes are recorded, then an ^ estimate of p is given by p= ^ k , while the standard error of p is n = p (1 − p) . In n ^ practice, the sample estimate for p, i.e., p , replaces p in the standard error formula. For a large sample size, n>30, and np > 5 and n(1-p)>5, the sampling distribution is very nearly Download at InfoClearinghouse.com 6 © 2001 Gilberto E. Urroz All rights reserved normal, i.e., The 100(1- α ) % central two-sided confidence interval for the population mean p is ( ^p - zα/2 σ^p, ^p + zα/2 σ^p). Example of proportion confidence interval for a large sample Suppose an irrigation engineer keeps track of the number of days during a 90-day period in late spring and early summer in which significant rainfall is available as to not needing to activate an irrigation sprinkler system in an orchard. Observations taken at random during 70 days in the last three summers indicate that enough rainfall was recorded only during 20 out of those 70 days. Determine the 90% confidence interval for the proportion p of the number of days where enough rainfall is available in the orchard. Estimate for p and the standard error are calculated as: -->k=20;n=90;alpha=0.10; -->p_hat = k/n p_hat = .2222222 -->sigma_p_hat =sqrt(p_hat*(1-p_hat)/n) sigma_p_hat = .0438228 The parameter z is obtained from: α 2 -->z_alpha_2 = qnorm(1-alpha/2,0,1) z_alpha_2 = 1.6448536 The lower and upper limits of the confidence interval are: -->CL=p_hat-z_alpha_2*sigma_p_hat CL = .1501401 -->CU=p_hat+z_alpha_2*sigma_p_hat CU = .2943043 For a small sample, n<30, we can estimate a confidence interval using: ( - t n − 1, α 2 , + t n − 1, α 2 ). Example of proportion confidence interval for a small sample The same engineer has kept data belonging to the early fall rainfall. His records, however, include only 25 days in the last three years, and they indicate that 20 out of those 25 days there was sufficient rainfall as to turn off the sprinkler system. Estimate the 90% confidence interval for the proportion of days with sufficient rainfall. The following is the SCILAB solution for this problem: -->k=20;n=25;alpha=0.10; Download at InfoClearinghouse.com 7 © 2001 Gilberto E. Urroz All rights reserved -->p_hat = k/n p_hat = .8 -->sigma_p_hat =sqrt(p_hat*(1-p_hat)/n) sigma_p_hat = .08 -->t_alpha_2 = cdft('T',n-1,1-alpha/2,alpha/2) t_alpha_2 = 1.7108821 -->CL=p_hat-t_alpha_2*sigma_p_hat CL = .6631294 -->CU=p_hat+t_alpha_2*sigma_p_hat CU = .9368706 Sampling Distribution of Differences and Sums of Statistics Let S1 and S2 be independent statistics from two populations based on samples of sizes n1 and n2 , respectively. Also, let the respective means and standard errors of the sampling distributions of those statistics be µS and µS , and σS and σS , respectively. 1 2 1 2 The differences between the statistics from the two populations have a sampling distribution with mean µS 1 −S 2 = µS − µS , 1 2 and standard error σS The sum of the statistics 1 −S = 2 2 σS + σS 1 2 . 2 S1 + S2 has a mean µS 1 +S 2 = µS + µS , 1 2 and standard error σS 1 +S = 2 2 σS + σS 1 2 2 Estimators for the mean and standard deviation of the difference and sum of the statistics and S1 S2 are given by: µˆ S1 ± S2 = X 1 ± X 2 ;σˆ S1 ± S2 = Download at InfoClearinghouse.com 8 σ S21 n1 + σ S22 n2 . © 2001 Gilberto E. Urroz All rights reserved Confidence intervals For large 2 samples, i.e., 30 ≤ n1 and 30 ≤ n2 , and assuming that the variances σS and 1 2 σS are known, the confidence intervals for the difference and sum of the statistics S , S 1 2 2 are given by: σ S21 σ S22 σ S21 σ S22 (X − X ) − z ; ( X X ) z + − + + α/2 α /2 2 1 2 1 n1 n2 n1 n2 and σ S21 σ S22 σ S21 σ S22 (X + X ) − z + ; ( X + X ) + z + , α /2 α /2 2 1 2 1 n1 n2 n1 n2 respectively. If one of the samples is 2 small, i.e., n1 < 30 or n2 < 30 , or if the variances σS and 1 2 σS are unknown, the confidence intervals for the difference and sum of the statistics S , S 1 2 2 are given by: S12 S 22 S12 S 22 (X − X ) − t + − + + ; ( X 1 X 2 ) tν ,α / 2 ν ,α / 2 2 1 n n n n 2 1 2 1 and S12 S 22 S12 S 22 (X + X ) − t + + + + ; ( X X ) t , ν ,α / 2 ν ,α / 2 2 1 2 1 n n n n 1 2 1 2 respectively, where ν = n1 + n2 − 2 is the number of degrees of freedom in the variate tν . Examples of confidence intervals for sum and difference of means An industrial process consists of two consecutive steps taking times X1 and X2, respectively, for completion. Measurements from 20 repetitions indicate that the first step takes an average ofX1 = 45 minutes with a standard deviation of S1 = 10 minutes, while measurements from 15 repetitions indicate that the second step takes an average of X2 = 65 minutes with a standard Download at InfoClearinghouse.com 9 © 2001 Gilberto E. Urroz All rights reserved deviation of S2 = 5 minutes. Determine the 99% confidence interval for total process time, XT = X1 + X2. Using SCILAB we proceed as follows: -->n1 = 20; X1bar = 45; S1 = 10; n2 = 15; X2bar = 65; S2 = 5; -->XTbar = X1bar + X2bar XTbar = 110. -->sigmaTbar = sqrt(S1^2/n1+S2^2/n2) sigmaTbar = 2.5819889 -->t_alpha_2 = cdft('T',n1+n2-1,1-alpha/2,alpha/2) t_alpha_2 = 1.6909243 -->CL=XTbar-t_alpha_2*sigmaTbar CL = 105.63405 -->CU=XTbar+t_alpha_2*sigmaTbar CU = 114.36595 Interval Estimation for the Variance Consider a random sample X1, X2 , ..., Xn of independent normally distributed variables with mean µ , variance σ , and sample mean 2 . The statistic 1 n Sˆ 2 = ( X i − X )3 ∑ n − 1 i =1 is an unbiased estimator of the variance σ . The quantity, 2 (n − 1) Sˆ 2 1 = 2 2 σ σ n ∑(X i =1 i − X )2 follows the χ2 distribution with ν = n - 1 degrees of freedom. The ( 1 − α )*100 % two-sided confidence interval for the variance σ2 is found from Sˆ 2 P χ n2−1,1−α / 2 ≤ (n − 1) 2 ≤ χ n2−1,α / 2 = 1 − α , σ as illustrated in the figure below. Download at InfoClearinghouse.com 10 © 2001 Gilberto E. Urroz All rights reserved The confidence interval for the population variance σ 2 is therefore, (n − 1) Sˆ 2 (n − 1) Sˆ 2 . , 2 χ2 χ n −1,1−α / 2 n−1,α / 2 where χ α 2 and 1 − 2 and χ α 1− 2 2 are the values that a χ n−1 2 variable exceeds with probabilities α 2 α , respectively. 2 The one-sided upper confidence limit for σ 2 is defined as (n − 1) Sˆ 2 . χ n2−1,1−α Two-sided and upper 99% confidence limit for the standard deviation Suppose that the compressive strengths of 40 test concrete cubes have an estimated standard deviation of 5.02 N/mm2. We will determine the two-sided and the upper 99% confidence limits as follows: -->n=40;s=5.02;alpha=0.01; -->Chi_alpha_2 = cdfchi('X',n-1,1-alpha/2,alpha/2) Chi_alpha_2 = 65.475571 -->Chi_1_alpha_2=cdfchi('X',n-1,alpha/2,1-alpha/2) Chi_1_alpha_2 = 19.995868 Download at InfoClearinghouse.com 11 © 2001 Gilberto E. Urroz All rights reserved -->CL=(n-1)*s^2/Chi_alpha_2 CL = 15.010417 -->CU=(n-1)*s^2/Chi_1_alpha_2 CU = 49.150935 Hypothesis Testing A hypothesis is a declaration made about a population (for instance, with respect to its mean). Acceptance of the hypothesis is based on a statistical test on a sample taken from the population. The consequent action and decision-making are called hypothesis testing. The process consist on taking a random sample from the population and making a statistical hypothesis about the population. If the observations do not support the model or theory postulated, the hypothesis is rejected. However, if the observations are in agreement, then hypothesis is not rejected, but it is not necessarily accepted. Associated with the decision is a level of significance α . This is complementary to the probability used earlier for setting confidence limits. The initial assumption of a significance level removes any subjectivity in the decision making process, i.e., two or more investigators will reach the same conclusion based on the same data. Hypothesis testing therefore involves procedures for rejecting or not rejecting a statement, and the chances of making incorrect decisions of either kind, i.e., rejecting if the hypothesis is true or accepting if the hypothesis is false Procedure for hypothesis testing The procedure for hypothesis testing involves the following six steps: 1. Declare a null hypothesis, H0 . This is the hypothesis to be tested. For example, H0 : µ1 − µ2 = 0 , i.e., we hypothesize that the mean value of population 1 and the mean value of population 2 are the same. If H0 is true, any observed difference in means is attributed to errors in random sampling. 2. Declare an alternative hypothesis, H1 . For the example under consideration, it could be H1 : µ1 − µ2 ≠ 0 . [Note: this is what we really want to test.] 3. Determine or specify a test statistic, T. In the example under consideration T will be based on X1 - X2, the difference of observed means. 4. Use the known (or assumed) distribution of the test statistic, T. 5. Define a rejection region (the critical region, R) for the test statistic based on a preassigned significance level α . Download at InfoClearinghouse.com 12 © 2001 Gilberto E. Urroz All rights reserved 6. Use observed data to determine whether the computed value of the test statistic is within or outside the critical region. If the test statistic is within the critical region then we say that the quantity we are testing is significant at the 100 α percent level. Notes: 1. For the example under consideration, the alternate hypothesis H1 : µ1 − µ2 ≠ 0 produces what is called a two-tailed test. If the alternate hypothesis is H1 : 0 < µ1 − µ2 , or H1 : µ1 − µ2 < 0 , then we have a one-tailed test. 2. The probability of rejecting the null hypothesis is equal to the level of significance, i.e., Pr[ | H0 ] = α . Errors in hypothesis testing In hypothesis testing we use the terms errors of Type I and Type II to define the cases in which a true hypothesis is rejected or a false hypothesis is accepted (not rejected), respectively. Let T = value of test statistic, R = rejection region, A = acceptance region, thus, , and , where Ω = the parameter space for T. The probabilities of making an error of Type I or of Type II are as follows: Rejecting a true hypothesis, P[Type I error] = P[ | H0 ] = α Not rejecting a false hypothesis, P[Type II error] = P[ | H1 ] = β . Now, let's consider the cases in which we make the correct decision: Not rejecting a true hypothesis, P[Not(Type I error)] = P[ Rejecting a false hypothesis, P[Not(Type II error)] = P [ | H0 ] = 1 - α . | H1 ] = 1 - β . Power of hypothesis testing The complement of β is called the power of the test of the null hypothesis alternative H0 vs. the H1 . The power of a test is used, for example, to determine a minimum sample size to restrict errors. Download at InfoClearinghouse.com 13 © 2001 Gilberto E. Urroz All rights reserved Selecting the values of α and β A typical value of the level of significance (or probability of Type I error) is α = .05 , (i.e., incorrect rejection once in 20 times on the average). If the consequences of a Type I error are more serious, choose smaller values of α , say 0.01 or even 0.001. The value of β , i.e., the probability of making an error of Type II, depends on α , the sample size n , and on the true value of the parameter tested. Thus, the value of β is determined after the hypothesis testing is performed. It is customary to draw graphs showing β or the power of the test ( 1 − β ) as a function of the true value of the parameter tested. These graphs are called operating characteristic curves or power function curves, respectively. Hypothesis testing involving mean values Hypothesis testing on one mean Suppose you want to test the hypothesis that the mean of a population is equal to a certain value, i.e., H0 : µ = µ0 , at a significance level α . We could use three different alternate hypothesis for the test. These are: Two-tailed test: One-tailed tests: H1 : µ ≠ µ0 H1 : µ0 < µ , or H1 : µ < µ0 . A sample of size n is taken from the population yielding a mean value, x, and a standard deviation, sx . Case I: Knowing σ , or large sample if σ unknown Assuming that we know the standard deviation of the population σ , we use the standard normal score: Z = (X- µ )/( σ / as the test statistic. The particular value for the test is n) z0 = (x - µ0 )/( σ / n ) . If the standard deviation of the population, σ , is unknown, but the sample is large ( 30 ≤ n ), we can still use the standard normal score as the test statistic, but replacing σ with sx , i.e., z0 = (x - µ0 )/( sx / n ) Download at InfoClearinghouse.com 14 © 2001 Gilberto E. Urroz All rights reserved Let Φ( z ) be the CDF of the standard normal distribution, i.e., Z~N(0,1). Two-tailed test If using a two-tailed test we will find the value of z , from α 2 α Pr[Z> z ] = 1- Φ z = , or α α 2 2 2 We will reject the null hypothesis, α Φ z α = 1 − . 2 2 H0 , if z < z or if z < −z . α α 0 0 2 2 In other words, the rejection region is R = { z < z }, while the acceptance region is A = α 0 2 { z < z }. α 0 2 One-tailed test If using a one-tailed test we will find the value of Pr[Z> Reject the null hypothesis, µ < µ0 . Case II: zα , from zα ] = 1- Φ( zα ) = α , or Φ ( zα ) = 1 − α . H0 , if zα < z0 , and H1 : µ0 < µ , or if z0 < −zα , and H1 : Small sample with unknown σ If the standard deviation of the population, σ , is unknown and n < 30 (small sample), we use the Student's t score: t = (X- µ )/( Sx / n ), with ν = n − 1 degrees of freedom, as the test statistic. The particular value for the test is, t0 = (x - µ0 )/( sx / n ) . Let Fν ( t ) be the CDF of the Student's t variate with ν degrees of freedom, i.e., t ~Student's t( ν ). Two-tailed test If using a two-tailed test we will find the value of t , from α 2 Download at InfoClearinghouse.com 15 © 2001 Gilberto E. Urroz All rights reserved α Pr[t > t ] = F t = , or α ν α 2 2 Reject the null hypothesis, 2 α Fν t α = 1 − . 2 2 H0 , if t < t or if t < −t . α α 0 0 2 2 In other words, the rejection region is R = { t < t }, while the acceptance region is A = α 0 2 { t < t }. α 0 2 One-tailed test If using a one-tailed test we will find the value of Pr[t> Reject the null hypothesis, µ < µ0 . tα , from tα ] = 1- Fν ( tα ) = α , or Fν ( tα ) = 1 − α . H0 , if tα < t0 , and H1 : µ0 < µ , or if t0 < −tα , and H1 : A function to perform hypothesis testing on one mean The following function, htestmu1, can be used to perform hypothesis testing on one mean. The possible calls to the function are: [xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,xbar,s,sigma,n) [xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,x) A listing of the function follows: function [xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,x,s,sigma,n) //Hypothesis testing on one mean. Possible function calls: // // [xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,xbar,s,sigma,n) // [xbar,s,Ta,T0] = htestmu1(altype,alpha,mu0,x) // // altype can be 'one' - for one-sided alternative hypothesis, // or 'two' - for two sided alternative hypothesis // alpha = level of significance (typical values = 0.01,0.05,0.10) // mu0 = value of population mean being tested, H0:mu = mu0 // x = mean value of a sample (xbar) or vector representing the sample // if x = mean value, then s = standard deviation of sample // if x = vector representing sample, s = standard deviation of population // if x = mean value, sigma = standard deviation of population and // n = sample size if altype<>'one' & altype<>'two' then Download at InfoClearinghouse.com 16 © 2001 Gilberto E. Urroz All rights reserved error('htestmu1 - select type of alternative hypothesis = one or two'); abort; end; [nargout,nargin] = argn(0) if nargin == 5 then if length(x)<1 then error('htestmu1 - x must be a vector'); abort; else sigma = s; n = length(x); xbar = mean(x); s = st_deviation(x); end; else xbar = x; end; printf(' \n'); printf('Hypothesis testing on one mean: ' ... + altype + '-side alternative hypothesis.\n') printf(' \n'); if sigma > 0 & altype=='one' then Ta = cdfnor('X',0,1,1-alpha,alpha); T0 = (xbar-mu0)/(sigma/sqrt(n)); printf('Test parameter used: z'); elseif sigma >0 & altype=='two' then Ta = cdfnor('X',0,1,1-alpha/2,alpha/2); T0 = (xbar-mu0)/(sigma/sqrt(n)); printf('Test parameter used: z'); elseif sigma <=0 & n>=30 & altype=='one' then Ta = cdfnor('X',0,1,1-alpha,alpha); T0 = (xbar-mu0)/(s/sqrt(n)); printf('Test parameter used: z'); elseif sigma <=0 & n<30 & altype=='one' then Ta = cdft('T',n-1,1-alpha,alpha); T0 = (xbar-mu0)/(s/sqrt(n)); printf('Test parameter used: t'); elseif sigma <=0 & n>=30 & altype=='two' then Ta = cdfnor('X',0,1,1-alpha/2,alpha/2); T0 = (xbar-mu0)/(s/sqrt(n)); printf('Test parameter used: z'); else Ta = cdft('T',n-1,1-alpha/2,alpha/2); T0 = (xbar-mu0)/(s/sqrt(n)); printf('Test parameter used: t'); end; if altype == 'two' then if T0>Ta | T0<-Ta then printf('Reject the null hypothesis H0 : mu = %f\n',mu0); else printf('Do not reject the null hypothesis H0 : mu = %f\n',mu0); end else if T0>Ta then printf('Reject the null hypothesis H0 : mu = %f\n',mu0); printf('if the alternative hypothesis is H1 : mu > %f\n',mu0); elseif T0<-Ta printf('Reject the null hypothesis H0 : mu = %f\n',mu0); Download at InfoClearinghouse.com 17 © 2001 Gilberto E. Urroz All rights reserved printf('if the alternative hypothesis is H1 : mu < %f\n',mu0); else printf('Do not reject the null hypothesis H0 : mu = %f\n',mu0); end end; printf(' \n'); Examples of hypothesis testing on one mean In the following examples values of n , x, sx , µ0 , and α are provided and the hypothesis testing performed by using function htestmu1. Example 1. Test the null hypothesis H0:µ = 2.0, against an one-sided alternative hypothesis using data from a sample of size 15, with a sample mean of 2.5 and sample standard deviation sx = 3.5 for a confidence level α = 0.05. Assume that the population standard deviation is known, σ = 1.5. -->n=15;xbar=2.5;sx=3.5;mu0=2.0;alpha=0.05;sigma=1.5; -->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,xbar,sx,sigma,n) Hypothesis testing on one mean: one-side alternative hypothesis. Test parameter used: z Do not reject the null hypothesis H0 : mu = 2.000000 T0 = Ta = s = xbar 1.2909944 1.6448536 3.5 = 2.5 Example 2. Test the same null hypothesis as in Example 1 but using a two-sided alternative hypothesis. -->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,xbar,sx,sigma,n) Hypothesis testing on one mean: two-side alternative hypothesis. Test parameter used: z Do not reject the null hypothesis H0 : mu = 2.000000 T0 = 1.2909944 Ta = 1.959964 s = 3.5 xbar = 2.5 Example 3. Test the null hypothesis H0:µ = 6.0 against a one-sided alternative hypothesis based on a sample of size 45 (large sample) with a sample mean 0f 12.3 and a sample standard deviation of 2.0 at a confidence level of 0.05. The population standard deviation is not known. -->n=45;xbar=12.3;sx=2.0;mu0=6.0;alpha=0.05;sigma=0.0; -->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,xbar,sx,sigma,n) Hypothesis testing on one mean: one-side alternative hypothesis. Download at InfoClearinghouse.com 18 © 2001 Gilberto E. Urroz All rights reserved Test parameter used: z Reject the null hypothesis H0 : mu = 6.000000 if the alternative hypothesis is H1 : mu > 6.000000 T0 = 21.130842 Ta = 1.6448536 s = 2. xbar = 12.3 Example 4. Test the null hypothesis of Example 3 against a two-sided hypothesis. -->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,xbar,sx,sigma,n) Hypothesis testing on one mean: two-side alternative hypothesis. Test parameter used: z Reject the null hypothesis H0 : mu = 6.000000 T0 = 21.130842 Ta = 1.959964 s = 2. xbar = 12.3 Example 5. Test the null hypothesis H0:µ = 14.0 against a one-sided alternative hypothesis based on a sample of size 10 (small sample) with a sample mean of 11 and a sample standard deviation of 1.5 at a level of significance of 0.01. The standard deviation of the population is unknown. -->n=10;xbar=11;sx=1.5;mu0=14.0;alpha=0.01;sigma=0.0; -->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,xbar,sx,sigma,n) Hypothesis testing on one mean: one-side alternative hypothesis. Test parameter used: t Reject the null hypothesis H0 : mu = 14.000000 if the alternative hypothesis is H1 : mu < 14.000000 T0 = - 6.3245553 Ta = 2.8214379 s = 1.5 xbar = 11. Example 6. Test the null hypothesis of Example 5 against a two-sided alternative hypothesis. -->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,xbar,sx,sigma,n) Hypothesis testing on one mean: two-side alternative hypothesis. Test parameter used: t Reject the null hypothesis H0 : mu = 14.000000 T0 = - 6.3245553 Ta = 3.2498355 s = 1.5 xbar = 11. Download at InfoClearinghouse.com 19 © 2001 Gilberto E. Urroz All rights reserved Example 7. For the vector of data, X, generated below, test the null hypothesis H0:µ = 23 against a one-sided alternative hypothesis at the significance level α = 0.01. The population standard deviation is assumed to be known, σ = 5. -->X = int(100*rand(1,10)) X = ! 30. 93. 21. 31. 36. 29. 56. 48. 33. 59. ! -->alpha = 0.01;mu0=23;sigma=5; -->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,X,sigma) Hypothesis testing on one mean: one-side alternative hypothesis. Test parameter used: z Reject the null hypothesis H0 : mu = 23.000000 if the alternative hypothesis is H1 : mu > 23.000000 T0 = 13.028584 Ta = 2.3263479 s = 21.313532 xbar = 43.6 Example 8. Test the null hypothesis of Example 7 against a two-sided alternative hypothesis. -->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,X,sigma) Hypothesis testing on one mean: two-side alternative hypothesis. Test parameter used: z Reject the null hypothesis H0 : mu = 23.000000 T0 = 13.028584 Ta = 2.5758293 s = 21.313532 xbar = 43.6 -->[xbar,s,Ta,T0]=htestmu1('one',alpha,mu0,X,-1) Example 9. Test the null hypothesis of Example 7 against a one-sided alternative hypothesis assuming that the population standard deviation is not known. Hypothesis testing on one mean: one-side alternative hypothesis. Test parameter used: t Reject the null hypothesis H0 : mu = 23.000000 if the alternative hypothesis is H1 : mu > 23.000000 T0 = 3.0564112 Ta = 2.8214379 s = 21.313532 xbar = 43.6 Example 10. Test the null hypothesis of Example 7 against a two-sided alternative hypothesis assuming that the population standard deviation is not known. -->[xbar,s,Ta,T0]=htestmu1('two',alpha,mu0,X,-1) Download at InfoClearinghouse.com 20 © 2001 Gilberto E. Urroz All rights reserved Hypothesis testing on one mean: two-side alternative hypothesis. Test parameter used: t Do not reject the null hypothesis H0 : mu = 23.000000 T0 = 3.0564112 Ta = 3.2498355 s = 21.313532 xbar = 43.6 Hypothesis testing on one proportion Suppose that we want to test the null hypothesis, H0 : p = p0 , where p represents the probability of obtaining a successful outcome in any given repetition of a Bernoulli trial. To test the hypothesis, we perform n repetitions of the experiment, and find that k successful outcomes are recorded. Thus, an estimate of p is given by ^p = k/n. The standard deviation for the sample will be estimated as sp = pˆ ⋅ (1 − pˆ ) = n k (n − k) n3 Assume that the Z score, Z = ( p - p0 )/ sp follows the standard normal distribution, i.e., Z ~ N(0,1). The particular value of the statistic to test is z0 = (^p-p0)/sp. We could use three different alternate hypothesis for the test. These are: Two-tailed test: One-tailed tests: H1 : p ≠ p0 H1 : p0 < p , or H1 : p < p0 . Two-tailed test If using a two-tailed test we will find the value of z , from α 2 α Pr[Z> z ] = 1- Φ z = , or α 2 α 2 2 Reject the null hypothesis, α Φ z α = 1 − . 2 2 H0 , if z < z or if z < −z . α α 0 0 2 Download at InfoClearinghouse.com 2 21 © 2001 Gilberto E. Urroz All rights reserved In other words, the rejection region is R = { z < z }, while the acceptance region is A = α 0 2 { z < z }. α 0 2 One-tailed test If using a one-tailed test we will find the value of Pr[Z> Reject the null hypothesis, p < p0 . zα , from zα ] = 1- Φ( zα ) = α , or Φ ( zα ) = 1 − α . H0 , if zα < z0 , and H1 : p0 < p , or if z0 < −zα , and H1 : Examples of hypothesis testing on one proportion Example 1. Test the null hypothesis H0:p0 = 0.25 against a one-sided alternative hypothesis based on 100 repetitions of a test out of which 20 successful outcomes are recorded using a significance level of 0.05. -->k=20;n=100;p0=0.25;alpha=0.05; -->[p_hat,sigma,Ta,T0]=htestprop1('one',alpha,p0,k,n) Hypothesis testing on one proportion: one-side alternative hypothesis. Test parameter used: z Do not reject the null hypothesis H0 : p = .250000 T0 = - 1.25 Ta = 1.6448536 sigma = .04 p_hat = .2 Example 2. Test the same null hypothesis as in Example 1 against a two-sided alternative hypothesis. -->[p_hat,sigma,Ta,T0]=htestprop1('two',alpha,p0,k,n) Hypothesis testing on one proportion: two-side alternative hypothesis. Test parameter used: z Do not reject the null hypothesis H0 : p = .250000 T0 = - 1.25 Ta = 1.959964 sigma = .04 p_hat = .2 Download at InfoClearinghouse.com 22 © 2001 Gilberto E. Urroz All rights reserved Hypothesis testing on two means Assume that we have two populations, population 1 and population 2, with mean values µ1 and µ2 , respectively, and with standard deviations, σ1 and σ2 , respectively. A sample of size n1 is taken out of population 1 yielding a mean value x1 and standard deviation s1 . Similarly, a sample of size standard deviation n2 is taken out of population 2 yielding a mean value x2 and s2 . Testing the difference between two means using known variances If both population 1 and population 2 are normal, the statistic Z= ( X 1 − X 2 ) − ( µ1 − µ 2 ) σ X1 − X 2 has a N(0,1) distribution. The standard error of the difference between the two means is: σ1 = 2 n1 + σ2 2 n2 . The criteria for rejection of the null hypothesis, H0:µ1− µ 2 = δ, is the same as for a single mean value, µ = µ1− µ 2 = δ. Testing the differences between two means when the variances are unknown but equal This could be the case in which the two samples are taken from the same population, or when there is evidence that the standard deviation of two different populations are the same. In this case, we obtain a "pooled estimate" of the common standard deviation of the two populations, σ , as: 2 2 sp = ( n1 − 1 ) s1 + ( n2 − 1 ) s2 n1 + n2 − 2 2 . Then, the random variable has the Student's t distribution with Download at InfoClearinghouse.com ν = n1 + n2 − 2 degrees of freedom. 23 © 2001 Gilberto E. Urroz All rights reserved Testing the difference between two means when the variances are unknown and unequal For observations taken from normal populations with unknown and unequal variances, the statistic has an approximate Student's t distribution with degrees of freedom. For the last two cases in which a t-parameter is used for the test, the criteria for rejection of the null hypothesis, H0:µ1− µ 2 = δ, is the same as for a single mean value, µ = µ1− µ 2 = δ. A user-defined SCILAB function for hypothesis testing on two means The procedure for hypothesis testing on two means is coded in the following function, htestmu2, which has possible calls [X1Info,X2Info,sp,nu,T0,Ta] = htestmu2() [X1Info,X2Info,sp,nu,T0,Ta] = htestmu2(Xdata) [X1Info,X2Info,sp,nu,T0,Ta] = htestmu2(X1data,X2data) Xdata, X1data, and X2data are row vectors of data. X1Info is a vector that contains the sample size, n1, the mean value, x1bar, and the standard deviation, s1, of sample 1. X2Info is a vector containing n2, x2bar, and s2. The value sp represents the standard deviation for the two samples which could be the value σX1-X2 or sp, as defined above. The value nu represents the degrees of freedom of the Student’s t distribution, is a t parameter is used in the test. T0 is the actual value of the z or t parameter used in the test. Ta represents zα, zα/2, tα, or tα/2, depending on the test parameter used and on the type of alternative hypothesis (one- or twsided) used. The function operates interactively requesting information from the user and provides verbose information on the type of test parameter and alternative hypothesis used, as well as providing a recommendation about the rejection or not-rejection of the null hypothesis. If the function call [X1Info,X2Info,sp,nu,T0,Ta] = htestmu2() is used, the user will be prompted for the summary information on the samples, i.e., the samples sizes, mean values, and standard deviations. If the function call [X1Info,X2Info,sp,nu,T0,Ta] = htestmu2(Xdata) is used, the user is asked to identify the vector Xdata as sample 1 or sample 2, and then is prompted for the summary data for the other sample. Finally, if the function call [X1Info,X2Info,sp,nu,T0,Ta] = htestmu2(X1data,X2data) is used, the function calculates the sample summary data all by itself. The function will also prompt the user for the following information: • The difference of means to be tested, i.e., δ = µ1 − µ2 Download at InfoClearinghouse.com 24 © 2001 Gilberto E. Urroz All rights reserved • • • • The level of confidence of the test, i.e., α The type of alternative hypothesis to be used, i.e., one- or two-sided The standard deviation of the populations, σ1 and σ 2, if known For unknown σ1 and σ 2, the function asks if the user suspects if the values of σ1 and σ2, are equal or not. This helps the function select the t parameter to use. A listing of the function is shown below. function [X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1,X2) [nargout,nargin]=argn(0) if nargin == 0 then X1in = input('For sample 1 enter n, xbar, s :') n1 = X1in(1);x1bar = X1in(2);s1 = X1in(3); X2in = input('For sample 2 enter n, xbar, s :') n2 = X2in(1);x2bar = X2in(2);s2 = X2in(3); elseif nargin == 1 then disp('You entered a vector as input to the function.') disp('Do you want this vector to represent sample 1 or 2?') idsample = input(' ') if idsample == 1 then n1 = length(X1); x1bar = mean(X1); s1 = st_deviation(X1); printf('n1 = %g x1bar = %g s1 = %g',n1,x1bar,s1) X2in = input('For sample 2 enter n, xbar, s :') n2 = X2in(1);x2bar = X2in(2);s2 = X2in(3); else n2 = length(X1); x2bar = mean(X1); s2 = st_deviation(X1); printf('n2 = %g x2bar = %g s2 = %g',n2,x2bar,s2) X1in = input('For sample 1 enter n, xbar, s :') n1 = X1in(1);x1bar = X1in(2);s1 = X1in(3); end else n1 = length(X1); x1bar = mean(X1); s1 = st_deviation(X1); printf('n1 = %g x1bar = %g s1 = %g',n1,x1bar,s1) n2 = length(X2); x2bar = mean(X2); s2 = st_deviation(X2); printf('n2 = %g x2bar = %g s2 = %g',n2,x2bar,s2) end X1Info = [n1,x1bar,s1]; X2Info = [n2,x2bar,s2]; delta = ... input('Enter the difference of population means to be tested:'); disp('Enter the level of confidence, alpha, for the test:') disp('(Typical values: 0.01, 0.05, 0.10)') alpha = input(' '); disp('Enter the type of alternative hypothesis to test:') disp(' 1 - one-sided 2 - two-sided'); atype = input(' '); disp('Enter population standard deviations, sigma1 & sigma2.') disp('Note: Enter zero if sigma1 or sigma2 are unknown.') sigmas = input(''); if sigmas == 0 then sigma1 = 0; sigma2 = 0; else sigma1 = sigmas(1); sigma2 = sigmas(2); end if sigma1<=0 | sigma2<=0 then Download at InfoClearinghouse.com 25 © 2001 Gilberto E. Urroz All rights reserved disp('Do you suspect that the unknown population standard') disp('deviations are equal? If so enter 1, otherwise enter 0') answer = input('') if answer == 1 then nu=int(((s1^2/n1)+(s2^2/n2))^2/((s1^2/n1)^2/(n1-1)+(s2^2/n2)^2/(n2-1))); sp=sqrt(s1^2/n1+s2^2/n2); T0=((x1bar-x2bar)-delta)/sp;ttype = ' t'; else nu=n1+n2-2; sp=sqrt(((n1-1)*s1^2+(n2-1)*s2^2)/nu); T0=((x1bar-x2bar)-delta)/(sp*sqrt(1/n1+1/n2));ttype = ' t'; end if atype == 1 then Ta = cdft('T',nu,1-alpha,alpha) else Ta = cdft('T',nu,1-alpha/2,alpha/2) end else sp=sqrt(sigma1^2/n1+sigma2^2/n2); T0=((x1bar-x2bar)-delta)/sp;ttype = ' z';nu=[]; if atype == 1 then Ta = cdfnor('X',0,1,1-alpha,alpha) else Ta = cdfnor('X',0,1,1-alpha/2,alpha/2) end end; if atype == 1 then printf('Hypothesis testing on two means: one-sided test.\n') printf('Test parameter used:' + ttype +'\n');printf(' \n'); if T0<-Ta then printf('Reject the null hypothesis H0:mu1-mu2=%f, \n',delta); printf('if the alternative hypothesis is H1:mu1-mu2<%f. \n',delta); elseif T0>Ta printf('Reject the null hypothesis H0:mu1-mu2=%f, \n',delta); printf('if the alternative hypothesis is H1:mu1-mu2>%f. \n',delta); else printf('Do not reject the null hypothesis H0:mu1-mu2=%f, \n',delta); end else printf('Hypothesis testing on two means: two-sided test.\n') printf('Test parameter used:' + ttype +'\n');printf(' \n'); if T0<-Ta | T0>Ta then printf('Reject the null hypothesis H0:mu1-mu2=%f, \n',delta); else printf('Do not reject the null hypothesis H0:mu1-mu2=%f, \n',delta); end end; Examples of application of function htestmu2 Example 1. Two samples taken from two different populations are described by the statistics n1 = 100,x1 = 2.3, n2 = 75,x2 = 2.5. The populations are known to have the standard deviations σ1 = 5.5 and σ 2 = 3.0. Test the null hypothesis H0:µ1-µ2 = 0 (i.e., δ = 0) at the level of confidence α = 0.05 against (a) a two-sided alternative hypothesis, and (b) a one-sided alternative hypothesis. [Note: since the populations’ standard deviations are given, we do not need to use the samples’ standard deviation (which are not given, anyway). In this case we Download at InfoClearinghouse.com 26 © 2001 Gilberto E. Urroz All rights reserved simply enter them as zero when prompted by function htestmu2.] The user’s input requested by function htestmu2 is shown in italics: -->Solution to Example 1 - part (a) -->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2() For sample 1 enter n, xbar, s : 100, 2.3, 0.0 For sample 2 enter n, xbar, s : 75, 2.5, 0.0 Enter the difference of population means to be tested: 0 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.05 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Enter population standard deviations, sigma1 & sigma2. Note: Enter zero if sigma1 or sigma2 are unknown. 5.5 3.0 Hypothesis testing on two means: two-sided test. Test parameter used: z Do not reject the null hypothesis H0:mu1-mu2=0.000000, Ta = 1.959964 T0 = - .3076923 nu = [] sp = .65 X2Info = ! 75. 2.5 0. ! X1Info = ! 100. 2.3 0. ! -->Solution to Example 1 - part (b) -->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2() For sample 1 enter n, xbar, s : 100, 2.3, 0.0 For sample 2 enter n, xbar, s : 75, 2.5, 0.0 Enter the difference of population means to be tested: 0 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.05 Enter the type of alternative hypothesis to test: 1 - one-sided Download at InfoClearinghouse.com 2 - two-sided 27 © 2001 Gilberto E. Urroz All rights reserved 1 Enter population standard deviations, sigma1 & sigma2. Note: Enter zero if sigma1 or sigma2 are unknown. 5.5 3.0 Hypothesis testing on two means: one-sided test. Test parameter used: z Do not reject the null hypothesis H0:mu1-mu2=0.000000, Ta = 1.6448536 T0 = - .3076923 nu = [] sp = .65 X2Info = ! 75. 2.5 0. ! X1Info = ! 100. 2.3 0. ! Example 2. Sample 1 is given by X1 = [2.4,3.2,1.1,2.5,4.2,3.6], while sample 2 is characterized by the statistics n2 = 8,x2 = 3.2, s2 = 0.5. Test the null hypothesis H0:µ1-µ2 = 0 (i.e., δ = 0) at the level of confidence α = 0.10 against (a) a two-sided alternative hypothesis, and (b) a onesided alternative hypothesis. The standard deviations of the populations are unknown, and suspected to be different. -->//Example 2 - part(a) -->X1 = [2.4,3.2,1.1,2.5,4.2,3.6] X1 = ! 2.4 3.2 1.1 2.5 4.2 3.6 ! -->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1) You entered a vector as input to the function. Do you want this vector to represent sample 1 or 2? 1 n1 = 6 x1bar = 2.83333 s1 = 1.08566 For sample 2 enter n, xbar, s : 8 3.2 0.5 Enter the difference of population means to be tested: 0 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.10 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Enter population standard deviations, sigma1 & sigma2. Note: Enter zero if sigma1 or sigma2 are unknown. 0 Do you suspect that the unknown population standard deviations are equal? If so enter 1, otherwise enter 0 0 Download at InfoClearinghouse.com 28 © 2001 Gilberto E. Urroz All rights reserved Hypothesis testing on two means: two-sided test. Test parameter used: t Do not reject the null hypothesis H0:mu1-mu2=0.000000, Ta = 1.7822876 T0 = - .8507016 nu = 12. sp = .7980880 X2Info = ! 8. 3.2 .5 ! X1Info = ! 6. 2.8333333 1.0856642 ! -->//Example 2 - part(b) -->X1 = [2.4,3.2,1.1,2.5,4.2,3.6] X1 = ! 2.4 3.2 1.1 2.5 4.2 3.6 ! -->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1) You entered a vector as input to the function. Do you want this vector to represent sample 1 or 2? 1 n1 = 6 x1bar = 2.83333 s1 = 1.08566 For sample 2 enter n, xbar, s : 8 3.2 0.5 Enter the difference of population means to be tested: 0 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.10 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Enter population standard deviations, sigma1 & sigma2. Note: Enter zero if sigma1 or sigma2 are unknown. 0 Do you suspect that the unknown population standard deviations are equal? If so enter 1, otherwise enter 0 0 Hypothesis testing on two means: one-sided test. Test parameter used: t Do not reject the null hypothesis H0:mu1-mu2=0.000000, Ta = 1.3562173 T0 = - .8507016 nu = 12. sp = .7980880 X2Info = ! 8. 3.2 .5 ! X1Info = ! 6. 2.8333333 1.0856642 ! Example 3. Sample 2 is given by X2 = [12.4,13.2,11.1,12.5,14.2,13.6], while sample 1 is characterized by the statistics n1 = 15,x1 = 16.2, s1 = 2.0. Test the null hypothesis H0:µ1-µ2 = Download at InfoClearinghouse.com 29 © 2001 Gilberto E. Urroz All rights reserved 0 (i.e., δ = 0) at the level of confidence α = 0.01 against (a) a two-sided alternative hypothesis, and (b) a one-sided alternative hypothesis. The standard deviations of the populations are unknown, and suspected to be the same. -->//Example 3 - part(a) -->X2= [12.4,13.2,11.1,12.5,14.2,13.6] X2 = ! 12.4 13.2 11.1 12.5 14.2 13.6 ! -->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X2) You entered a vector as input to the function. Do you want this vector to represent sample 1 or 2? n2 = 6 x2bar = 12.8333 s2 = 1.08566 For sample 1 enter n, xbar, s : 15 16.2 2.0 Enter the difference of population means to be tested: 0 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.01 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Enter population standard deviations, sigma1 & sigma2. Note: Enter zero if sigma1 or sigma2 are unknown. 0 Do you suspect that the unknown population standard deviations are equal? If so enter 1, otherwise enter 0 1 Hypothesis testing on two means: two-sided test. Test parameter used: t Reject the null hypothesis H0:mu1-mu2=0.000000, Ta = 2.9207816 T0 = 4.9471778 nu = 16. sp = .6805227 X2Info = ! 6. 12.833333 1.0856642 ! X1Info = ! 15. 16.2 2. ! -->//Example 3 - part(b) -->X2= [12.4,13.2,11.1,12.5,14.2,13.6] X2 = ! 12.4 13.2 11.1 Download at InfoClearinghouse.com 12.5 14.2 13.6 ! 30 © 2001 Gilberto E. Urroz All rights reserved -->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X2) You entered a vector as input to the function. Do you want this vector to represent sample 1 or 2? n2 = 6 x2bar = 12.8333 s2 = 1.08566 For sample 1 enter n, xbar, s : 15 16.2 2.0 Enter the difference of population means to be tested: 0 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.01 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 1 Enter population standard deviations, sigma1 & sigma2. Note: Enter zero if sigma1 or sigma2 are unknown. 0 Do you suspect that the unknown population standard deviations are equal? If so enter 1, otherwise enter 0 1 Hypothesis testing on two means: one-sided test. Test parameter used: t Reject the null hypothesis H0:mu1-mu2=0.000000, if the alternative hypothesis is H0:mu1-mu2>0.000000. Ta = 2.5834872 T0 = 4.9471778 nu = 16. sp = .6805227 X2Info = ! 6. 12.833333 1.0856642 ! X1Info = ! 15. 16.2 2. ! Example 4. Given samples X1 = [3.2, 3.1, 3.0, 3.2], and X2 = [2.8,3.0, 2.9, 2.7, 3.1], test the null hypothesis H0:µ1-µ2 = 0 (i.e., δ = 0) at the level of confidence α = 0.01 against (a) a twosided alternative hypothesis, and (b) a one-sided alternative hypothesis. The standard deviations of the populations are unknown, and suspected to be the different. -->//Example 4 - part(a) -->X1=[3.2,3.1,3.0,3.2], X2=[2.8,3.0,2.9,2.7,3.1] X1 = ! 3.2 3.1 3. 3.2 ! X2 = ! 2.8 3. 2.9 2.7 3.1 ! -->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1,X2) n1 = 4 x1bar = 3.125 s1 = .0957427 n2 = 5 x2bar = 2.9 s2 = .15811 Download at InfoClearinghouse.com 31 © 2001 Gilberto E. Urroz All rights reserved Enter the difference of population means to be tested: 0 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.01 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Enter population standard deviations, sigma1 & sigma2. Note: Enter zero if sigma1 or sigma2 are unknown. 0 Do you suspect that the unknown population standard deviations are equal? If so enter 1, otherwise enter 0 0 Hypothesis testing on two means: two-sided test. Test parameter used: t Do not reject the null hypothesis H0:mu1-mu2=0.000000, Ta = 3.4994833 T0 = 2.4852506 nu = 7. sp = .1349603 X2Info = ! 5. 2.9 .1581139 ! X1Info = ! 4. 3.125 .0957427 ! -->X1=[3.2,3.1,3.0,3.2], X2=[2.8,3.0,2.9,2.7,3.1] X1 = ! 3.2 3.1 3. 3.2 ! X2 = ! 2.8 3. 2.9 2.7 3.1 ! -->[X1Info,X2Info,sp,nu,T0,Ta]=htestmu2(X1,X2) n1 = 4 x1bar = 3.125 s1 = .0957427 n2 = 5 x2bar = 2.9 s2 = .15811 Enter the difference of population means to be tested: 0 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.01 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Enter population standard deviations, sigma1 & sigma2. Note: Enter zero if sigma1 or sigma2 are unknown. 0 Do you suspect that the unknown population standard deviations are equal? If so enter 1, otherwise enter 0 Download at InfoClearinghouse.com 32 © 2001 Gilberto E. Urroz All rights reserved 0 Hypothesis testing on two means: one-sided test. Test parameter used: t Do not reject the null hypothesis H0:mu1-mu2=0.000000, Ta = 2.9979516 T0 = 2.4852506 nu = 7. sp = .1349603 X2Info = ! 5. 2.9 .1581139 ! X1Info = ! 4. 3.125 .0957427 ! Testing the difference between two proportions Suppose that we want to test the null hypothesis, H0 : p1 − p2 = p0 , where the p 's represents the probability of obtaining a successful outcome in any given repetition of a Bernoulli trial for two populations 1 and 2. To test the hypothesis, we perform n1 repetitions of the experiment from population 1, and find that Thus, an estimate of k1 successful outcomes are recorded. p1 and p2 are given, respectively, by ^p1 = k1/n1 and ^p2 = k2/n2. The standard deviations for the samples will be estimated, respectively, as s1 = pˆ 1 ⋅ (1 − pˆ 1 ) = n1 k1 (n1 − k1 ) , and s 2 = n13 pˆ 2 ⋅ (1 − pˆ 2 ) = n2 k 2 (n2 − k 2 ) . n23 and the standard deviation of the difference of proportions is estimated from: 2 2 sp = s1 + s2 2 Assume that the Z score, Z = (( p1 − p2 )- p0 )/ sp follows the standard normal distribution, i.e., Z ~ N(0,1). The particular value of the statistic to test is k1 z0 = n1 − k2 n2 sp − p0 , We could use three different alternate hypothesis for the test. These are: Two-tailed test: H1 : p1 − p2 ≠ p0 Download at InfoClearinghouse.com 33 © 2001 Gilberto E. Urroz All rights reserved H1 : p0 < p1 − p2 , or H1 : p1 − p2 < p0 . One-tailed tests: Two-tailed test If using a two-tailed test we will find the value of z , from α 2 α Pr[Z> z ] = 1- Φ z = , or α α 2 2 2 Reject the null hypothesis, α Φ z α = 1 − . 2 2 H0 , if z < z or if z < −z . α α 0 0 2 2 In other words, the rejection region is R = { z < z }, while the acceptance region is A = α 0 2 { z < z }. α 0 2 One-tailed test If using a one-tailed test we will find the value of Pr[Z> Reject the null hypothesis, H1 : p1 − p2 < p0 . zα , from zα ] = 1- Φ( zα ) = α , or Φ ( zα ) = 1 − α . H0 , if zα < z0 , and H1 : p0 < p1 − p2 , or if z0 < −zα , and A function for hypothesis testing on two proportions Function htestprop2 performs hypothesis testing on two proportions, based on measurements that show k1 successful outcomes out of n1 repetitions, and k2 successful outcomes out of n2 repetitions. The null hypothesis is H0:p1-p2=p0. function [p1,p2,s1,s2,sp,z0,za] = htestprop2(atype,k1,n1,k2,n2,p0,alpha) //Hypothesis testing in two proportions. Test the null hypothesis //H0:p1-p2=p0, given k1, k2 successful outcomes out of n1, n2 //repetitions, respectively. Significance level = alpha. //Variable atype represents the type of alternative hypothesis, i.e., //atype = 'one' for one-sided test, atype = 'two' for two-sided test p1 = k1/n1; p2 = k2/n2; s1 = sqrt(p1*(1-p1)/n1); s2 = sqrt(p2*(1-p2)/n2); sp = sqrt(s1^2+s2^2); z0 = (p1-p2-p0)/sp; printf('Hypothesis testing on two proportions:'+atype+'-sided test.') if atype == 'one' then za = cdfnor('X',0,1,1-alpha,alpha) if z0>za then printf('Reject the null hypothesis H0:p1-p2=%g \n',p0) printf('if the alternative hypothesis is H1:p1-p2>%g \n',p0) elseif z0<-za then Download at InfoClearinghouse.com 34 © 2001 Gilberto E. Urroz All rights reserved printf('Reject the null hypothesis H0:p1-p2=%g \n',p0) printf('if the alternative hypothesis is H1:p1-p2<%g \n',p0) else printf('Do not reject the null hypothesis H0:p1-p2=%g \n',p0) end else za = cdfnor('X',0,1,1-alpha/2,alpha/2) if z0>za | z0<-za then printf('Reject the null hypothesis H0:p1-p2=%g \n',p0) else printf('Do not reject the null hypothesis H0:p1-p2=%g \n',p0) end end Examples of application of function htestmu2 Example 1. Test the null hypothesis H0:p1-p2 = 0 at a significance level a = 0.05 based on the values k1 = 33, n1 = 90, k2 = 44, n2 = 100. (a) Use a one-sided test. (b) Use a two-sided test. -->getf('htestprop2') -->//part (a) -->[p1,p2,s1,s2,sp,z0,za]=htestprop2('one',10,200,45,100,0.3,0.1) Hypothesis testing on two proportions:one-sided test. Reject the null hypothesis H0:p1-p2= .3 if the alternative hypothesis is H1:p1-p2< .3 za = 1.2815516 z0 = - 13.44043 sp = .0520817 s2 = .0497494 s1 = .0154110 p2 = .45 p1 = .05 -->//part (b) -->[p1,p2,s1,s2,sp,z0,za]=htestprop2('two',10,200,45,100,0.3,0.1) Hypothesis testing on two proportions:two-sided test. Reject the null hypothesis H0:p1-p2= .3 za = 1.6448536 z0 = - 13.44043 sp = .0520817 s2 = .0497494 s1 = .0154110 p2 = .45 p1 = .05 Characteristic and power equations H0 : µ = µ0 , H1 : µ ≠ µ0 . Suppose that it is correct to reject H0 because the true value of µ is µ0 + c , where c is a constant. The probability β of a Consider the two-tailed test: Type II error is given by c n c n − Φ −z α − , β = Φ z α − σ σ 2 2 Download at InfoClearinghouse.com 35 © 2001 Gilberto E. Urroz All rights reserved where Φ( z ) is the CDF of the standard normal distribution. Notice that the probability β is a function β = f α, n, . Curves representing β vs. µ are called characteristic curves. σ The complement of β is the power function = 1 - β = probability of rejecting the null hypothesis when it is not true. c Power = 1- β = 1 - [ Φ z − α 2 c n c n − Φ −z α − ]. σ σ 2 Characteristic and power curves are shown below for α=0.05, and c/σ= 0.25, 0.50, 0.875, 1.0. //Script to plot characteristic and power curves for alpha = 0.05 alpha = 0.05; z = cdfnor('X',0,1,1-alpha/2,alpha/2); n=[0:1:100]; cs = [0.25,0.50,0.875,1.0]; b=zeros(length(n),length(cs));p=b; for i = 1:length(n) for j = 1:length(cs) b(i,j) = ... cdfnor('PQ',z-cs(j)*sqrt(n(i)),0,1) -cdfnor('PQ',-z-cs(j)*sqrt(n(i)),0,1); p(i,j)=1-b(i,j); end; end; xset('window',1);minn=min(n);maxn=max(n);minb=min(b);maxb=max(b); rect1=[minn,minb,maxn,maxb]; plot2d([n',n',n',n'],[b(:,1),b(:,2),b(:,3),b(:,4)],[1,2,3,4],... '111','c/sigma=0.25@c/sigma=0.50@c/sigma=0.875@c/sigma=1',rect1); xtitle('Characteristic curves for alpha = 0.05','n','beta'); xset('window',2);minp=min(p);maxp=max(p);rect2=[minn,minp,maxn,maxp]; plot2d([n',n',n',n'],[p(:,1),p(:,2),p(:,3),p(:,4)],[1,2,3,4],... '111','c/sigma=0.25@c/sigma=0.50@c/sigma=0.875@c/sigma=1',rect2); xtitle('Power curves for alpha = 0.05','n','power'); Download at InfoClearinghouse.com 36 © 2001 Gilberto E. Urroz All rights reserved Hypothesis testing on one variance Suppose that a sample of size n is taken out of a population of mean µ and variance σ . 2 The sample yields a mean x and variance hypothesis, sx . We will use these data to test the null 2 H0 : σ 2 = σ0 , at a level of confidence α . The test statistic to be used is a chi- square statistic, 2 χ0 = ( n − 1 ) sx σ0 2 = 2 ν sx σ0 2 2 where ν = n − 1 represents the degrees of freedom of a χ 2 distribution. Let Fν ( χ ) = Pr[ Χ < χ ] be the CDF corresponding to the chi-square distribution with ν 2 2 2 degrees of freedom. Two-tailed test In this case, the alternate hypothesis is 2 2 if χ α < χ0 , or if 2 χ0 < χ 2 2 Pr[ χ α < Χ ] = 1 − Fν χ α = 2 2 1− α . 2 2 α 1− 2 2 2 Download at InfoClearinghouse.com 2 H1 : σ 2 ≠ σ0 , then we will reject the null hypothesis , where α , and 2 Pr[ χ 1− 37 2 α 2 < Χ 2 ] = 1 − Fν χ α 1 − 2 2 = © 2001 Gilberto E. Urroz All rights reserved One-tailed test We consider two possibilities: 2 H1 : σ0 < σ 2 , then we will reject the null hypothesis if (1) If the alternate hypothesis is 2 2 χα < χ0 , where 2 2 Pr[ χα < Χ ] = 1 − F ( χ ) = α . ν α (2) If the alternate hypothesis is 2 2 2 H1 : σ 2 < σ0 , then we will reject the null hypothesis if 2 χ0 < χ1 − α , where 2 2 Pr[ χ1 − α < Χ ] = 1 − F ( χ ) = 1−α . ν 1−α 2 A function for hypothesis testing on one variance The following function, htestsigma1, can be used to test the null hypothesis H0:σ2 = σ02, at the level of significance α. There are two possible calls to the function: [n,s,X0,Xa,X1a] = htestmu1(altype,alpha,sigma0_2,var,n) [n,s,X0,Xa,X1a] = htestmu1(altype,alpha,sigma0_2,x) In the first call the sample variance (s2) and the sample size (n) are given, besides the type of alternative hypothesis (altype, which could be equal to ‘one’ or ‘two’ corresponding to one- or two-sided tests), the level of significance, α, and the value of σ02. In the second call, instead of providing the sample variance and sample size, the user provides the actual sample as a vector x. The function returns n and s, the sample’s size and standard deviation, as well as the chisquare test parameter, X0 = X02, and the values Xa and X1a which represent χ2α and χ21- α, respectively, if using a one-sided test, or χ2α/2 and χ21- α/2, respectively, if using a one-sided test. A listing of the function follows: function [n,s,X0,Xa,X1a] = htestsigma1(altype,alpha,sigma0_2,x,n) //Hypothesis testing on one variance. Possible function calls: // // [s,X0,Xa,X1a] = htestmu1(altype,alpha,sigma0_2,s,n) // [s,X0,Xa,X1a] = htestmu1(altype,alpha,sigma0_2,x) // // altype can be 'one' - for one-sided alternative hypothesis, // or 'two' - for two sided alternative hypothesis // alpha = level of significance (typical values = 0.01,0.05,0.10) // sigma0 = value of population standard deviation being tested, // i.e., H0:sigma^2 = sigma0^2 // x = sample variance (s^2) or vector containing sample (x) // if x = sample variance, n = sample size // X0 = test statistics // Xa = X_alpha/2 for altype='two' or X_alpha for altype='one' Download at InfoClearinghouse.com 38 © 2001 Gilberto E. Urroz All rights reserved // X1a = X_(1-alpha/2) for 'two' or X_(1-alpha) for 'one' if altype<>'one' & altype<>'two' then error('htestmu1 - select type of alternative hypothesis = one or two'); abort; end; [nargout,nargin] = argn(0) if nargin == 4 then if length(x)<1 then error('htestmu1 - x must be a vector'); abort; else n = length(x); s = st_deviation(x); end else s = sqrt(x); end; printf(' \n'); printf('Hypothesis testing on one variance: ' ... + altype + '-side alternative hypothesis.\n') printf(' \n'); X0 = (n-1)*s^2/sigma0_2; if altype == 'one' then Xa = cdfchi('X',n-1,1-alpha,alpha); X1a = cdfchi('X',n-1,alpha,1-alpha); if X0>Xa then printf('Reject the null hypothesis H0:sigma^2=%g \n',sigma0_2); printf('if the alternative hypothesis is H1:sigma^2>%g \n',sigma0_2); elseif X0<X1a then printf('Reject the null hypothesis H0:sigma^2=%g \n',sigma0); printf('if the alternative hypothesis is H1:sigma^2<%g \n',sigma0_2); else printf('Do not reject the null hypothesis H0:sigma^2=%g \n',sigma0_2); end else Xa = cdfchi('X',n-1,1-alpha/2,alpha/2); X1a = cdfchi('X',n-1,alpha/2,1-alpha/2); if X0>Xa | X0<X1a then printf('Reject the null hypothesis H0:sigma^2=%g \n',sigma0_2); else printf('Do not reject the null hypothesis H0:sigma^2=%g \n',sigma0_2); end end; Examples of application of function htestsigma1 Example 1. A sample of size 10 produces a variance of 20. With a confidence level of 0.05, test the null hypothesis H0:σ02=25, using a two-sided test. -->alpha=0.05; sigma0_2 = 25; var = 20; n = 10; -->[n,s,X0,Xa,X1a]=htestsigma1('two',alpha,sigma0_2,var,n) Hypothesis testing on one mean: two-side alternative hypothesis. Download at InfoClearinghouse.com 39 © 2001 Gilberto E. Urroz All rights reserved Do not reject the null hypothesis H0:sigma^2=25 X1a = 2.7003895 Xa = 19.022768 X0 = 7.2 s = 4.472136 n = 10. Example 2. Given the sample X = [3.5 2.2 1.5 4.2 3.2 1.4 5.6 2.3 4.8], with a confidence level of 0.05, test the null hypothesis H0:s02=25, using a two-sided test. -->alpha=0.05; sigma0_2 = 25; -->X = [3.5 2.2 1.5 4.2 3.2 1.4 5.6 2.3 4.8] X = ! 3.5 2.2 1.5 4.2 3.2 1.4 5.6 2.3 4.8 ! -->[n,s,X0,Xa,X1a]=htestsigma1('two',alpha,sigma0_2,X) Hypothesis testing on one mean: two-side alternative hypothesis. Reject the null hypothesis H0:sigma^2=25 X1a = 2.1797307 Xa = 17.534546 X0 = .6939556 s = 1.4726205 n = 9. Hypothesis testing on two variances If two populations are normal and independent samples of sizes n1 and n2 are drawn from them, then the statistic follows the F distribution with n1-1 degrees of freedom for the numerator and n2-1 degrees of freedom for the denominator. We can test the null hypothesis, H1 : σ1 2 σ2 2 2 H0 : σ1 2 σ2 2 = 1 , against the two-tailed alternate hypothesis, 2 ≠ 1 , where σ and σ are the variances of populations 1 and 2, respectively, 1 2 by taking samples from the two populations and evaluating their variances, s 1 the size of sample 1 and 2 be Let s M 2 and s m 2 2 and s . Let 2 n1 and n2 , respectively, and let α be the level of confidence. be the largest and smallest of the variances s 1 Download at InfoClearinghouse.com 2 40 2 2 and s , respectively. 2 © 2001 Gilberto E. Urroz All rights reserved Calculate the statistic F0 = sM sm 2 2 , and the quantiles F α , with the appropriate degrees of 2 freedom for numerator and denominator. Reject the null hypothesis if F0 > F . α 2 If the alternate hypothesis is hypothesis if, H1 : σ1 2 σ2 2 > 1, use the statistic F0 = s1 s2 2 2 , reject the null F0 > Fα . If, on the other hand, the alternate hypothesis is reject the null hypothesis if H1 : σ1 2 σ2 2 < 1, use the statistic F0 = s2 s1 2 2 , F0 > Fα . A function for hypothesis testing with two variances The following function, htestsigma2, can be used for hypothesis testing in two variances. Possible calls to the function are: [X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2() [X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2(Xdata) [X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2(X1data,X2data) Xdata, X1data, and X2data are row vectors of data. X1Info is a vector that contains the sample size, n1, and the standard deviation, s1, of sample 1. X2Info is a vector containing n2, and s2. The vector nuInfo contains the degrees of freedom for the numerator and denominator, respectively, of the F distribution. F0 is the F parameter used in the test. Fa represents Fα. The function operates interactively requesting information from the user and provides verbose information on the type of test parameter and alternative hypothesis used, as well as providing a recommendation about the rejection or not-rejection of the null hypothesis. If the function call [X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2() is used, the user will be prompted for the summary information on the samples, i.e., the samples sizes and standard deviations. If the function call [X1Info,X2Info,nuInfo,F0,Fa]= htestsigma2(Xdata) is used, the user is asked to identify the vector Xdata as sample 1 or sample 2, and then is prompted for the summary data for the other sample. Finally, if the function call [X1Info,X2Info,nuInfo,F0,Fa] = htestsigma2(X1data,X2data) is used, the function calculates the sample summary data all by itself. The function will also prompt the user for the following information: • • • The level of confidence of the test, i.e., α The type of alternative hypothesis to be used, i.e., one- or two-sided The type of one-sided alternative hypothesis to be tested. A listing of the function is shown below. Download at InfoClearinghouse.com 41 © 2001 Gilberto E. Urroz All rights reserved function [X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1,X2) [nargout,nargin]=argn(0) if nargin == 0 then X1in = input('For sample 1 enter n, s :') n1 = X1in(1);s1 = X1in(2); X2in = input('For sample 2 enter n, s :') n2 = X2in(1);s2 = X2in(2); elseif nargin == 1 then disp('You entered a vector as input to the function.') disp('Do you want this vector to represent sample 1 or 2?') idsample = input(' ') if idsample == 1 then n1 = length(X1); s1 = st_deviation(X1); printf('n1 = %g s1 = %g',n1,s1) X2in = input('For sample 2 enter n, s :') n2 = X2in(1);s2 = X2in(2); else n2 = length(X1); s2 = st_deviation(X1); printf('n2 = %g s2 = %g',n2,s2) X1in = input('For sample 1 enter n, s :') n1 = X1in(1);s1 = X1in(2); end else n1 = length(X1); s1 = st_deviation(X1); printf('n1 = %g s1 = %g',n1,s1) n2 = length(X2); s2 = st_deviation(X2); printf('n2 = %g s2 = %g',n2,s2) end X1Info = [n1,s1]; X2Info = [n2,s2]; disp('Enter the level of confidence, alpha, for the test:') disp('(Typical values: 0.01, 0.05, 0.10)') alpha = input(' '); disp('Enter the type of alternative hypothesis to test:') disp(' 1 - one-sided 2 - two-sided'); atype = input(' '); if atype == 1 then disp('Enter the type of one-sided alternative hypothesis to test:') disp(' 1 H1:sigma1^2/sigma2^2>1'); disp(' 2 H1:sigma1^2/sigma2^2<1'); onetype = input(' '); printf('Hypothesis testing on two variances: one-sided test.\n') if onetype == 1 then nuInfo = [n1-1,n2-1]; F0 = (s1/s2)^2; Fa = cdff('F',n1-1,n2-1,1-alpha,alpha) printf('The alternative hypothesis is H1:sigma1^2/sigma2^2>1'); if F0>Fa then printf('Reject the null hypothesis H0:sigma1^2/sigma2^2=1.\n'); else printf('Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1.\n'); end else nuInfo = [n2-1,n1-1]; F0 = (s1/s2)^2; Fa = cdff('F',n2-1,n1-1,1-alpha,alpha); printf('The alternative hypothesis is H1:sigma1^2/sigma2^2<1.\n'); if F0>Fa then printf('Reject the null hypothesis H0:sigma1^2/sigma2^2=1.\n'); else Download at InfoClearinghouse.com 42 © 2001 Gilberto E. Urroz All rights reserved printf('Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1.\n'); end end else printf('Hypothesis testing on two variances: two-sided test.\n') if s1>=s2 then sM = s1; nM = n1; sm = s2; nm = n2; else sM = s2; nM = n2; sm = s1; nm = n1; end; nuInfo = [nM,nm]; F0 = (sM/sm)^2; Fa = cdff('F',nM-1,nm-1,1-alpha/2,alpha/2); if F0>Fa then printf('Reject the null hypothesis H0:sigma1^2/sigma2^2=1. \n'); else printf('Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1. \n'); end end; Examples of application of function htestsigma2 In the following examples, user input is given in italics. Example 1. Given two samples with n1 = 25, s1 = 2.3, n2 = 15, s2 = 3.2, test the null hypothesis H0:σ12/ σ22 = 1, at a significance level of 0.10, against (a) a two-sided hypothesis; (b) a onesided hypothesis, H1:σ12/ σ22 > 1; and, (c) a one-sided hypothesis, H1:σ12/ σ22 < 1. -->//Part (a) - Two sided hypothesis -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2() For sample 1 enter n, s : 25, 2.3 For sample 2 enter n, s : 15, 3.2 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.10 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Hypothesis testing on two means: two-sided test. Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 2.1297969 F0 = 1.9357278 nuInfo = ! 14. 24. ! X2Info = ! 15. 3.2 ! X1Info = ! 25. 2.3 ! -->//Part (b) - One-sided hypothesis, H1: sigma1^2/sigma2^2 > 1 -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2() Download at InfoClearinghouse.com 43 © 2001 Gilberto E. Urroz All rights reserved For sample 1 enter n, s : 25, 2.3 For sample 2 enter n, s : 15, 3.2 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.10 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 1 Enter the type of one-sided alternative hypothesis to test: 1 - H1:sigma1^2/sigma2^2>1 2 - H1:sigma1^2/sigma2^2<1 1 Hypothesis testing on two variances: one-sided test. The alternative hypothesis is H1:sigma1^2/sigma2^2>1 Do not eject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 1.937663 F0 = .5166016 nuInfo = ! 24. 14. ! X2Info = ! 15. 3.2 ! X1Info = ! 25. 2.3 ! -->//Part (c) - One-sided hypothesis, H1: s1^2/s2^2 < 1 -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2() For sample 1 enter n, s : 25, 2.3 For sample 2 enter n, s : 15, 3.2 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.10 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 1 Enter the type of one-sided alternative hypothesis to test: 1 - H1:sigma1^2/sigma2^2>1 2 - H1:sigma1^2/sigma2^2<1 2 Hypothesis testing on two variances: one-sided test. The alternative hypothesis is H1:sigma1^2/sigma2^2<1 Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 1.7974154 F0 = .5166016 Download at InfoClearinghouse.com 44 © 2001 Gilberto E. Urroz All rights reserved nuInfo X2Info X1Info = ! = ! = ! 14. 15. 25. 24. ! 3.2 ! 2.3 ! Example 2. Given the sample X1 = [3.2, 2.1, 4.5, 6.2, 3.4], and a second sample with n2=10, s2=0.5, test the null hypothesis H0:σ12/ σ22 = 1, at a significance level of 0.05, against (a) a twosided hypothesis; (b) a one-sided hypothesis, H1:σ12/ σ22 > 1. --> Example 2 - part (a) --> X1 = [3.2, 2.1, 4.5, 6.2, 3.4] X1 = ! 3.2 2.1 4.5 6.2 3.4 ! -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1) You entered a vector as input to the function. Do you want this vector to represent sample 1 or 2? 1 n1 = 5 s1 = 1.55145 For sample 2 enter n, s : 10, 0.5 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.05 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Hypothesis testing on two variances: two-sided test. Reject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 4.7180785 F0 = 9.628 nuInfo = ! 4. 9. ! X2Info = ! 10. .5 ! X1Info = ! 5. 1.5514509 ! --> Example 2 - part (b) -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1) You entered a vector as input to the function. Do you want this vector to represent sample 1 or 2? 1 n1 = 5 s1 = 1.55145 For sample 2 enter n, s : 10, 0.5 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) Download at InfoClearinghouse.com 45 © 2001 Gilberto E. Urroz All rights reserved 0.05 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Enter the type of one-sided alternative hypothesis to test: 1 - H1:sigma1^2/sigma2^2>1 2 - H1:sigma1^2/sigma2^2<1 1 Hypothesis testing on two variances: one-sided test. The alternative hypothesis is H1:sigma1^2/sigma2^2>1 Reject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 3.6330885 F0 = 9.628 nuInfo = ! 4. 9. ! X2Info = ! 10. .5 ! X1Info = ! 5. 1.5514509 ! Example 3. Given the sample X2 = [0.9,11.1,0.2,3.4,5.6,2.1,8.2,3.2] and sample 1 with n1 = 22 and s1 = 0.5, test the null hypothesis H0:σ12/ σ22 = 1, at a significance level of 0.05, against (a) a two-sided hypothesis; (b) a one-sided hypothesis, H1:σ12/ σ22 > 1. -->X2 = [0.9,11.1,0.2,3.4,5.6,2.1,8.2,3.2] X2 = ! .9 11.1 .2 3.4 5.6 2.1 8.2 3.2 ! -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X2) You entered a vector as input to the function. Do you want this vector to represent sample 1 or 2? 2 n2 = 8 s2 = 3.7485 For sample 1 enter n, s : 22, 0.5 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.01 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Hypothesis testing on two variances: two-sided test. Reject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 4.1789302 F0 = 6.245 nuInfo = ! 7. 21. ! X2Info = ! 8. 3.7484997 ! X1Info = ! 22. 1.5 ! Download at InfoClearinghouse.com 46 © 2001 Gilberto E. Urroz All rights reserved --> Example 3 - part (b) -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X2) You entered a vector as input to the function. Do you want this vector to represent sample 1 or 2? 2 n2 = 8 s2 = 3.7485 For sample 1 enter n, s : 22, 0.5 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.01 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Enter the type of one-sided alternative hypothesis to test: 1 - H1:sigma1^2/sigma2^2>1 2 - H1:sigma1^2/sigma2^2<1 1 Hypothesis testing on two variances: one-sided test. The alternative hypothesis is H1:sigma1^2/sigma2^2>1 Do not eject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 6.1323795 F0 = .1601281 nuInfo = ! 21. 7. ! X2Info = ! 8. 3.7484997 ! X1Info = ! 22. 1.5 ! Example 4 - Using the samples X1 and X2, defined in examples 2 and 3, respectively, test the null hypothesis H0:σ12/ σ22 = 1, at a significance level of 0.05, against (a) a two-sided hypothesis; (b) a one-sided hypothesis, H1:σ12/ σ22 > 1. -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1,X2) n1 = 5 s1 = 1.55145 n2 = 8 s2 = 3.7485 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.10 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 2 Hypothesis testing on two means: two-sided test. Do not reject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 6.0942109 Download at InfoClearinghouse.com 47 © 2001 Gilberto E. Urroz All rights reserved F0 = 5.837661 nuInfo = ! 8. X2Info = ! 8. X1Info = ! 5. 5. ! 3.7484997 ! 1.5514509 ! -->[X1Info,X2Info,nuInfo,F0,Fa]=htestsigma2(X1,X2) n1 = 5 s1 = 1.55145 n2 = 8 s2 = 3.7485 Enter the level of confidence, alpha, for the test: (Typical values: 0.01, 0.05, 0.10) 0.10 Enter the type of alternative hypothesis to test: 1 - one-sided 2 - two-sided 1 Enter the type of one-sided alternative hypothesis to test: 1 - H1:sigma1^2/sigma2^2>1 2 - H1:sigma1^2/sigma2^2<1 1 Hypothesis testing on two variances: one-sided test. The alternative hypothesis is H1:sigma1^2/sigma2^2>1 Do not eject the null hypothesis H0:sigma1^2/sigma2^2=1. Fa = 2.9605341 F0 = .1713015 nuInfo = ! 4. 7. ! X2Info = ! 8. 3.7484997 ! X1Info = ! 5. 1.5514509 ! Chi-square criteria for goodness of fitting Function histnorm, introduced in Chapter 15, produces a histogram for a given data set, x, based on a number of class boundaries, xclass. The function, whose call is [chi2,cm,f] = freqdist(x,xclass) returns vectors of class marks, cm, and frequency, f, as well as the parameter chi2 corresponding to a chi-square statistic calculated as k χ2 = ∑ i =1 ( f i − fci ) 2 , fci where fi is the actual frequency count for the ith class, fci is the estimated frequency count obtained from the normal distribution for the ith class, and k is the number of classes in the frequency distribution. The parameter χ2, defined above, follows the chi-square distribution with ν = k-3 degrees of freedom, where k is the number of classes in the histogram. To produce a fitting of the normal distribution based on a sample of size n we use not only the sample size, n, but also the Download at InfoClearinghouse.com 48 © 2001 Gilberto E. Urroz All rights reserved mean value,x, and the sample standard deviation, s. Thus, the number of degrees of freedom is k-3, since three parameters are already known in the data fitting. The idea of goodness of fitting for the normal distribution, for example, means to test the hypothesis H0: {the data fits the normal distribution with µ =x and σ = s}, tested against the alternative hypothesis H1: {the data does not fit the normal distribution with µ =x and σ = s}. The latter is a form of one-sided alternative hypothesis. Given a significance level α, we calculate the parameter χ2 based on the observed and predicted frequencies, and compare its value against the parameter χ2α obtained from the chi-square distribution with k-3 degrees of freedom. If χ2> χ2α, we reject the null hypothesis H0. Examples of goodness-of-fitting for the normal distribution Example 1. Consider the sample x loaded below into SCILAB. Its frequency distribution is to be obtained using the class boundaries indicated in vector xclass. The following SCILAB commands are used to generate the test statistic for testing the hypothesis H0: {the data fits the normal distribution with µ =x and σ = s}: -->x=[2.3,3.2,1.1,4.5,6.2,8.4,1.3,2.2,4.5,3.6,2.2,1.0]; -->min(x),max(x) ans = 1. ans = 8.4 -->xclass = [1:0.5:9]; k = length(xclass)-1, nu = k-3 k = 16. nu = 13. -->[chi2,xmark,f] = histnorm(x,xclass); -->chi2 chi2 = 33.263408 -->chi_a = cdfchi('X',nu,1-alpha,alpha) chi_a = 19.811929 The results are χ2 = 33.263408, χ2α = 19.811929. Because χ2> χ2α, we reject the null hypothesis that the data in vector x belongs to a normal distribution. The histogram is shown in the following figure. Download at InfoClearinghouse.com 49 © 2001 Gilberto E. Urroz All rights reserved Example 1. In this second example the data analyzed is generated from a normal distribution using function grand. The null hypothesis H0: {the data fits the normal distribution with µ =x and σ = s} is tested at a level of significance of 0.01. -->X=grand(1,200,'nor',350,100); -->min(X),max(X) ans = 119.33031 ans = 655.10351 -->Xclass=[100:50:700];k=length(Xclass)-1,nu=k-3 k = 12. nu = 9. -->[chi2,Xmark,Xfreq] = histnorm(X,Xclass); -->chi2 chi2 = 10.142254 -->alpha = 0.01; -->chi_a = cdfchi('X',nu,1-alpha,alpha) chi_a = 21.665994 The results are χ2 = 10.142254, χ2α = 21.665994. Because χ2< χ2α, we cannot reject the null hypothesis that the data in vector x belongs to a normal distribution. The histogram is shown in the following figure. Examples of goodness-of-fitting for the beta distribution The approach followed in function histnorm for checking the goodness-of-fit of a sample to the normal distribution can be used, in general, for other probability distributions. The key is to determine the parameters of the distribution based on statistics of the data. For the normal distribution, X ~ N(µ,σ), for example, we use the parameters µ =x and σ = sx. In this section we present a function, histbeta, that can be used to check the goodness-of-fit of data sets to the beta distribution. The beta distribution, introduced in Chapter 15, requires two parameters, α and β, which can be obtained from a sample by making µ =x and σ = sx and solving the following two equations Download at InfoClearinghouse.com 50 © 2001 Gilberto E. Urroz All rights reserved µX = x = α α ⋅β , σ 2 = s x2 = α+β (α + β + 1)(α + β ) 2 . The solution can be accomplished numerically by using SCILAB’s function fsolve. The listing of function histbeta, incorporating such numerical solution, follows. Notice that the function returns not only the parameter χ2 (chi2), the class mark (cmark), and the frequency count (fcount), but also the parameters of the beta distribution α (a) and β (b). function [a,b,chi2,cmark,fcount]=histbeta(x, xclass) //This function calculates the frequency distribution //for the data in (row) vector x according to the //class boundaries contained in the (row) vector //xclass. It also produces a histogram of the //data and the beta distribution that best fit the data //Note: the beta distribution works only for data between 0 and 1 // //Typical call: [chi2,cm,f] = freqdist(x,xclass) //where cm = class marks, f = frequency count, // chi2 = chi-square parameter for the fitting if min(x)<0 | max(x)>1 then error('histbeta - sample contains data outside of [0,1]'); abort; end; [m n] = size(x); [m nB] = size(xclass); k = nB - 1; //Sample size //Number of class boundaries //Number of classes //Calculate class marks cmark = zeros(1,k); for ii = 1:k cmark(ii) = 0.5*(xclass(ii)+xclass(ii+1)); end //Initialize frequency counts to zero fcount=zeros(1,k); fbelow=0; fabove=0; //Accumulate frequency counts for ii = 1:n if x(ii) < xclass(1) fbelow = fbelow + 1; elseif x(ii) > xclass(nB) fabove = fabove + 1; else for jj = 1:k if x(ii)>= xclass(jj) & x(ii)< xclass(jj+1) fcount(jj) = fcount(jj) +1; end end end end //Calculate sample size, mean, standard deviation, and //minimum and maximum values for the plot nn = sum(fcount); xbar = mean(x); sx = st_deviation(x); xmin = min(xclass); xmax = max(xclass); Download at InfoClearinghouse.com 51 © 2001 Gilberto E. Urroz All rights reserved //Calculate values of a (alpha) and b (beta) deff('[w]=ff(xx)',['ff1=xbar-xx(1)/(xx(1)+xx(2))';... 'ff2=sx^2-xx(1)*xx(2)/((xx(1)+xx(2)+1)*(xx(1)+xx(2))^2)';... 'w=[ff1;ff2]']); xx0 = [1;1];xxs=fsolve(xx0,ff); a = xxs(1); b = xxs(2); //Calculate predicted frequencies pk = []; for j = 1:k+1 pk = [pk cdfbet("PQ",xclass(j),1-xclass(j),a,b)]; end; p_in_classes = pk(k+1)-pk(1); pxclass = pk(2:k+1) - pk(1:k); fc = pxclass*nn*p_in_classes; //Calculate chi-square parameter chi2=0; for j = 1:length(fc) chi2 = chi2 + (fcount(j)-fc(j))^2/fc(j); end; //Produce beta distribution for data Dx = (xmax-xmin)/100; xx = [xmin:Dx:xmax]; xxx = xx(1:100) + Dx/2; pkk = []; for j = 1:101 pkk = [pkk cdfbet("PQ",xx(j),1-xx(j),a,b)]; end; pp = pkk(2:101) - pkk(1:100); fcc = pp*p_in_classes*nn*100/k; //Determine plot rectangle ymin = 0; ymaxf = max(fcount); ymaxy = max(fcc); ymax = max(ymaxf,ymaxy); ymax = int(1.1*ymax); plotrectangle = [xmin ymin xmax ymax]; //plot the histogram and normal curve xp = xclass(1:k); xset('window',1);xbasc(1); plot2d2('onn',xclass',[fcount fcount(k)]',[1],'011','y',[xmin ymin xmax ymax]); plot2d3('onn',xp',fcount',[1],'000'); plot2d(xxx',fcc',[2],'000'); xtitle('Histogram with normal curve','x','frequency'); //end function histbeta For this function we would be testing the hypothesis H0: {the data fits the beta distribution}, tested against the alternative hypothesis H1: {the data does not fit the normal distribution}. As with the test of the normal distribution testing, given a significance level α, we calculate the parameter χ2 based on the observed and predicted frequencies, and compare its value against the parameter χ2α obtained from the chi-square distribution with k-3 degrees of freedom. If χ2> χ2α, we reject the null hypothesis H0. Please notice that the beta distribution is used for data whose values are between 0 and 1 only. Download at InfoClearinghouse.com 52 © 2001 Gilberto E. Urroz All rights reserved Example 1. Data from a uniform distribution -->rand('info') ans = uniform -->X=rand(1,25);min(X),max(X) ans = .0002211 ans = .9329616 -->Xclass=[0:0.1:1];k=length(Xclass)-1, nu = k - 3 k = 10. nu = 7. -->[a,b,chi2,Xmark,Xfreq]=histbeta(X,Xclass) Xfreq = ! 2. Xmark 1. 5. 4. 0. 2. 5. 2. 3. 1. ! = column 1 to 8 ! .05 .15 column ! .25 .35 .45 .55 .65 .75 ! 9 to 10 .85 .95 ! = 9.6996796 1.127886 1.0598387 chi2 b = a = -->alpha=0.01;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 18.475307 -->alpha=0.05;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 14.06714 -->alpha=0.1;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 12.017037 At confidence levels of α = 0.01, 0.05, and 0.10, we cannot reject the null hypothesis that the data fits a beta distribution. Download at InfoClearinghouse.com 53 © 2001 Gilberto E. Urroz All rights reserved Example 2. Data generated from a normal distribution. Data X is generated from a normal distribution. Data Y is obtained from X so that values of Y to are between 0 and 1. -->rand('normal') -->X = rand(1,100);min(X), max(X) ans = - 2.0552251 ans = 1.9347752 -->Y = (X-min(X))/(max(X)-min(X));min(Y),max(Y) ans = 0. ans = 1. -->Yclass = [0:0.1:1];k = length(Yclass)-1, nu = k - 3 k = 10. nu = 7. -->[a,b,chi2,Ymark,Yfreq]=histbeta(Y,Yclass) Yfreq = ! 1. Ymark 4. 12. 10. 21. 17. 12. 10. 6. 6. ! = column 1 to 8 ! .05 .15 .25 column 9 to 10 .35 .45 .55 .65 .75 ! ! .85 .95 ! chi2 = 8.9939089 b = 1.9850001 a = 2.2248757 -->alpha=0.01;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 18.475307 -->alpha=0.05;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 14.06714 -->alpha=0.1;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 12.017037 As in Example 1, at confidence levels of α = 0.01, 0.05, and 0.10, we cannot reject the null hypothesis that the data fits a beta distribution. Download at InfoClearinghouse.com 54 © 2001 Gilberto E. Urroz All rights reserved Example 3 - Data generated from a beta distribution with α = 0.5 and β = 6. -->X = grand(1,50,'bet',0.5,6);min(X),max(X) ans = .0001030 ans = .3289092 -->Xclass = [0:0.05:0.35]; k = length(Xclass)-1, nu = k-3 k = 6. nu = 3. -->[a,b,chi2,Xmark,Xfreq]=histbeta(X,Xclass) Xfreq = ! 25. 10. 5. 3. 3. 3. ! Xmark = ! .025 .075 .125 .175 .225 chi2 = 3.4613321 b = 8.0386858 a = .6906572 .275 ! -->alpha=0.01;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 11.344867 -->alpha=0.05;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 7.8147279 -->alpha=0.1;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 6.2513886 As expected, at confidence levels of α = 0.01, 0.05, and 0.10, we cannot reject the null hypothesis that the data fits a beta distribution. Download at InfoClearinghouse.com 55 © 2001 Gilberto E. Urroz All rights reserved The three examples above have produced goodness-of-fit results that indicate that we should not reject the hypothesis that the data belongs to a beta distribution. These results indicate the versatility of the distribution and the variety of shapes it can fit by using the values of α and β obtained from the set of equations µX = x = α α ⋅β , σ 2 = s x2 = . α+β (α + β + 1)(α + β ) 2 . On the other hand, if we force the values of α and β, rather than using those from the two equations above, we may find situations where the hypothesis of the data fitting the required beta distribution must be rejected. To try such cases we modify function histbeta to create function histbeta1 which requires that the values of α and β be given by the user: function [chi2,cmark,fcount]=histbeta1(x,xclass,a,b) //This function calculates the frequency distribution //for the data in (row) vector x according to the //class boundaries contained in the (row) vector //xclass. It also produces a histogram of the //data and the beta distribution that best fit the data //Note: the beta distribution works only for data between 0 and 1 // //Typical call: [chi2,cm,f] = freqdist(x,xclass) //where cm = class marks, f = frequency count, // chi2 = chi-square parameter for the fitting if min(x)<0 | max(x)>1 then error('histbeta - sample contains data outside of [0,1]'); abort; end; [m n] = size(x); [m nB] = size(xclass); k = nB - 1; //Sample size //Number of class boundaries //Number of classes //Calculate class marks cmark = zeros(1,k); for ii = 1:k cmark(ii) = 0.5*(xclass(ii)+xclass(ii+1)); end Download at InfoClearinghouse.com 56 © 2001 Gilberto E. Urroz All rights reserved //Initialize frequency counts to zero fcount=zeros(1,k); fbelow=0; fabove=0; //Accumulate frequency counts for ii = 1:n if x(ii) < xclass(1) fbelow = fbelow + 1; elseif x(ii) > xclass(nB) fabove = fabove + 1; else for jj = 1:k if x(ii)>= xclass(jj) & x(ii)< xclass(jj+1) fcount(jj) = fcount(jj) +1; end end end end //Calculate sample size, mean, standard deviation, and //minimum and maximum values for the plot nn = sum(fcount); xbar = mean(x); sx = st_deviation(x); xmin = min(xclass); xmax = max(xclass); //Calculate predicted frequencies pk = []; for j = 1:k+1 pk = [pk cdfbet("PQ",xclass(j),1-xclass(j),a,b)]; end; p_in_classes = pk(k+1)-pk(1); pxclass = pk(2:k+1) - pk(1:k); fc = pxclass*nn*p_in_classes; //Calculate chi-square parameter chi2=0; for j = 1:length(fc) chi2 = chi2 + (fcount(j)-fc(j))^2/fc(j); end; //Produce beta distribution for data Dx = (xmax-xmin)/100; xx = [xmin:Dx:xmax]; xxx = xx(1:100) + Dx/2; pkk = []; for j = 1:101 pkk = [pkk cdfbet("PQ",xx(j),1-xx(j),a,b)]; end; pp = pkk(2:101) - pkk(1:100); fcc = pp*p_in_classes*nn*100/k; //Determine plot rectangle ymin = 0; ymaxf = max(fcount); ymaxy = max(fcc); ymax = max(ymaxf,ymaxy); ymax = int(1.1*ymax); plotrectangle = [xmin ymin xmax ymax]; //plot the histogram and normal curve xp = xclass(1:k); xset('window',1);xbasc(1); plot2d2('onn',xclass',[fcount fcount(k)]',[1],'011','y',[xmin ymin xmax ymax]); Download at InfoClearinghouse.com 57 © 2001 Gilberto E. Urroz All rights reserved plot2d3('onn',xp',fcount',[1],'000'); plot2d(xxx',fcc',[2],'000'); xtitle('Histogram with beta distribution','x','frequency'); //end function histbeta1 Applying function histbeta1 to the data of Example 3 shown above, with α = 1 and β = 5, produces the following results: -->[chi2,Xmark,Xfreq]=histbeta1(X,Xclass,1,5) Xfreq = ! 25. Xmark = ! 10. 5. .025 .075 = 28.784445 3. .125 3. 3. ! .175 .225 .275 ! chi2 -->alpha=0.01;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 11.344867 -->alpha=0.05;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 7.8147279 -->alpha=0.1;chi_a=cdfchi('X',nu,1-alpha,alpha) chi_a = 6.2513886 We will reject the null hypothesis that the data follows the beta distribution with α = 1 and β = 5, under levels of significance α = 0.01, 0.05, or 0.10. [Note: be careful distinguishing the level of confidence α from the beta distribution parameter α]. A plot of the sample histogram and the fitted data is shown below. Chi-square criteria for R×C tables The terms R×C tables refers to tables summarizing frequency counts of observations classified according to two different criteria. For example, the following table summarizes the frequency counts of slope damages due to rain summarized by a geotechnical engineer. The slope damages are classified according to the depth of erosion into three categories (0-2.5 cm, Download at InfoClearinghouse.com 58 © 2001 Gilberto E. Urroz All rights reserved 2.5-5.0 cm, > 5 cm) or according to the percentage of vegetation cover into four categories (025%, 25%-50%, 50%-75%, and 75%-100%). erosion depth 0-2.5 cm 2.5 cm - 5.0 cm > 5.0 cm Totals 0-25% 17 24 20 61 vegetation cover 25%-50% 50%-75% 75%-100% 23 15 6 13 40 10 10 20 5 46 75 21 Totals 61 87 55 203 If the vegetation cover and erosion depth criteria are independent, the expected frequency counts for each of the cells in the table can be calculated by multiplying the corresponding row total times the corresponding column total and dividing by the overall total (203, in this case). For example, the expected frequency count for erosion depth of 2.5 cm - 5.0 cm and vegetation cover of 50%-75% will be 87× 75/203 = 32.14. This procedure follows from the calculation of probabilities for independent events, i.e., if A = { erosion depth of 2.5 cm - 5.0 cm } and B = { vegetation cover of 50%-75% }, P(A) = 87/203, P(B) = 75/203, and P(A∩B) = P(A)P(B) = (87/203)(75/203) = 87×75/2032. Since a probability represents a relative frequency, the actual frequency count will be the probability multiplied by the total number of occurrences (i.e., 203) to produce (87×75/2032)× 203 = 87× 75/203 = 32.14. The chi-square criteria can be used to determine how well the predicted frequency counts fcij approximate the measured frequency counts fij. The chi-square statistic to be used is n m χ 2 = ∑∑ ( f ij − fcij ) 2 fcij i =1 j =1 , for i=1,2,…,n (rows in the table) and j=1,2,…,m (columns in the table). The parameter thus defined will follow the chi-square distribution with ν = (n-1)⋅(m-1) degrees of freedom. The following function, RC, calculates the predicted frequency count, the chi-square statistics, the degrees of freedom, and provides a recommendation regarding the rejection or not rejection of the null hypothesis H0:{criteria for the R×C table are independent}. A listing of the function follows: function [nu,chi_a,chi2,fPred] = RC(fObs,alpha) //Determines the chi-square statistic for an RxC table //passed on to the function as a matrix fObs. The //function calculates the predicted frequency counts also. TR = sum(fObs,'c'); TC = sum(fObs,'r'); TT = sum(fObs); [n m] = size(fObs); fPred = zeros(fObs); chi2 = 0.0; for i = 1:n for j = 1:m fPred(i,j) = TR(i)*TC(j)/TT; chi2 = chi2 + (fObs(i,j)-fPred(i,j))^2/fPred(i,j); end; end; nu = (n-1)*(m-1); chi_a=cdfchi('X',nu,1-alpha,alpha); if chi2 > chi_a then printf('Reject the null hypothesis H0:independent criteria.') Download at InfoClearinghouse.com 59 © 2001 Gilberto E. Urroz All rights reserved else printf('Do not reject the null hypothesis H0:independent criteria') end; As an example, we will use the R×C table presented earlier to check the hypothesis of independence of the classification criteria: -->fObs = [17,23,15,6;24,13,40,10;20,10,20,5] fObs = ! ! ! 17. 24. 20. 23. 13. 10. 15. 40. 20. 6. ! 10. ! 5. ! -->[nu,chi_a,chi2,fPred]=RC(fObs,0.1) Reject the null hypothesis H0:independent criteria fPred = ! ! ! 18.330049 13.82266 26.142857 19.714286 16.527094 12.463054 chi2 = 14.524802 chi_a = 10.644641 nu = 22.536946 32.142857 20.320197 6.3103448 ! 9. ! 5.6896552 ! 6. Exercises [1]. A sample of 150 electric components is tested for temperature response by measuring the temperature of the component, X, after 10 minutes of operation. The mean value of the temperature for the sample is found to bex = 86oF with a standard deviation of sx = 5.5oF. The records from the factory indicate that the standard deviation of the 10-minute temperature measurements for the entire population of electric components is σ = 10oF. Obtain confidence intervals for the mean value of the 10-minute temperature measurement for the population of electric components m using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [2]. A sample of 10 electric bulbs is used to determine the number of on-off cycles to produce failure of the filament. The data shows a mean value of 1500 cycles with a standard deviation of 50 cycles. Obtain confidence intervals for the mean value of the 10-minute temperature measurement for the population of electric components m using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [3]. Records of traffic accidents in the main road through a small town shows that in the last 300 days there have been 10 days where a major closure of the road has been registered. Obtain confidence intervals for the proportion of days of closure of the road using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [4]. Measurements of the specific density in 10 soil samples show values of 1.25, 1.30, 1.45, 1.55, 1.20, 1.23, 1.90, 1.40, 1.35, 1.40. Obtain confidence intervals for the mean value of the specific density using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [5]. In order to determine the need to keep a service station open after the regular closing hour of 5:00 pm, a test is carried out in which the station is kept open for an extra hour for 20 Download at InfoClearinghouse.com 60 © 2001 Gilberto E. Urroz All rights reserved business days. The number of clients visiting the service station after 5 pm during those 20 days are the following: 2 3 5 6 3 2 1 0 3 4 2 5 7 8 7 6 5 8 2 3 Obtain confidence intervals for the mean value of the number of clients visiting the service station after 5:00 pm using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [6]. In reference to problem [5], a successful day is one in which 4 or more clients show up at the service station after 5:00 pm. Obtain confidence intervals for the proportion of successful days using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [7]. A routine construction project consists of two stages. If T1 represents the time required to complete stage 1 and T2 the time required to complete stage 2, determine the confidence interval for the total construction time if for samples of sizes n1 = 10 and n2 = 8, the mean completion times aret1 = 25 days andt2 = 40 days, with standard deviations s1 = 3 days and s2 = 5 days . Use levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [8]. To put together a mechanical component a factory needs to produce two separate metallic pieces that then get assembled together. The time required to finish the first piece has a mean value of t1 = 12 minutes, while the second piece requires an average of t2 = 14.0 minutes. The corresponding standard deviations are deviations s1 = 3 days and s2 = 5 days. The sample sizes used in the measurements are n1 = 50 and n2 = 45. Of interest for optimizing the operation of the factory is the difference between the times of completion of the two pieces, i.e., T1-T2. Obtain confidence intervals for the time difference using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [9]. Obtain confidence intervals for the variance σ2 for the data of problem [1] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [10]. Obtain confidence intervals for the variance σ2 for the data of problem [2] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [11]. Obtain confidence intervals for the variance σ2 for the data of problem [3] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [12]. Obtain confidence intervals for the variance σ2 for the data of problem [4] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [13]. Obtain confidence intervals for the variance σ2 for the data of problem [5] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [14]. Obtain confidence intervals for the variance σ2 for the data of problem [6] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [15]. Obtain confidence intervals for the variance σ2 for each of the standard deviations in problem [7] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [16]. Obtain confidence intervals for the variance σ2 for each of the standard deviations in problem [8] using levels of confidence of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [17]. A sample of 20 measurements of the density of a liquid indicate a mean value of x = 3.25 mg/l with a standard deviation of sx = 0.25 mg/l. The manufacturer of the liquid claims that the mean density of the population is µ = 3.30 mg/l. Should the manufacturer’s claim be Download at InfoClearinghouse.com 61 © 2001 Gilberto E. Urroz All rights reserved rejected at levels of confidence of (a) α = 0.01?, (b) α = 0.05?, (c) α = 0.10?. alternative hypothesis is µ ≠3.30 mg/l}. {Note: the [18]. A more detailed study for the liquid density of problem [17] includes measurements in 300 liquid samples. This study reveals that the mean value of the 300 samples isx = 3.35 mg/l with a standard deviation of sx = 0.15 mg/l. Should the manufacturer’s claim be rejected at levels of confidence of (a) α = 0.01, (b) α = 0.05, (c) α = 0.10, based on the new evidence? [19]. Tests of a new pavement are conducted by measuring the time required for different cars to stop after reaching speeds of 35 mph. The following stop times are recorded for 12 car tests: 12.5 s, 24.3 s, 18.7 s, 15.6 s, 18.2 s, 12.4 s , 23.2 s, 40.3 s, 18.2 s, 19.3 s, 15.4 s, 14.4 s Test the claim that the mean stop time is 15.5 s against the null hypothesis that the actual mean stop time is larger than 15.5 s at levels of confidence of (a) α = 0.01, (b) α = 0.05, (c) α = 0.10. [20]. Twenty specimens of light-weight concrete taken from a concrete manufacturer indicate that, in general, the standard deviation of the concrete density is 150 kg/m3. A sample of 20 concrete cubes shows a mean value of 1200 kg/m3 with a standard deviation of 100 kg/m3. Test the null hypothesis that the mean value of the population of concrete densities is 1250 kg/m3 against the alternative hypothesis that the mean value of the population of concrete densities is less than 1250 kg/m3. [21]. A remote sensing device that measures the potential evapotranspiration of crops is purported to produce accurate results 90% of the time. To check this claim, 10 crop test sites are instrumented in the ground for measuring evapotranspiration and overflights with the remove sensing device are scheduled to compare measurements. It is found that the remote sensing device produced accurate measurements, as compared with the ground-based measurements, in 7 out of the 10 measurements performed. Should we reject the assertion that the remote sensing device is accurate 90% of the time based on these data, against an alternative hypothesis that the device is accurate less than 90% of the data? Use confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [22]. A monitoring site in a small stream is checked daily to verify that the levels of a certain contaminant produced by a farm operation are kept below the allowed limit. The local regulating agency records indicate that the levels of the contaminant were in violation of regulations for 5 out of the last 35 days. Test the hypothesis that the farm operation violates the regulations only 10% of the time against the alternative hypothesis that the farm operation violates the regulations more than 10% of the time. Use confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [23]. A new regulation regarding the amount of ozone produced by cars is being considered. A sample of 40 cars is picked up at random for testing, and it is found that 12 of those cars produce larger ozone concentrations than is considered safe by the local regulating agency. The agency hypothesizes that 20% of the cars currently on the road produce excessive amounts of ozone. Based on the data described above, should be accept this hypothesis against the alternative hypothesis that the proportion of cars in violation of the ozone levels is not 20%? Use confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [24]. Traffic studies are performed at two intersections of a city to determine whether a new turn-signal is necessary. The study is aimed at determining the average number of left turns at each intersection during a selected period of 1 hour. Intersection 1 is monitored through 20 consecutive days for 1 hour showing an average of 12.6 cars turning left with a standard Download at InfoClearinghouse.com 62 © 2001 Gilberto E. Urroz All rights reserved deviation of 3.2 cars. Intersection 2 is monitored through 10 consecutive days at the same 1 hour period an it shows an average of 10.8 cars turning left with a standard deviation of 2.5 cars. It is hypothesized that the difference in the mean values of the populations of cars turning left at intersections 1 and 2 is 2, i.e., H0:µ1 - µ2 = 2. Test this null hypothesis against the alternative hypothesis H1:µ1 - µ2 ≠ 2, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. Assume that the variances of the populations of left-turning cars in each intersection are unknown and unequal. [25]. Precision parts are delivered from two factories that use the same type of machine for manufacturing the parts. A sample of 100 parts from factory number 1 shows an average length of 12.5 cm with a standard deviation of 0.25 cm, while a sample of 50 parts from factory number 2 shows an average length of 11.9 cm with a standard deviation of 0.30 cm. Test the null hypothesis H0:µ1 - µ2 = 0 against the alternative hypothesis H1:µ1 - µ2 > 0, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. Assume that the standard deviation of the populations are unknown but equal. [26]. Measurements of the chlorine levels out of 11 sample bottles from site number 1 indicate an average of 2.5 mg/l with a standard deviation of 0.2 mg/l. In a neighboring site (site number 2) the following values of chlorine concentrations are measured (in mg/l): 1.2 4.3 2.5 3.2 1.7 2.8 4.5 6.3 7.2 Test the null hypothesis H0:µ1 - µ2 = 0 against the alternative hypothesis H1:µ1 - µ2 < 0, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. Assume that the variances of the population of chlorine concentrations are unknown and unequal. [27]. The manager of a service station is trying to estimate the average time of service for a particular task. He monitors the service time at two specific periods during the day obtaining the following data (in minutes): • • For period number 1: 5.0 12.5 15.0 8.5 6.2 7.8 11.4 12.5 10.0 9.2 8.7 11.2 For period number 2: 8.2 7.5 6.7 9.0 11.2 14.3 8.7 6.3 9.2 Test the null hypothesis H0:µ1 - µ2 = 0 against the alternative hypothesis H1:µ1 - µ2 < 0, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. Assume that the variances of the populations of service times are unknown but equal. [28]. The traffic study of problem [24] is repeated with the purpose of determining the proportion of cars through each intersection that perform a left turn during the selected period. A sample from intersection 1 indicates that out of 450 cars counted, 60 made a left turn during the period of study. On the other hand, at intersection 2, it was determined that 50 out of 300 cars made a left turn. Let p1 and p2 represent the proportions of the population of cars making left turns at intersections 1 and 2, respectively. Test the null hypothesis H0:p1p2 = 0.2 against the alternative hypothesis H1:p1-p2 ≠ 0.2, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [29]. A computer manufacturer is testing the proportion of defective chips received from two different factories. A sample of 1000 chips from factory number 1 shows a total of 25 defective chips, while a sample of 300 chips from factory number 2 shows a total of 10 defective chips. Let p1 and p2 represent the proportions of the population of defective chips from factories 1 and 2, respectively. Test the null hypothesis H0:p1-p2 = 0 against the alternative hypothesis H1:p1-p2 ≠ 0, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. Download at InfoClearinghouse.com 63 © 2001 Gilberto E. Urroz All rights reserved [30]. Plot the characteristic and power curves for the hypotheses tests of problems [13] through [23] using a suitable range of population mean values µ and a confidence level of 0.05. [31]. The standard deviation of a sample of 25 measurements of soil density is found to be 25 kg/m3. Let σ2 be the variance of the population of soil specimens from which the sample was taken. Test the null hypothesis H0: σ2 = 30 kg/m3, against the alternative hypothesis H1: σ2 ≠ 30 kg/m3, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [32]. The standard deviation of a sample of 25 measurements of soil density is found to be 25 kg/m3. Let σ2 be the variance of the population of soil specimens from which the sample was taken. Test the null hypothesis H0: σ2 = 900 kg2/m6, against the alternative hypothesis H1: σ2 ≠ 900 kg2/m6, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [33]. Repeat the hypothesis test of problem [32] if the alternative hypothesis is H1: σ2 < 900 kg2/m6. [35]. The following data set represents measurements of hydrocarbon concentration (mg/l) out of specimens taken from wells in a contaminated site: 3.5 5.6 2.3 4.5 8.5 2.3 4.5 1.2 5.6 3.2 Let σ2 be the variance of the population of water specimens from which the sample was taken. Test the null hypothesis H0: σ2 = 30 kg/m3, against the alternative hypothesis H1: σ2 ≠ 30 kg/m3, using confidence levels of (a) α = 0.01, (b) α = 0.05, and (c) α = 0.10. [36]. Two samples of car speeds are taken at a selected site of a major highway. The first sample, consisting of 40 measurements, shows a standard deviation of 10 mph while the second sample, consisting of 20 measurements, shows a standard deviation of 5 mph. Test the null hypothesis H0: σ12/σ22 = 1 against the alternative hypothesis (a) H1: σ12/σ22 < 1, (b) H1: σ12/σ22 > 1, and, (c) H1: σ12/σ22 ≠ 1, for a confidence level of α = 0.01. [37]. Laboratory tests are performed on a batch of water specimens to detect the concentration of coliforms producing the following results (mg/l): 0.10 0.20 0.35 0.15 0.25 0.05 0.23 0.35 0.42 Refer to this data set as sample 1. A previous batch of 5 specimens, taken the previous day (sample 2), showed a mean value of coliform concentration of 0.20 mg/l with a standard deviation of 0.05 mg/l. Test the null hypothesis H0: σ12/σ22 = 1 against the alternative hypothesis (a) H1: σ12/σ22 < 1, (b) H1: σ12/σ22 > 1, and, (c) H1: σ12/σ22 ≠ 1, for a confidence level of α = 0.05. [38]. Two batches of erosion control tests are performed to determine the effectiveness of a new type of hydromulch in controlling erosion at construction sites. The reported rates of erosion (lb/acre/hr) for the two batches are: batch 1 = { 175 276 280 125 456 235 172 180 235 } batch 2 = { 150 175 350 120 275 178 200 } Test the null hypothesis H0: σ12/σ22 = 1 against the alternative hypothesis (a) H1: σ12/σ22 < 1, (b) H1: σ12/σ22 > 1, and, (c) H1: σ12/σ22 ≠ 1, for a confidence level of α = 0.05. [39]. The following data set represents the diameter in mm of a sample of sand grains. Download at InfoClearinghouse.com 64 © 2001 Gilberto E. Urroz All rights reserved 6.81 3.51 5.02 3.95 4.24 3.57 5.00 4.59 5.39 5.39 2.66 2.64 4.33 3.50 2.81 3.42 2.41 5.86 3.11 4.14 3.78 2.96 2.53 6.18 2.35 5.45 2.96 4.12 4.54 3.44 4.29 4.12 4.30 5.23 3.44 3.61 3.47 2.57 6.47 3.86 3.66 1.16 4.83 4.33 4.29 5.66 4.67 5.11 3.65 3.58 4.00 3.41 2.58 3.20 5.08 3.83 3.47 4.21 3.36 3.43 (a) Use user-defined function histnorm to check the hypothesis that the data follows the normal distribution if the data is grouped into 10 classes of the same width at a level of confidence α = 0.05. (b) Use user-defined function histbeta to check the hypothesis that the data follows the beta distribution if the data is grouped into 10 classes of the same width at a level of confidence α = 0.10. [40]. Write a SCILAB function, along the lines of functions histnorm and histbeta that produces the chi-square parameter needed to test the goodness-of-fit of a vector of data for the Weibull distribution. Use this function to check the hypothesis that the data from problem [39] follows the Weibull distribution a level of confidence α = 0.05. [41]. Write a SCILAB function, along the lines of functions histnorm and histbeta that produces the chi-square parameter needed to test the goodness-of-fit of a vector of data for the exponential distribution. Use this function to check the hypothesis that the data from problem [39] follows the exponential distribution a level of confidence α = 0.05. [42]. Samples of a particular species of fish are taken out of four fishing ponds and tested for swirling disease. The table below summarizes the number of fish that tested positive and negative for the disease in the four ponds. Use user-defined function RC to test the hypothesis that the two criteria for classification in the table, namely, pond of origin and test result, are independent at a level of confidence α = 0.05. Positive Negative Pond 1 122 289 Location Pond 2 11 26 Pond 3 3 8 Pond 4 28 33 [43]. The following table is based on a number of laboratory tests on soil samples. The samples are classified according to the typical grain size as sand, lime, or clay. Tests are then performed in a rainfall simulator and the soil samples get classified according to a high, medium, or low erosion potential. Use user-defined function RC to test the hypothesis that the two criteria for classification in the table, namely, soil type and erosion potential, are independent at a level of confidence α = 0.10. Soil type Sand Lime Clay Download at InfoClearinghouse.com Low 45 23 12 Potential Medium 12 10 5 65 High 8 10 2 © 2001 Gilberto E. Urroz All rights reserved REFERENCES (for all SCILAB documents at InfoClearinghouse.com) Abramowitz, M. and I.A. Stegun (editors), 1965,"Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables," Dover Publications, Inc., New York. Arora, J.S., 1985, "Introduction to Optimum Design," Class notes, The University of Iowa, Iowa City, Iowa. Asian Institute of Technology, 1969, "Hydraulic Laboratory Manual," AIT - Bangkok, Thailand. Berge, P., Y. Pomeau, and C. Vidal, 1984,"Order within chaos - Towards a deterministic approach to turbulence," John Wiley & Sons, New York. Bras, R.L. and I. Rodriguez-Iturbe, 1985,"Random Functions and Hydrology," Addison-Wesley Publishing Company, Reading, Massachussetts. Brogan, W.L., 1974,"Modern Control Theory," QPI series, Quantum Publisher Incorporated, New York. Browne, M., 1999, "Schaum's Outline of Theory and Problems of Physics for Engineering and Science," Schaum's outlines, McGraw-Hill, New York. Farlow, Stanley J., 1982, "Partial Differential Equations for Scientists and Engineers," Dover Publications Inc., New York. Friedman, B., 1956 (reissued 1990), "Principles and Techniques of Applied Mathematics," Dover Publications Inc., New York. Gomez, C. (editor), 1999, “Engineering and Scientific Computing with Scilab,” Birkhäuser, Boston. Gullberg, J., 1997, "Mathematics - From the Birth of Numbers," W. W. Norton & Company, New York. Harman, T.L., J. Dabney, and N. Richert, 2000, "Advanced Engineering Mathematics with MATLAB® - Second edition," Brooks/Cole - Thompson Learning, Australia. Harris, J.W., and H. Stocker, 1998, "Handbook of Mathematics and Computational Science," Springer, New York. Hsu, H.P., 1984, "Applied Fourier Analysis," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace Jovanovich, Publishers, San Diego. Journel, A.G., 1989, "Fundamentals of Geostatistics in Five Lessons," Short Course Presented at the 28th International Geological Congress, Washington, D.C., American Geophysical Union, Washington, D.C. Julien, P.Y., 1998,”Erosion and Sedimentation,” Cambridge University Press, Cambridge CB2 2RU, U.K. Keener, J.P., 1988, "Principles of Applied Mathematics - Transformation and Approximation," Addison-Wesley Publishing Company, Redwood City, California. Kitanidis, P.K., 1997,”Introduction to Geostatistics - Applications in Hydogeology,” Cambridge University Press, Cambridge CB2 2RU, U.K. Koch, G.S., Jr., and R. F. Link, 1971, "Statistical Analysis of Geological Data - Volumes I and II," Dover Publications, Inc., New York. Korn, G.A. and T.M. Korn, 1968, "Mathematical Handbook for Scientists and Engineers," Dover Publications, Inc., New York. Kottegoda, N. T., and R. Rosso, 1997, "Probability, Statistics, and Reliability for Civil and Environmental Engineers," The Mc-Graw Hill Companies, Inc., New York. Kreysig, E., 1983, "Advanced Engineering Mathematics - Fifth Edition," John Wiley & Sons, New York. Lindfield, G. and J. Penny, 2000, "Numerical Methods Using Matlab®," Prentice Hall, Upper Saddle River, New Jersey. Magrab, E.B., S. Azarm, B. Balachandran, J. Duncan, K. Herold, and G. Walsh, 2000, "An Engineer's Guide to MATLAB®", Prentice Hall, Upper Saddle River, N.J., U.S.A. McCuen, R.H., 1989,”Hydrologic Analysis and Design - second edition,” Prentice Hall, Upper Saddle River, New Jersey. Download at InfoClearinghouse.com 66 © 2001 Gilberto E. Urroz All rights reserved Middleton, G.V., 2000, "Data Analysis in the Earth Sciences Using Matlab®," Prentice Hall, Upper Saddle River, New Jersey. Montgomery, D.C., G.C. Runger, and N.F. Hubele, 1998, "Engineering Statistics," John Wiley & Sons, Inc. Newland, D.E., 1993, "An Introduction to Random Vibrations, Spectral & Wavelet Analysis - Third Edition," Longman Scientific and Technical, New York. Nicols, G., 1995, “Introduction to Nonlinear Science,” Cambridge University Press, Cambridge CB2 2RU, U.K. Parker, T.S. and L.O. Chua, , "Practical Numerical Algorithms for Chaotic Systems,” 1989, Springer-Verlag, New York. Peitgen, H-O. and D. Saupe (editors), 1988, "The Science of Fractal Images," Springer-Verlag, New York. Peitgen, H-O., H. Jürgens, and D. Saupe, 1992, "Chaos and Fractals - New Frontiers of Science," Springer-Verlag, New York. Press, W.H., B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, 1989, “Numerical Recipes - The Art of Scientific Computing (FORTRAN version),” Cambridge University Press, Cambridge CB2 2RU, U.K. Raghunath, H.M., 1985, "Hydrology - Principles, Analysis and Design," Wiley Eastern Limited, New Delhi, India. Recktenwald, G., 2000, "Numerical Methods with Matlab - Implementation and Application," Prentice Hall, Upper Saddle River, N.J., U.S.A. Rothenberg, R.I., 1991, "Probability and Statistics," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace Jovanovich, Publishers, San Diego, CA. Sagan, H., 1961,"Boundary and Eigenvalue Problems in Mathematical Physics," Dover Publications, Inc., New York. Spanos, A., 1999,"Probability Theory and Statistical Inference - Econometric Modeling with Observational Data," Cambridge University Press, Cambridge CB2 2RU, U.K. Spiegel, M. R., 1971 (second printing, 1999), "Schaum's Outline of Theory and Problems of Advanced Mathematics for Engineers and Scientists," Schaum's Outline Series, McGraw-Hill, New York. Tanis, E.A., 1987, "Statistics II - Estimation and Tests of Hypotheses," Harcourt Brace Jovanovich College Outline Series, Harcourt Brace Jovanovich, Publishers, Fort Worth, TX. Tinker, M. and R. Lambourne, 2000, "Further Mathematics for the Physical Sciences," John Wiley & Sons, LTD., Chichester, U.K. Tolstov, G.P., 1962, "Fourier Series," (Translated from the Russian by R. A. Silverman), Dover Publications, New York. Tveito, A. and R. Winther, 1998, "Introduction to Partial Differential Equations - A Computational Approach," Texts in Applied Mathematics 29, Springer, New York. Urroz, G., 2000, "Science and Engineering Mathematics with the HP 49 G - Volumes I & II", www.greatunpublished.com, Charleston, S.C. Urroz, G., 2001, "Applied Engineering Mathematics with Maple", www.greatunpublished.com, Charleston, S.C. Winnick, J., , "Chemical Engineering Thermodynamics - An Introduction to Thermodynamics for Undergraduate Engineering Students," John Wiley & Sons, Inc., New York. Download at InfoClearinghouse.com 67 © 2001 Gilberto E. Urroz All rights reserved