Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 2 Building Empirical Model Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important characteristics of the product. An engineer is interested in comparing the strength of a modified formulation in which polymer latex emulsion have been added during mixing to the strength of unmodified mortar. The experimenter has collected observations on the strength, 10 each for both mortars. The data are shown in Table 2.1 Each observations,j is called a run Fluctuation (noise) – experimental error Presence of error implies that response variable is a random variable (can be discreate or continuous) Dot diagram for data in Table 2.1 What can you conclude from the dot diagram? Where is the general location or central tendency? Other graphical methods… Histogram •For fairly numerous data Other graphical methods… Box plot (or box and whisker plot) Upper quartiles (75%) median lower quartiles (25%) Probability Distributions • The probability structure of a random variable, y is described by its probability distributions. • If y is discrete – the probability function of y, p(y) • If y is continuous – the probability density function, f(y) Mean, Variance and Expected value • Mean,μ of a probability distribution is a measure of its central tendency or location • We may also express the mean in terms of expected value of random variable, y Where E denotes the expected value operator • The variability or dispersion of a probability distribution can be measured by the variance, defined as • Note that the variance can be expressed entirely in terms of expectation because • Finally the variance is used so extensively that it is convenient to define a variance operator, V such that Elementary results • If y is a random variable with mean μ and variance σ2 and c is a constant, then: Covariance is a measure of the linear association between y1 and y2. If y1 and y2 are independent, then Cov(y1,y2)=0. We may also show that, Inferences About Differences In Means, Randomized Design • • • • • • Hypothesis testing Choice of sample size Confidence intervals The case where σ12≠ σ22 The case where σ12 and σ22 are known Comparing a single mean to specified value Hypothesis testing • Lets reconsider the portland cement experiment. • In general, we can consider 2 formulations (unmodified and modified mortar) involved as 2 level of the factor formulations. • Let y11,y12,y13,…y1n1 represent the n1 observations from the first factor level, whereas y21,y22,y23,…y1n1 represent the n2 observations from the second factor level. • We describe the results of experiment with a model. A simple statistical model: y= j observation from factor level i μ= mean of response ε = normal random variable = random error 1) Statistical hypothesis Is a statement either about the parameters of a probability distribution or the parameters of a model. Decision-making procedure about hypothesis is called hypothesis testing. For example, in the portland cement experiment, we may think that the mean tension bond strengths of two mortar formulation are equal. This may stated formally as: Power = the probability of rejecting null hypothesis, H0 when the alternative hypothesis, H1 is true. 2) The two-sample t-Test • The appropriate test statistic to use for comparing two treatment mean in completely randomized design is Where : y is sample mean n is sample size S2p is estimate of common var iance S12 and S 22 are individual sample variances • To determine whether to reject H0:μ1=μ2, we would compare t0 to the t distribution with n+n-2 degrees of freedom. • If t 0 t / 2,n n 2, where t 0 t / 2,n n 2, is the upper α/2 percentage point of t distribution with n1+n2-2 degrees of freedom, we would reject H0 and conclude that the mean strength of two formulation of portland cement differ. • This test procedure is called two-sample t-test • For one sided alternative hypothesis H1:μ1>μ2, H0 would be rejected if t 0 t / 2,n n 2, • For H1:μ1<μ2, H0 would be rejected if t 0 t / 2,n n 2, Example: From the portland cement data, 3) P-values • One way to report the results of a hypothesis test is to state that the null hypothesis was or was not rejected at specified α-value or level of confidence. • For example; in portland cement mortar formulation, we can say that H0:μ1=μ2 was rejected at 0.05 level of confidence. • This is inadequate conclusion because no idea exact location of the computed value in rejection region. Moreover, some decision maker might be uncomfortable with α=0.05. • To overcome this difficulties P-value approach • • P-value is the smallest level of significance that would lead to rejection of null hypothesis. • P-value: Smallest level α at which data are significant. Therefore, can determine significance of data. • It is not easy to compute exact P-value. However, approximation can be done. For portland cement mortar example, degree of freedom=18. From tdistribution table, the smallest tail area probability is 0.0005, for which t0.0005,18 = 3.922 • Now t 0 9.13 3.922 (H0 is rejected), so because the alternative hypothesis is two-sided, P-values must be less than 2(0.0005)= 0.001. 4) Normal probability plot Is a graphical technique for determining whether sample data conform to hypothesized distribution based on subjective visual exam of data. How to interpret? How to construct?? (j-0.5)/n, where j=1,2,3….n Choice of sample size • The choice of sample size and probability of type II error, β are closely related. • Suppose we are testing And that the means, μ are not equal. Because H0:μ1=μ2 is not true, we are concerned about wrongly failing to reject H0. • β depends on true difference in mean,δ • Graph β vs δ is called the operating characteristic curve or O.C. curve. • Generally, β error decreases as the sample size increases. So, δ is easier to detect in bigger sample size. Example of O.C curve for the case where σ1 and σ2 are unknown but equal, and α= 0.05 d 1 2 / 2 n* 2n 1 d From the curve; The greater the difference in mean, the smaller β error As the sample size increases, β gets smaller • How to use the O.C curve to calculate sample size? • Suppose that δ=0.1, therefore, 0.1 d 1 2 / 2 / 2 2 • If σ = 0.25, then d= 0.2. • If we want to reject the null hypothesis 95% of the time when μ1-μ2=0.1, then β=0.05 and d=0.2 yields n*=15 • Since n* 2n 1 , therefore n = 8 Confidence intervals an interval within which the value of parameter or parameters in question would be expected to lie. L and U are called lower and upper confidence limits. 1-α is called confidence coefficient. If α = 0.05, Equation 8.29 is called a 95% confidence interval for μ. How to calculate confidence interval? y1 y 2 t / 2,n1 n 2 2Sp 1 1 1 2 n1 n 2 y1 y 2 t / 2,n1 n 2 2Sp 1 1 n1 n 2 is a 100(1-α) percent confidence interval for μ1-μ2. Example y1 y 2 t / 2, n1 n 2 2S p 1 1 1 2 n1 n 2 y1 y 2 t / 2, n1 n 2 2S p 1 1 n1 n 2 From portland cement mortar example discuss earlier; the actual 95% confidence .interval estimate for difference in mean tension strength, 16 .76 17 .92 ( 2.101)0.284 1 1 1 2 10 10 1 1 10 10 1.16 0.27 1 2 1.16 0.27 16 .76 17 .92 ( 2.101)0.284 1.43 1 2 1.43 Thus the confidence interval is μ1-μ2 = -1.16 kgf/cm2 ±0.27 kgf/cm2 Or the difference in mean strength is -1.16 and the accuracy of this estimate is ±0.27 kgf/cm2 The case where σ12≠ σ22 If we are testing, And cannot assume the variances are equal, the test statistic becomes With calculation of degree of freedom as follows The case where σ12 and σ22 are known If both variances are known, then the hypothesis Comparing a single mean to specified value If we are testing, The test statistics, The confidence interval, SUMMARY Regression model & Empirical model • Suppose there is a single dependent variable or response,y that depends on k independent or regressor variables, for example x1,x2,x3,…xk • The relationship between y and k is characterized by mathematical model called a regression model. • Regression model is the basis of empirical model (created from experimental observations) Linear Regression Model • Suppose we wish to develop an empirical model which relates viscosity of polymer to the temperature, x1 and catalyst feed rate,x2 y 0 1x1 2 x 2 • • • • This is multiple linear regression model. Why? β =regression coefficient x =predictor variables or regressor In general, any regression model that is linear in parameters is a linear regression model, regardless of the surface that is generated (normally related to model with interaction) . • Methods for estimating parameters in multiple linear regression is called model fitting. • Typical method is method of least squares Least squares estimation of the parameters Matrix Approach To Multiple Linear Regression Properties of the least squares estimators and estimation of σ2 Hypothesis Testing In Multiple Regression • Test for significance of regression • Test on individual regression coefficients and groups of coefficients Test for significance of regression Test on individual regression coefficients and groups of coefficients Confidence interval in multiple regression • On individual regression coefficient • On the mean response Confidence interval in multiple regressionOn individual regression coefficient Confidence interval in multiple regressionOn the mean response Thank you…