Download Sampling Distributions

ST 370 Probability and Statistics for Engineers Sampling Distributions When you carry out an experiment and measure some quantity, the resulting value is regarded as one particular value of a random variable, and the probability distribution of that random variable governs the probabilities of the different possible values. When the experiment is part of a factorial design or a regression design, you observe the values of several random variables, each of which has its own probability distribution. 1 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers Any quantity calculated from the observed values, such as an estimate of one of the parameters, is called a statistic. Because it is a function of the observed values of random variables, a statistic is also a random variable. As a random variable, a statistic has a probability distribution, called its sampling distribution, and the standard deviation of that distribution is called the standard error of the statistic. The sampling distribution of a parameter estimate is the key to making statistical inferences about the parameter that it estimates. 2 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers Factorial designs Consider for example the replicated two-factor design, for which the statistical model is Yi,j,k = µ + τi + βj + (τ β)i,j + i,j,k , i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n where i,j,k , i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n are independent random variables, each distributed as N(0, σ 2 ). 3 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers An equivalent way to write the model is: Yi,j,k , i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n are independent random variables, normally distributed with common variance σ 2 and expected values E (Yi,j,k ) = µ + τi + βj + (τ β)i,j , i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n 4 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers With constraints such as τ1 = β1 = 0, (τ β)i,1 = 0, i = 1, . . . , a and (τ β)1,j = 0, j = 1, . . . , b the least squares estimates of the parameters can be found. 5 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers Regression designs The statistical model is Yi = β0 + β1 xi,1 + · · · + βk xi,k + i where i , i = 1, . . . , n are independent random variables, each distributed as N(0, σ 2 ). Again, an equivalent way to write the model is: Yi , i = 1, n, are independent random variables, normally distributed with common variance σ 2 and expected values E (Yi ) = β0 + β1 xi,1 + · · · + βk xi,k . 6 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers General linear model Any factorial model may be written as a regression model by using indicator variables, so the regression model is the more general form. The key sampling distribution results for the least squares estimates are: Each parameter estimate β̂j has a Gaussian distribution, with expected value equal to βj (they are unbiased). The standard error of each estimate is of the form aj × σ, where aj is a quantity that can be calculated from the design, and does not depend on the unknown parameters. So β̂j − βj ∼ N(0, 1). aj σ 7 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers The residual mean square s 2 has the property that νs 2 σ2 has the chi-squared distribution with ν degrees of freedom, which is a special case of the Gamma distribution. Here ν is the degrees of freedom for residuals. Also, s 2 is independent of the least squares estimates β̂j . So T = β̂j − βj ∼ Student’s t with ν degrees of freedom aj s The “standard error” reported by software is the estimated standard error aj s. 8 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers Confidence intervals 1 − α = P(|T | ≤ tα/2,ν ) = P −tα/2,ν ≤ =P ! β̂ − β j j ≤ tα/2,ν aj s ! β̂j − βj ≤ tα/2,ν aj s = P −tα/2,ν × aj s ≤ β̂j − βj ≤ tα/2,ν × aj s = P −β̂j − tα/2,ν × aj s ≤ −βj ≤ −β̂j + tα/2,ν × aj s = P β̂j + tα/2,ν × aj s ≥ βj ≥ β̂j − tα/2,ν × aj s 9 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers That is, the probability that βj lies between the random limits β̂j ± tα/2,ν × aj s is 1 − α, and β̂j − tα/2,ν × aj s, β̂j + tα/2,ν × aj s is a 100(1 − α)% confidence interval for βj . 10 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers Hypothesis tests Similarly, under H0 : βj = βj0 , the probability of finding as large a value as β̂j − βj0 tobs = aj s is P(|T | ≥ |tobs |) where T ∼ Student’s t with ν degrees of freedom. This is the P-value reported by software. 11 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers Nested models Write SSE ,reduced and SSE ,full for the residual sums of squares of the two models, where the “reduced” model is nested within the “full” model. The extra sum of squares is SSR,extra = SSE ,reduced − SSE ,full and if this is large, the r additional predictors have explained a substantial additional amount of variability. 12 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers The key properties of these sums of squares are: SSR,extra and SSE ,full are independent; SSE ,full /σ 2 follows the chi-squared distribution with ν degrees of freedom, where ν is the residual degrees of freedom for the full model. Under the null hypothesis that the added predictors all have zero coefficients, SSR,extra /σ 2 follows the chi-squared distribution, with r degrees of freedom. Consequently, under that null hypothesis, the F -statistic F = SSR,extra /r MSE ,full follows Fisher’s F -distribution with r and ν degrees of freedom. 13 / 14 Sampling Distributions and Statistical Inference Sampling Distributions ST 370 Probability and Statistics for Engineers The “test for significance of regression” is a special case, where the reduced model has no predictors, only an intercept. In this case, and in the general nested model comparison, the P-value reported by software is P(F ≥ Fobs ) where F ∼ Fisher’s F with r and ν degrees of freedom. 14 / 14 Sampling Distributions and Statistical Inference Sampling Distributions

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Sampling Distributions