Download Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ST 370
Probability and Statistics for Engineers
Sampling Distributions
When you carry out an experiment and measure some quantity, the
resulting value is regarded as one particular value of a random
variable, and the probability distribution of that random variable
governs the probabilities of the different possible values.
When the experiment is part of a factorial design or a regression
design, you observe the values of several random variables, each of
which has its own probability distribution.
1 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
Any quantity calculated from the observed values, such as an
estimate of one of the parameters, is called a statistic.
Because it is a function of the observed values of random variables, a
statistic is also a random variable.
As a random variable, a statistic has a probability distribution, called
its sampling distribution, and the standard deviation of that
distribution is called the standard error of the statistic.
The sampling distribution of a parameter estimate is the key to
making statistical inferences about the parameter that it estimates.
2 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
Factorial designs
Consider for example the replicated two-factor design, for which the
statistical model is
Yi,j,k = µ + τi + βj + (τ β)i,j + i,j,k ,
i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n
where
i,j,k , i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n
are independent random variables, each distributed as N(0, σ 2 ).
3 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
An equivalent way to write the model is:
Yi,j,k , i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n
are independent random variables, normally distributed with common
variance σ 2 and expected values
E (Yi,j,k ) = µ + τi + βj + (τ β)i,j ,
i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , n
4 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
With constraints such as
τ1 = β1 = 0,
(τ β)i,1 = 0, i = 1, . . . , a
and
(τ β)1,j = 0,
j = 1, . . . , b
the least squares estimates of the parameters can be found.
5 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
Regression designs
The statistical model is
Yi = β0 + β1 xi,1 + · · · + βk xi,k + i
where i , i = 1, . . . , n are independent random variables, each
distributed as N(0, σ 2 ).
Again, an equivalent way to write the model is: Yi , i = 1, n, are
independent random variables, normally distributed with common
variance σ 2 and expected values
E (Yi ) = β0 + β1 xi,1 + · · · + βk xi,k .
6 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
General linear model
Any factorial model may be written as a regression model by using
indicator variables, so the regression model is the more general form.
The key sampling distribution results for the least squares estimates
are:
Each parameter estimate β̂j has a Gaussian distribution, with
expected value equal to βj (they are unbiased).
The standard error of each estimate is of the form aj × σ, where
aj is a quantity that can be calculated from the design, and does
not depend on the unknown parameters.
So
β̂j − βj
∼ N(0, 1).
aj σ
7 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
The residual mean square s 2 has the property that
νs 2
σ2
has the chi-squared distribution with ν degrees of freedom,
which is a special case of the Gamma distribution. Here ν is the
degrees of freedom for residuals.
Also, s 2 is independent of the least squares estimates β̂j .
So
T =
β̂j − βj
∼ Student’s t with ν degrees of freedom
aj s
The “standard error” reported by software is the estimated standard
error aj s.
8 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
Confidence intervals
1 − α = P(|T | ≤ tα/2,ν ) = P
−tα/2,ν ≤
=P
!
β̂ − β j
j
≤ tα/2,ν
aj s !
β̂j − βj
≤ tα/2,ν
aj s
= P −tα/2,ν × aj s ≤ β̂j − βj ≤ tα/2,ν × aj s
= P −β̂j − tα/2,ν × aj s ≤ −βj ≤ −β̂j + tα/2,ν × aj s
= P β̂j + tα/2,ν × aj s ≥ βj ≥ β̂j − tα/2,ν × aj s
9 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
That is, the probability that βj lies between the random limits
β̂j ± tα/2,ν × aj s
is 1 − α, and
β̂j − tα/2,ν × aj s, β̂j + tα/2,ν × aj s
is a 100(1 − α)% confidence interval for βj .
10 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
Hypothesis tests
Similarly, under H0 : βj = βj0 , the probability of finding as large a
value as
β̂j − βj0
tobs =
aj s
is
P(|T | ≥ |tobs |)
where T ∼ Student’s t with ν degrees of freedom.
This is the P-value reported by software.
11 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
Nested models
Write SSE ,reduced and SSE ,full for the residual sums of squares of the
two models, where the “reduced” model is nested within the “full”
model.
The extra sum of squares is
SSR,extra = SSE ,reduced − SSE ,full
and if this is large, the r additional predictors have explained a
substantial additional amount of variability.
12 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
The key properties of these sums of squares are:
SSR,extra and SSE ,full are independent;
SSE ,full /σ 2 follows the chi-squared distribution with ν degrees of
freedom, where ν is the residual degrees of freedom for the full
model.
Under the null hypothesis that the added predictors all have zero
coefficients, SSR,extra /σ 2 follows the chi-squared distribution,
with r degrees of freedom.
Consequently, under that null hypothesis, the F -statistic
F =
SSR,extra /r
MSE ,full
follows Fisher’s F -distribution with r and ν degrees of freedom.
13 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
ST 370
Probability and Statistics for Engineers
The “test for significance of regression” is a special case, where the
reduced model has no predictors, only an intercept.
In this case, and in the general nested model comparison, the P-value
reported by software is
P(F ≥ Fobs )
where F ∼ Fisher’s F with r and ν degrees of freedom.
14 / 14
Sampling Distributions and Statistical Inference
Sampling Distributions
Related documents