Download (s/sqrt(n)) - People Server at UNCW

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

Data assimilation wikipedia, lookup

Regression toward the mean wikipedia, lookup

Confidence interval wikipedia, lookup

Choice modelling wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Bias of an estimator wikipedia, lookup

Transcript
• Inference about the mean of a population of
measurements (m) is based on the standardized
value of the sample mean (Xbar).
• The standardization involves subtracting the
mean of Xbar and dividing by the standard
deviation of Xbar – recall that
– Mean of Xbar is m ; and
– Standard deviation of Xbar is s/sqrt(n)
• Thus we have (Xbar - m )/(s/sqrt(n)) which has a
Z distribution if:
– Population is normal and s is known ; or if
– n is large so CLT takes over…
• But what if s is unknown?? Then this
standardized Xbar doesn’t have a Z distribution
anymore, but a so-called t-distribution with n-1
degrees of freedom…
• Since s is unknown, the standard deviation of
Xbar, s/sqrt(n), is unknown. We estimate it by
the so-called standard error of Xbar, s/sqrt(n),
where s=the sample standard deviation.
• There is a t-distribution for every value of the
sample size; we’ll use t(k) to stand for the
particular t-distribution with k degrees of
freedom. There are some properties of these tdistributions that we should note…
• Every t-distribution looks like a N(0,1) distribution; i.e., it
is centered and symmetric around 0 and has the same
characteristic “bell” shape… however, the standard
deviation of t(k) {sqrt(k/(k-2))} is greater than 1, the s.d.
of Z so the t-distribution density curve is more spread out
than Z. Probabilities involving r.v.s that have the t(k)
distributions are given by areas under the t(k) density
curve … Table D in the back of our book gives us the
probabilities we need…
• The good news is that everything we’ve already
learned about constructing confidence intervals
and testing hypotheses about m carries through
under the assumption of unknown s …
• So e.g., a 95% confidence interval for m based
on a SRS from a population with unknown s is
Xbar +/- t*(s.e.(Xbar))
Recall that s.e.(Xbar) = s/sqrt(n). Here t* is the
appropriate tabulated value from Table D so that
the area between –t* and +t* is .95
• As we did before, if we change the level of
confidence then the value of t* must change
appropriately…
• Similarly, we may test hypotheses using this tdistributed standardized Xbar… e.g., to test the
H0: m =m0 against Ha: m >m0 we use
(Xbar - m0)/(s/sqrt(n)) which has a tdistribution with n-1 df, assuming the null
hypothesis is true. See page 422 (7.1, 3/7) for a
complete summary of hypothesis testing in the
case of “the one-sample t-test” …
• HW: Read section 7.1 thru p. 433; go over all the
examples carefully and answer the HW questions
following them: #7.1-7.9 Work on the following
problems (p.441 ff) (use software as needed):
#7.15-7.22, 7.25, 7.32, 7.35-7.37, 7.41.
Is there a difference in aggressive behavior of patients on
"moon days" compared with "non-moon days"?
•To summarize the analysis:
– when the data comes in matched pairs, the analysis is
performed on the differences between the paired
measurements
– then use the t-statistic with n-1 d.f. (n = # of pairs) to
construct confidence intervals and test hypotheses on
the true mean difference.
• In a matched pairs design, subjects are matched
in pairs and the outcomes are compared within
each matched pair. A coin toss could determine
which of the two subjects gets the treatment and
which gets the control… One special kind of
matched pairs design is when a subject acts as
his/her own control, as in a before/after study…
See example 7.7 on page 428ff (7.1, 4/7). Note
that the paired observations (# of agressive
behaviors) are subtracted and the difference in
scores becomes the single number analyzed
with a one-sample t-statistic with n-1 df, where
n=the number of pairs… see the top of page 431
and the next page for a summary of the process.
• HW Read through p.433. Go over Example 7.7
then do #7.32, 7.35, 7.41.
• Read the section on Robustness of the t
procedures (starting p.432 (7.1, 5/7))… note
the definition of the statistical term robust –
essentially, a statistic is robust if it is insensitive
to violations of the assumptions made when the
statistic is used. For example, the t-statistic
requires normality of the population… how
sensitive is the t-statistic to violations of
normality?? Look at the practical guidelines for
inference on a single mean at bottom of p.432…
– If the sample size is < 15, use the t procedures if the
data are close to normal.
– If the sample size is >= 15 then unless there is strong
non-normality or outliers, t procedures are OK
– If the sample size is large (say n >= 40) then even if
the distribution is skewed, t procedures are OK