Download March2006

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
MT2004
Olivier GIMENEZ
Telephone: 01334 461827
E-mail: [email protected]
Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html
So far, data-driven statistical methods i.e. use data to answer
questions in direct ways
The rest of the module - from Section 7 - deals with
CAPTURING PATTERNS IN THE DATA USING MODELS:
Modelling Step
ANALYSING THESE MODELS TO ANSWER QUESTIONS:
Estimation and Inference Step
7. Basic Normal theory & the
Central Limit Theorem
7.1 Basic properties of normal distributions
See Section 2.2.2
0.8
f (y)
µ=3
 = 0,5
0.7
0.6
0.5
µ=0
= 1
0.4
0.3
µ=3
= 1
0.2
µ=1
= 2
0.1
x
0
-4
-3
-2
-1
0
1
2
3
4
5
6
7.1.1 Linear transformation of a normal r.v.
Let X be a random variable with mean  and variance 2
Let Y = a + b X, then
E(Y) = ?
V(Y) = ?
7.1.1 Linear transformation of a normal r.v.
Let X be a random variable with mean  and variance 2
Let Y = a + b X, then
E(Y) = a + b E(X) = a + b 
V(Y) = b2 V(X) = b2 2
7.1.1 Linear transformation of a normal r.v.
Now, if X is normally distributed with mean  and variance 2, then
Y = a + b X is normally distributed too.
In other words, any linear combination of a normal distribution is a
normal distribution.
And more precisely, according to the previous slide,
X  N(,2)  Y  N (a + b  , b2 2)
Demonstration: homework
7.1.1 Linear transformation of a normal r.v.
Now, suppose that X  N(,2), and consider
What is the distribution of Z ?
7.1.1 Linear transformation of a normal r.v.
Write
and remember that X  N(,2)  Y = a + b X  N (a + b  , b2 2)
so by identification, we obtain
Finally
7.1.1 Linear transformation of a normal r.v.
Result:
7.1.1 Linear transformation of a normal r.v.
Very useful result for working out probabilities associated with
any normal distributions.
Idea: transform to the standard normal distribution N(0,1) and
use the published tables for probabilities associated with N(0,1).
E.g.:
z
0.0
0.5
1.0
2.0
2.5
3.0
Pr(Z>z) 0.5000 0.30854 0.15866 0.02275 0.00621 0.00135
(See Table 5 of the K & Y Tables)
7.1.1 Linear transformation of a normal r.v.
Example:
Calculate the probability that a random variable X  N(3,4) takes
a value between 4 and 5
7.1.1 Linear transformation of a normal r.v.
Example:
Calculate the probability that a random variable X  N(3,4) takes
a value between 4 and 5
We wish to compute Pr(4  X  5).
Using the result above, we have that Z = (X-3)/2  N(0,1)
So Pr(4  X  5) = Pr(4-3  X-3  5-3) = Pr(1/2  Z  1)
Finally Pr(4  X  5) = Pr(Z  1) - Pr(Z  1/2)
= (1-0.15866) - (1-0.30854) = 0.14988
7.1.2 Sums of independent normal random
variables
Sums of (normal) r.v's occur frequently in statistical theory (e.g.
mean, variance...). Distribution?
If X1, X2 independent with Xi  N(i,i2) i=1,2,
then
Extension: X1,...,Xn independent r.v's with Xi  N(i,i2)
i=1,...,n and a1,...,an constants, then
7.1.2 Sums of independent normal random
variables
7.1.2 Sums of independent normal random
variables
7.1.2 Sums of independent normal random
variables
7.2 The Central Limit Theorem
We've just seen that the mean of n independent identically normally
distributed r.v's is itself normally distributed.
The Central Limit Theorem (CLT) states that
the mean of n i.i.d. r.v's from any distribution is
approximately normally distributed for large
enough n.
7.2 The Central Limit Theorem
The Central Limit Theorem (CLT) states that the mean of n
i.i.d. r.v's from any distribution is approximately normally
distributed for large enough n.
7.2 The Central Limit Theorem
The Central Limit Theorem (CLT) states that the mean of n
i.i.d. r.v's from any distribution is approximately normally
distributed for large enough n.
7.2 The Central Limit Theorem
Example: A bridge can hold at most 400 vehicles if they are bumperto-bumper and stationary. The mean weight of vehicles using the
bridge is 2.5 tonnes with a standard deviation of 2.0 tonnes. What is
the probability that the maximum design load of 1100 tonnes will be
exceeded in a traffic jam?
7.2 The Central Limit Theorem
Example: A bridge can hold at most 400 vehicles if they are bumper-to-bumper and
stationary. The mean weight of vehicles using the bridge is 2.5 tonnes with a
standard deviation of 2.0 tonnes. What is the probability that the maximum design
load of 1100 tonnes will be exceeded in a traffic jam?
Let Xi be the weight of a vehicle, i=1,...n. Here, we have that n = 400,
 = 2.5t and  = 2.0t.
The probability that the maximum design load of 1100 tonnes will be
exceeded in a traffic jam is given by Pr(iXi > 1100).
We'd like to use the CLT: X1,...,Xn i.i.d. r.v's with mean  and variance
2:
7.2 The Central Limit Theorem
Example: A bridge can hold at most 400 vehicles if they are bumper-to-bumper and
stationary. The mean weight of vehicles using the bridge is 2.5 tonnes with a
standard deviation of 2.0 tonnes. What is the probability that the maximum design
load of 1100 tonnes will be exceeded in a traffic jam?
How to use Tables?
Table of the Standard Normal Distribution
values inside the table = areas under Z  N(0,1) between - and z
i.e. (z) = P(Z  z)
Example 1: to determine (1.96)=P(Z1.96), i.e. the area under the
curve between - and 1.96, look in the intersecting cell for the row
labelled 1.90 and the column labelled 0.06. The area under the curve
is 0.975.
Example 2: Find z such as (z) = 0.95. P(Z1.64)=0.9495 and
P(Z1.65)=0.9505 so that z = 1.645.
Example 3: (-1.23) = 1-(1.23)=1-.8907=0.1093
7.3 Approximating other distributions by normal distributions
The CLT also provides the justification for approximating several other
distributions by a normal distribution.
We consider two examples, the Binomial and the Poisson distributions.
The Binomial probability distribution is:
It becomes hard to evaluate it for large n as the factorials in the
binomials coefficient 'explode'.
However, the CLT can be used to overcome this problem
7.3 Approximating other distributions by normal distributions
We first note that if X  Bin(n,p), then X can be written as a sum of n
independent binomials r.v's: X = X1 + ... + Xn, where Xi  Bin(1,p).
Each Xi has mean p and variance p(1-p).
Thus the CLT implies that
Alternatively:
In real life, the approximation will be good enough when
7.3 Approximating other distributions by normal distributions
X  Bin(n,p); X = X1 + ... + Xn, where Xi  Bin(1,p); each Xi has mean
p and variance p(1-p).
Alternatively:
Example: The probability of annual survival of a bird species is 0.4.
Suppose we are studying a population of n = 200 individuals. What is
the probability that less than 50% of the population survives the current
year.
7.3 Approximating other distributions by normal distributions
Example: The probability of annual survival of a bird species is 0.4. Suppose we are studying a
population of n = 200 individuals. What is the probability that less than 50% of the population
survives the current year.
Let Xi be the random variable 'individual i survives the year', we have
that  Bin(1,0.4); each Xi has mean 0.4 and variance 0.4(1-0.4). Then X
= X1 + ... + Xn is the total of surviving individuals, X  bin(200,0.4).
As:
Via the CLT:
So that
7.3 Approximating other distributions by normal distributions
Example: The probability of annual survival of a bird species is 0.4. Suppose we are studying a
population of n = 200 individuals. What is the probability that less than 50% of the population
survives the current year.
So that
with Z  N(0,1). Using tables for the standard normal distribution,
we have that P(Z<3)=0.9987.
Without invoking the CLT, we would need to compute
P(X<100) = P(X=0) + P(X=1) + ... + P(X=99)
7.3 Approximating other distributions by normal distributions
The Poisson probability function is:
It becomes hard to evaluate it for high values of  as x gets huge.
However, the CLT can be used to overcome this problem.
We note first that if X1, X2 independent with Xi  Pois(i) i = 1,2 then
X1 + X2  Pois(1+2) (to be proved in Honours)
7.3 Approximating other distributions by normal distributions
We note that if X  Pois(), then X can be written as a sum of n
independent Poisson r.v's: X = X1 + ... + Xn, where Xi  Pois(/n).
Each Xi has mean /n and variance /n (mean = variance: homework)
Thus the CLT
implies that
So that
7.3 Approximating other distributions by normal distributions
Example: Find the probability that a Poisson distributed r.v. with mean
25 takes a value in the range 26 to 30.
7.3 Approximating other distributions by normal distributions
Example: Find the probability that a Poisson distributed r.v. with mean 25 takes a
value in the range 26 to 30.
If X  Pois(25), we need to calculate P(26  X  30)
The CLT tells us that
Using tables for the standard normal distribution, we have that
P(26  X  30) = P(0.2  Z  1) = P(Z  1) - P(Z  0.2)
= 0.8413 – 0.5793 = 0.262
8. Practical Applications of
Normal Distributions
Why are normal distributions so important?
1 – The CLT shows that sums of i.i.d r.v’s tend towards normality,
even if the r.v’s are non-normal
2 – Many data sets for which a normal distribution provides a good
model (describe adequately the data): heights of people, IQ scores…
3 – Easy to work with mathematically (integrals, tables…)
4 – Statistical procedures based on normality assumption are often
insensitive to small violations of the assumption (ANOVA e.g., see
future section)
5 – Non-normal distributions can be transformed to approximate
normality
8.1 Testing for normality
Before using the normal distribution as a model of data to perform
test about the mean of a population e.g., we need to decide whether
or not the random sample under investigation could have been drawn
from a normal distribution.
There a analytical tests (Pearson, Kolmogorow…) but we will focus
on a graphical method here.
We won’t be able to prove normality, but only fail to reject the
hypothesis that the data come from the normal distribution
(hypothesis testing philosophy, finite random sample)
8.1 Testing for normality
First idea: use a histogram, and compare with what we would
expect for a normal distribution, i.e. bell-shaped, symmetric with a
single peak (unimodal)
…
8.1 Testing for normality
Histograms of random samples
(n=30) from N(3,var=25) vs
Density curve of N(xi/n,s2)
Difficult to conclude for
normality, because of variability
8.1 Testing for normality
Second idea:
Remember that any normal r.v. is a linear transformation of a
standard normal r.v.
So if y1,…,yn is a random sample from any normal r.v. (N(,2)
say) and z1,…,zn a random sample from a N(0,1)
Then plot the sorted y values against the sorted z values
We would get something close to a straight line because
Y=+Z
8.1 Testing for normality
Plots of random samples
(n=30) from N(3,var=25)
against N(0,1)
Difficult to conclude for
normality, because of
variability
8.1 Testing for normality
Third idea: to overcome the problem of variability, use an
‘idealised’/theoretical average sample from N(0,1), the normal
scores
8.1 Testing for normality
If Z  N(0,1), by definition of the cumulative distribution
function/(lower) quantile, we have that:
P(Z  z10% quantile) = Φ(z10% quantile) = 0.10
P(Z  z20% quantile) = Φ(z20% quantile) = 0.20
…
P(Z  z100% quantile) = Φ(z100% quantile) = 1.00
Meaning that, on average, we expect 10% of the data points to lie
below the 10% quantile of the c.d.f., 20% below the 20% quantile,
…, and 100% below the 100% quantile.
8.1 Testing for normality
Consider a sample of 10 points e.g. from N(0,1)
We’ve got 10 probability intervals corresponding to the
quantiles:
[0,0.1], [0.1,0.2], [0.2,0.3], [0.3,0.4], [0.4,0.5]
[0.5,0.6], [0.6,0.7], [0.7,0.8], [0.8,0.9], [0.9,1.0]
For convenience, consider the mid-point of each interval
(i-0.5)/10, i=1,…,10
The normal scores are obtained by computing Φ-1((i0.5)/10), i=1,…,10, where  is the c.d.f. of the N(0,1)
8.1 Testing for normality
Consider a sample of 10 points e.g. from N(0,1)
0.05
Φ-1((1-0.5)/10) = Φ-1(0.05) = -1.645
8.1 Testing for normality
Consider a sample of 10 points e.g. from N(0,1)
0.15
Φ-1((2-0.5)/10) = Φ-1(0.15) = -1.036
8.1 Testing for normality
Consider a sample of 10 points e.g. from N(0,1)
Finally…
8.1 Testing for normality
Idea: Plot the observed y sorted values against the normal scores,
then check visually for linearity
Example: Early in the 20th century, a Colonel L.A. Waddell
collected 32 skulls from Tibet. He collected 2 groups: 17 from
graves on Sikkim and 15 from a battlefield near Lhasa. Here are
maximum skull length measurements (in mm) for the Lhasa group:
182, 180, 191, 184, 181, 173, 189, 175, 196, 200, 185, 174, 195,
197, 182. Before doing anything with these data (e.g. testing for a
difference in the mean skull length between the 2 groups), we need
to check for normality first.
Example
Sample
quantiles
182
180
191
184
181
173
189
175
196
200
185
174
195
197
182
Example
Sample Sorted sample
quantiles
quantiles
182
173
180
174
191
175
184
180
181
181
173
182
189
182
175
184
196
185
200
189
185
191
174
195
195
196
197
197
182
200
Example
Sample Sorted sample i = 1,…,15 (i-0.5)/15
quantiles
quantiles
182
173
1
0.03
180
174
2
0.10
191
175
3
0.17
184
180
4
0.23
181
181
5
0.30
173
182
6
0.37
189
182
7
0.43
175
184
8
0.50
196
185
9
0.57
200
189
10
0.63
185
191
11
0.70
174
195
12
0.77
195
196
13
0.83
197
197
14
0.90
182
200
15
0.97
Example
Sample Sorted sample i = 1,…,15 (i-0.5)/15 Φ-1((i-0.5)/15)
quantiles
quantiles
182
173
1
0.03
-1.83
180
174
2
0.10
-1.28
191
175
3
0.17
-0.97
184
180
4
0.23
-0.73
181
181
5
0.30
-0.52
173
182
6
0.37
-0.34
189
182
7
0.43
-0.17
175
184
8
0.50
0.00
196
185
9
0.57
0.17
200
189
10
0.63
0.34
185
191
11
0.70
0.52
174
195
12
0.77
0.73
195
196
13
0.83
0.97
197
197
14
0.90
1.28
182
200
15
0.97
1.83
x=seq(1,15)
Example
y=qnorm((x-0.5)/15)) # calculates Φ-1((i-0.5)/15), theoretical quantiles
o=c(173,174,175,180,181,182,182,184,185,189,191,195,196,197,200)
plot(y,o,xlab="Theoretical quantiles",ylab="Sample quantiles")
Example
Alternatively, use R command qqnorm
skull.lhasa = c(182,180,191,184,181,173,189,175,196,200,185,174,195,197,182)
qqnorm(skull.lhasa)
Example
To help check linearity of a normal scores plot, add a straight line.
‘qqline’ adds a line to a normal quantile-quantile plot which passes through the
first and third quartiles.
qqnorm(skull.lhasa)
qqline(skull.lhasa)
Further examples
Sample from a distribution with more probability in the and centre
of the distribution and less in the ‘shoulders’ than the normal
distribution
Further examples
Sample from a positively skewed distribution
Further examples
Sample from a negatively skewed distribution
Further examples
Sample from a normal distribution
8.2 Using a normal distribution as a model
when the variance 2 is known (z-test)
We wish to learn something about a population on the basis of a
random sample from that population
Why? Because it is often impractical to work with the whole
population, so we test hypotheses about the population on the basis
of a sample drawn from it
We assume here that the population may be modelled using a normal
distribution with unknown mean  but known variance 2
For example, suppose one wants to investigate IQ of students at
StAndrews. As you cannot study the whole population, you need to
measure the IQ’s of a random sample of students.
In this example, we could ask what the average IQ of StAndrews
students is, or test the hypothesis that it is greater than some value
8.2.1 Hypothesis testing: parametric approach
General approach for hypothesis testing:
1 – Define a null hypothesis H0 and an alternative hypothesis H1
2 – Choose a test statistic which will distinguish between H0 and H1
by taking ‘extreme’ values if H1 is true, and moderate values
otherwise
3 – Find the distribution of the test statistic under H0
4 – Using this distribution, determine the probability of obtaining a
test statistic at least as ‘extreme’ as the one observed under H0. This
is the p-value of the data under the test
5 – Conclude: a very low p-value suggests that H0 is false, otherwise
we cannot reject H0
8.2.1 Hypothesis testing: parametric approach
And in the particular case of a normally distributed population with
known variance?
Define x1,…,xn a set of independent observations from a population
which can be modelled by a normal distribution with unknown mean
 and known variance 2.
Using these observations, we are able to test hypotheses about the
mean  of the whole population, e.g.
1 –H0:  = 0 against H1:   0
2 – A ‘good’ test statistic is the mean of the observations xi/n
which will tend to be close to 0 under H0 but further away under H1
8.2.1 Hypothesis testing: parametric approach
1 –H0:  = 0 against H1:   0 - TWO-SIDED TEST
2 – A ‘good’ test statistic is the mean of the observations xi/n
which will tend to be close to 0 under H0 but further away under H1
3 – Distribution of the test statistic under H0 ?
8.2.1 Hypothesis testing: parametric approach
1 –H0:  = 0 against H1:   0 - TWO-SIDED TEST
2 – A ‘good’ test statistic is the mean of the observations xi/n
which will tend to be close to 0 under H0 but further away under H1
3 – Distribution of the test statistic under H0 ?
8.2.1 Hypothesis testing: parametric approach
3 – Distribution of the test statistic under H0 ?
8.2.1 Hypothesis testing: parametric approach
3 – The distribution of the test statistic under H0 is
4 - We use a significance level of  = 5%, and extreme values on
either side of the mean are of interest since H1 does not distinguish
between them
Z
-z/2
z/2
The appropriate range of values is P(-z/2  Z  z/2) = 1-=0.95 i.e
P(Z  z/2) - P(Z  - z/2) = 2P(Z  z/2) - 1 = 0.95
P(Z  z/2) = 0.975 so z/2 = 1.96
8.2.1 Hypothesis testing: parametric approach
4 - At the significance level  = 5%, the region of acceptance of H0
is [-1.96,1.96]
5 - So if the observed value of the test statistic is outside this region,
we reject H0, otherwise we cannot.
ALTERNATIVELY,
4 - We can calculate what proportion of values are at least as
improbable as the observed value under H0, i.e. the p-value.
In other words, we want to compute the p-value
P(Z -zobs or Z zobs) = 1-P(Z zobs)+1-P(Z zobs) = 2(1-(zobs))
5 - Finally, if the p-value is < 0.05, we reject H0 (outsite the
acceptance region), otherwise we cannot.
8.2.1 Hypothesis testing: parametric approach
H0:  = 0 against H1:  > 0 - ONE-SIDED TEST
z
P(Z  z) = 1- = 0.95 so z = 1.645 and accept H0
ALTERNATIVELY
P(Z  zobs) = 1-(zobs) and reject H0 if < 0.05
8.2.1 Hypothesis testing: parametric approach
H0:  = 0 against H1:  < 0 - ONE-SIDED TEST
-z
P(Z  z) = 1- = 0.95 so z = -1.645 and accept H0
ALTERNATIVELY
P(Z  zobs) = (zobs) and reject H0 if < 0.05
8.2.1 Hypothesis testing: parametric approach
Example - Tutorial 4, Question 5a
Test at the 5% level the hypothesis that the mean spinning timehas
not been affected by lubrification, against the alternative that it has
been increased. Report the p-value.
Test H0:=150 against H1:>150. We have that n=5 and Xi/n=162.
Hence:
We thus reject H0 at the 5% significance level.
The p-value is 1-(2.68) = 1- 0.9963 = 0.0037 << 0.05 so we reject
H0
8.2.2 The power of a test
There are two types of error that can be made when hypothesis
testing:
DECISION
REALITY
Accept H0
Reject H0
H0 true
OK
Type I error
H0 false
Type II error
OK
Type I error: Reject H0 when it is true; P(type I error) = 
Type II error: Accept H0 when it is false; P(type II error) = ?
We'd like to minimize the two types of error, but the problem is that
when  decreases, P(type II error) increases and vice-versa.
So set up  to a fixed value, and try to minimize P(type II error)
8.2.2 The power of a test
Definition: The power of a test is the probability of rejecting H0
when it is false.
P(type II error) = P(Accept H0 | H0 false) so power = 1 - P(type II
error).
So once  is fixed, we'll do our best to increase the power of the test.
Example: Consider H0:  = 0 vs H1: > 0
8.2.2 The power of a test
8.2.2 The power of a test
Note: If n increases, (...) decreases and the power increases; in
other words, for a given , the power increases as the sample size
increases, so your ability to detect an alternative hypothesis
8.2.3 Confidence intervals
You have already calculated confidence intervals by computerintensive methods. Here, we'll do it analytically.
An x% confidence interval for a parameter ( here) is an interval
having probability x% of including the true value of this parameter.
The parameter  being estimated is a fixed quantity, but the interval
is random.
In other words, it means that x% of intervals calculated in the same
way for similar samples will include the true value.
Regarding the example of n observations from a normal population
with known variance 2
8.2.3 Confidence intervals
Example of n observations from a normal population with known
variance 2
8.2.3 Confidence intervals
Example: The wavelengths of light pulses from a semiconductor
laser are approximately normally distributed, with variance
calculated by theory to be 100nm2. The mean wavelength for
individual lasers varies. Measurements of 100 pulses from a laser
give an average wavelength of 598nm. Find a 95% confidence
interval for the mean wavelength of the laser.
8.2.3 Confidence intervals
Example: The wavelengths of light pulses from a semiconductor laser are
approximately normally distributed, with variance calculated by theory to be
100nm2. The mean wavelength for individual lasers varies. Measurements of 100
pulses from a laser give an average wavelength of 598nm. Find a 95% confidence
interval for the mean wavelength of the laser.
9. Distributions derived from
normal distributions
In the previous section, we assume that the variance of the whole
population was known
Unlikely to be the case…
So we need methods to deal with both mean and variance of the
whole population are unknown
To develop the theory underlying such methods, we need to
introduce first some other distributions but related to the normal
distribution
Namely, the 2, t and F distributions
9.1 2 distributions
9.1 2 distributions
9.1 2 distributions
Upper quantile = value above which some specified proportion of
the area of a p.d.f. lies
9.1 2 distributions
The 5% upper quantile of a 25 is x such Pr(25  x) = 0.05
9.1 2 distributions
The 5% upper quantile of a 25 is x such Pr(25  x) = 0.05 or
alternatively Pr(25  x) = 0.95 i.e. the lower 95% quantile
9.1 2 distributions
Pr(25  x) = 0.95 (the lower 95% quantile) is obtained using the R
command: > qchisq(0.95,5) # cumulative d. f.
[1] 11.07050
9.1 2 distributions
Example: Suppose that X, Y, and Z are coordinates in 3dimensional space which are independently distributed as N(0,1),
with all measurements in cm. What is the probability that the point
(X,Y,Z) lies more than 3 cm from the origin?
9.1 2 distributions
Example: Suppose that X, Y, and Z are coordinates in 3-dimensional space which
are independently distributed as N(0,1), with all measurements in cm. What is
the probability that the point (X,Y,Z) lies more than 3 cm from the origin?
9.1 2 distributions
Example: Suppose that X, Y, and Z are coordinates in 3-dimensional space which
are independently distributed as N(0,1), with all measurements in cm. What is
the probability that the point (X,Y,Z) lies more than 3 cm from the origin?
9.1 2 distributions
9.1 2 distributions
9.1 2 distributions
9.2 The F distributions
9.2 The F distributions
The 5% upper quantile of a Fdf1,df2 is x such Pr(Fdf1,df2  x) = 0.05
Use Tables or R command qf(0.95,df1,df2) (lower 95% quantile)
9.2 The F distributions
So if we have a table with the upper quantiles, we can also get the
lower quantiles as follows.
Remember that:
Upper quantile = value above which some specified proportion
of the area of a p.d.f. lies
Lower quantile = value below which some specified proportion
of the area of a p.d.f. lies
9.2 The F distributions
So if we have a table with the upper quantiles, we can also get the
lower quantiles as follows.
9.2 The F distributions
So if we have a table with the upper quantiles, we can also get the
lower quantiles as follows.
i.e. upper (1-) quantile of Fn,k or lower  quantile of Fn,k is the
inverse of the upper  quantile of the Fk,n
9.2 The F distributions
Example: Given that F3,2;0.025 = 39.17, find F2,3;0.975 (i.e. lower 0.025
= 1-0.975 quantile of the F2,3 distribution)
F2,3;0.975 = 1/ F3,2;0.025 = 1/39.17 = 0.0255
9.2 The F distributions
Example: Given that F3,2;0.025 = 39.17, find F2,3;0.975 (i.e. lower 0.025
= 1-0.975 quantile of the F2,3 distribution)
F2,3;0.975 = 1/ F3,2;0.025 = 1/39.17 = 0.0255
R commands
> par(mfrow=c(2,1))
> plot(x,df(x,2,3),xlab="",ylab="",type='l')
> title("pdf F(2,3)")
> plot(x,df(x,3,2),xlab="",ylab="",type='l')
> title("pdf F(3,2)")
9.3 The t distributions
9.3 The t distributions
The shape of the p.d.f. of tn depends on n
9.3 The t distributions
Looks like a normal distribution, but more of the probability is in
the centre and the tails, see the graph for t1 e.g. (top left)
9.3 The t distributions
9.3 The t distributions
tn; is the upper  quantile of the t distribution with n degrees of
freedom
9.3 The t distributions
Use tables or R, e.g. qt(0.95,30) (=1.859548) gives the lower
95% quantile of the t distribution with 8 degrees of freedom
(upper 5% quantile) (qt(0.95,5000) = 1.645158…)
10 Using t distributions
To derive the distribution of the statistic testing hypotheses about
the mean of a normal population with unknown variance, we
need a key result on the joint distribution of the sample mean and
the sample variance
Remember that:
10 Using t distributions
To derive the distribution of the statistic testing hypotheses about
the mean of a normal population with unknown variance, we
need a key result on the joint distribution of the sample mean and
the sample variance
10 Using t distributions
The quantity T depends on the population mean  but not on the
unknown variance 2.
So this statistic will be useful to test hypotheses about the mean
population of normal populations with unknown variance
10.2 One-sample t-tests and
confidence intervals
One sample t-tests:
39 observations on pulse rates (heart beats/minute) of Indigenous
Peruvians had sample mean 70.31 and sample variance 90.219.
We assume normality.
Question: at the 1% significance level, could this data set be
considered as a random sample from a population with mean 75.
In other words (Step 1 of hypothesis testing strategy):
H0:  = 75 against H1  75
Your turn. Perform step 2 (find a ‘good test statistic’) and step 3
(derive its distribution)
10.2 One-sample t-tests and
confidence intervals
One sample t-tests:
39 observations on pulse rates (heart beats/minute) of Indigenous
Peruvians had sample mean 70.31 and sample variance 90.219.
Step 1: H0:  = 75 against H1  75
Step 2: Xi/n - 0 is a good candidate since it takes ‘extreme’
values if H1 is true, and moderate values if H0 is true.
Step 4: it’s a 2-sided test, so we will reject H0 if
tobs  –tn-1;/2 or tobs  tn-1;/2 (graphical representation)
10.2 One-sample t-tests and
confidence intervals
One sample t-tests:
39 observations on pulse rates (heart beats/minute) of Indigenous
Peruvians had sample mean 70.31 and sample variance 90.219.
Step 1: H0:  = 75 against H1  75
Step 2: Xi/n - 0 is a good candidate since it takes ‘extreme’
values if H1 is true, and moderate values if H0 is true.
If one-sided test, H1: <0, we reject if tobs  –tn-1;
If one-sided test, H1: >0, we reject if tobs  tn-1;
10.2 One-sample t-tests and
confidence intervals
One sample t-tests:
39 observations on pulse rates (heart beats/minute) of Indigenous
Peruvians had sample mean 70.31 and sample variance 90.219.
So we will reject if tobs  2.7045 or if tobs  -2.7045
P-value using R:
> 2*pt(tobs,38) # (tobs<0 so need to double the c.d.f. of tobs – 2-sided test)
> 0.003799049
10.2 One-sample t-tests and
confidence intervals
Confidence interval:
39 observations on pulse rates (heart beats/minute) of Indigenous
Peruvians had sample mean 70.31 and sample variance 90.219.
We’d like to build up a 99% confidence interval for , we’re
looking for values of  for which we would accept H0
We know that:
10.2 One-sample t-tests and
confidence intervals
Confidence interval:
39 observations on pulse rates (heart beats/minute) of Indigenous
Peruvians had sample mean 70.31 and sample variance 90.219.
So we would accept any value of  such that
75 is outside the confidence interval, so we would reject H0 at the 1% significance level
10.2 One-sample t-tests and
confidence intervals
Confidence interval:
With R, a 95% confidence interval is obtained as follows:
> cil = 70.31 + qt(0.975,38)*90.219/sqrt{39}
> cil = 70.31 - qt(0.975,38)*90.219/sqrt{39}
> c(cil,ciu)
> [1] 67.23099 73.38901
And the 99% confidence interval is obtained as
> c(70.31 + qt(0.995,38)*90.219/sqrt{39}, 70.31 + qt(0.995,38)*90.219/sqrt{39}
10.3 Paired t-tests
Consider two samples of observations (Xi,Yi)
Consider the case: the two measurements (Xi,Yi) are made on the
same unit i
We wish to test if the two population means are equal
Example: measurement of left and right wing length of birds
Should not be treated as independent!!!!!
Obviously, length of left wing and length of right wing both tend to
be large for large birds: dependent measurements
Idea: work with the differences between the two measurement on
each unit, i.e. Xi-Yi, in order to go back to a one-sample t-test e.g.
10.3 Paired t-tests
Example: corneal thickness in microns for both eyes of patients
who have glaucoma in one eye
Glaucoma 488 478 480 426 440 410 458 460
Healthy
484 478 492 444 436 398 464 476
Obviously, the corneal thickness is likely to be similar in the two
eyes of any patient – dependent observations
Consider di = glaucomai – healthyi. We will assume that this new
random sample is drawn from a normal distribution N(d,2), and
we wish to test: H0: d=0 vs H1: d0
di = -32 ; di2 = 936 and
10.3 Paired t-tests
Example: corneal thickness in microns for both eyes of patients
who have glaucoma in one eye
H0: d=0 vs H1: d0
di = -32 ; di2 = 936, s2 = 115.43 and t7;0.025 = 2.3646 (see Tables)
tobs > - t7;0.025 and tobs < t7;0.025 meaning that tobs is in the region of
acceptance of H0
t
-t/2
t/2
10.3 Paired t-tests
Example: corneal thickness in microns for both eyes of patients
who have glaucoma in one eye
H0: d=0 vs H1: d0
di = -32 ; di2 = 936, s2 = 115.43 and t7;0.025 = 2.3646 (see Tables)
tobs > - t7;0.025 and tobs < t7;0.025 meaning that tobs is in the region of
acceptance of H0
At the 5% significance level, we fail to reject H0, so there is
apparently no difference between the good eye and the diseased eye
10.4 Two-sample t-tests
Now, we want to deal with two sets of data and compare, e.g., their
means
We consider that the two random samples are drawn from normal
distributions with unknown but same variances.
More formally
10.4 Two-sample t-tests
We consider that the two random samples are drawn from normal
distributions with unknown but same variances.
We know that the distributions of the sample means of the two
samples are:
so that (using results on sums of normal r.v’s)
As usual, we’d like to relate this distribution to a standard normal
random variable…
10.4 Two-sample t-tests
We consider that the two random samples are drawn from normal distributions
with unknown but same variances.
We have that:
Obviously, if we assume that  is known, we can test hypotheses
about the difference in means between the two groups (see the onesample case – z-test).
But we assume that  is unknown. So we need to do again what
we’ve done for the t-test (one-sample test about the mean with
unknown variance).
10.4 Two-sample t-tests
More precisely, first find the distribution of:
We note that:
where
10.4 Two-sample t-tests
Similarly, we have that:
where
10.4 Two-sample t-tests
Putting the two latter results together, we have that, using the
additivity of 2 r.v’s:
Note that the above quantity can be written as
where:
is called the pooled sample variance.
10.4 Two-sample t-tests
Remember that we have:
10.4 Two-sample t-tests
So let the test statistic T be
which is actually the ratio of following distributions:
i.e. a t distribution with n+m-2 degrees of freedom!
10.4 Two-sample t-tests
Now we can see that T can be re-written as follows:
or:
The quantity T depends on the population means X and Y but not
on the unknown variance 2.
This statistic is thus useful to test hypotheses about the difference in
means between the 2 populations.
10.4 Two-sample t-tests
Example: Consider two random samples from 2 normal
distributions:
x = 11 10 14 12 13 and y = 8 3 4 9
Test the hypothesis that the two population means are equal against
the alternative hypothesis that they are not.
10.4 Two-sample t-tests
Example: Consider two random samples from 2 normal
distributions:
x = 11 10 14 12 13 and y = 8 3 4 9
Test the hypothesis that the two population means are equal against
the alternative hypothesis that they are not.
We wish to test H0: X = Y against H1: X  Y
s2 = (10 + 26) / 7 = 36 / 7, and  xi/n = 12,  yj/m = 6
There is evidence to reject H0 at the 5% significance level.
In other words, the two population means are different
10.4 Two-sample t-tests
Using R:
> x=c(11,10,14,12,13)
> y=c(8,3,4,9)
> # pooled standard deviation:
> pooledsd=sqrt(((5-1)*var(x)+(4-1)*var(y))/(5+4-2))
> # observed value of the test statistic:
> tobs=(mean(x)-mean(y))/(pooledsd*sqrt(1/5+1/4))
> tobs
[1] 3.944053
> # p-value of the 2-sided test
> 2*(1-pt(tobs,5+4-2))
[1] 0.005574311
10.5 Testing equality of variances
Motivation: to apply the two-sample t-test of Section 10.4, we need
to check that the two samples come from normal distributions with
same variance
Consider X1,…,Xn and Y1,…,Ym two random samples drawn from
normal distributions. We also assume independence.
Let 2X and 2Y be the population variances of the two random
samples.
Remember the strategy of hypothesis testing:
Step 1: We wish to test H0: 2X = 2Y vs H1: 2X  2Y
Step 2: We need to find a ‘good’ test statistic, i.e. a function of the
data that takes ‘extreme’ values if H1 is true, and moderate values if
H0 is true.
10.5 Testing equality of variances
We’ve seen that:
So what about the ratio:
?????
10.5 Testing equality of variances
If you work it out a little bit, you get under H0: 2X = 2Y = 2, the
following test statistic:
Under the null hypothesis the terms involving 2 cancel.
If the alternative hypothesis is true, i.e. if 2X  2Y, then the value
of the test statistic above will be small or large depending on
whether 2X < 2Y or 2X > 2Y.
10.5 Testing equality of variances
Step 3: Now we need the distribution of this test statistic under H0.
By definition of an F distribution, we have that:
that is:
or
using the main property of F distributions.
10.5 Testing equality of variances
Step 4: We will reject the null hypothesis if the observed value of
this test statistic is greater than the upper quantile of the appropriate
F distribution (using the Tables or program R).
Note that it is enough to compare the larger of the two test statistics
describes on the previous slide with the upper quantile of the
appropriate distribution.
Example: consider two examples, one of size 11 and the other of
size 16 from two normal distributions. The sample variance of the
first is 20 and the sample variance of the second is 30. At the 5%
level, is there evidence to reject the hypothesis that the two
populations have the same variance? Note that F15,10;0.025=3.522
10.5 Testing equality of variances
Example: consider two examples, one of size 11 and the other of size 16 from
two normal distributions. The sample variance of the first is 20 and the
sample variance of the second is 30. At the 5% level, is there evidence to
reject the hypothesis that the two populations have the same variance? Note
that F15,10;0.025=3.522
1) We wish to test 2X = 2Y vs H1: 2X  2Y , where X has sample
size and Y has sample size 16, with respectively s2X=20 and
s2Y=30. This is a test of equality of variances.
2) To perform it, we calculate the observed value of the test
statistic (the largest one): fobs = s2Y/s2X = 30/20 = 1.5
3) We need to compare this observed value to the 2.5% upper
quantile of an F distribution with 15 and 10 degrees of freedom,
i.e. F15,10;0.025 which is equal to 3.522
10.5 Testing equality of variances
Example: consider two examples, one of size 11 and the other of size 16 from
two normal distributions. The sample variance of the first is 20 and the
sample variance of the second is 30. At the 5% level, is there evidence to
reject the hypothesis that the two populations have the same variance? Note
that F15,10;0.025 = 3.522
4) fobs = 1.5 < F15,10;0.025 = 3.522
5) So there is no evidence to reject the null hypothesis. We fail to
reject the equality of variances.
Note: We might now consider testing whether the two population
means are different.
Sections 10.4 and 10.5 might be useful for Prac3 (two random
samples from the same normal distribution?)
Examples
Example 1:
The data are the number of moths caught during the night by 11
traps of one style and 8 traps of a second style.
Trap type 1: 41 34 33 36 40 25 31 37 34 30 38
Trap type 2: 52 57 62 55 64 57 56 55
We assume that the two samples of measurements are taken at
random from a normal population, and we ask if the variances
of the two populations are equal (5% significance level).
Examples
Solution - Example 1:
We wish to test H0 12 = 22 vs H1: 12  22 (test of equality of
variances)
We have that n1 = 11, n2 = 8, so 1=10 and 2=7
s12=21.87 and s22=15.36
So Fobs = s12/s22 = 21.87/15.36=1.42
F10,7;0.025=4.76
Fobs<F10,7;0.025, therefore we cannot reject H0 at the 5% level.
Examples
Example 2:
The data are human blood-clotting times (in minutes) of individuals
given one of two different drugs.
Given drug B: 8.8 8.4 7.9 8.7 9.1 9.6
Given drug G: 9.9 9.0 11.1 9.6 8.7 10.4 9.5
We assume that the two samples of measurements are taken at
random from a normal population, and we ask if blood of
persons treated with drug B has the same mean clotting time as
does blood from persons treated with drug G (5% significance
level).
Examples
Solution - Example 2:
We wish to test H0 1 = 2 vs H1: 1  2 (test of equality of means)
We have that n1 = 6, n2 = 7, (n1-1)s12=1.6950, (n2-1)s22=4.0171. So
s2 = (1.6950+4.0171)/(6+7-2)=0.5193
We get tobs = (8.75-9.74)/(0.40)=-2.475
t11;0.025=2.201
tobs<-t11;0.025, therefore reject H0 at the 5% level.
The p-value of this test is P(T<-2.475)=1-F(2.475) where F cdf of
the t-distribution, and 1-F(2.475) = P(T>2.475) which lies
between 0.025 and 0.01. We double those values (2-sided test),
and we find that the p-value is between 0.05 and 0.02, so we
reject H0 at the 5% significance level.
Examples
Example 3:
The data are weight changes of humans, tabulated after
administration of a drug proposed to result in weight loss. Each
weight change (in kg) is the weight after minus the weight
before drug administration.
Data: 0.2 -0.5 -1.3 -1.6 -0.7 0.4 -0.1 0.0 -0.6 -1.1 -1.2 -0.8
We assume that the sample of measurements is taken at random
from a normal population, and we ask whether a weight loss
occurs after the drug is taken (5% significance level).
Examples
Solution - Example 3:
We wish to test H0 1 = 0 vs H1: 1 < 0 (test of equality of means)
We have that n = 12, s2=0.4008,  Xi/n=-0.61.
We get tobs = (-0.61-0)/(0.4008/12)=-0.61/0.18=-3.389
t11;0. 05=1.7959
tobs<-t11;0.05, therefore reject H0 at the 5% level.
The p-value of this test is P(T<-3.389)=1-F(3.389) where F cdf of
the t-distribution, and 1-F(3.389) = P(T>3.389) which lies
between 0.005 and 0.001, so we reject H0 at the 5% significance
level.
Examples
Example 4:
We consider an experiment designed to test whether a new fertilizer
results in an increase of more than 250kg/ha in crop yield over
the old fertilizer. 9 pairs of test plots were set up with same
environmental conditions (paired samples…).
New fertilizer: 2250 2410 2260 2200 2360 2320 2240 2300 2090
Old fertilizer: 1920 2020 2060 1960 1960 2140 1980 1940 1790
Test the hypothesis at the 5% significance level.
Examples
Solution - Example 4:
We first consider the differences di=newi-oldi
dj: 330 390 200 240 400 180 260 360 300
We wish to test H0 d = 0 vs H1: d > 0 (test of equality of means for
paired samples)
We have that n = 9, s=80.6,  di/n=295.6.
We get tobs = (295.6-250)/(80.6/(12))=1.695
t8;0. 05=1.8595
tobs<t8;0.05, therefore do not reject H0 at the 5% level.
The p-value of this test is P(T>1.695) which lies between 0.1 and
0.05, so we cannot reject H0 at the 5% significance level.