Download Notes 8 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Monte Carlo method wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
Statistics 512 Notes 8: The Monte Carlo Method
The t-test
Let X 1 , , X n be iid with mean  and unknown
distribution. Consider the hypotheses
H 0 :   0 vs. H1 :   0
If the distribution of the X i is normal (with unknown
variance), then a test with exact size 0.05 is to use the test
statistic
X  0
t
S
.
n
and the rejection region {t : t  t ,n 1} [where t ,n 1 is the
(1   ) quantile of the t-distribution with n-1 degrees of
freedom, i.e.,   P(T  t ,n 1 ) ]. This is called the t-test.
When the distribution of X i is normal, the test has exact
X  0
t

S
size  because when   0 ,
has a tn
distribution with n-1 degrees of freedom.
When the distribution of X i is not normal, the test does not
necessarily have exact size 0.05. However, as for large n,




X


X


0
0
P0 
 t ,n 1   P0 
 z   P  Z  z   
 S

 S

n
n




because of the Central Limit Theorem so that the t-test has
approximate size 0.05 for large samples for any distribution
of X i .
Note the difference between the rejection rule
{t : t  t ,n 1} and {t : t  z } . The large sample
{t : t  z } has approximate size  , while {t : t  t ,n 1} has
exact size  . Of course, we now have to assume that
X i has a normal distribution. In practice, we may not be
willing to assume that the population is normal. In general
t-critical values are larger than z critical values (i.e.,
t ,n 1  z ) so the t-test is conservative relative to the large
sample test. So in practice, many statisticians often use the
t-test even if they do not believe the data is normally
t ,n 1  z .
distributed. Note that lim
n 
How well does the t-test work in moderate sized samples
when the data is not normal, i.e., what is its true size in
moderate sized samples?
Example 5.8.5: Consider the following contaminated
normal distribution: 75% of the time an observation is
generated by a standard normal distribution while 25% of
the time it is generated by a normal distribution with mean
0 and standard deviation 25. We call this distribution
contaminated normal distribution A. Suppose a random
sample of size 20 is generated from contaminated normal
distribution A. The mean of X i is 0 so H 0 is true.
What is the true size of using the nominal size 0.05 t-test
(reject the null hypothesis when t  t.05,19  1.729 which
would have size 0.05 for a normal distribution) for random
samples of size 20 contaminated normal distribution A?
Let f ( x ) denote the density of the contaminated normal
X
t ( X1, , X n ) 
S
distribution A and let
.
n
The true size of the t-test for contaminated normal
distribution A is
  I{t ( x1 , , x20 )  1.729} f ( x1 ) f ( x20 )dx1 dx20 (1)
where I{t ( x1 , , x20 )  1.729} =0 if t ( x1 , , x20 )  1.729 and
0 otherwise. We can write (1) as
E[ I{t ( x1 , , x20 )  1.729}]
where the expectation is with respect to random samples
from contaminated normal distribution A.
The Monte Carlo method:
Consider a function g ( X ) of a random vector X where
X has density f ( X ) . Consider the expected value of
g( X ) :
E[ g ( X )]   g ( x ) f ( x )dx .
Suppose we take an iid random samples X 1 ,
density f ( X ) .
Then by the law of large numbers

n
g( Xi )
, X n from the
P
 E[ g ( X )]
n
The Monte Carlo method is to estimate E[ g ( X )] by
i 1
Eˆ [ g ( X )] 

n
i 1
g( Xi )
n
Standard error of the estimate is
2
n

g( Xi ) 
n

i

1
 i 1  g ( X i )  n 


S Eˆ [ g ( X )] 
n
By the Central Limit Theorem, an approximate 95%
confidence interval for E[ g ( X )] is
Eˆ [ g ( X )]  1.96SEˆ [ g ( X )]
Example: Monte Carlo estimation of 
Define the unit square as a square centered at (0.5,0.5) with
sides of length 1 and the unit circle as the circle centered at
the origin with a radius of length 1. The ratio of the area of
the unit circle that lies in the first quadrant to the area of the
unit square is  / 4 .
Let U1 and U 2 be iid uniform (0,1) random variables. Let
g (U1 ,U 2 ) =1 if (U1 ,U 2 ) is in the unit circle and 0

otherwise. Then E[ g (U1 ,U 2 )]  4 .
Monte Carlo method: Repeat the experiment of drawing
U1 and U 2 be iid uniform (0,1) random variables n times
and estimate  by

4
n
i 1
g (U i1 ,U i 2 )
n
In R, the command runif(n) draws n iid uniform (0,1)
random variables.
Here is a function for estimating pi
piest=function(n){
#
# Obtains the estimate of pi and its standard
# error for the simulation discussed in Example 5.8.1
#
# n is the number of simulations
#
u1=runif(n);
u2=runif(n);
cnt=rep(0,n);
chk=u1^2+u2^2-1;
cnt[chk<0]=1;
est=4*mean(cnt);
se=4*sqrt(est*(1-est)/n);
list(estimate=est,standard=se);
}
Back to Example 5.8.5:
The true size of the 0.05 nominal size t-test ) for random
samples of size 20 contaminated normal distribution A?
We want to estimate
E[ I{t ( x1 , , x20 )  1.729}]
Monte Carlo method:
Eˆ [ I {t ( x1 ,
, x20

)  1.729}] 
n
i 1
I {t ( xi ,1 ,
, xi ,20 )  1.729}
n
where ( xi ,1 , , xi ,20 ) is a random sample of size 20 from the
contaminated normal distribution A.
How to draw a random observation from the contaminated
normal distribution A?
(1) Draw a Bernoulli random variable B with p=0.75;
(2) If B=1, draw a random observation from the
standard normal distribution. If B=0, draw a
random observation from the normal distribution
with mean 0 and standard deviation 25.
In R, the command rnorm(n,mean=0,sd=1) draws a random
sample of size n from the normal distribution with the
specified mean and SD. The command rbinom(n,size=1,p)
draws a random sample of size n from Bernoulli
distribution with probability of success p.
R function for obtaining Monte Carlo estimate
Eˆ[ I{t ( x1, , x20 )  1.729}]
empalphacn=function(nsims){
#
# Obtains the empirical level of the test discussed in
# Example 5.8.5
#
# nsims is the number of simulations
#
sigmac=25;
eps=.25;
alpha=.05;
n=20;
tc=qt(1-alpha,n-1);
ic=0;
for(i in 1:nsims){
samp=rcn(n,eps,sigmac);
ttest=(sqrt(n)*mean(samp))/var(samp)^.5;
if(ttest>tc){
ic=ic+1;
}
empalp=ic/nsims;
err=1.96*sqrt((empalp*(1-empalp))/nsims);
list(empiricalalpha=empalp,error=err);
}
Generating random observations with given cdf F
Theorem 5.8.1: Suppose the random variable U has a
uniform (0,1) distribution. Let F be the cdf of a random
variable with a continuous distribution function. Then the
1
random variable X  F (U ) has cdf F.