* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Notes 8 - Wharton Statistics
Survey
Document related concepts
Transcript
Statistics 512 Notes 8: The Monte Carlo Method The t-test Let X 1 , , X n be iid with mean and unknown distribution. Consider the hypotheses H 0 : 0 vs. H1 : 0 If the distribution of the X i is normal (with unknown variance), then a test with exact size 0.05 is to use the test statistic X 0 t S . n and the rejection region {t : t t ,n 1} [where t ,n 1 is the (1 ) quantile of the t-distribution with n-1 degrees of freedom, i.e., P(T t ,n 1 ) ]. This is called the t-test. When the distribution of X i is normal, the test has exact X 0 t S size because when 0 , has a tn distribution with n-1 degrees of freedom. When the distribution of X i is not normal, the test does not necessarily have exact size 0.05. However, as for large n, X X 0 0 P0 t ,n 1 P0 z P Z z S S n n because of the Central Limit Theorem so that the t-test has approximate size 0.05 for large samples for any distribution of X i . Note the difference between the rejection rule {t : t t ,n 1} and {t : t z } . The large sample {t : t z } has approximate size , while {t : t t ,n 1} has exact size . Of course, we now have to assume that X i has a normal distribution. In practice, we may not be willing to assume that the population is normal. In general t-critical values are larger than z critical values (i.e., t ,n 1 z ) so the t-test is conservative relative to the large sample test. So in practice, many statisticians often use the t-test even if they do not believe the data is normally t ,n 1 z . distributed. Note that lim n How well does the t-test work in moderate sized samples when the data is not normal, i.e., what is its true size in moderate sized samples? Example 5.8.5: Consider the following contaminated normal distribution: 75% of the time an observation is generated by a standard normal distribution while 25% of the time it is generated by a normal distribution with mean 0 and standard deviation 25. We call this distribution contaminated normal distribution A. Suppose a random sample of size 20 is generated from contaminated normal distribution A. The mean of X i is 0 so H 0 is true. What is the true size of using the nominal size 0.05 t-test (reject the null hypothesis when t t.05,19 1.729 which would have size 0.05 for a normal distribution) for random samples of size 20 contaminated normal distribution A? Let f ( x ) denote the density of the contaminated normal X t ( X1, , X n ) S distribution A and let . n The true size of the t-test for contaminated normal distribution A is I{t ( x1 , , x20 ) 1.729} f ( x1 ) f ( x20 )dx1 dx20 (1) where I{t ( x1 , , x20 ) 1.729} =0 if t ( x1 , , x20 ) 1.729 and 0 otherwise. We can write (1) as E[ I{t ( x1 , , x20 ) 1.729}] where the expectation is with respect to random samples from contaminated normal distribution A. The Monte Carlo method: Consider a function g ( X ) of a random vector X where X has density f ( X ) . Consider the expected value of g( X ) : E[ g ( X )] g ( x ) f ( x )dx . Suppose we take an iid random samples X 1 , density f ( X ) . Then by the law of large numbers n g( Xi ) , X n from the P E[ g ( X )] n The Monte Carlo method is to estimate E[ g ( X )] by i 1 Eˆ [ g ( X )] n i 1 g( Xi ) n Standard error of the estimate is 2 n g( Xi ) n i 1 i 1 g ( X i ) n S Eˆ [ g ( X )] n By the Central Limit Theorem, an approximate 95% confidence interval for E[ g ( X )] is Eˆ [ g ( X )] 1.96SEˆ [ g ( X )] Example: Monte Carlo estimation of Define the unit square as a square centered at (0.5,0.5) with sides of length 1 and the unit circle as the circle centered at the origin with a radius of length 1. The ratio of the area of the unit circle that lies in the first quadrant to the area of the unit square is / 4 . Let U1 and U 2 be iid uniform (0,1) random variables. Let g (U1 ,U 2 ) =1 if (U1 ,U 2 ) is in the unit circle and 0 otherwise. Then E[ g (U1 ,U 2 )] 4 . Monte Carlo method: Repeat the experiment of drawing U1 and U 2 be iid uniform (0,1) random variables n times and estimate by 4 n i 1 g (U i1 ,U i 2 ) n In R, the command runif(n) draws n iid uniform (0,1) random variables. Here is a function for estimating pi piest=function(n){ # # Obtains the estimate of pi and its standard # error for the simulation discussed in Example 5.8.1 # # n is the number of simulations # u1=runif(n); u2=runif(n); cnt=rep(0,n); chk=u1^2+u2^2-1; cnt[chk<0]=1; est=4*mean(cnt); se=4*sqrt(est*(1-est)/n); list(estimate=est,standard=se); } Back to Example 5.8.5: The true size of the 0.05 nominal size t-test ) for random samples of size 20 contaminated normal distribution A? We want to estimate E[ I{t ( x1 , , x20 ) 1.729}] Monte Carlo method: Eˆ [ I {t ( x1 , , x20 ) 1.729}] n i 1 I {t ( xi ,1 , , xi ,20 ) 1.729} n where ( xi ,1 , , xi ,20 ) is a random sample of size 20 from the contaminated normal distribution A. How to draw a random observation from the contaminated normal distribution A? (1) Draw a Bernoulli random variable B with p=0.75; (2) If B=1, draw a random observation from the standard normal distribution. If B=0, draw a random observation from the normal distribution with mean 0 and standard deviation 25. In R, the command rnorm(n,mean=0,sd=1) draws a random sample of size n from the normal distribution with the specified mean and SD. The command rbinom(n,size=1,p) draws a random sample of size n from Bernoulli distribution with probability of success p. R function for obtaining Monte Carlo estimate Eˆ[ I{t ( x1, , x20 ) 1.729}] empalphacn=function(nsims){ # # Obtains the empirical level of the test discussed in # Example 5.8.5 # # nsims is the number of simulations # sigmac=25; eps=.25; alpha=.05; n=20; tc=qt(1-alpha,n-1); ic=0; for(i in 1:nsims){ samp=rcn(n,eps,sigmac); ttest=(sqrt(n)*mean(samp))/var(samp)^.5; if(ttest>tc){ ic=ic+1; } empalp=ic/nsims; err=1.96*sqrt((empalp*(1-empalp))/nsims); list(empiricalalpha=empalp,error=err); } Generating random observations with given cdf F Theorem 5.8.1: Suppose the random variable U has a uniform (0,1) distribution. Let F be the cdf of a random variable with a continuous distribution function. Then the 1 random variable X F (U ) has cdf F.