Download Lab 1:

Lab 1: Note: This lab is repetitive. The same tasks are performed, each time with a new distribution. You might want to set this up in a script so that you can easily use copy and paste for each subsequent part. The Normal Distribution Simulate 1000 samples of size N=50 from the standard normal distribution. Organize them into a 50 x 1000 array. dat <- rnorm(50*1000,0,1) dim(dat) <- c(50,1000) N <- 50 1a) Plot a histogram and density curve of the 50,000 data. hist(dat,freq=FALSE) den <- density(dat) lines(den) Using apply, calculate the 1000 means of each sample mean.vec <- apply(dat,2,mean) 1b) Plot a histogram and empirical density curve of the 1,000 sample means. hist(mean.vec,freq=FALSE) den <- density(mean.vec) lines(mean.vec) 1c) Calculate the theoretical standard deviation of the mean for N=50. Overlay the histogram from 1b) with the theoretical normal distribution of the sample means. lines(den$x,dnorm(den$x,0, sqrt(1/N)),col=”red”) Using apply, calculate the sd of each sample, and calculate the t-statistic for each sample. sd.vec <- apply(dat,2,sd) t.vec <- (mean.vec-0)/(sd.vec/sqrt(N)) Verify that your t-statistic matches the one returned by the built-in function. 1d) What fraction of the 1000 tests would reject the null hypothesis of mu=0 with size alpha=.1? With alpha = .05? sum(abs(t.vec)>qt(.975,N-1))/1000 1e) What fraction of 90% confidence are below the true mean? Above the true mean? below.lower <- sum(0<mean.vec-qt(.95,N-1)*sd.vec/sqrt(N)) below.lower above.upper <- sum(0>mean.vec-qt(.05,N-1)*sd.vec/sqrt(N)) above.upper (below.lower + above.upper)/1000 Lognormal Distribution with small N 2) Simulate 1000 samples of size N=30 from the lognormal distribution with parameters mu=0 and sigma=1. Repeat 1a)-1e) with this distribution. For the null hypothesis, however, do not use Note, you will need to know the true mean and standard deviation of the lognormal distribution. You can find the formula for the mean and variance of a lognormal on the help page for the lognormal distribution: help(Lognormal) Lognormal Distribution with large N 3) Simulate 1000 samples of size N=500 from the lognormal distribution with mu=0 and sigma = 1. Repeat 1a)-1e) with this distribution. Comment on the effect of skewness in the t-test under different sample sizes. Is the effect of skewness the same for both sides of the confidence interval? How so? Spatial data with normal distribution 4) Create a spatially correlated data set. You may just copy and paste the call to grf below. Later in the course, you will know what all of this means. library(geoR) sim1<grf(n=100,grid="reg",cov.pars=c(1,.25),cov.model="gaussian” ,nsim=1) image(sim1) Create a spatially uncorrelated data set. sim2<grf(n=100,grid="reg",cov.pars=c(1,.0),cov.model="gaussian”, nsim=1) image(sim2) Visually, what is the difference between spatially correlated and spatially uncorrelated data? 5) Now simulate 1000 spatial random fields using sim3<grf(n=100,grid="reg",cov.pars=c(1,.25),cov.model="gaussian” ,nsim=1000) The data are stored in a 100 x 1000 array sim3$data. Repeat 1a) through 1e) using these 1000 simulations. 6f) what is the standard deviation of the 1000 sample means? What sample size of independent data would give you a sample mean with this standard error? Comment on the effect of spatial correlation on traditional statistical inference. Bootstrap of Lognormal Distribution with small n 6) We will return to problem 3). The problem is that the t-statistic does not have tdistribution. I have written a function here that calculates the t-statistic, performs bootstrapping, and then returns quantiles of that t-statistic. # bootstrap function boot.t.quantile <- function(x,B,alpha){ # x = data # B=number of bootstrap repetitions # alpha = vector of quantiles, e.g. c(.05,.95) boot.dat <- sample(x,length(x)*B,replace=TRUE) dim(boot.dat) <- c(length(x),B) mx <- mean(x) boot.mean <- apply(boot.dat,2,mean) boot.std <- apply(boot.dat,2,sd) boot.se <- boot.std / sqrt(length(x)) boot.t <- (boot.mean - mx)/ boot.se boot.t.quantile <- quantile(boot.t,alpha,names=FALSE) } Now simulate data, and use the bootstrap function to return t-quantiles for each simulation. N<-30 dat <- rlnorm(N*1000,0,1) dim(dat) <- c(N,1000) # Create empty storage boot.t <- seq(from=0,to=0,length=2000) dim(boot.t) <- c(1000,2) for (i in 1:1000){ boot.t[i,] <- boot.t.quantile(dat[,i],200,c(.05,.95)) } Now use these t-quantiles instead of those from the t-distribution to calculate 90% confidence intervals. 1e) What fraction of 90% confidence are below the true mean? Above the true mean?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lab 1: