Download Lab 1:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lab 1:
Note: This lab is repetitive. The same tasks are performed, each time with a new
distribution. You might want to set this up in a script so that you can easily use copy and
paste for each subsequent part.
The Normal Distribution
Simulate 1000 samples of size N=50 from the standard normal distribution. Organize
them into a 50 x 1000 array.
dat <- rnorm(50*1000,0,1)
dim(dat) <- c(50,1000)
N <- 50
1a) Plot a histogram and density curve of the 50,000 data.
hist(dat,freq=FALSE)
den <- density(dat)
lines(den)
Using apply, calculate the 1000 means of each sample
mean.vec <- apply(dat,2,mean)
1b) Plot a histogram and empirical density curve of the 1,000 sample means.
hist(mean.vec,freq=FALSE)
den <- density(mean.vec)
lines(mean.vec)
1c) Calculate the theoretical standard deviation of the mean for N=50. Overlay the
histogram from 1b) with the theoretical normal distribution of the sample means.
lines(den$x,dnorm(den$x,0, sqrt(1/N)),col=”red”)
Using apply, calculate the sd of each sample, and calculate the t-statistic for each sample.
sd.vec <- apply(dat,2,sd)
t.vec <- (mean.vec-0)/(sd.vec/sqrt(N))
Verify that your t-statistic matches the one returned by the built-in function.
1d) What fraction of the 1000 tests would reject the null hypothesis of mu=0 with size
alpha=.1? With alpha = .05?
sum(abs(t.vec)>qt(.975,N-1))/1000
1e) What fraction of 90% confidence are below the true mean? Above the true mean?
below.lower <- sum(0<mean.vec-qt(.95,N-1)*sd.vec/sqrt(N))
below.lower
above.upper <- sum(0>mean.vec-qt(.05,N-1)*sd.vec/sqrt(N))
above.upper
(below.lower + above.upper)/1000
Lognormal Distribution with small N
2) Simulate 1000 samples of size N=30 from the lognormal distribution with parameters
mu=0 and sigma=1. Repeat 1a)-1e) with this distribution. For the null hypothesis,
however, do not use Note, you will need to know the true mean and standard deviation of
the lognormal distribution. You can find the formula for the mean and variance of a
lognormal on the help page for the lognormal distribution:
help(Lognormal)
Lognormal Distribution with large N
3) Simulate 1000 samples of size N=500 from the lognormal distribution with mu=0 and
sigma = 1. Repeat 1a)-1e) with this distribution.
Comment on the effect of skewness in the t-test under different sample sizes. Is the
effect of skewness the same for both sides of the confidence interval? How so?
Spatial data with normal distribution
4) Create a spatially correlated data set. You may just copy and paste the call to grf
below. Later in the course, you will know what all of this means.
library(geoR)
sim1<grf(n=100,grid="reg",cov.pars=c(1,.25),cov.model="gaussian”
,nsim=1)
image(sim1)
Create a spatially uncorrelated data set.
sim2<grf(n=100,grid="reg",cov.pars=c(1,.0),cov.model="gaussian”,
nsim=1)
image(sim2)
Visually, what is the difference between spatially correlated and spatially uncorrelated
data?
5) Now simulate 1000 spatial random fields using
sim3<grf(n=100,grid="reg",cov.pars=c(1,.25),cov.model="gaussian”
,nsim=1000)
The data are stored in a 100 x 1000 array sim3$data.
Repeat 1a) through 1e) using these 1000 simulations.
6f) what is the standard deviation of the 1000 sample means? What sample size of
independent data would give you a sample mean with this standard error?
Comment on the effect of spatial correlation on traditional statistical inference.
Bootstrap of Lognormal Distribution with small n
6) We will return to problem 3). The problem is that the t-statistic does not have tdistribution. I have written a function here that calculates the t-statistic, performs
bootstrapping, and then returns quantiles of that t-statistic.
# bootstrap function
boot.t.quantile <- function(x,B,alpha){
# x = data
# B=number of bootstrap repetitions
# alpha = vector of quantiles, e.g. c(.05,.95)
boot.dat <- sample(x,length(x)*B,replace=TRUE)
dim(boot.dat) <- c(length(x),B)
mx <- mean(x)
boot.mean <- apply(boot.dat,2,mean)
boot.std <- apply(boot.dat,2,sd)
boot.se <- boot.std / sqrt(length(x))
boot.t <- (boot.mean - mx)/ boot.se
boot.t.quantile <- quantile(boot.t,alpha,names=FALSE)
}
Now simulate data, and use the bootstrap function to return t-quantiles for each
simulation.
N<-30
dat <- rlnorm(N*1000,0,1)
dim(dat) <- c(N,1000)
# Create empty storage
boot.t <- seq(from=0,to=0,length=2000)
dim(boot.t) <- c(1000,2)
for (i in 1:1000){
boot.t[i,] <- boot.t.quantile(dat[,i],200,c(.05,.95))
}
Now use these t-quantiles instead of those from the t-distribution to calculate 90%
confidence intervals. 1e) What fraction of 90% confidence are below the true mean?
Above the true mean?
Related documents