Download Lab 1: Note: This lab is repetitive. The same tasks are performed

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lab 1:
Note: This lab is repetitive. The same tasks are performed, each time with a new distribution.
You might want to set this up in a script so that you can easily use copy and paste for each
subsequent part.
The Normal Distribution
Simulate 1000 samples of size N=50 from the standard normal distribution. Organize them into
a 50 x 1000 array.
dat <- rnorm(50*1000,0,1)
dim(dat) <- c(50,1000)
N <- 50
1a) Plot a histogram and density curve of the 50,000 data.
hist(dat,freq=FALSE)
den <- density(dat)
lines(den)
Using apply, calculate the 1000 means of each sample
mean.vec <- apply(dat,2,mean)
1b) Plot a histogram and empirical density curve of the 1,000 sample means.
hist(mean.vec,freq=FALSE)
den <- density(mean.vec)
lines(mean.vec)
1c) Calculate the theoretical standard deviation of the mean for N=50. Overlay the histogram
from 1b) with the theoretical normal distribution of the sample means.
lines(den$x,dnorm(den$x,0, sqrt(1/N)),col=”red”)
Using apply, calculate the sd of each sample, and calculate the t-statistic for each sample.
sd.vec <- apply(dat,2,sd)
t.vec <- (mean.vec-0)/(sd.vec/sqrt(N))
Verify that your t-statistic matches the one returned by the built-in function.
1d) What fraction of the 1000 tests would reject the null hypothesis of mu=0 with size alpha=.1?
With alpha = .05?
sum(abs(t.vec)>qt(.975,N-1))/1000
1e) What fraction of 90% confidence are below the true mean? Above the true mean?
below.lower <- sum(0<mean.vec-qt(.95,N-1)*sd.vec/sqrt(N))
below.lower
above.upper <- sum(0>mean.vec-qt(.05,N-1)*sd.vec/sqrt(N))
above.upper
(below.lower + above.upper)/1000
Lognormal Distribution with small N
2) Simulate 1000 samples of size N=30 from the lognormal distribution with parameters mu=0
and sigma=1. Repeat 1a)-1e) with this distribution. For the null hypothesis, however, do not
use Note, you will need to know the true mean and standard deviation of the lognormal
distribution. You can find the formula for the mean and variance of a lognormal on the help page
for the lognormal distribution:
help(Lognormal)
Lognormal Distribution with large N
3) Simulate 1000 samples of size N=500 from the lognormal distribution with mu=0 and sigma =
1. Repeat 1a)-1e) with this distribution.
Comment on the effect of skewness in the t-test under different sample sizes.
skewness the same for both sides of the confidence interval? How so?
Is the effect of
Spatial data with normal distribution
4) Create a spatially correlated data set. You may just copy and paste the call to grf below.
Later in the course, you will know what all of this means.
library(geoR)
sim1<grf(n=100,grid="reg",cov.pars=c(1,.25),cov.model="gaussian”,nsim
=1)
image(sim1)
Create a spatially uncorrelated data set.
sim2<grf(n=100,grid="reg",cov.pars=c(1,.0),cov.model="gaussian”,nsim=
1)
image(sim2)
Visually, what is the difference between spatially correlated and spatially uncorrelated data?
5) Now simulate 1000 spatial random fields using
sim3<grf(n=100,grid="reg",cov.pars=c(1,.25),cov.model="gaussian”,nsim
=1000)
The data are stored in a 100 x 1000 array sim3$data.
Repeat 1a) through 1e) using these 1000 simulations.
6f) what is the standard deviation of the 1000 sample means? What sample size of independent
data would give you a sample mean with this standard error?
Comment on the effect of spatial correlation on traditional statistical inference.
Bootstrap of Lognormal Distribution with small n
6) We will return to problem 3). The problem is that the t-statistic does not have t-distribution.
I have written a function here that calculates the t-statistic, performs bootstrapping, and then
returns quantiles of that t-statistic.
# bootstrap function
boot.t.quantile <- function(x,B,alpha){
# x = data
# B=number of bootstrap repetitions
# alpha = vector of quantiles, e.g. c(.05,.95)
boot.dat <- sample(x,length(x)*B,replace=TRUE)
dim(boot.dat) <- c(length(x),B)
mx <- mean(x)
boot.mean <- apply(boot.dat,2,mean)
boot.std <- apply(boot.dat,2,sd)
boot.se <- boot.std / sqrt(length(x))
boot.t <- (boot.mean - mx)/ boot.se
boot.t.quantile <- quantile(boot.t,alpha,names=FALSE)
}
Now simulate data, and use the bootstrap function to return t-quantiles for each simulation.
N<-30
dat <- rlnorm(N*1000,0,1)
dim(dat) <- c(N,1000)
# Create empty storage
boot.t <- seq(from=0,to=0,length=2000)
dim(boot.t) <- c(1000,2)
for (i in 1:1000){
boot.t[i,] <- boot.t.quantile(dat[,i],200,c(.05,.95))
}
Now use these t-quantiles instead of those from the t-distribution to calculate 90% confidence
intervals. 1e) What fraction of 90% confidence are below the true mean? Above the true
mean?
Related documents