Download docx (Word)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Central limit theorem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Exam A
Part I: Estimating probabilities
Favrskov school transport data
In this part we will use the dataset Favrskov available for download at the course website
(remember to read the description of the dataset when you download it):
Favrskov <- read.csv("Favrskov.dat", sep = "")
•
Do a cross tabulation of the variables klassetrin and transport.
•
Estimate the probability of transport="cyklet".
•
Make a 95% confidence interval for the probability of transport="cyklet".
•
Estimate the probability of transport="cyklet" given that
klassetrin="indskolingen".
•
What do these probabilities satisfy if transport and klassetrin are statistically
independent?
Part II: Sampling distibutions and the central limit theorem
This is a purely theoretical exercise where we investigate the random distribution of
samples from a known population.
Waiting times in a queue
We start by sampling data from the so-called exponential distribution - also called the
negative exponential distribution. The exponential distribution is the most common
distribution used to describe the waiting time between arrivals in a queue. It has one
parameter, which is the number of arrivals per time unit, also called the arrival rate. In our
case we set it to 1 arrival per time unit. Since the arrival rate in our theoretical population
is 1, the mean waiting time for the population will be 𝜇 = 1. Furthermore, it can be shown
that the standard deviation is 𝜎 = 1.
The following commands randomly samples 25 waiting times y and calculates the mean of
these y_bar.
y <- rexp(25, rate = 1)
y
## [1] 0.45679646
## [7] 1.05863436
## [13] 2.32052960
## [19] 0.84637155
## [25] 0.60719950
0.34425511
1.42195494
0.76473657
0.33567092
0.31303231
0.87617110
0.07438345
1.45525104
0.96295424
0.15980615
0.73128123
0.14039747
0.85885484
3.78236116
1.55005892
0.33064175
0.34043880
1.41779058
0.10552004
0.47482768
y_bar <- mean(y)
y_bar
## [1] 0.8691968
Note: Since it is a random sample from the population your numbers will be different. Try
to rerun the commands a few times.
The following command replicates the sampling experiment 1000 times and saves the
result as a matrix(y) with 25 rows and 1000 columns as well as the mean(y_bar) for each
of the 1000 replications:
y <- replicate(1000, rexp(25, rate = 1))
y_bar <- colMeans(y)
Make a histogram of all the sampled waiting times (try to do experiments with the number
of breaks):
hist(y, breaks = 40)
•
•
Explain how a histogram is constructed.
Does this histogram look like a normal distribution?
Now we focus on the mean waiting times y_bar.
•
Use the central limit theorem to predict the mean, standard deviation and
approximate distribution of y_bar.
•
Use this result to predict the quartiles of y_bar.
•
Compare the predicted values of mean, standard deviation and quartiles with the
observed values (you can use summary and sd to calculate these from ybar)
•
Make a histogram of the sample means (y_bar). Does it look like a normal
distribution?
•
Make a boxplot of the sample means and explain how a boxplot is constructed.
Finally, consider the theoretical boxplot of a general normal distribution with mean 𝜇 and
standard deviation 𝜎, and find the probability of being an outlier according to the 1.5⋅IQR
criterion:
•
First find the 𝑧-score of the lower/upper quartile. I.e. the value of 𝑧 such that 𝜇 ± 𝑧𝜎 is
the lower/upper quartile.
•
Use this to find the IQR (expressed in terms of 𝜎).
•
Now find the 𝑧-score of the maximal extent of the whisker. I.e. the value of 𝑧 such that
𝜇 ± 𝑧𝜎 is the endpoint of lower/upper whisker.
•
Find the probability of being an outlier.