Download docx (Word)

Exam It is highly recommended that you answer the exam using Rmarkdown (you can simply use the exam Rmarkdown file as a starting point). Part I: Estimating probabilities Favrskov school transport data In this part we will use the Favrskov dataset (remember to read the description of the dataset when you download it): Favrskov <read.csv("http://asta.math.aau.dk/dan/static/datasets?file=Favrskov.dat", sep = "") • Do a cross tabulation of the variables klassetrin and transport. • Estimate the probability of transport="cyklet". • Make a 95% confidence interval for the probability of transport="cyklet". • Estimate the probability of transport="cyklet" given that klassetrin="indskolingen". • What do these probabilities satisfy if transport and klassetrin are statistically independent? Part II: Sampling distributions and the central limit theorem This is a purely theoretical exercise where we investigate the random distribution of samples from a known population. Waiting times in a queue We start by sampling data from the so-called exponential distribution - also called the negative exponential distribution. The exponential distribution is the most common distribution used to describe the waiting time between arrivals in a queue. It has one parameter, which is the number of arrivals per time unit, also called the arrival rate. In our case we set it to 1 arrival per time unit. Since the arrival rate in our theoretical population is 1, the mean waiting time for the population will be 𝜇 = 1. Furthermore, it can be shown that the standard deviation is 𝜎 = 1. The following commands randomly samples 25 waiting times y and calculates the mean of these y_bar. y <- rexp(25, rate = 1) y ## [1] 0.11440572 ## [7] 2.36595784 ## [13] 0.36983323 ## [19] 0.34308119 ## [25] 0.42302329 0.82283062 0.59501424 1.24641370 0.10522128 0.08347796 0.81090968 1.02974101 0.02916098 0.21030060 1.89092012 1.93451403 3.61961480 0.85696115 0.84391031 0.98985420 1.36999000 2.09756011 0.49074162 0.07010304 2.54954789 y_bar <- mean(y) y_bar ## [1] 1.010524 Note: Since it is a random sample from the population your numbers will be different. Try to rerun the commands a few times. The following command replicates the sampling experiment 1000 times and saves the result as a matrix(y) with 25 rows and 1000 columns: y <- replicate(1000, rexp(25, rate = 1)) The mean(y_bar) is calculated for each of the 1000 replications (i.e. each entry in y_bar is the average of the 25 values in the corresponding column): y_bar <- colMeans(y) Make a histogram of all the sampled waiting times using a command like histogram(as.numeric(y), breaks = 40) inserted in a new code chunk (try to do experiments with the number of breaks): • • Explain how a histogram is constructed. Does this histogram look like a normal distribution? Now we focus on the mean waiting times y_bar. • Based on the known population parameters 𝜇 = 1 and 𝜎 = 1 what is the the mean, standard deviation and approximate distribution of y_bar according to the CLT? • What are the theoretical quartiles based on this approximate distribution of y_bar? • Compare the predicted values of mean, standard deviation and quartiles with the observed values (you can use favstats to calculate these from y_bar). • Make a histogram of the sample means (y_bar). Does it look like a normal distribution? • Make a boxplot of the sample means and explain how a boxplot is constructed. Part III: Theoretical boxplot for a normal distribution Finally, consider the theoretical boxplot of a general normal distribution with mean 𝜇 and standard deviation 𝜎, and find the probability of being an outlier according to the 1.5⋅IQR criterion: • First find the 𝑧-score of the lower/upper quartile. I.e. the value of 𝑧 such that 𝜇 ± 𝑧𝜎 is the lower/upper quartile. • Use this to find the IQR (expressed in terms of 𝜎). • Now find the 𝑧-score of the maximal extent of the whisker. I.e. the value of 𝑧 such that 𝜇 ± 𝑧𝜎 is the endpoint of lower/upper whisker. • Find the probability of being an outlier.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download docx (Word)