Download docx (Word)

Exam It is highly recommended that you answer the exam using Rmarkdown (you can simply use the exam Rmarkdown file as a starting point). Part I: Estimating probabilities Remember to load the mosaic package first: library(mosaic) Chile referendum data In this part we will use the dataset Chile. Remember to read the description of the dataset as well as the Wikipedia entry about the background. Chile <read.table("http://asta.math.aau.dk/dan/static/datasets?file=Chile.dat", header=TRUE, quote="\"") NB: This dataset has several missing values (NA). To remove these when you use tally you can add the argument useNA = "no". • Do a cross tabulation of the variables vote and sex. • Estimate the probability of vote=N. • Make a 95% confidence interval for the probability of vote=N. • Estimate the probability of vote=N given that sex=F. • What would these probabilities satisfy if vote and sex were statistically independent? Part II: Sampling distributions and the central limit theorem This is a purely theoretical exercise where we investigate the random distribution of samples from a known population. Waiting times in a queue We start by sampling data from the so-called exponential distribution - also called the negative exponential distribution. The exponential distribution is the most common distribution used to describe the waiting time between arrivals in a queue. It has one parameter, which is the number of arrivals per time unit, also called the arrival rate. In our case we set it to 1 arrival per time unit. Since the arrival rate in our theoretical population is 1, the mean waiting time for the population will be 𝜇 = 1. Furthermore, it can be shown that the standard deviation is 𝜎 = 1. The following commands randomly samples 25 waiting times y and calculates the mean of these y_bar. y <- rexp(25, rate = 1) y ## [1] 0.43411264 ## [7] 0.20025879 ## [13] 1.18766120 ## [19] 0.45262537 ## [25] 1.06047869 2.96845468 0.68169148 1.54653673 0.29007238 2.00201953 0.78435122 2.31660386 1.02436731 2.40607838 0.71508338 2.93045280 2.79151352 0.01393748 0.88599114 0.67177111 1.54055438 0.32013741 0.18889305 2.64166444 0.01879488 y_bar <- mean(y) y_bar ## [1] 1.202964 Note: Since it is a random sample from the population your numbers will be different. Try to rerun the commands a few times. The following command replicates the sampling experiment 1000 times and saves the result as a matrix(y) with 25 rows and 1000 columns: y <- replicate(1000, rexp(25, rate = 1)) The mean(y_bar) is calculated for each of the 1000 replications (i.e. each entry in y_bar is the average of the 25 values in the corresponding column): y_bar <- colMeans(y) Make a histogram of all the sampled waiting times using a command like histogram(as.numeric(y), breaks = 40) inserted in a new code chunk (try to do experiments with the number of breaks): • • Explain how a histogram is constructed. Does this histogram look like a normal distribution? Now we focus on the mean waiting times y_bar. • Based on the known population parameters 𝜇 = 1 and 𝜎 = 1 what is the the mean, standard deviation and approximate distribution of y_bar according to the CLT? • What are the theoretical quartiles based on this approximate distribution of y_bar? • Compare the predicted values of mean, standard deviation and quartiles with the observed values (you can use favstats to calculate these from y_bar). • Make a histogram of the sample means (y_bar). Does it look like a normal distribution? • Make a boxplot of the sample means and explain how a boxplot is constructed. Part III: Theoretical boxplot for a normal distribution Finally, consider the theoretical boxplot of a general normal distribution with mean 𝜇 and standard deviation 𝜎, and find the probability of being an outlier according to the 1.5⋅IQR criterion: • First find the 𝑧-score of the lower/upper quartile. I.e. the value of 𝑧 such that 𝜇 ± 𝑧𝜎 is the lower/upper quartile. • Use this to find the IQR (expressed in terms of 𝜎). • Now find the 𝑧-score of the maximal extent of the whisker. I.e. the value of 𝑧 such that 𝜇 ± 𝑧𝜎 is the endpoint of lower/upper whisker. • Find the probability of being an outlier.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download docx (Word)