Download 4.4 notes - Fitting models to data Example 4.4.1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
4.4 notes - Fitting models to data
Example 4.4.1 - Call lengths
In RStudio, import the following data file:
http://wwwp.cord.edu/faculty/reber/data/315/calls.txt
The data lists the lengths (in seconds) of 31,288 calls made to a customer service center
over the course of a month.
a. Let’s begin by graphing the data:
hist(calls$Length)
Which model(s) would be appropriate for call lengths? Propose a model, find MLE’s
for the parameters, and sketch the histogram with the PDF of the model overlaid.
Qualitatively assess how well your model fits the data.
b. Use your model to find the probability that a call will last under three minutes.
c. se your model to find the probability that a call will last more than five minutes.
d. Propose a second model, find MLE’s for the parameters, and sketch the histogram
with the PDF of the model overlaid. Qualitatively assess how well your model fits
the data. Which model is “better?” Can you tell?
e. Use your second model to re-calculate the probabilities in (b) and (c). Is there a
marked difference in the results?
Example 4.4.2 - Accuracy of MLE’s
In our last example we used a MLE to estimate the average length of calls at a
call center. How accurate is this estimate? How could we get an estimate of an
estimate’s accuracy?
Example 4.4.2 (cont.)
Continuing our previous example, the data lists the lengths (in seconds) of 31,288 calls
made to a customer service center over the course of a month.
a. Re-create a Histogram of the data. Comment on the distribution of call lengths.
b. Take a random sample of 500 calls from the data and assign it to variable sample.calls. Draw a histogram and compare it to your histogram from part (a).
> sample.calls = sample(calls$Length, 500)
> hist(sample.calls, freq=FALSE, main="Histogram of Sample Call Lengths")
c. Calculate the mean and standard deviation of your sample. Compare values with
your neighbor. (Note: If your neighbor’s not in this class, just compare with the
person sitting next to you.)
> mean(sample.calls)
> sd(sample.calls)
d. Take a random sample of 50 calls from the data and assign it to variable sample.calls.
Draw a histogram and compare it to your histogram from parts (a) and (b).
e. Calculate the mean and standard deviation of your sample. Compare values with
your neighbor.
Comments:
Theoretical Aside
f. Let x1 , x2 , . . . , xn be a random sample that we wish to model using a continuous
model with PDF f (x) where E(x) = µ and SD(x) = σ.
(a) Find E(x̄).
(b) Find V (x̄) and SD(x̄).
Comments:
g. Let’s consider the following statistic:
zn =
x̄ − µ
√σ
n
Comments:
Theorem (The Central Limit Theorem.) Let x1 , x2 , . . . , xn denote a random
sample from a model with E(x) = µ < ∞ and 0 < SD(x) = σ < ∞. Then
! Z
b
x2
x̄ − µ
1
√ e− 2 dx
lim P a < σ < b =
√
n→∞
2π
a
n
Here the integral on the right side is the normal probability model. A more
general form of the PDF is
(x−µ)2
1
f (x) = √ e− 2σ2
σ 2π
h. Use the CLT to construct an interval for x̄ when a = −2 and b = 2 for the exponential model.
i. Use the CLT to construct an interval for λ when a = −2 and b = 2 for the exponential model.
j. Take a random sample of measurements from the calls data for each of n = 5, 50, 500.
Construct a 95% confidence interval in each case. Use the class’ results to comment
on the accuracy of these intervals.
k. Use the CLT to construct a 95% confidence interval formula for λ in a Poisson
model.
HW:
a. Show that, for the normal model, E(x) = µ and SD(x) = σ.
b. Use the CLT to construct a 95% interval formula for p in a binomial model.
c. Use the CLT to construct a 95% interval formula for p in a geometric model.