Download Solution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
CS 1538: Introduction to Simulation
Homework 5
Introduction
In this assignment, you will perform both input modeling and output analysis. When answering
the following questions, please show all work (equations, explanations, etc.) not just the final
answer. List any software you use and explain what you used it for.
Input Modeling
1. The following data are generated randomly from a Normal distribution. Compute the
maximum-likelihood estimators for µ and σ2.
-0.3182
-0.0895
0.3708
2.0838
0.3755
0.1913
-1.8758
1.1832
-0.0495
0.6597
1.3174
2.8889
0.0492
-2.0548
0.7597
0.7215
-0.5848
0.2236
0.2184
-0.8680
0.0155
-1.2310
1.0263
-3.3413
0.3406
1.7381
0.1327
1.4639
-0.5454
-1.4124
SOLUTION:
Since this is the Normal distribution:
µ = sample average = 0.11298
σ2 = sample variance = 1.643
Sample average and variance calculated using Excel.
2. The highway between Atlanta, Georgia and Athens, Georgia has a high incidence of
accidents along its 100 kilometers. Public safety officers say that the occurrence of
accidents along the highway is randomly (uniformly) distributed, but the news media
says otherwise. The Georgia Department of Public Safety published records for the
month of September. These records indicated the point at which 30 accidents involving
an injury or death occurred, as shown below (the data points representing the distance
from the city limits of Atlanta). Use the Kolmogorov-Smirnov test to discover whether
the distribution of location of accidents is uniformly distributed. Use the level of
significance α = 0.05.
88.3
91.7
98.8
32.4
20.6
76.6
40.7
67.3
90.1
87.8
73.1
73.2
SOLUTION:
36.3
7.0
17.2
69.8
21.6
27.3
27.3
45.2
23.7
62.6
6.0
87.6
36.8
23.3
97.4
99.7
45.3
87.2
Null hypothesis: The data follows a uniform distribution.
Alternative hypothesis: The data does not follow a uniform distribution.
D = 0.172
Dα = 0.240
Since D < Dα, accept null hypothesis. The occurrence of accidents follows
a uniform distribution.
3. The time required for 50 different employees to compute and record the number of hours
worked during the week was measured, with the following results in minutes. Use the
chi-square test to test the hypothesis that these service times are exponentially distributed.
Use six intervals. Use the level of significance α = 0.05.
Employee
1
2
3
4
5
6
7
8
9
10
11
12
13
Time
(min)
1.88
0.54
1.90
0.15
0.02
2.81
1.50
0.53
2.62
2.67
3.53
0.53
1.80
Employee
14
15
16
17
18
19
20
21
22
23
24
25
26
Time
(min)
0.79
0.21
0.80
0.26
0.63
0.36
2.03
1.42
1.28
0.82
2.16
0.05
0.04
Employee
27
28
29
30
31
32
33
34
35
36
37
38
39
Time
(min)
1.49
0.66
2.03
1.00
0.39
0.34
0.01
0.10
1.10
0.24
0.26
0.45
0.17
Employe
e
40
41
42
43
44
45
46
47
48
49
50
Time
(min)
4.29
0.80
5.50
4.91
0.35
0.36
0.90
1.03
1.73
0.38
0.48
SOLUTION:
Null hypothesis: The data follows an exponential distribution.
Alternative hypothesis: The data does not follow an exponential
distribution.
Sample mean = 1.206
Estimated λ = 1/1.206 = 0.829
Since we want 6 intervals, that means each interval will have 1/6 (16.67%)
probability mass each.
F(x)
x
Observed Expected
(O-E)2/E
freq
freq
0.1667 [0, 0.220)
8
8.3333
0.013
0.3333 [0.220, 0.489) 11
8.3333
0.853
0.5
[0.489, 0.836) 9
8.3333
0.053
0.6667 [0.836, 1.325) 5
8.3333
1.333
0.8333 [1.325, 2.161) 10
8.3333
0.333
1.000
[2.161, ∞)
7
8.3333
0.213
The x column can be calculated using F-1(x) = -ln(1-x)/ λ. So, for the first
row, it’s –ln(1-0.1667)/0.829 = 0.2199.
Since we created intervals with equal probability mass, the expected
frequency column is equal across each interval. It can be calculated as
1/6*(sample size) = 1/6*50 = 8.3333
C = the sum of the last column = 2.8
Degrees of freedom = k – s – 1 = 6 – 1 – 1 = 4
k = number of bins
s = number of estimated parameters
Critical value = χ24,0.05 = 9.488
Since C < χ24,0.05, we accept the null hypothesis. The service times are
likely from an exponential distribution with service rate 0.829.
4. At a small store, you record the service time (in minutes) for 30 transactions (shown
below). How are these service times distributed? Develop and test a suitable model.
Use one of the goodness-of-fit tests to decide. Use the level of significance α = 0.05.
4.6093
2.4541
2.7272
2.6083
9.5841
3.8583
1.1305
8.7227
4.8375
1.7347
5.2921
1.1191
2.1489
0.2878
2.9482
11.8326
2.9973
3.0065
2.6055
6.7692
3.3745
4.1410
2.3380
1.6949
12.1024
14.3956
2.0066
2.6579
0.3578
4.1612
SOLUTION:
Since we’re dealing with service times, it’s good to start with the
exponential distribution.
Null Hypothesis: The service times follow an exponential distribution.
Alt. Hypothesis: The service times do not follow an exponential
distribution.
Sample mean = 4.28346
Estimated λ = 1/4.28346 = 0.233
I’ll use the Chi-squared goodness-of-fit test with 5 intervals. Thus, each
interval will have 20% probability mass.
F(x) x
Observed Expected (O-E)^2/E
freq
freq
0.2 [0, 0.9558)
2
6 2.666667
0.4 [0.9558, 2.1881)
6
6
0
0.6 [2.1881, 3.9249)
11
6 4.166667
0.8 [3.9249, 6.894)
6
6
0
1 [6.894, ∞)
5
6
0.166667
C=7
χ23,0.05 = 7.815
Since C < χ23,0.05, we accept the null hypothesis. The service times are
likely from an exponential distribution with service rate 0.233.
Output Analysis
1. The small store from #4 above desires their service time to be faster, closer to 2.5
minutes. You implement a simulation of their store. Over 10 runs, you record the service
time for 30 customers:
6.4678
5.6306
5.3717
6.9439
3.7322
5.2842
5.6135
3.9969
6.1131
4.2907
The store owner has thoughts on how to improve service times. You implement these
thoughts in the simulated system and rerun the simulation for 10 runs of 30 customers
each. You do your best to keep the same random numbers in run i this time as run i had
for the first run. The average service times recorded were:
3.1879
3.4287
2.9015
3.5636
2.7428
4.7058
3.9970
3.1817
3.3992
3.1958
Is there a difference in service times? Construct the appropriate 95% confidence interval
to decide.
SOLUTION:
Since the runs of the two simulations used the same random numbers, we
can used the paired-t approach. To do this, take the difference of runi from
the first simulation and runi of the second. Compute a confidence interval
for that set of data.
Difference:
3.2799 2.7291 2.6289 2.9469
1.8555 2.0499 -0.709 2.9314
Sample mean = 1.914
Sample variance = 1.691
Standard error = sqrt(1.691/10) = 0.411
tα/2,n-1 = t0.05/2,9 = 2.262
Confidence interval: (0.984, 2.844)
0.333
1.0949
Since the confidence interval does not contain zero, we are 95% confident
that there is a difference between the two samples. Since the mean is
positive, this tells us that the first sample (from the original system) is
larger than the second sample (the improved system).
2. The store owner is curious if her improvements bring the service time close enough to 2.5
minutes. Using the data above for the improvements, construct a 95% confidence
interval to decide.
SOLUTION:
The confidence interval for the simulation of the improvement is:
Sample mean = 3.430
Sample variance = 0.322
Standard error = sqrt(0.322/10) = 0.179
tα/2,n-1 = t0.05/2,9 = 2.262
Confidence interval: (3.024, 3.836)
Since the confidence interval does not contain 2.5, we are 95% confident
that the improvement will not bring the service time close to 2.5 minutes.
3. The store owner wants to know the 95% confidence interval on the service times from
your original simulation run (before the improvements). After seeing the range, she is
disappointed that it is too large. She wants the confidence interval to be within 15
seconds (0.25 minutes). How many simulation runs are needed to have a confidence
interval that is 30 seconds wide?
SOLUTION:
n >= (zα/2 S / ε)2 = (1.960 * sqrt(0.322) / 0.25)2 = 17.89 = 18
At least 18 runs are needed. Since 10 have already been run, we need 8
more.