Download Assignment #1 – This assignment has two parts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Part I – Probability, the Normal Distribution, and the Binomial Distribution
a. Recall the Jimmy Rhodes Auto Supply problem from Lecture #0 review
notes.
Question 1: What is the probability of stock-outs occurring during the 2-day lead
time using the theoretical model?
The probability of stock-outs is the probability that demand during the 2-day lead time
exceeds the quantity (20 gallons) on hand, i.e., P(X>20):
(Source: Professor Schroeder’s Lecture #0 Page20)
The corresponding Z-score for 20 gallons is computed as
Then
P(X>20) =P(Z>0.83)=0.2033=20.33%
Question 2: What % of the simulated demands exceeds 20, i.e., what is the
“probability” of stocking out based on the simulated data?
From the random simulation (see Appendix A Table in Columns B and C), there are 9
demands that are greater than 20. Therefore, the percentage of the simulated demands
exceeds 20 is
Based on the simulated data, the “probability” for stocking out is 45%.
Question 3: Are your answers to questions 1 and 2 the same? Comment/Explain.
The answers to Questions 1 and 2 are different. In Question 1, it was suggested that the
two day demand from the past record (or population) follows a normal distribution. We
analyzed the probability for out-of-stock during 2-day lead time based on the entire
population; whereas in Question 2, we analyzed the percentage of out-of-stock based on a
sample size of 40. There is usually a deviation between the probabilities analyzed from a
population vs. those from a sample size. (Remember, probability is a long-run
percentage, i.e., the percentage as determined from an “infinitely” large number of trials).
Question 4: Use both the theoretical distribution, N(15, 6), and the simulated data
to calculate the middle 95% of demand values. Show your work.
Using the theoretical distribution:
The middle 95% probability provides that the random variable will be within 2 standard
deviations away from the mean. That is,
P(-1.96 ≤ Z ≤ 1.96) = 0.95
Since X = µ + Zσ
X1 = 15+(-1.96)*6 = 3.24 gallons
X2 = 15 + (1.96)*6 = 26.76 gallons
Therefore, the middle 95% of demand values are between 3.24 and 26.76 gallons.
Using the simulated data:
Based on the above figure, we should simulate the motor oil demand at 2.50 percentile
and at 97.50 percentile as shown below:
One Variable
Summary
Mean
Std. Dev.
2.50%
97.50%
Motor Oil
Demand
Data Set #1
14
8
0
27
The simulation shows that the middle 95% of demand values are between 0 and 27
gallons. You could, of course, have counted from your simulated data. The 2.5th
percentile is in position (41)(.025) = 1.025 ≈ 1 in the data set. That means, find the
smallest data value in your simulated demands. Likewise, the 97.5th percentile is in
position (41)(.975) = 39.975 ≈ 40. That means, find the largest data value in the
simulated demands.
b. Binomial distributions. Let’s simulate some observations from a Binomial
distribution with probability of success equal to 0.25.
Question 5: If we simulate 40 random observations, how many successes do we
expect to see?
n = 40
p = 0.25
E = n*p = 40*0.25 = 10
We expect to see 10 successes.
Question 6: How many successes are there in your simulated data? Is your
answer the same as your answer to Question 5? Comment/Explain. What
proportion (%) of successes is this?
There are 14 successes in the simulation (see Appendix A Table in Column D), which is
different from the answer to Question 5. The expected value for successes (from
Question 5) is the average number of successes out of 40 observations assuming the
theoretical probability of success (0.25). In our simulation, since we have only included
40 trials, our observed probability of success has not stabilized at the theoretical
probability of success.
The simulation contains 14/40*100% = 35% of successes, which is higher than the 25%
probability of success given.
Question 7: What is the exact theoretical probability of observing fewer than 2
successes in 40 trials if the probability of success is 0.25? What is the
approximate theoretical probability of observing fewer than 2 successes in 40
trials when the probability of success is 0.25 using the Normal approximation to
the Binomial? Explain how you could use simulation to answer such probability
questions?
Let X = the number of successes observed.
The exact theoretical probability of observing fewer than 2 successes in 40 trials is:
P(X<2) = P(X=0) + P(X=1)
= 40C0(0.25)0(1-0.25)40 + 40C1(0.25)1(1-0.25)40-1
= 1.441*10-4
≈0
The theoretical expected value and standard deviation are:
µ = 10
By using the normal approximation to the Binomial,
Z=
( X − .5) − μ
σ
=
1.5 − 10
2.74
P(Z<-3.10) = 0.0010
The approximate theoretical probability of observing fewer than 2 successes is 0.0010.
To determine the simulated % of times that fewer than 2 successes is observed would
require multiple simulations of data for the Binomial distribution with n = 40 and p =
0.25 in each case. For each simulated 40 trials, count the number of successes. Use the
number of successes from the multiple simulations to find the % of your simulations that
had count of successes less than 2.
Appendix A Excel Worksheet for Part I
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
A
B
Motor Oil
Demand
Demand>20?
1=Yes, 0=No
C
D
Binomial
Data
P=0.25
11
0
1
21
1
1
10
21
18
3
25
13
10
0
1
0
0
1
0
0
0
0
1
0
0
1
1
0
0
1
31
24
12
17
24
19
17
6
1
1
0
0
1
0
0
0
0
0
0
0
1
0
0
1
15
0
1
11
17
6
13
9
2
10
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
1
21
1
1
19
0
0
13
0
0
27
1
0
16
0
0
2
0
0
17
0
0
14
0
0
24
1
0
10
0
1
18
0
0
8
0
0
18
0
0
Count
9
Count
14
Part II – Parameter versus Statistic, and the distribution of X versus the
distribution of X
Question 8: Our sample size is only 15. Why is the small sample size not a
problem? Define the theoretical distribution of X and the sampling distribution
of X .
The small sample size is not a problem because the original population is Normal.
The theoretical distribution of X is the normal distribution with a mean of 100 and a
standard deviation of 5. The sampling distribution of X is also the normal distribution
5
σ
with mean of 100, but has standard error of
=
≈ 1.291
n
15
Question 9: Define Parameter. Define Statistic.
The one variable summary for X (Data Set #1) and X (Data Set #2) is shown below:
Sample
One Variable
Summary
Mean
Std. Dev.
Skewness
Median
Minimum
Maximum
Count
1st Quartile
3rd Quartile
X
Sample 1
Avg
Data Set #1
Data Set #2
100.84
4.39
-0.28
101.86
94.10
106.83
15.00
96.03
103.35
99.77
1.32
0.25
99.60
97.65
102.69
50.00
98.76
100.66
A parameter is a numerical summary measure that describes the population, i.e., the set of
ALL possible items of interest. A statistic is a numerical summary measure that
describes a sample, i.e., a subset of items selected from the population.
Question 10: Use the results of the one variable summary for X and the definition
of the theoretical distribution of X to illustrate the difference between a parameter
and a statistic.
The parameters of the original variable, X, (i.e., of the population) are µ = 100 and σ = 5.
In the above “One Variable Summary” table for Data Set #1 (Sample 1) the summary
numbers given are statistics because they are based on only 1 sample of 15 observations.
Their values are different from the parameter values. The sample average, x , has value
100.84, and the sample standard deviation, s, has value 4.39.
Question 11: Define σ, s, and σ X . Use your results to illustrate the difference
between them.
σ is the standard deviation of the population. s is the standard deviation of the sample
size. σ X is the standard deviation of the sample average, X . From our previous
discussion, σ is 5, s is 4.39, σ X is 1.291.
Question 12: What is the theoretical probability that X is less than 97.5? What %
of the values in your first sample is less than 97.5? Comment on the disparity.
Since the random variable X follows a normal distribution with µ = 100 and σ = 5, the
theoretical probability that X is less than 97.5 is
In the first sample, 4 out of 15 simulated values is less than 97.5. That is, 26.7% of the
values are less than 97.5. The theoretical probability and the actual percentage from the
sample size are pretty close. There is usually a difference between the theoretical
probability and the percentage computed from a set of sample data. Again, theoretical
probability assumes “infinite” number of observations; we only have 15 in a single
sample.
Question 13: What is the theoretical probability that X is less than 97.5? What
% of the 50 average values is less than 97.5? Again, comment on the disparity.
The theoretical probability that X is less than 97.5 is 0.262 as calculated below.
σ
5
σX =
=
= 1.291
n
15
97.5 − 100
P ( X < 97 .5) = P ( Z <
) = P ( Z < −1.94) = 0.0262 = 2.62 %
1.291
None of the 50 average values in my simulations is less than 97.5. Therefore, 0% of the
50 average values is less than 97.5. The disparity, again, is accounted for by the
relatively small number (50) of repetitions.
Histograms:
Question 14: Explain why the probability found in question 13 is so much
smaller than the probability found in question 12.
The reason why the probability found in question 13 is much smaller than the probability
found in question 12 is because the standard deviation used in question 13 is much
smaller than the standard deviation used in question 12. Under the same amount of
deviation (100 – 97.5 = 2.5), with a smaller standard deviation in question 13 (1.291 vs. 5
in question 12), the absolute value of Z score becomes much larger for question 13. This
means many less random variables will fall in the range less than 97.5 in question 13.
Therefore, the probability found in question 13 is much smaller than the probability
found in question 12.
X
X