Download 12 6 1 6 36 12 Number of ways to get a total of after rolling two sided

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Student's t-distribution wikipedia , lookup

Normal distribution wikipedia , lookup

Statistics wikipedia , lookup

Multimodal distribution wikipedia , lookup

Transcript
Julian Archer
To: Dr. Findsen
STAT 511 – Section 3
Project #1
Question #1(a)
Experimental Procedure:
Step #1: Use “roll-dice-online.com” to generate 100 random paired dice rolls using two 6-sided dice.
Step #2: Transfer the 100 observations for the paired dice rolls into excel.
Step#3: Compute the combined total for each of the paired dice rolls (i.e. for each row of data).
Step#4: Compute the number of occurrences where the total in step#3 was “12”. This value was found to
be 2.
Step#5: Compute the probability of getting a total of 12. Divide the value found in Step #4 by 100. The
answer should be 0.02.
Step#6: Compute the absolute difference for each of the paired dice rolls (i.e. for each row of data).
Step#7: Compute the number of occurrences where the absolute difference in Step#6 was “4”. This value
was found to be 11.
Step#8: Compute the probability of getting an absolute difference of 4. Divide the value found in Step#7
by 100. The answer should be 0.11.
Step#9: Compute the probability of getting a total of 12or an absolute difference of 4. This is simply
adding the values found in Step#5 and Step#8. The answer was found to be 0.13.
Theoretical Calculation:
Number of ways to get a total of 12 after rolling two 6  sided dice  N  X    1
Total number of possible outcomes of totals when rolling two 6  sided dice  N   36
Probability of getting a total of 12  P  X  
N X  1

 0.0278
N
36
Number of ways to get an absolute difference of 4 after rolling two 6  sided dice  N  D    4
Total number of possible outcomes of totals when rolling two 6  sided dice  N   36
Probability of getting an absolute difference of 4  P  D  
N D 4

 0.1111
N
36
Probability of getting a total of 12 or an absolute difference of 4  P  X  D 
 P  X   P  D   0.0278  0.1111  0.1389
Answer Comparison: In experiment, the probability of getting a total of 12 or an absolute difference of
4 is slightly smaller than in theory, i.e. 0.13 versus 0.1389
Question #1(b)
Experimental Procedure:
Step #1: Use “roll-dice-online.com” to generate 100 random paired dice rolls using two 6-sided dice.
Step #2: Transfer the 100 observations for the paired dice rolls into excel.
Step#3: Compute the number of occurrences where at least one six occurred in each paired dice roll using
a logical test (i.e. for each row of data). This value was found to be 26.
Step#4: Compute the probability of at least one six. Divide the value found in Step#3 by 100. The
answer should be 0.26.
Step#5: Compute the absolute difference for each of the paired dice rolls (i.e. for each row of data).
Step#6: Compute the number of occurrences where the absolute difference in Step#5 was “4”. This value
was found to be 13.
Step#7: Compute the probability of getting an absolute difference of 4. Divide the value found in Step#6
by 100. The answer should be 0.13.
Step#8: Compute the probability of getting at least one six or an absolute difference of 4. This is simply
adding the values found in Step#4 and Step#7. The answer was found to be 0.39.
Theoretical Calculation:
Number of ways to get at least one 6 after rolling two 6  sided dice  N  X    11
Total number of possible outcomes of totals when rolling two 6  sided dice  N   36
Probability of getting at least one 6  P  X  
N  X  11

 0.3056
N
36
Number of ways to get an absolute difference of 4 after rolling two 6  sided dice  N  D    4
Total number of possible outcomes of totals when rolling two 6  sided dice  N   36
Probability of getting an absolute difference of 4  P  D  
N D 4

 0.1111
N
36
Probability of getting at least one 6 or an absolute difference of 4  P  X  D 
 P  X   P  D   0.3056  0.1111  0.4167  Correct this calculation with the intersection component
Answer Comparison: In experiment, the probability of getting at least one 6 or an absolute difference of
4 is a bit smaller than in theory, i.e. 0.39 versus 0.4167
Question #2(a)
Experimental Procedure:
Step #1: Use “randomizer.org/form.htm” to generate 100 observations of 3 cards drawn randomly
(without replacement) from one suite playing cards.
Parameters:
How many sets of numbers do you want to generate: 100
How many numbers per set: 3
Number range (e.g., 1-50): 1 to 13
Do you wish each number in a set to remain unique: Yes
Step #2: Transfer the 100 observations in Step#1 into Excel.
Step #3: Count the number of occurrences where there were no face cards for the three cards drawn. (i.e.
for each row of data). This value was found to be 53.
Step #4: Compute the probability of getting no face cards for the three cards drawn. Divide the value
found in Step#3 by 100. The answer should be 0.53.
Theoretical Calculation:
Using a hypergeometric distribution, the parameters are as followed:
Outcomes  not face card  success  or face card  failure 
Finite population  N   13 || number of success in population  M   10
Fixed sample size  n   3 || number of success in sample  x   3
 M   N  M   10   3 
 x   n  x   3   0
       0.4196
h(x  3; n  3, M  10, N  13)  P (X  3)    
N
 
 13 
n
3
 
 
 N  n 
M 
M
 13  3   10  
10 
X  
  n.   1    
  3.   1    0.6662
N
 N 1  N 
 13  1   13   13 
Answer Comparison: In experiment, the probability of drawing 3 non face cards was larger than what
we would expect in theory, i.e. 0.53 versus 0.4196, which is approximately, 0.17 standard deviations from
the theoretical value, i.e. (0.55 - 0.4196) / (0.6662).
Question #2(b)
Experimental Procedure:
Step #1: Use “randomizer.org/form.htm” to generate 100 observations of 3 cards drawn randomly (with
replacement) from one suite playing cards.
Parameters:
How many sets of numbers do you want to generate: 100
How many numbers per set: 3
Number range (e.g., 1-50): 1 to 13
Do you wish each number in a set to remain unique: No
Step #2: Transfer the 100 observations in Step#1 into Excel.
Step #3: Count the number of occurrences where there were no face cards for the three cards drawn. (i.e.
for each row of data). This value was found to be 45.
Step #4: Compute the probability of getting no face cards for the three cards drawn. Divide the value
found in Step#3 by 100. The answer should be 0.45.
Theoretical Calculation:
Using a binominal distribution, the parameters are as followed:
Sequence of independent trials  n   3
Outcomes  not face card  success  x  or face card
Probability of success
 p
 failure 
 10 / 13
 3
 b( x  3; n  3, p  10 / 13)  P( X  x ) p x (1  p ) n  x    (10 / 13) 3 (3 / 13) 0  0.4552
 3


 X  np(1  p )  (3)(10 / 13)  1 
10 
  0.7298
13 
Answer Comparison: In experiment, the probability of drawing 3 non face cards was slightly smaller
than what we would expect in theory, i.e. 0.45 versus 0.4552. It’s close though, -0.00071 standard
deviations from the theoretical value, i.e. (0.45 - 0.4552) / (0.7298).
Question #2(c)
Experimental Procedure:
Step #1: Use “randomizer.org/form.htm” to generate 100 observations of 3 cards drawn randomly (with
replacement) from one suite playing cards.
Parameters:
How many sets of numbers do you want to generate: 100
How many numbers per set: 3
Number range (e.g., 1-50): 1 to 13
Do you wish each number in a set to remain unique: No
Step #2: Transfer the 100 observations in Step#1 into Excel.
Step #3: Count the number of occurrences where there were no face cards for the first of the three cards
drawn. (i.e. for each row of data). This value was found to be 71.
Step #4: Compute the probability of getting no face cards for the three cards drawn. Divide the value
found in Step#3 by 100. The answer should be 0.71.
Theoretical Calculation:
Using a geometric distribution, the parameters are as followed:
Sequence of independent trials  n   3
Outcomes  not face card  success  or face card
Probability of success
Interested in “1” trial
 p   10 / 13
 x  until the 1st  rth 
 failure 
success
0
 10   3 
 g ( x  1; p  10 / 13)  p(1  p ) x 1       0.7692
 13   13 

10 
1 
1 p 
 X   2    132   2.0817
  10  
 p 
  
  13  
Answer Comparison: In experiment, the probability of drawing a non-face card on the first trial was a
bit smaller than what we would expect in theory, i.e. 0.71 versus 0.7692. It’s close though, -0.0284
standard deviations from the theoretical value, i.e. (0.71 - 0.7692) / (2.0817).
Question 3(a)
Step #1: Using “http://www.random.org/gaussian-distributions/” generate 2 ten (10); 2 hundred (100);
1 one thousand (1000) sample sizes.
Use these parameters below:
The distribution's mean should be
±1,000,000).
The numbers should have
2
0.0
(limits ±1,000,000) and its standard deviation
1.0
(limits
significant digits (minimum 2, maximum 20).
Step #2: For each of the sets of random numbers generated, transfer the values into MATLAB and place
them in an array. Call these arrays X1, X2, X3, X4, and X5, respectively.
Step #3: Generate Q-Q plots for X1, X2, X3, X4, and X5, respectively. Also get the standard deviation
and mean values for each dataset so that they could be compared with the set parameters above. This is
easily done using a few lines of code in MATLAB.
Note: after a bunch of searching, to no avail, and playing around with numbers, I have come to the
 i  0.5 
deduction that MATLAB uses this formula for calculating percentiles: Percentile  100 

 n 
th
This is what I got below:
QQ Plot of Sample Data versus Standard Normal
QQ Plot of Sample Data versus Standard Normal
1.5
1.5
1
Quantiles of Input Sample
Quantiles of Input Sample
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
-1.5
-2
-1.5
-2
-2
-2.5
-1
0
1
Standard Normal Quantiles
2
-3
-2
= -0.2391
This plot is not normal; it has an S-shape. There
appears to be a large deviation from the straight line in
the lower and upper standard normal quartiles. The
standard deviation and mean of the X1 data set are not
1 and 0, respectively (which were the default
parameters). Overall, it appears that the data may have
a long (heavy) tail and slightly be bimodal. I can’t
really tell skewness from this.
2
Plot (X2): Sample size = 10
Plot (X1): Sample size = 10
 = 0.9457
-1
0
1
Standard Normal Quantiles
 = 1.078
= -0.7778
This plot, like the X1 plot is not normal; it has an Sshape. There appears to be a large deviation from the
straight line in the lower and upper standard normal
quartiles. The standard deviation and mean of the X2
data set are not 1 and 0, respectively (which were the
default parameters). Overall, it appears that the data
may have a long (heavy) tail and slightly be bimodal. I
can’t really tell skewness from this.
QQ Plot of Sample Data versus Standard Normal
3
2
2
Quantiles of Input Sample
Quantiles of Input Sample
QQ Plot of Sample Data versus Standard Normal
3
1
0
-1
-2
1
0
-1
-2
-3
-4
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
-3
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
Plot (X4): Sample size = 100
Plot (X3): Sample size = 100
 = 1.091
 = 0.9939
= -0.1199
= 0.09536
This plot is somewhat normal; it has a fairly straightlines shape. There still appears to be some deviation
from the normal standard deviation and mean.
However, this deviation appears to be lesser than in the
previous X1 and X2 plots. Apparently with more data
points, the normal distribution seems to be
approaching.
This plot is somewhat normal; it has a fairly straightlines shape. There still appears to be some deviation from
the normal standard deviation and mean. However, this
deviation appears to be lesser than in the previous X1
and X2 plots, even X3, although 100 points were used.
Apparently with more data points, the normal
distribution seems to be approaching.
 = 1.009
QQ Plot of Sample Data versus Standard Normal
4
= -0.004889
Quantiles of Input Sample
3
This plot is very close to normal; it follows a straight
line. There is very little deviation from the normal
standard deviation and mean. Compared to X1, X2, and
X3 and X4, this plot yielded the best result of a normal
distribution.
2
1
0
-1
-2
-3
-4
-4
-2
0
2
Standard Normal Quantiles
Plot (X5): Sample size = 1000
4
Overall, it seems as though with more data points, the
more normal the distribution of the sample seems to be
when taken from the parent (which was established by
the default parameters. A simple explanation for this is
the fact that if a small sample is taken, there could be
quite a few outliers that would tend to skew the mean
and standard deviation. Remember the “0” mean and
standard deviation of “1” is only obtained after averaging
all data points in the entire population. Therefore, the
larger the sample pulled from the population, the more
representative the values become.
Question 3(b)
Step #1: For each of the sample data sets in the Project1.xlx file, transfer the values into MATLAB and
place them in an array. Call these arrays S1, S2, and S3, respectively.
Step #2: Generate Q-Q plots for S1, S2, and S3, respectively. Also get the standard deviation and mean
values for each dataset so that they could be compared with the standard deviation and mean values of the
normal line fitted to the data. This is easily done using a few lines of code in MATLAB.
This is what I got below:
QQ Plot of Sample Data versus Standard Normal
QQ Plot of Sample Data versus Standard Normal
1400
1000
Quantiles of Input Sample
Quantiles of Input Sample
1200
500
0
-500
-1000
-1500
-3
1000
800
600
400
200
0
-2
-1
0
1
2
Standard Normal Quantiles
3
-200
-3
-2
-1
0
1
2
Standard Normal Quantiles
3
Plot (S2): Sample size = 150
Plot (S1): Sample size = 150
QQ Plot of Sample Data versus Standard Normal
Quantiles of Input Sample
15
Plots S1 and S2 are obviously not normal; they have
an S-shape and deep curve, respectively. Plot S1
appears to have a somewhat short (light) tail, while
Plot S2 appears to be positively skewed. There is
tremendous deviation from the straight line, which is
expected of a normal distribution Q-Q plot.
10
5
0
-5
-2
-1
0
1
Standard Normal Quantiles
Plot (S3): Sample size = 20
2
Plot (S3) on the other hand, seems somewhat
normally distributed. There is a general fitting to the
straight line as seen in the graph, with the exception
of the first and last points.
Question 3(c)
Only plot (S3) seemed normal (if we disregard the first two and last one data point) so I will explain this
one.
QQ Plot of Sample Data versus Standard Normal
Quantiles of Input Sample
15
10
5
rise = (8.8 - 1.6) = 7.2
0
-5
-2
run = (1 - -1) = 2
-1.5
-1
-0.5
0
0.5
Standard Normal Quantiles
1
1.5
2
Plot (S3): Sample size = 20
Yes, it is possible to use the Q-Q plot to determine the mean and standard deviation of a sample that is
normally distributed. We will assume that because the sample is normally distributed then it is
theoretically represented by the straight red line shown in the graph above. Using this graph as a guide,
this is how we do it. For the mean of the sample, simple identify the middle points on the graph. I
provided the dashed green lines and an arrow as a guide. At this middle point, our sample mean
appears to be roughly 5.1, reading from the y-axis. Hence the mean of the sample is 5.1. For the
standard deviation, this is simply the slope of the straight line. From basic math we know that this slope
is determined by dividing the “rise” of the y-values over the “run” of the x-values. I have provided
dashed purple lines as guides. The “rise” is roughly 8.8 minus 1.6, while the “run” is roughly 1 minus -1.
(I.e. rise = 7.2 and run = 2). Hence the slope of the red line is 7.2 divided by 2, which is equal to 3.6.
Hence the standard deviation of the sample is 3.6. Again, this is all assuming all points are normally
distributed.
How do these values compare to the theoretical values in MATLAB?
 = 3.606
x = 5.125
These values are pretty close to what I have computed. Note however, that the actual standard deviation
and mean values for the S3 dataset was 4.535, and 4.826, respectively. Remember that the data is only
somewhat normal like I said before, just based on a visual check, hence, the deviation in values from the
theoretical values.
Question 3(d)
Step #1: Using “http://www.random.org/integers/” generate 10 uniform distributions on the interval
[1, 10] with 100 values each.
Step #2: For each dataset, transfer the values into MATLAB and place them in an array. Call these arrays
U1,…,U10, respectively.
Step #3: Using MATLAB, average U1 (which is the same as the U1 array), then average U1+U2, then
average U1+…+U6, then average U1+…+U10. Call these new arrays A1U, A2U, A6U, and A10U,
respectively.
Step #4: Generate Q-Q plots and histograms for A1U, A2U, A6U, and A10U, respectively. Also get the
standard deviation and mean values for each dataset so that they could be compared with the theoretical
standard deviation and mean values of the normal line fitted to the data. This is easily done using a few
lines of code in MATLAB.
This is what I got below:
12
QQ Plot of Sample Data versus Standard Normal
14
10
12
Quantiles of Input Sample
10
frequency
8
6
4
8
6
4
2
0
2
0
-2
-4
-3
1
2
3
4
5
6
x values
7
8
9
10
-2
-1
0
1
Standard Normal Quantiles
Plot (A1U): Histogram (left) and Q-Q Plot (right) for A1U dataset, i.e. average of 1 Uniform distribution
Theoretical:
 = 3.182
= 5.25
Sample:
 = 2.846
= 5.33
This is obviously not normally distributed; it is also
bimodal. (Look at the histogram and look at the Sshape on the Q-Q plot).
The theoretical versus sample mean and standard
deviation, respectively, are somewhat “off”.
2
3
QQ Plot of Sample Data versus Standard Normal
18
12
16
10
Quantiles of Input Sample
14
frequency
12
10
8
6
8
6
4
2
4
0
2
0
1
2
3
4
5
6
x values
7
8
9
-2
-3
10
-2
-1
0
1
Standard Normal Quantiles
2
3
Plot (A2U): Histogram (left) and Q-Q Plot (right) for A1U dataset, i.e. average of 2 Uniform distributions
Theoretical:
 = 2.121
This is somewhat normally distributed (Look at the
histogram and look at the relatively straight line on
the Q-Q plot).
= 5.5
The theoretical versus sample standard deviation is a
little off, but we can see the mean of the sample is
fairly similar and representative of the theoretical
vale.
Sample:
 = 2.036
= 5.515
25
QQ Plot of Sample Data versus Standard Normal
9
8
Quantiles of Input Sample
frequency
20
15
10
7
6
5
4
5
3
0
2
-3
2
3
4
5
6
7
8
9
-2
-1
0
1
Standard Normal Quantiles
2
x values
Plot (A6U): Histogram (left) and Q-Q Plot (right) for A1U dataset, i.e. average of 6 Uniform distributions
Theoretical:
 = 1.061
= 5.583
Sample:
 = 1.188
= 5.3
This is close to normally distributed (Look at the
histogram and look at the relatively straight line on
the Q-Q plot).
The theoretical versus sample standard deviation and
mean, respectively are a little off.
3
25
QQ Plot of Sample Data versus Standard Normal
9
8
Quantiles of Input Sample
frequency
20
15
10
7
6
5
4
5
3
0
3
4
5
6
x values
7
8
2
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
Plot (A10U): Histogram (left) and Q-Q Plot (right) for A1U dataset, i.e. average of 10 Uniform distributions
Theoretical:
 = 0.9899
= 5.6
Sample:
 = 0.9528
This is a bit closer to a normal distribution than the
previous 3 graphs. (Look at the histogram and look at
the relatively straight line capturing more data points
on the Q-Q plot).
The theoretical versus sample standard deviation and
mean, respectively are very close to each other.
= 5.57
Question 3(e)
Step #1: Using MATLAB, generate 50 sets of random exponential distributions, each having a mean
value of 5, and an array size of 100*1 (i.e. 100 rows, 1 column). Name the arrays E1, E2,…E50,
respectively. Because you said explicitly, here is an example of what the code looks like for one of the
arrays:
E1 = exprnd(5, [100,1]);
Step #2: Using MATLAB, keep averaging E1+…+ E10, E1+…+E20, and so… on until Call these new
arrays A10E, A20E, and so on respectively. E.g A6E = ((E1+E2+E3+E4+E5+E6)/6)
Step #3: Generate Q-Q plots for A10E, A20E, and so on respectively. This is easily done using a few
lines of code in MATLAB. E.g. of code used to generate a plot: qqplot(A6E)
Answer: About 45 columns need to be averaged first before the resulting distribution appears normal.
The number of column averages for the resulting distribution to be normal is larger than the value in part
3(d) because the exponential distribution with a mean value of 5 is right skewed, and therefore the mean
is to the left of the graph. This means that in order for the mean to shift to the center and “pileup”, more
values need to be averaged to allow the mean to start shifting and hence approach the center. Essentially,
according to the central limit theorem, each column that we will average will be treated independently
and therefore we expect the averages of those columns to be different, and eventually become normally
distributed, as seen at the 45th average in this case. In the case of the uniform used in part 3(d), everything
in equally likely, so we just need the mean value to become established and start developing a peak in the
center as we normally observe in a normal distribution, hence when we need to average less.