Download Project 1 Lecture Notes - University of Arizona Math

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
Random Sampling
 In the real world, most R.V.’s for practical
applications are continuous, and have no
generalized formula for fX(x) and FX(x).
 We may approximate the density functions by
taking a random sample, with a large enough
sample size, n, and plot the relative
frequencies within the sample.
Random Sampling
 Examples:
 Suppose you wanted to know more information
about the GPAs of students enrolled at the U of A
 Rather than look up every individual student, you
can take a small sample of randomly selected
students and figure out their GPAs to project what
the GPAs of the entire student body would be.
 Taking a poll of registered voters for the
presidential election
Random Sampling
 The whole idea behind random sampling is to
let a part represent the
whole.
Random Sampling
• Estimate distribution and properties of a random
variable by taking a random sample
• We consider that all events are independent from
one another
• Collect a sample with n items (a random sample of
size n)
Random Sampling
• If we take a large enough sample, the histogram
shows us the distribution of the data
• This means we know the p.m.f. graph for a finite r.v.
or the p.d.f. graph for a continuous r.v.
• In addition, the max, min, & mean of a sample
should be similar to the max, min, & mean of the
variable
Random Sampling
• Suppose that X is the number of assembly line
stoppages that occur during an 8-hour shift in
our manufacturing plant.
• We could obtain a random sample of size 10 by
watching the line for 10 different shifts and
recording the number of stoppages during
each eight hour shift.
Random Sampling
• The table below shows the number of work
stoppages for various shifts:
Shift observed
Number of stoppages
1
2
2
11
3
6
4
8
5
6
6
5
7
10
8
4
9
8
10
3
• A histogram plot of these stoppages gives us a
pictorial representation of how this chaotic
data behaves.
Random Sampling
• The relative frequency histogram plot is shown
below:
Random Sampling
• Our histogram plot of the relative frequencies
for the work stoppage example can be used to
approximate the p.m.f. for this situation
• Of course, if we increase the number of
observations of shifts, our p.m.f. will be more
accurate.
Random Sampling
• From our example, we can also look at the average
number of work stoppages:
1
E  X    X   (2  11  6  8  6  5  10  4  8  3)
10
63

 6.3
10
• The average we just found is sometimes called the
sample mean and can be found using:
1 n
x    xi
n i 1
Random Sampling
• Suppose that the assembly line discussed in
Example 1 runs 24 hours per day, with workers
in three shifts. The sheet Numbers in the Excel
file Stoppages.xls contains records of the
number of stoppages per shift for nine months
(819 shifts).
Random Sampling
• Computations in that sheet show that the
number of stoppages in the sample ranged
from 0 to 14, with a mean of 5.78.
• The sample in Stoppages.xls is much larger
than the one of size 10 that we considered in
the previous example.
• Hence, we would replace the earlier estimate
of 6.3 for E(X) with the new estimate of 5.78.
Random Sampling
• A histogram plot of the relative frequencies
also give us a good estimation for the p.m.f.
Relative Frequency
Sample Data
0.200
0.180
0.160
0.140
0.120
0.100
0.080
0.060
0.040
0.020
0.000
0
1
2
3
4
5
6
7
8
Stoppages
9
10 11 12 13 14
Random Sampling
• We can also use a large sampling to approximate
the p.d.f. for a continuous random variable.
• Plant manager wants to better understand the
delays caused by stoppages of the assemble line.
• She is specifically interested in how long they last.
Random Sampling
• Let T be the length of time, in minutes, that a
randomly selected stoppage will last.
• The duration of each of the 4,734 stoppages
that occurred during the 819 shifts was
recorded.
• This provides a random sample of observations
of the continuous random variable T. The
times are shown in the sheet Times in the Excel
file Stoppages.xls.
Random Sampling
• The histogram of times is converted to relative
frequencies. We would like to treat this as a p.d.f.
• This means the total area must be 1!
• To do this the area of each rectangle of our
histogram must equal the relative frequency.
Random Sampling
• Because we already have made our bins of width 2
we must adjust the heights of our relative
frequency so that the area of each rectangle equals
the relative frequency.
• This is done by taking the relative frequencies and
dividing by the bin width
• Notice for example the bar whose bin label is 7 has
a height of 0.07, the area of this bar = 2 * 0.07 =
0.14, which is the relative frequency for this bin.
Random Sampling
• An example of a histogram from the Excel file
Stoppages.xls is shown below
TIMES
0.08
0.07
Approx. f T(t)
0.06
0.05
0.04
0.03
0.02
0.01
0.00
1
7
13 19 25 31 37 43 49 55 61 67 73 79
t
Random Sampling
• We can create an approximate p.d.f. by connecting
the midpoints of the bins at the top of each bar
TIMES
0.08
0.07
0.06
Approx. f T(t)
• Although the
actual p.d.f. is
smooth, this graph
gives a good
representation
0.05
0.04
0.03
0.02
0.01
0.00
1
7
13 19 25 31 37 43 49 55 61 67 73 79
t
Random Sampling
• Simulations used as a predictor
• Can give an estimate of what might happen for many
trials
• Estimate is generally accurate
Random Sampling
• Simulation for finite choices
• Use RANDBETWEEN function in Excel
• Chooses an integer between two values
• Ex. =RANDBETWEEN(1,5) would return a value equal
to 1, 2, 3, 4, or 5
Random Sampling
• Sample of RANDBETWEEN function
Random Sampling
• VLOOKUP function will find information in a table
Value in leftmost column
Location of table (database)
Number value of column (1, 2, 3, …)
Usually blank
Random Sampling
• Focus on the Project:
• In the Excel file SampleData.xls on my website, we find
that there are 615 arrival values for the 9 a.m. hour.
This is a random sample of the random variable A.
Number of Minimum
Times
Time
615
0.00
Mean
Time
0.48
Maximum Range of
Time
Times
2.75
2.75
Random Sampling
• Focus on the Project:
• In order to graph the p.m.f. and p.d.f. the heights of
the bars must be adjusted
• Take relative frequency (percentage) and divide by the
bin width to give the new height
• This ensures the area under the graph will equal 1
Random Sampling
• Focus on the Project:
• p.m.f. approximating p.d.f. for variable A
PMF approximating PDF
2
1
0.5
Arrival Times (9 a.m.)
2.
8
2.
6
2.
4
2.
2
2
1.
8
1.
6
1.
4
1.
2
1
0.
8
0.
6
0.
4
0.
2
0
0
Height
1.5
Random Sampling
• Focus on the Project:
• Approximate p.d.f. and actual p.d.f. for variable A
Approximate PDF and Actual PDF
2.5
Height
2
1.5
1
0.5
0
0
0.5
1
1.5
2
Arrival Times (9 a.m.)
2.5
3
Random Sampling
• Focus on the Project:
• Note: (since   0.48)
0
for x  0

f A a    1 a / 0.48
for a  0
 0.48 e
0
for a  0

FA a   
 a / 0.48
1

e
for a  0

Random Sampling
• Focus on the Project:
• We find that there are 130 arrival values for the 9
p.m. hour. This is a random sample of the random
variable B.
Number of Minimum
Times
Time
130
0.00
Mean
Time
2.20
Maximum Range of
Time
Times
9.22
9.22
Random Sampling
• Focus on the Project:
• Note: (since   2.20) we get the following for 9 p.m.
0
for b  0

f B b    1 b / 2.20
for b  0
 2.20 e
0

FB b   
b / 2.20
1

e

for b  0
for b  0
Random Sampling
• Focus on the Project:
• For the service times, we get
Number of Minimum
Times
Time
8356
0.50
Mean
Time
1.17
Maximum Range of
Time
Times
7.72
7.22
Random Sampling
• Focus on the Project:
• For the service times, we get the following graph
Service Time PMF approximating PDF
1.2
0.8
0.6
0.4
0.2
Service Times
8
7.5
7
6.5
6
5.5
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
0
Height
1
Random Sampling
• Focus on the Project:
• You may not change the name of the Excel file
Queue Focus.xls (remember that you MUST
download the updated Queue Focus.xls file)
• You may not delete or insert any rows, columns,
nor cells
Random Sampling
• Focus on the Project:
• Copy and paste your service time data from your
team data into the Excel file Queue Focus.xls on the
sheet Data in cells H45:H???
• This will end with your last service time
• You should also increase the numbers in column G
Random Sampling
•
Focus on the Project:
•
Note: The sample data contains 8356 service time records
•
This is the formula that exists in cell E83 (and continues to
cell E282) in Queue Focus.xls in the sheet Random Sampling
=IF(ISNUMBER(D83),VLOOKUP(RANDBETWEEN(1,7634),Data!$G
$45:Data!$H$7678,2),"")
Random Sampling
•
Focus on the Project:
•
The formula should be modified as follows:
=IF(ISNUMBER(D83),VLOOKUP(RANDBETWEEN(1,7634),Data!$G
$45:Data!$H$7678,2),"")
=IF(ISNUMBER(D83),VLOOKUP(RANDBETWEEN(1,your last
service time),Data!$G$45:Data!$H$your last service time
cell,2),"")
Random Sampling
• Focus on the Project:
• Change the value in cell D80 (number of customers)
of Queue Focus.xls in the sheet Random Sampling
to the number in your team’s Excel file in cell F18
(number of customers simulation must
accommodate)
Random Sampling
• Focus on the Project: (What to do)
• Perform all steps discussed in this “focus on the
project” section (DO NOT rename the Queue
Focus.xls file)
• Find formulas for f A , FA, f B, and FB