Download TAMS11: Introduction to Probability and Statistics Computer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
LINKÖPINGS TEKNISKA HÖGSKOLA
Matematiska Institutionen
Matematisk Statistik
TAMS11: Introduction to Probability and Statistics
Computer Laboratory 1
Attendance at the computer laboratories is compulsory. To pass the laboratory, you need to complete
all the exercises and show your results to the tutor. The work should be carried out in groups of two
or three. The aim is to give experience in analysing data, using the package Minitab, particularly
in situations where it is not convenient to make the computations by hand. The computer package
also gives graphical illustrations. OBS! Take the formula collection, course notes and calculator to the
exercises. The NT-Windows version of Minitab will be used. FIRST go into WINDOWS. Then choose
the latest version of MINITAB from the ‘Start’ program.
• Useful Information about MINTAB: Minitab works with data found in columns, C1, C2 ...,
and constants K1, K2 ..., and matrices M1, M2, ... . The contents of the columns may be seen in the
data window. There is also a row where one may name the columns.
• The Command Language: To use the command language, one first clicks on the session window
and then goes in under Editor - Enable Commands. Then the prompt
MTB >
appears in the session window and one may write in the commands. The command is registered by
pressing the return key.
• The Menu:
File contains the commands for opening, saving, printing, etc ....
Edit has the usual editor functions for cut, paste, etc ..
Calc contains different mathematical functions.
Stat contains various methods of statistical analysis.
Graph gives various types of diagrams and plots.
—————————————————————————————————————————————
Exercise 1: Simulation of Rolling Dice
We simulate the scores obtained by rolling a six sided dice. Enter the following commands:
set c1
1:6
end
set c2
6(1)
end
let c2=c2/6
1
Now go in under Calc - Random Data - Discrete and set 600 rolls of the dice in c3, Values in:
c1 and Probabilities in: c2. Enter the command
table c3
In the session window, you should see the frequencies f1 , . . . , f6 for the scores 1, . . . , 6.
To compute the sample average x̄ for c3, one enters the command
mean c3
To compute the sample standard deviation s for c3 one enters the command
stdev c3
How do these compare with the theoretical values µ and σ?
Exercise 2: Approximation of an Integral by Simulation
Integrals that cannot be computed by hands may be computed approximately using numberical methods. One of these methods is the Monte Carlo method, which uses random variables.
Suppose we want to find |A|, the area of a region A, which is contained wholly within a rectangle
(a, b) × (0, c). It is easy to generate randomly chosen points which are distributed uniformly within
the rectangle (a, b) × (0, c). If A is the area over which we want to integrate, then a point lands in the
|A|
.
area with probability (b−a)c
To generate a uniform distribution within the rectangle, one chooses a random point for the x-co
ordinate according to a uniform distribution in the interval (a, b) and a random point for the y-co
ordinate with uniform distribution in the interval (0, c).
Suppose we choose n points. Let fn denote the number that have landed in the region A. Then, by
the law of large numbers,
|A|
fn
≈
,
n
(b − a) c
where |A| denotes the area of the region A. It follows that
fn
n
As n increases, the right hand hand side gives a better approximation of |A|.
|A| ≈ (b − a) c
Random techniques for numerical integration are known as Monte Carlo methods. Our basic Monte
Carlo method is now applied to the computation of the integral
Z
I=
0
where
√1
2π
e
−x2 /2
0.5
1
2
√ e−x /2 dx,
2π
is the density function for a N (0, 1) - distribution.
Viewing this as an area, as described above, set a = 0 and b = 0.5. Find a value for c so that you have
a rectangle which contains the area under the Gaussian curve between x = 0 and x = 0.5.
2
First, let n = 20 and go in under Calc - Random Data - Uniform.... Make 20 observations
(x-coordinates) from a uniform distribution in column c1. Let Lower endpoint = 0 and Upper
endpoint = 0.5. Repeat for the y-coordinates and put them into c2, where the upper endpoint is the
value you calculated, so that the rectangle contains the whole area under the Gaussian curve.
Now write in the session window:
let c3=(1/sqrt(2∗3.14))∗expo(-c1∗∗2/2)
let c4=c2<c3
sum c4
Now we have the number of points f20 which land in the area corresponding to the integral.
What is the resulting approximate value for I?
Repeat for n = 100 and then for n = 1000.
For X ∼ N (0, 1), the value of I = P (0 < X < 0.5) may be found by looking up tables. What is its
value, using the probability tables?
Exercise 3: The Central Limit Theorem
Here we simulate the average of n independent random variables from two continuous distributions:
the exponential distribution and the uniform distribution.
For the exponential, let µ = 4 (that is, λ = 1/4).
√
√
For the uniform, set a = 4 − 4 3 ' −2.93 and b = 4 + 4 3 ' 10.93. This ought to imply that the
expected values and variances are the same for each distribution. Check this!
By the central limit theorem, we can approximate the distribution of the sum of n independent identically distributed random variables by a normal distribution. We now check if this approximation
really works, by simulating n observations from each of these distributions.
First, let n = 5. Go in under Calc - Random Data - Exponential... and create 1000 observations
in each of the columns c1-c5. Choose mean = 4.
In each row, there are five observations from the exponential distribution, with µ = 4). Now go in
under Calc - Row Statistics and choose the altenative mean for c1-c5 (Input variables) and Store
Results in c6. (The average value for each row is put into c6.)
To see how the distribution for the 1000 average values appears graphically, go in under Stat Basic Statistics - Display Descriptive Statistics, write in c6, go in under Graphs... and choose
Histogram of data, with normal curve. A normal distribution curve has been added to the
histogram that appears. Also observe the values in the column to the right. Tex skewness is a measure
of the asymmetry of the histogram. The normal distribution has skewness value of 0.
Now, go in under Calc - Random Data - Uniform... and make 1000 observations from a uniform
distribution U (−2.93, 10.93) - distribution. Let Lower endpoint = -2.93 and Upper endpoint =
3
10.93. Repeat what you did for the exponential distribution, using the observations from the uniform
distribution.
Repeat the procedure, with the necessary modifications, for the exponential and uniform distributions,
when n = 50.
What are your conclusions? (i.e. how good is the central limit theorem approximation)?
—————————————————————————————————————————————
Control Sheet
Names:
Exercise 1
f1 = . . .
f2 = . . .
f3 = . . .
average value x̄ = . . .
f4 = . . .
f5 = . . .
f6 = . . .
expectation E(X) = . . .
OK . . . . . .
Exercise 2
Figure:
Approximations of I:
n = 20 : . . .
n = 100 : . . .
n = 1000 : . . .
I = P (0 < X < 0.5) = . . .
(from the tables)
OK . . . . . .
Exercise 3
Show your results to the tutor.
OK . . . . . .
4