Download TAMS11: Introduction to Probability and Statistics Computer

LINKÖPINGS TEKNISKA HÖGSKOLA Matematiska Institutionen Matematisk Statistik TAMS11: Introduction to Probability and Statistics Computer Laboratory 1 Attendance at the computer laboratories is compulsory. To pass the laboratory, you need to complete all the exercises and show your results to the tutor. The work should be carried out in groups of two or three. The aim is to give experience in analysing data, using the package Minitab, particularly in situations where it is not convenient to make the computations by hand. The computer package also gives graphical illustrations. OBS! Take the formula collection, course notes and calculator to the exercises. The NT-Windows version of Minitab will be used. FIRST go into WINDOWS. Then choose the latest version of MINITAB from the ‘Start’ program. • Useful Information about MINTAB: Minitab works with data found in columns, C1, C2 ..., and constants K1, K2 ..., and matrices M1, M2, ... . The contents of the columns may be seen in the data window. There is also a row where one may name the columns. • The Command Language: To use the command language, one first clicks on the session window and then goes in under Editor - Enable Commands. Then the prompt MTB > appears in the session window and one may write in the commands. The command is registered by pressing the return key. • The Menu: File contains the commands for opening, saving, printing, etc .... Edit has the usual editor functions for cut, paste, etc .. Calc contains different mathematical functions. Stat contains various methods of statistical analysis. Graph gives various types of diagrams and plots. ————————————————————————————————————————————— Exercise 1: Simulation of Rolling Dice We simulate the scores obtained by rolling a six sided dice. Enter the following commands: set c1 1:6 end set c2 6(1) end let c2=c2/6 1 Now go in under Calc - Random Data - Discrete and set 600 rolls of the dice in c3, Values in: c1 and Probabilities in: c2. Enter the command table c3 In the session window, you should see the frequencies f1 , . . . , f6 for the scores 1, . . . , 6. To compute the sample average x̄ for c3, one enters the command mean c3 To compute the sample standard deviation s for c3 one enters the command stdev c3 How do these compare with the theoretical values µ and σ? Exercise 2: Approximation of an Integral by Simulation Integrals that cannot be computed by hands may be computed approximately using numberical methods. One of these methods is the Monte Carlo method, which uses random variables. Suppose we want to find |A|, the area of a region A, which is contained wholly within a rectangle (a, b) × (0, c). It is easy to generate randomly chosen points which are distributed uniformly within the rectangle (a, b) × (0, c). If A is the area over which we want to integrate, then a point lands in the |A| . area with probability (b−a)c To generate a uniform distribution within the rectangle, one chooses a random point for the x-co ordinate according to a uniform distribution in the interval (a, b) and a random point for the y-co ordinate with uniform distribution in the interval (0, c). Suppose we choose n points. Let fn denote the number that have landed in the region A. Then, by the law of large numbers, |A| fn ≈ , n (b − a) c where |A| denotes the area of the region A. It follows that fn n As n increases, the right hand hand side gives a better approximation of |A|. |A| ≈ (b − a) c Random techniques for numerical integration are known as Monte Carlo methods. Our basic Monte Carlo method is now applied to the computation of the integral Z I= 0 where √1 2π e −x2 /2 0.5 1 2 √ e−x /2 dx, 2π is the density function for a N (0, 1) - distribution. Viewing this as an area, as described above, set a = 0 and b = 0.5. Find a value for c so that you have a rectangle which contains the area under the Gaussian curve between x = 0 and x = 0.5. 2 First, let n = 20 and go in under Calc - Random Data - Uniform.... Make 20 observations (x-coordinates) from a uniform distribution in column c1. Let Lower endpoint = 0 and Upper endpoint = 0.5. Repeat for the y-coordinates and put them into c2, where the upper endpoint is the value you calculated, so that the rectangle contains the whole area under the Gaussian curve. Now write in the session window: let c3=(1/sqrt(2∗3.14))∗expo(-c1∗∗2/2) let c4=c2<c3 sum c4 Now we have the number of points f20 which land in the area corresponding to the integral. What is the resulting approximate value for I? Repeat for n = 100 and then for n = 1000. For X ∼ N (0, 1), the value of I = P (0 < X < 0.5) may be found by looking up tables. What is its value, using the probability tables? Exercise 3: The Central Limit Theorem Here we simulate the average of n independent random variables from two continuous distributions: the exponential distribution and the uniform distribution. For the exponential, let µ = 4 (that is, λ = 1/4). √ √ For the uniform, set a = 4 − 4 3 ' −2.93 and b = 4 + 4 3 ' 10.93. This ought to imply that the expected values and variances are the same for each distribution. Check this! By the central limit theorem, we can approximate the distribution of the sum of n independent identically distributed random variables by a normal distribution. We now check if this approximation really works, by simulating n observations from each of these distributions. First, let n = 5. Go in under Calc - Random Data - Exponential... and create 1000 observations in each of the columns c1-c5. Choose mean = 4. In each row, there are five observations from the exponential distribution, with µ = 4). Now go in under Calc - Row Statistics and choose the altenative mean for c1-c5 (Input variables) and Store Results in c6. (The average value for each row is put into c6.) To see how the distribution for the 1000 average values appears graphically, go in under Stat Basic Statistics - Display Descriptive Statistics, write in c6, go in under Graphs... and choose Histogram of data, with normal curve. A normal distribution curve has been added to the histogram that appears. Also observe the values in the column to the right. Tex skewness is a measure of the asymmetry of the histogram. The normal distribution has skewness value of 0. Now, go in under Calc - Random Data - Uniform... and make 1000 observations from a uniform distribution U (−2.93, 10.93) - distribution. Let Lower endpoint = -2.93 and Upper endpoint = 3 10.93. Repeat what you did for the exponential distribution, using the observations from the uniform distribution. Repeat the procedure, with the necessary modifications, for the exponential and uniform distributions, when n = 50. What are your conclusions? (i.e. how good is the central limit theorem approximation)? ————————————————————————————————————————————— Control Sheet Names: Exercise 1 f1 = . . . f2 = . . . f3 = . . . average value x̄ = . . . f4 = . . . f5 = . . . f6 = . . . expectation E(X) = . . . OK . . . . . . Exercise 2 Figure: Approximations of I: n = 20 : . . . n = 100 : . . . n = 1000 : . . . I = P (0 < X < 0.5) = . . . (from the tables) OK . . . . . . Exercise 3 Show your results to the tutor. OK . . . . . . 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download TAMS11: Introduction to Probability and Statistics Computer