Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT 10020 Minitab - Lab 2 Part 1 - Probability 1. Consider the sample space when two dice are thrown together. One die is green and the other, red. There are 36 outcomes: (1, 1), (1, 2) ... (1, 6), (2, 1) ... (6, 6). Enter these sample points into a Minitab worksheet. Let C1 represent the number on the green die and let C2 represent the number on the red die. Make sure that each of the 36 possible red-green pairs appears exactly once in your worksheet. We are interested in the sum of the numbers on the two dice, so make a new column called Result containing the sum of C1 and C2 (remember we saw how to do this in Lab 1). 2. Now, we want to get the probability of each result in C3 occurring. Use code like this: MTB > TALLY C3; SUBC > COUNTS; SUBC > PERCENTS. Here, we are asking Minitab to display the number of times each value is seen in C3. We also ask it to display the percentage of the time each value is seen. Notice the syntax we use. The semicolon at the end of the first line tells Minitab not to process the command immediately but instead to let us enter a subcommand. The full stop at the end of the third line tells Minitab that we have finished entering commands and it is time to process them. Look at the Session window. To get the probability of a particular result (4, for example) divide the number of times it occurs (the number in the Counts column) by the number of points in the sample space. How does this probability relate to the percentage column? Fill in the probability of each result in the following table: Result Probability 2 3 4 5 6 7 8 9 10 11 12 3. Suppose we want to roll two dice 50 times and see what pattern the results follow. We could roll them by hand and record the result each time, but Minitab gives us an easier way to do it. Picking a number at random from the Result column is really the same as throwing a pair of dice and recording the result. So we could write each number in the Result column on a piece of paper, put all 36 pieces into a hat, and pull one out. Or we could just ask Minitab to choose a number from the Result column at random, repeat that 50 times, and put all 50 numbers into a new column. Click on Calc – Random Data – Sample from Columns. You want to simulate throwing two dice with the sum of the two being recorded. (Which column is the sample space for such an experiment?) The number of throws you want in the sample Tick this box Enter the next available column (In this case, C4) You should see 50 numbers appear in C4. Why did we choose to sample with replacement? Now we’re going to compare the percentage of times each result occurred with the probability of each result occurring. Type in the Session window: MTB > TALLY C3 C4; SUBC > PERCENTS. Compare the two columns. Are the percentages shown similar for both? Repeat the simulation, but take a sample of 5,000 (store the results in C5). This is equivalent to rolling a pair of dice 5,000 times. Tally the results in this new column and compare them with the results in C3. What do you notice? Is there a difference between the pattern of numbers in C4 and C5? 4. Now consider an experiment where two coins are tossed. We are interested in the number of heads – 0, 1 or 2. In a new worksheet or in new columns of your current worksheet, enter the sample space for this experiment, with one column representing coin 1 and another representing coin 2. Use the techniques you have just learned to find the probability of the following outcomes: i. Exactly 1 head: ii. At least 1 head: Use Minitab to simulate 10 tosses of 2 coins. What percentage of the time does exactly 1 head occur? What percentage of the time does at least 1 head occur? Do these agree with the probabilities you calculated above? Repeat the simulation, but simulate 100 and then 1,000 tosses. How do the percentages of each outcome occurring agree with the probabilities now? Why is this? Part 2 – Data Analysis 1. Go to the class page (www.ucd.ie/statdept/classpages/stat_10020.html) and download the data set called Pulse.mtw. Save it in the Minitab folder on your H drive. (Remember, this is the folder where you saved your work from Lab 1). 2. In Minitab, click on File – Open Worksheet to open the worksheet. You will see data in several columns from an experiment to study pulse rate. In this experiment, volunteers recorded their resting pulse rate (shown in the column Pulse1). They were then randomised into two groups, one of which ran for one minute and the other of which sat still for one minute (recoded in the column Ran). They then took their pulses again (recorded in Pulse2). Some other data including height, weight, sex and some other background information is also recorded. 3. We want a histogram of the variable Pulse1. Click on Graph – Histogram and choose Simple Histogram. In the dialogue box that you see, double-click on Pulse1. 4. Now draw a dotplot of the same data. (Click on Graph – Dotplot). From the two graphs you’ve drawn, can you locate the lowest and highest values of resting pulse rate? 5. Look at the Ran column. It contains 1 or 2, depending on whether the subject ran or not. We’re going to try to decide which value means that the subject ran by looking at the second pulse measurement. Go to Graph – Dotplot and choose With Groups. You should see a dialogue box like this: Choose the variable you want to graph – Pulse 2 Choose the variable you want to group by - Ran Which group do you think ran for one minute? 6. Next we are going to compare descriptive statistics for the two groups. There are two ways of doing this. The first is to separate the data in the Pulse2 column according to the number in the Ran column. Click on Data - Unstack Columns. You should see a dialogue box like this: The variable you want to separate. How do you want to separate the variable? Now you should have two new columns containing the values from Pulse2. Use Stat - Basic Statistics - Display Descriptive Statistics and select Pulse2_1 and Pulse2_2 as the variables. Compare the means and medians. Does this agree with the impression you got from the dotplot earlier? 7. There is a simpler way to do this. Go to Stat - Basic Statistics - Display Descriptive Statistics again. This time, select Pulse2 as the variable and check the box marked "By variable". Put "Ran" in this box and click OK, and you should see the descriptive statistics displayed separately for those who did and did not run. Are they the same as your results from Step 6? (They should be!) 8. Now we’ll compare the heights of male and female volunteers. You can use the method described in either Step 6 or Step 7 - whichever one makes more sense to you. How does the mean height of males compare with that of females? 9. Save your work. Click on File - Save Project. Remember to save your work in your home directory, which is shown as your student number or by the letter H. If you save it anywhere else you won't be able to access is again. ASSIGNMENT Submit this assignment at the beginning of your next lab, 2 weeks from today. Assignments must be submitted before class; any that are completed during class will be considered late and you will not receive credit for them. Include appropriate output from Minitab to support your answers. Remember to put your name and student number, as well as the lab time and room you attend, on your assignment. 1. Consider an experiment where counters numbered 1 to 5 are placed in each of two bags. One counter is drawn from each bag and then replaced. In a new Minitab worksheet, produce the sample space for the experiment. (Hint: you should have 25 outcomes in total). a) What is the probability of drawing any given number from the first bag? b) Suppose we want to add the numbers on the two counters we drew. What code would you use in Minitab to make a third column containing the sum of the numbers on the two counters? Call this column "Total". 2. c) What is the probability of getting a total of 3? d) What is the probability of getting a total of 4 or less? e) What is the probability of getting a total of 5 or more? Simulate 50 runs of this experiment. a) What percentage of the time was the result 3? b) What percentage of the time was the result 4 or less? c) What percentage of the time was the result 5 or more? Simulate 10,000 runs of this experiment. 3. d) What percentage of the time was the result 3? e) What percentage of the time was the result 4 or less? f) What percentage of the time was the result 5 or more? g) Why are the percentages closer to the true probabilities now? Using the data from the simulation of 50 trials, draw a dotplot. Do you expect it to be left or right skewed or symmetric? Is it in fact left or right skewed, or is it symmetric? REVISION SUMMARY After this lab you should be able to: - Tally a column (using both counts and percentages) - Use the subcommand function - Take a random sample from a column - Generate graphs and summary statistics - Unstack data into new columns.