Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MATH1002 Introduction to Statistics - Minitab Workshop 2 1. TV viewing data. A study was carried out to investigate TV viewing patterns in which a random sample of 120 people of different ages, gender and educational background were interviewed. The resulting data are stored in the Minitab worksheet tvhrs.mtw in the folder \\uk-ac-man-ss7\vol3\shared\eps\Maths\MATH10002. This is the shared data on the CLIP image. The variables in columns c1-c9 are: Column C1-T C2 C3 C4 C5 C6 C7 C8 C9 Name ID AgeGrp Age Gender Sesame HrsTv HrsMTV HrsNews Educa Description Subject identification number Age group of subject 1= < 18 years old 2= 18-30 years old 3= > 30 years old Age of subject 1= male, 2=female Watched Sesame Street or not: 1=no, 2=yes Hours spent watching TV in a week Hours spent watching MTV Hours spent watching news Educational background (7 categories) Open the worksheet using File > Open Worksheet. When it is opened click on the blue bar at the top of the session window. Then click on Editor > Enable Commands to enable the Minitab prompt “> MTB” to appear in the session window. When you now use the drop down menus to perform some action you will see the commands that you have implicitly used appear in the session window. You can even type your own commands after the prompt. We will first look at comparing males and females in terms of how many hours they spend watching TV in a week. Use Stat > Basic Statistics > Display Descriptive Statistics and select HrsTv by gender. Choose the particular statistics and graphs you wish to use for comparative purposes. For the statistics you need to choose the mean, standard deviation and those needed for a 5 number summary. For the Graphs just select Boxplot of data. Use these results to comment on the differences between males and females in terms of the times they spend watching TV in a week and the distributions of these times. Use the boxplots to comment further on the main features of the male and female distributions (including whether they appear to be skewed or symmetric) and the differences between them. We now want to compute separate histograms with a Normal fit and separate Normal probability plots to examine the Normality of the male data and the female data. With this aim, it will be most convenient if we extract the data in C6 into two columns – one for males and one for females. To do this use Data > Copy > Columns to Columns. Then choose C6 to copy from and C11 in the current worksheet to copy to. To firstly select males click on “subset data”. Highlight “specify which rows to include” and “rows that match”. Click on “condition” and type “Gender = 1” as the condition to select males. Click on OK in each dialogue box to perform the action. You then need to repeat this process to copy the female (gender=2) data into C12. Now produce histograms and normal probability plots for the males and the females. Comment on the main features you see in these plots. Do they indicate that the male data are a random sample from a Normal distribution. Comment on whether the female data appear to be Normal. It is also sensible to analyze the data in C6 on the hours spent watching TV separately for the three age groups in C2 (AgeGrp). What you should do then is to carry out the above steps, but this time compare the data for the three age groups using summary statistics and graphics. You should ignore sex in this part of the analysis. Comment in detail on your results. 2. Simulating the distribution of the sample mean. In this part we will simulate 100 random samples from an Exponential distribution and look at the distribution of the means for different sample sizes. You should see the central limit theorem take effect as the sample size n increases. You should open a new project or at least a new worksheet for this part using File > New. As above, enable commands in the session window. Firstly then, we want to create 100 random samples, of size n=30 each, from the Exponential distribution with mean equal to 10.0. To do this use Calc > Random Data > Exponential. We want to generate 30 rows of data in cols C1-C100 and the scale parameter is to be 10.0. It is more convenient here if the samples are placed each along one row of the worksheet instead of down the columns. To do this you need to copy the data in C1-C100 into a 30x100 matrix, transpose the matrix and then copy the transposed matrix back into the columns of the worksheet and then each sample will be on a separate row. The commands to do this are: Data > Copy > Columns to matrix. Copy from columns C1-C100 into matrix M1 Calc > Matrices > Transpose. Transpose M1 and store the result in M2 Data > Copy > Matrix to columns. Copy from M2 and store the data in columns C111C140 Delete columns C1-C100 by typing ERASE C1-C100 at the prompt in the session window We will use columns C111-C115 only for samples of size 5, columns C111-C120 for samples of size 10 and columns C111-C140 for samples of size 30. Calc > Row Statistics. Choose the mean as the statistic to be calculated. For samples of size 5 use C111-C115 as input variables and store the result in C151. It will contain 100 numbers, each corresponding to the mean of a sample of size 5 from the Exponential distribution whose mean equals 10. Investigate the distribution of the means in C151 using summary statistics, a histogram with a Normal pdf superimposed and a Normal probability plot. Do the sample means appear to be Normally distributed? Also, compare the mean and standard deviation of the means in C151 with what you would expect theoretically. Now repeat the calculation of row statistics for samples of size 10 using C111-C120 and investigate the Normality of the ensuing sample means which you can store in C152 this time. Finally, repeat for samples of size 30 using C111-C140 storing the sample means in C153. Again, examine whether the sample means are Normally distributed. How is the shape, centre and spread of each histogram related to the sample size n? What conclusions do you draw from these comparisons? Assessment: A written report on part of this workshop forms the first assessed coursework for this part of the module MATH1002. More specifically, firstly you need to write a report on your analysis of the hours spent watching TV for the three age groups in the dataset. You should not include any discussion of your analysis of the differences between the sexes. Secondly, you need to report on the results of your simulations from the Exponential distribution. Both reports should be submitted as one whole piece of work. It should include the relevant numerical and graphical output from Minitab as well as your comments and conclusions on the results and maybe prepared using Microsoft Word or written by hand. Please ensure though, that if you choose the latter format, it is in a clear and readable format. The report should be all your own work. It should be handed in at the Student Support Office in Room G204 of the Alan Turing Building by Friday 26th March 2010.