Download Stats Workshop 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
MATH1002 Introduction to Statistics - Minitab
Workshop 2
1. TV viewing data. A study was carried out to investigate TV viewing patterns in
which a random sample of 120 people of different ages, gender and educational
background were interviewed. The resulting data are stored in the Minitab worksheet
tvhrs.mtw in the folder \\uk-ac-man-ss7\vol3\shared\eps\Maths\MATH10002. This is
the shared data on the CLIP image. The variables in columns c1-c9 are:
Column
C1-T
C2
C3
C4
C5
C6
C7
C8
C9
Name
ID
AgeGrp
Age
Gender
Sesame
HrsTv
HrsMTV
HrsNews
Educa
Description
Subject identification number
Age group of subject
1= < 18 years old
2= 18-30 years old
3= > 30 years old
Age of subject
1= male, 2=female
Watched Sesame Street or not: 1=no, 2=yes
Hours spent watching TV in a week
Hours spent watching MTV
Hours spent watching news
Educational background (7 categories)
Open the worksheet using File > Open Worksheet. When it is opened click on the blue
bar at the top of the session window. Then click on Editor > Enable Commands to
enable the Minitab prompt “> MTB” to appear in the session window. When you now
use the drop down menus to perform some action you will see the commands that you
have implicitly used appear in the session window. You can even type your own
commands after the prompt.
We will first look at comparing males and females in terms of how many hours they
spend watching TV in a week. Use Stat > Basic Statistics > Display Descriptive
Statistics and select HrsTv by gender. Choose the particular statistics and graphs you
wish to use for comparative purposes. For the statistics you need to choose the mean,
standard deviation and those needed for a 5 number summary. For the Graphs just select
Boxplot of data. Use these results to comment on the differences between males and
females in terms of the times they spend watching TV in a week and the distributions of
these times.
Use the boxplots to comment further on the main features of the male and
female distributions (including whether they appear to be skewed or symmetric) and the
differences between them.
We now want to compute separate histograms with a Normal fit and separate Normal
probability plots to examine the Normality of the male data and the female data. With
this aim, it will be most convenient if we extract the data in C6 into two columns – one
for males and one for females. To do this use Data > Copy > Columns to Columns.
Then choose C6 to copy from and C11 in the current worksheet to copy to. To firstly
select males click on “subset data”. Highlight “specify which rows to include” and “rows
that match”. Click on “condition” and type “Gender = 1” as the condition to select males.
Click on OK in each dialogue box to perform the action. You then need to repeat this
process to copy the female (gender=2) data into C12.
Now produce histograms and normal probability plots for the males and the females.
Comment on the main features you see in these plots. Do they indicate that the male
data are a random sample from a Normal distribution. Comment on whether the female
data appear to be Normal.
It is also sensible to analyze the data in C6 on the hours spent watching TV separately
for the three age groups in C2 (AgeGrp). What you should do then is to carry out the
above steps, but this time compare the data for the three age groups using summary
statistics and graphics. You should ignore sex in this part of the analysis. Comment in
detail on your results.
2. Simulating the distribution of the sample mean. In this part we will simulate 100
random samples from an Exponential distribution and look at the distribution of the
means for different sample sizes. You should see the central limit theorem take effect as
the sample size n increases. You should open a new project or at least a new worksheet
for this part using File > New. As above, enable commands in the session window.
Firstly then, we want to create 100 random samples, of size n=30 each, from the
Exponential
distribution with mean equal to 10.0. To do this use Calc > Random
Data > Exponential. We want to generate 30 rows of data in cols C1-C100 and the scale
parameter is to be 10.0.
It is more convenient here if the samples are placed each along one row of the worksheet
instead of down the columns. To do this you need to copy the data in C1-C100 into a
30x100 matrix, transpose the matrix and then copy the transposed matrix back into the
columns of the worksheet and then each sample will be on a separate row. The
commands to do this are:
Data > Copy > Columns to matrix. Copy from columns C1-C100 into matrix M1
Calc > Matrices > Transpose. Transpose M1 and store the result in M2
Data > Copy > Matrix to columns. Copy from M2 and store the data in columns C111C140
Delete columns C1-C100 by typing ERASE C1-C100 at the prompt in the session
window
We will use columns C111-C115 only for samples of size 5, columns C111-C120 for
samples of size 10 and columns C111-C140 for samples of size 30.
Calc > Row Statistics. Choose the mean as the statistic to be calculated. For samples
of size 5 use C111-C115 as input variables and store the result in C151. It will contain
100 numbers, each corresponding to the mean of a sample of size 5 from the Exponential
distribution whose mean equals 10.
Investigate the distribution of the means in C151 using summary statistics, a histogram
with a Normal pdf superimposed and a Normal probability plot. Do the sample means
appear to be Normally distributed? Also, compare the mean and standard deviation of
the means in C151 with what you would expect theoretically.
Now repeat the calculation of row statistics for samples of size 10 using C111-C120 and
investigate the Normality of the ensuing sample means which you can store in C152 this
time.
Finally, repeat for samples of size 30 using C111-C140 storing the sample means in
C153. Again, examine whether the sample means are Normally distributed.
How is the shape, centre and spread of each histogram related to the sample size n? What
conclusions do you draw from these comparisons?
Assessment: A written report on part of this workshop forms the first assessed
coursework for this part of the module MATH1002. More specifically, firstly you need
to write a report on your analysis of the hours spent watching TV for the three age groups
in the dataset. You should not include any discussion of your analysis of the differences
between the sexes. Secondly, you need to report on the results of your simulations from
the Exponential distribution.
Both reports should be submitted as one whole piece of work. It should include the
relevant numerical and graphical output from Minitab as well as your comments and
conclusions on the results and maybe prepared using Microsoft Word or written by hand.
Please ensure though, that if you choose the latter format, it is in a clear and readable
format. The report should be all your own work.
It should be handed in at the Student Support Office in Room G204 of the Alan Turing
Building by Friday 26th March 2010.