Download Part 1 — IQ Scores

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Minitab Demonstrations
by Bruce E. Trumbo
Department of Statistics
CSU Hayward
Part 1 — IQ Scores
Setup
In this demonstration you will use the Minitab worksheet MTBDEM.MTW. You need to open this
worksheet from disk so that it is ready for use within Minitab. Start Minitab. In the menus at the top of the
screen use FILE ➯ Open Worksheet; then select the file name from its disk location. (Notice that there
things other than worksheets that can be opened.) This worksheet contains the data for Parts 1-5 of these
notes. Ordinarily, the data for each part would be put into its own separate worksheet, but we put them all
together here to minimize file handling in these introductory sessions.
If you do not see the Minitab prompt MTB > in the Session window when you start Minitab, then do
the following. Click anywhere on the Session window so that it is "active" as indicated by a colored bar at
the top. Then in the menus use EDITOR ➯ Enable Commands. (Be sure to use EDITOR, not EDIT.)
The Data
In this part you will use the following columns. Their exact contents will be explained as we go along.
c12 Origl IQ
c13 Final IQ
The data for Part 1 are IQ scores of 250 high school students in the San Francisco Bay Area, collected for a master's thesis in Educational Psychology at CSU Hayward.
Exploration of the Data
We use several graphical and numerical methods to explore the IQ data.
Dotplots
The dotplot is one of the simplest graphical devices. Each observation is represented by a dot
appropriately placed along a horizontal axis. If several observations have the same (or nearly the same)
value, they are stacked vertically.
In Minitab you might make a dotplot in either of two ways:
ƒ
You may type the command DOTPlot, followed by the column identifier (here 'Origl IQ'
or c12). Use the apostrophe (on most keyboards near the RETURN key) for both beginning and
ending single quotes; in Minitab, never use the left-single-quote (on most keyboards at the left of
the top row of keys, near the 1). Minitab does not distinguish between capital and small letters in
commands. We capitalize the first four letters here to signify that they are the only ones required.
(If a command name has more than four letters, you need to type only the first four letters. But you
may type the entire command name if you like.)
ƒ
Alternatively, you may follow the menu path GRAPH ➯ Character ➯ Dotplot, and then select
c12 (Origl IQ). In these notes the menu path for Windows is shown at the beginning of each
display on a gray background, followed by the corresponding command in courier typeface.
Copyright © 1993, 1996, 2003 by Bruce E. Trumbo. All rights reserved. Intended for use in the Statistics Department at California State University, Hayward. Please request permission for other uses.
Minitab Demonstrations—Part 1
1-2
Notes on commands and menus: Minitab is command-based software. The menus are just a way
to generate commands. For the most basic procedures, if you know the command name, it is
probably easier to type it after the program-generated prompt MTB >.
However, if you don't know the name of the command you need for a procedure, the menus are
laid out logically enough that they may help you to find the procedure you want. Also, for
complex procedures (especially graphical ones with many possibilities for annotations and
options) it is often easier to use menus.
GRAPH ➯ Character ➯ Dotplot, select 'Orig IQ'
MTB > dotp 'Orig IQ'
.
:
::
: .
:: : : .:
:: : : ::
:::::.::::
:::::::::::
: :::::::::::
:.::::::::::::: :
::::::::::::::: :
:: :::::::::::::::::
..:.::::::::::::::::::::.:.
.
.
-----+---------+---------+---------+---------+---------+-Origl IQ
60
90
120
150
180
210
From this dotplot of the data, we see that most of the IQ scores lie between 70 and 130, with a few outside
this interval on both sides. However, the striking thing is the extreme IQ score of almost 200. From what
we know about IQ scores we suppose this is an error.
Boxplots
The boxplot of a dataset is based on the "five-number summary" of the observations. From smallest to
largest these five numbers are:
ƒ
The minimum
ƒ
The lower quartile (lower end of box)
ƒ
The median (symbol within box)
ƒ
The upper quartile (upper end of box), and
ƒ
The maximum.
Notice that the "middle half" of the observations fall within the box of the boxplot.
An outlier is a value that falls relatively far away from the rest of the values in a dataset. Minitab can
make two styles of boxplot—using character or standard graphics and using pixel or professional
graphics. We begin with character graphics. A Minitab character graphics boxplot signals "probable"
outliers with the symbol O and "possible" ones with *.
Copyright © 1993, 1996, 2003 by Bruce E. Trumbo. All rights reserved. Intended for use in the Statistics Department at California State University, Hayward. Please request permission for other uses.
Minitab Demonstrations—Part 1
1-3
GRAPH ➯ Character ➯ Boxplot, select 'Orig IQ'
MTB > GSTD
MTB > boxp 'Orig IQ'
------------------I +
I-------*
O
-------------+---------+---------+---------+---------+---------+-Origl IQ
60
90
120
150
180
210
This boxplot explicitly highlights the extreme value we noticed in the dotplot, and labels it as a probable
outlier. (It turns out that the observation marked * is not an error, but indicates an exceptionally bright
student.)
A professional graphic boxplot runs vertically instead of horizontally, does not distinguish between
probable and possible outliers, and can be embellished in various ways (most conveniently using menus)
that we do not discuss here. The procedure and the resulting graphic are shown below.
GRAPH ➯ Boxplot, y = 'Orig IQ', x unspecified (left blank).
MTB > GPRO
MTB > boxp 'Orig IQ'
200
Origl IQ
150
100
50
Note: (a) When Minitab starts up, it is in professional graphics mode. If you make only
professional graphs, then you do not need to use the commands GPRO and GSTD to switch
between modes. Even though character graphs are somewhat less precise, they are often just as
Copyright © 1993, 1996, 2003 by Bruce E. Trumbo. All rights reserved. Intended for use in the Statistics Department at California State University, Hayward. Please request permission for other uses.
Minitab Demonstrations—Part 1
1-4
effective, often take up less room on the page, can be easily labeled using a word processor, and
take up less file space. (b) When cutting a character graph from Minitab and pasting it into a
word processor, always cut one blank line above and below the graph you want to move, and be
sure to format the graph using a monospace font such as courier in order to preserve the
spacing.
Numerical Descriptive Statistics.
Minitab makes it easy to compute a number of numerical descriptive statistics for a dataset. (Your output
may look a little different, depending on the release of Minitab used.)
STAT ➯ Basic ➯ Descriptive, select 'Origl IQ'
MTB > desc 'Origl IQ'
Origl IQ
N
250
MEAN
100.32
MEDIAN
100.00
TRMEAN
100.21
Origl IQ
MIN
58.00
MAX
196.00
Q1
90.00
Q3
112.00
STDEV
16.52
SEMEAN
1.04
The crucial information here is the maximum value MAX = 196. This is the exact numerical value of the
outlier seen in the dotplot and the boxplot above.
Notes on other descriptive statistics shown above: Check your textbook for the definitions:
ƒ
N = 250, the sample size. (Minitab uses N here, but most texts use n for sample size and N for
population size.)
ƒ
The sample MEAN = 100.32. (Most texts use X or Y for the sample mean.)
ƒ
The sample MEDIAN = 100.00.
ƒ
The sample standard deviation STDEV = 16.52.
ƒ
TRMEAN stands for the trimmed mean of the sample, computed by ignoring the highest 5%
and lowest 5% of the data and averaging the middle 90%; this quantity is not as sensitive to
erratic extreme values as is the mean.
ƒ
Q1 and Q3 are the lower and upper quartiles of the sample.
ƒ
SEMEAN is the (estimated) standard error of the mean, equal to the sample standard
deviation divided by the square root of the sample size; this quantity is used in statistical
inference.
Data Cleaning
In the actual situation upon which these data are based, the researcher rechecked the original list of IQ
scores and found that the value 196 resulted from a data input error; the correct value is 96. The data in c3
('Final IQ') are identical to those in c2 except that this error has been corrected. Now we repeat our work,
using the corrected data.
Copyright © 1993, 1996, 2003 by Bruce E. Trumbo. All rights reserved. Intended for use in the Statistics Department at California State University, Hayward. Please request permission for other uses.
Minitab Demonstrations—Part 1
1-5
Notice that the dotplot below uses a different scale, appropriate to the span of the corrected data. (For
variety we have designated the column with the corrected data as c13 instead of 'Final IQ' in the
command. Column names are often easier to remember, but column numbers are easier to type. If a
column has a name, Minitab will always use it in the output—no matter whether you used its name or its
number in the command.)
GRAPH ➯ Character ➯ Dotplot, select 'Final IQ'
MTB > GSTD
MTB > dotp c13
.::
. : .
::: :.: : :.:
: ::::::: : ::::
. : :::::::::::::::
:: ::::::::::::::::::.. :
.. ::::::::::::::::::::::: :.
. .:..::.::::::::::::::::::::::::::: :.
.
-----+---------+---------+---------+---------+---------+-Final IQ
60
80
100
120
140
160
Here is a comparison of the numerical descriptive statistics for the incorrect and corrected IQ data.
(Notice that descriptive statistics can be computed for more than one column at a time.)
STAT ➯ Basic ➯ Descriptive, select both 'Origl IQ' and 'Final IQ'
MTB > desc 'Origl IQ' 'Final IQ'
Origl IQ
Final IQ
N
250
250
MEAN
100.32
99.920
MEDIAN
100.00
100.000
TRMEAN
100.21
100.076
Origl IQ
Final IQ
MIN
58.00
58.000
MAX
196.00
150.000
Q1
90.00
90.000
Q3
112.00
112.000
STDEV
16.52
15.367
SEMEAN
1.04
0.972
The incorrect observation changed the mean by 0.4 of an IQ point (giving 100.3 compared with a correct
mean of 99.9), the trimmed mean by about 0.1 of an IQ point, and the median not at all. The largest
discrepancy is in the standard deviation.
Comments
Unlike "textbook" examples, real data almost always contain some errors. In beginning to study a dataset
it is well to use a number of graphical and numerical devices to screen the data for unreasonable and
inconsistent values.
Using a computer with statistical software such as Minitab, we find it easy to take such a critical look
at a dataset before we try to draw conclusions from it—even if the sample size is fairly large, as in the
present case. Consider for a moment how much work would be required to duplicate the work shown in
Part 1 if we had to do it using pencil, graph paper, and a hand calculator.
Copyright © 1993, 1996, 2003 by Bruce E. Trumbo. All rights reserved. Intended for use in the Statistics Department at California State University, Hayward. Please request permission for other uses.