Download Analyzing Normally Distributed Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Lab 3
Analyzing Normally Distributed Data
Overview of Lab Session:
In this lab session we analyze real-world medical data, and test how well the data appear to be
normally distributed. We begin by using SPSS to generate “fake” normally distributed data—that
is data that are randomly selected from a perfectly normal distribution. We qualitatively compare
the results of this ideal sample to medical data from a study of ICU patients published in JASA:
Lemeshow, S., Teres, D., Avrunin, J. S., Pastides, H. (1988). Predicting
the Outcome of Intensive Care Unit Patients. Journal of the American
Statistical Association, 83, 348-356.
This research article is available on the database JSTOR which you can access using your library
account. [Go to www.csulb.edu/~library , click on Databases by title, then find JSTOR; you will
be prompted to enter your campus ID and library password. Now search JSTOR for the article.]
The article is also linked to the course website; you must first be logged on to your library account
to link to the article.
The data we work with in this lab is part of the ICU data used in the above study. The data is for
200 patients (represented by the rows of the data matrix). For each patient there are 21 measured
characteristics (represented in the columns of the data matrix). These include ID, STA (Vital
Status), AGE, SEX, RACE, SER (service at ICU admission), CAN (cancer), CRN (chronic renal
failure), …, SYS (systolic blood pressure), HRA (heart rate), …
I. Get Started
Use your ID number and password to log onto the computer.
Launch SPSS by clicking on Start, All Programs, Classes, then SPSS.
Load the ICU data from www.csulb.edu/~saleem/Course-F08-503/Data/icu.sav
II. Generate “Fake” Normally Distributed Data
STEP 1 Generate “Fake” Normally Distributed Data using Compute Variable
Use SPSS to generate a random sample of size 200 from a “perfect” normal distribution. The last
command in the set below generates normally distributed data with mean 100 and standard
deviation 20
.1
From the top menu choose Transform > Compute Variable.
In the Target Variable window type FN (this is the name of the new variable)
In the Numeric Expression window type: Normal(20)+100
STEP 2 Generate a Histogram for the FN Data using Chart Builder and Superimpose a
Normal Curve
From the top menu choose Graphs > Chart Builder
From Gallery >Choose From choose Histogram
Highlight the FN variable and drag it to the x-Axis under the histogram chart
Go to Element Properties > Set Parameters
For AnchorFfirst Bin choose Automatic
For Bin Sizes choose Automatic
Check the Display normal curve box
Click on Continue and OK
The output should contain a histogram with the appropriate normal curve.
Double click on the histogram in the output window to get to Chart Editor.
Choose Element > Show data labels.
This last command labels each bar in the histogram with the percentage of the data that the bar
represents.
The Chart Builder, Element Properties, and Element Properties: Set Parameters windows are
shown below.
.2
II. Analyzing the Systolic Pressure Data
In this part of the lab we repeat Steps 2 for the systolic pressure (SYS) from the ICU data.
Observe how the real-world SYS data exhibits the characteristics of a normal distribution.
STEP 2 Generate a Histogram for the SYS Data using Chart Builder.
From the top menu choose Graphs > Chart Builder
From Gallery >Choose From choose Histogram
Highlight the FN variable and drag it to the x-Axis under the histogram chart
Go to Element Properties > Set Parameters
For AnchorFfirst Bin choose Automatic
For Bin Sizes choose Automatic
Check the Display normal curve box
Click on Continue and OK
The output should contain a histogram with the appropriate normal curve.
Double click on the histogram in the output window to get to Chart Editor.
Choose Element > Show data labels.
This last command labels each bar in the histogram with the percentage of the data that the bar
represents.
.3
Name ________________
Worksheet for Lab 3
Use the SPSS output to answer the following questions.
PART I
1.
What are the mean and standard deviation of the FN data? Mean ______________,
Standard deviation ______________
2.
Fill in the following table with the appropriate percentages. In the “Normally Distributed”
column put the percentage of the data that should lie in the given interval if the data were
perfectly normally distributed. In the FN column put the actual percentages of the data in
the given interval.
Interval
Normally
Distributed
Actual FN data
Between 80 and 120
Greater than 140
Greater than 160
Less than 140
3.
How well does the data conform to normally distributed data? Answer this question based
on (1) How well the normal curve fits the histogram and (2) the results in the table in
Question 2.
.4
PART II
4.
What are the mean and standard deviation of the SYS data? Mean ______________,
Standard deviation ______________.
5.
Sketch a rough graph of the histogram and the normal curve from the SPSS output.
6.
Does the data appear to be normally distributed? Answer this question based on the
histogram and the superimposed normal curve.
7.
Find the interval that represents the mean systolic pressure plus or minus one standard
deviation. Interval: ( __________, __________ )
8.
What percent of the systolic pressures would expect to lie in the interval in Question 7.
.5