Download Calculate summary stats for BMI

Document related concepts
no text concepts found
Transcript
Basics of Biostatistics for Health Research
Session 2 – February 14th, 2013
Dr. Scott Patten, Professor of Epidemiology
Department of Community Health Sciences
& Department of Psychiatry
[email protected]
• Go to “www.ucalgary.ca/~patten”
www.ucalgary.ca/~patten
• Scroll to the bottom.
• Right click to download the files described
as being “for PGME Students”
– One is a dataset
– One is a data dictionary
• Save them on your desktop
Open the Datafile
The task from last week…
• Create a 95% exact binomial confidence
interval for the proportion of people with
Framingham with > H.S. education
Review of Last Week’s Task
•
•
•
•
•
“use”
“generate”
“recode”
“tabulate”
“ci”
The actual commands…
generate highschool = educ
recode highschool 1/2=0 3/4=1
tabulate highschool
ci highschool, binomial
Creating a “do” file…
1
2
3
The “do file” editor
Executing a “do” file
What is a “do” file?
• It is a text file – you can copy and paste
from the output window in Stata, or from a
word processor
• It is a computer program that consists of
actual commands and therefore doesn’t
need a compiler
• Others would call it a “macro”
Different Types of Data
• One type of distinction
– Nominal (e.g. sex, race)
– Ordinal (e.g. rating scales)
– Cardinal (e.g. physical measures)
• Another type of distinction
– Categorical (e.g. # of pregnancies)
– Continuous (e.g. height, weight)
Body Mass Index (BMI)
The BMI in our Data Set
This is an example of
a continuous variable
Changing Data Types in Stata
(e.g. continuous to categorical)
• recode bmi x/y=z
• This will recode all values of the variable
bmi having values from x to y to a single
value equal to z.
Interpretation of BMI
•
•
•
•
Underweight: < 18.5
Normal weight: 18.5 to 25
Over weight: >25 to 30
Obese: 30+
• Your task: Make a “do file” that calculates
a 95% confidence interval for the proportion
of the population that are overweight or
obese.
Example of Code for this…
generate owo = bmi
recode owo 0/25 = 0 25.01/100 = 1
tab owo, missing
ci owo, binomial
Another Task…
• Add a use command to your do file
• Save your “do file” on the desktop using a
descriptive file name of your choice
• Exit Stata
• Open Stata again
• Open the “do file” editor and select your do
file
• Execute your “do file”
The Power of “do files”
• Task: Calculate an exact 95% CI for the
proportion of the population that are obese
(BMI > 30)
• IMPORTANT: do NOT start from scratch
as we did before – try to do this by editing
your do file.
For Example…
generate owo = bmi
recode owo 0/25 = 0 25.01/100 = 1
tab owo, missing
ci owo, binomial
generate owo = bmi
generate obese = bmi
recode owo 0/25 = 0 25.01/100 = 1
recode obese 0/30 = 0 30.01/100=1
tab owo, missing
tab obese, missing
ci owo obese, binomial
Starting a Log File
1
2
3
Closing a Log File
1
2
3
Another Task…
• Start a log file
• Run your “do file”
• Close and save the resulting log file on your
desktop
• Open your log file
“do file” Etiquette
• When you add an * before a line on a “do
file” Stata will ignore that line
• Use this to….
– Add descriptive comments to your code
– Remove commands that you don’t want now,
but might want later
E.g. Without the Tables…
Review…
• Make a value label for obesity
• Attach this value label to the variable
representing obesity
Making a Graphic
The Pie Chart Dialogue Box
1
Find the
Variable that you made
2
Unedited Output
The Graph Editor
Here is a good place to start
See if you can do these things…
•
•
•
•
•
Change the color of the pie
Add a title
Add a comment
Change the background
Create a work of art
Save in a Standard Format
Back to BMI
• May not wish to categorize variables like
this
• Measures of central tendency
– Mode
– Median
– Mean
• Different types of graphs are useful for
examining continuous variables
– Box plots
– Histograms
Box Plots
Terminology
• Median: value with 50% of observations
above and 50% below.
• Interquartile range – contains 50% of
observations – plus or minus one quartile
• Adjacent values (whiskers) – observation
that is less than 1.5x the IQR
• Outliers: anything outside of the adjacent
values
Calculating Summary Stats
Calculate summary stats for BMI
Calculating Summary Stats
Calculate the mean BMI
Calculating Summary Stats
Calculate median BMI
Make a Box (and whisker) Plot
The Boxplot Dialogue Box
1
Select BMI from
the dropdown
list
2
Introducing Histograms
1
2
The Histogram Dialogue Box
Select the
variable here
Select the
bin# here
A Task for You to Do…
• Make 3 histograms of BMI
– In one use the default number of bins
– In one use a larger number
– In one, use a smaller number
• Save your favorite histogram
• Open it in the graph editor, give it a title and
improve its appearance
• Save it in a standard form (e.g. png, jpg, tif)
Assessing Normality with a Histogram
The distribution is not quite
normal, but close
Is BMI Higher in Men or
Women?
• We could use confidence intervals to assess
this…
• E.g. 1
2
3
Here is the dialogue box…
Once you’ve selected BMI, click this
The dialogue box, continued..
Enter sex as a group variable
The output
. mean bmi, over(sex)
Mean estimation
Number of obs
=
11575
1: sex = 1
2: sex = 2
Over
Mean
1
2
26.20382
25.62873
Std. Err.
[95% Conf. Interval]
.0484566
.0559382
26.10883
25.51909
bmi
26.2988
25.73838
It looks better with value labels
. mean bmi, over(sex)
Mean estimation
Number of obs
=
11575
Men: sex = Men
Women: sex = Women
Over
Mean
Men
Women
26.20382
25.62873
Std. Err.
[95% Conf. Interval]
.0484566
.0559382
26.10883
25.51909
bmi
26.2988
25.73838
Statistical Tests
• Start with an hypothesis that an “effect”
exists
– In this case, that there is an effect of sex on
BMI
• Assume that the effect DOES NOT exist
– This is the null hypothesis
• Find the probability of results, or those
more extreme given the null hypothesis
– This is what the “test” calculates for you
• If the null is unlikely (alpha value), reject it
The t-test (assumptions)
• The variables are approximately normally
distributed
• The standard deviations of the two groups
are approximately equal
• The two samples are independent
Using summarize similarly
• Use summarize with “by” in the dialogue
box
• Use histograms with a normal density plot
and the “by” tab in the dialogue box
Your task: use these two techniques to assess
the t-test assumptions.
Variance Comparisons
1
2
3
The t-test
1
2
3
The t-test dialogue box
1
2
3
optional
The output
. ttest bmi, by(sex) unequal
Two-sample t test with unequal variances
Group
Obs
Mean
Men
Women
5004
6571
combined
11575
diff
Std. Err.
Std. Dev.
[95% Conf. Interval]
26.20382
25.62873
.0484566
.0559382
3.427767
4.534443
26.10882
25.51908
26.29881
25.73839
25.87735
.0381332
4.10264
25.8026
25.9521
.5750831
.0740075
.4300158
.7201504
diff = mean(Men) - mean(Women)
t =
Ho: diff = 0
Satterthwaite's degrees of freedom =
Ha: diff < 0
Pr(T < t) = 1.0000
Ha: diff != 0
Pr(|T| > |t|) = 0.0000
7.7706
11572.4
Ha: diff > 0
Pr(T > t) = 0.0000
Two group tests for proportions..
1
2
3
You can also do this with tab
tab obese sex, exact
sex
obese
Men
Women
Total
Not Obese
Obese
4,405
599
5,610
961
10,015
1,560
Total
5,004
6,571
11,575
Fisher's exact =
1-sided Fisher's exact =
0.000
0.000
Your Final Task for Today
• Create a “do file” that …
– Reads in the data
– Recodes BMI to a categorical variable for
obesity
– Tests whether obesity differs between men and
women
• Create a log file to store the results