Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basics of Biostatistics for Health Research Session 2 – February 14th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences & Department of Psychiatry [email protected] • Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten • Scroll to the bottom. • Right click to download the files described as being “for PGME Students” – One is a dataset – One is a data dictionary • Save them on your desktop Open the Datafile The task from last week… • Create a 95% exact binomial confidence interval for the proportion of people with Framingham with > H.S. education Review of Last Week’s Task • • • • • “use” “generate” “recode” “tabulate” “ci” The actual commands… generate highschool = educ recode highschool 1/2=0 3/4=1 tabulate highschool ci highschool, binomial Creating a “do” file… 1 2 3 The “do file” editor Executing a “do” file What is a “do” file? • It is a text file – you can copy and paste from the output window in Stata, or from a word processor • It is a computer program that consists of actual commands and therefore doesn’t need a compiler • Others would call it a “macro” Different Types of Data • One type of distinction – Nominal (e.g. sex, race) – Ordinal (e.g. rating scales) – Cardinal (e.g. physical measures) • Another type of distinction – Categorical (e.g. # of pregnancies) – Continuous (e.g. height, weight) Body Mass Index (BMI) The BMI in our Data Set This is an example of a continuous variable Changing Data Types in Stata (e.g. continuous to categorical) • recode bmi x/y=z • This will recode all values of the variable bmi having values from x to y to a single value equal to z. Interpretation of BMI • • • • Underweight: < 18.5 Normal weight: 18.5 to 25 Over weight: >25 to 30 Obese: 30+ • Your task: Make a “do file” that calculates a 95% confidence interval for the proportion of the population that are overweight or obese. Example of Code for this… generate owo = bmi recode owo 0/25 = 0 25.01/100 = 1 tab owo, missing ci owo, binomial Another Task… • Add a use command to your do file • Save your “do file” on the desktop using a descriptive file name of your choice • Exit Stata • Open Stata again • Open the “do file” editor and select your do file • Execute your “do file” The Power of “do files” • Task: Calculate an exact 95% CI for the proportion of the population that are obese (BMI > 30) • IMPORTANT: do NOT start from scratch as we did before – try to do this by editing your do file. For Example… generate owo = bmi recode owo 0/25 = 0 25.01/100 = 1 tab owo, missing ci owo, binomial generate owo = bmi generate obese = bmi recode owo 0/25 = 0 25.01/100 = 1 recode obese 0/30 = 0 30.01/100=1 tab owo, missing tab obese, missing ci owo obese, binomial Starting a Log File 1 2 3 Closing a Log File 1 2 3 Another Task… • Start a log file • Run your “do file” • Close and save the resulting log file on your desktop • Open your log file “do file” Etiquette • When you add an * before a line on a “do file” Stata will ignore that line • Use this to…. – Add descriptive comments to your code – Remove commands that you don’t want now, but might want later E.g. Without the Tables… Review… • Make a value label for obesity • Attach this value label to the variable representing obesity Making a Graphic The Pie Chart Dialogue Box 1 Find the Variable that you made 2 Unedited Output The Graph Editor Here is a good place to start See if you can do these things… • • • • • Change the color of the pie Add a title Add a comment Change the background Create a work of art Save in a Standard Format Back to BMI • May not wish to categorize variables like this • Measures of central tendency – Mode – Median – Mean • Different types of graphs are useful for examining continuous variables – Box plots – Histograms Box Plots Terminology • Median: value with 50% of observations above and 50% below. • Interquartile range – contains 50% of observations – plus or minus one quartile • Adjacent values (whiskers) – observation that is less than 1.5x the IQR • Outliers: anything outside of the adjacent values Calculating Summary Stats Calculate summary stats for BMI Calculating Summary Stats Calculate the mean BMI Calculating Summary Stats Calculate median BMI Make a Box (and whisker) Plot The Boxplot Dialogue Box 1 Select BMI from the dropdown list 2 Introducing Histograms 1 2 The Histogram Dialogue Box Select the variable here Select the bin# here A Task for You to Do… • Make 3 histograms of BMI – In one use the default number of bins – In one use a larger number – In one, use a smaller number • Save your favorite histogram • Open it in the graph editor, give it a title and improve its appearance • Save it in a standard form (e.g. png, jpg, tif) Assessing Normality with a Histogram The distribution is not quite normal, but close Is BMI Higher in Men or Women? • We could use confidence intervals to assess this… • E.g. 1 2 3 Here is the dialogue box… Once you’ve selected BMI, click this The dialogue box, continued.. Enter sex as a group variable The output . mean bmi, over(sex) Mean estimation Number of obs = 11575 1: sex = 1 2: sex = 2 Over Mean 1 2 26.20382 25.62873 Std. Err. [95% Conf. Interval] .0484566 .0559382 26.10883 25.51909 bmi 26.2988 25.73838 It looks better with value labels . mean bmi, over(sex) Mean estimation Number of obs = 11575 Men: sex = Men Women: sex = Women Over Mean Men Women 26.20382 25.62873 Std. Err. [95% Conf. Interval] .0484566 .0559382 26.10883 25.51909 bmi 26.2988 25.73838 Statistical Tests • Start with an hypothesis that an “effect” exists – In this case, that there is an effect of sex on BMI • Assume that the effect DOES NOT exist – This is the null hypothesis • Find the probability of results, or those more extreme given the null hypothesis – This is what the “test” calculates for you • If the null is unlikely (alpha value), reject it The t-test (assumptions) • The variables are approximately normally distributed • The standard deviations of the two groups are approximately equal • The two samples are independent Using summarize similarly • Use summarize with “by” in the dialogue box • Use histograms with a normal density plot and the “by” tab in the dialogue box Your task: use these two techniques to assess the t-test assumptions. Variance Comparisons 1 2 3 The t-test 1 2 3 The t-test dialogue box 1 2 3 optional The output . ttest bmi, by(sex) unequal Two-sample t test with unequal variances Group Obs Mean Men Women 5004 6571 combined 11575 diff Std. Err. Std. Dev. [95% Conf. Interval] 26.20382 25.62873 .0484566 .0559382 3.427767 4.534443 26.10882 25.51908 26.29881 25.73839 25.87735 .0381332 4.10264 25.8026 25.9521 .5750831 .0740075 .4300158 .7201504 diff = mean(Men) - mean(Women) t = Ho: diff = 0 Satterthwaite's degrees of freedom = Ha: diff < 0 Pr(T < t) = 1.0000 Ha: diff != 0 Pr(|T| > |t|) = 0.0000 7.7706 11572.4 Ha: diff > 0 Pr(T > t) = 0.0000 Two group tests for proportions.. 1 2 3 You can also do this with tab tab obese sex, exact sex obese Men Women Total Not Obese Obese 4,405 599 5,610 961 10,015 1,560 Total 5,004 6,571 11,575 Fisher's exact = 1-sided Fisher's exact = 0.000 0.000 Your Final Task for Today • Create a “do file” that … – Reads in the data – Recodes BMI to a categorical variable for obesity – Tests whether obesity differs between men and women • Create a log file to store the results