Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
M.E.T.U. STATISTICS FALL 2011-2012 Dr. Ozlem Ilk STAT 462- Case Studies For each of the following datasets, read the data descriptions, decide on what type of data it is, the complexities that arise (missing data, correlation structure, the response and covariate types ...), the possible questions of interests, and appropriate approaches and analysis for this data. Then, pick two of the datasets, and hold some analyses. For advanced topics we covered in class, you don’t need to do any confirmatory analysis, but you can investigate the data in simple ways by exploratory data analysis. Keep an informal report of your findings and answers. DATASET 1: (Data is given in breastcancer.txt) A data set is constructed on breast cancer in women from a web depository at the National Cancer Institute. Only females diagnosed between ages 50 and 69 between 1990 and 1999 were included. These ages represent the highest incidence of breast cancer. AGE AT DIAGNOSIS 1 = 50-59 2 = 60-69 RACE 1 = White 2 = Black RADIATION 1 = Yes 2 = No CHEMOTHERAPY 1 = Yes 2 = No HORMONE 1 = Yes 2 = No NODAL STATUS 1 = Positive 2 = Negative STAGE OF THE DISEASE 0 = 0 1=I 2 = IIA 3 = IIB 4 = IIIA 5 = IIIB 6 = IV HISTOLOGICAL TYPE(1st) 0 = Papillary 1 = Ductal NOS 2 = Medullary 3 = Tubular 4 = Mucinous 5 = Cribiform 1 6 = Lobular 7 = Mixed Ductal/lobular 8 = Other 9 = Adenoid Cystic LAST CANCER STATUS 1 = no evidence of this cancer 2 = evidence of this cancer 3 = no evidence of this cancer but another present 4 = unknown if cancer is present SIZE OF THE INVASIVE CANCER(cm) TIME TO RECURRENCE(MONTHS) FOLLOW UP TIME 1 = 0-12 months 2 = 13-24 3 = 25-36 4 = 37-48 5 = 49-60 6 = 61+ VITAL STATUS 1 = Alive 2 = Dead TREATMENT USED 1 = rad, chem, hor 2 = rad, chem 3 = rad, hor 4 = chem, hor 5 = chem 6 = hor 7 = rad 8 = none DATASET2: (Dataset is given in iyfp.dat) Iowa Youth and Families Project Please do not use this data outside this class. You need written permission from Iowa Research Park for other uses of it. The goal of this study is to understand the impacts of economic hardship on family members' well-being. Project started in 1989 with 451 Iowa families. Targets were 7th graders in 1989, with two married biological parents and a sibling within four years of age. Subjects were followed between 1989 and 1999. ID TOS: target out of school in 1994? 0- No, 1-Yes Gender: Male-1, Female -0 Household Size Income : total income of household Percapita Income Material Needs: 1- No problem, ..., 5- Lots of problems Ends Meet: 1- No problem, ..., 5- Lots of problems 2 NEE: negative economical events: 0-No, 1-Yes Cutbacks: 0- None, 1-Some, 2-Lots Concerns: 0- None, 1-Some, 2-Lots Canxiety: Categorized Anxiety, 0- No, 1- Yes Manxiety: mean anxiety Chostility: Categorized hostility, 0- No, 1- Yes Mhostility: mean hostility CDepression: Categorized depression, 0- No, 1- Yes Mdepression: mean depression NLE(adj): adjusted negative life events, 0- None, 1-Some, 2-Lots Parent: if parents are seperated, with whom the teenager is staying, 0-mother, 1-father Time: year DATASET 3: (Data is given in lungcancer.txt) Mayo Clinic Lung Cancer Data: Patients with lung cancer at Mayo Clinic are observed. Performance scores rate how well the patient can perform usual daily activities. Variables used: inst: Institution code time: Survival time in days status: censoring status 1=censored, 2=dead age: Age in years sex: Male=1 Female=2 ph.ecog: ECOG performance score (0=good 5=dead) ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician pat.karno: Karnofsky performance score rated by patient meal.cal: Calories consumed at meals wt.loss: Weight loss in last six months DATASET 4: (Data is given in psid.dat) This is a random sample from The Panel Study of Income Dynamics (PSID). This study was conducted by the Survey Research Center in University of Michigan. The goal is to investigate the changes in annual income and the effects of covariates such as gender on the income. AGE IN 1968 YEARS OF EDUCATION GENDER: 0- Female 1-Male INCOME: natural logarithm of annual income TIME: year the measurement is taken ID 3