Download M.E.T.U. STATISTICS FALL 2011-2012 Dr. Ozlem Ilk STAT 462

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inverse problem wikipedia , lookup

Theoretical computer science wikipedia , lookup

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
M.E.T.U. STATISTICS
FALL 2011-2012
Dr. Ozlem Ilk
STAT 462- Case Studies
For each of the following datasets, read the data descriptions, decide on what type of data
it is, the complexities that arise (missing data, correlation structure, the response and
covariate types ...), the possible questions of interests, and appropriate approaches and
analysis for this data. Then, pick two of the datasets, and hold some analyses. For
advanced topics we covered in class, you don’t need to do any confirmatory analysis, but
you can investigate the data in simple ways by exploratory data analysis. Keep an
informal report of your findings and answers.
DATASET 1: (Data is given in breastcancer.txt)
A data set is constructed on breast cancer in women from a web depository at the
National Cancer Institute. Only females
diagnosed between ages 50 and 69 between 1990 and 1999 were included. These ages
represent the highest incidence of breast cancer.
AGE AT DIAGNOSIS 1 = 50-59
2 = 60-69
RACE 1 = White
2 = Black
RADIATION 1 = Yes
2 = No
CHEMOTHERAPY 1 = Yes
2 = No
HORMONE 1 = Yes
2 = No
NODAL STATUS 1 = Positive
2 = Negative
STAGE OF THE DISEASE 0 = 0
1=I
2 = IIA
3 = IIB
4 = IIIA
5 = IIIB
6 = IV
HISTOLOGICAL TYPE(1st) 0 = Papillary
1 = Ductal NOS
2 = Medullary
3 = Tubular
4 = Mucinous
5 = Cribiform
1
6 = Lobular
7 = Mixed Ductal/lobular
8 = Other
9 = Adenoid Cystic
LAST CANCER STATUS 1 = no evidence of this cancer
2 = evidence of this cancer
3 = no evidence of this cancer but another present
4 = unknown if cancer is present
SIZE OF THE INVASIVE CANCER(cm)
TIME TO RECURRENCE(MONTHS)
FOLLOW UP TIME 1 = 0-12 months
2 = 13-24
3 = 25-36
4 = 37-48
5 = 49-60
6 = 61+
VITAL STATUS 1 = Alive
2 = Dead
TREATMENT USED 1 = rad, chem, hor
2 = rad, chem
3 = rad, hor
4 = chem, hor
5 = chem
6 = hor
7 = rad
8 = none
DATASET2: (Dataset is given in iyfp.dat)
Iowa Youth and Families Project
Please do not use this data outside this class. You need written permission from
Iowa Research Park for other uses of it.
The goal of this study is to understand the impacts of economic hardship on family
members' well-being. Project started in 1989 with 451 Iowa families. Targets were 7th
graders in 1989, with two married biological parents and a sibling within four years of
age. Subjects were followed between 1989 and 1999.
ID
TOS: target out of school in 1994? 0- No, 1-Yes
Gender: Male-1, Female -0
Household Size
Income : total income of household
Percapita Income
Material Needs: 1- No problem, ..., 5- Lots of problems
Ends Meet: 1- No problem, ..., 5- Lots of problems
2
NEE: negative economical events: 0-No, 1-Yes
Cutbacks: 0- None, 1-Some, 2-Lots
Concerns: 0- None, 1-Some, 2-Lots
Canxiety: Categorized Anxiety, 0- No, 1- Yes
Manxiety: mean anxiety
Chostility: Categorized hostility, 0- No, 1- Yes
Mhostility: mean hostility
CDepression: Categorized depression, 0- No, 1- Yes
Mdepression: mean depression
NLE(adj): adjusted negative life events, 0- None, 1-Some, 2-Lots
Parent: if parents are seperated, with whom the teenager is staying, 0-mother, 1-father
Time: year
DATASET 3: (Data is given in lungcancer.txt)
Mayo Clinic Lung Cancer Data: Patients with lung cancer at Mayo Clinic are observed.
Performance scores rate how well the patient can perform usual daily activities.
Variables used:
inst:
Institution code
time:
Survival time in days
status: censoring status 1=censored, 2=dead
age:
Age in years
sex:
Male=1 Female=2
ph.ecog: ECOG performance score (0=good 5=dead)
ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician
pat.karno: Karnofsky performance score rated by patient
meal.cal: Calories consumed at meals
wt.loss: Weight loss in last six months
DATASET 4: (Data is given in psid.dat)
This is a random sample from The Panel Study of Income Dynamics (PSID). This study
was conducted by the Survey Research Center in University of Michigan. The goal is to
investigate the changes in annual income and the effects of covariates such as gender on
the income.
AGE IN 1968
YEARS OF EDUCATION
GENDER: 0- Female 1-Male
INCOME: natural logarithm of annual income
TIME: year the measurement is taken
ID
3