Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
10/2/15 Tools and Algorithms in Bioinforma7cs GCBA815, Fall 2015 Week-6 Working with TCGA Data Kristin Wipfler, PhD Candidate (Guda lab) Department of Gene7cs, Cell Biology & Anatomy University of Nebraska Medical Center What is TCGA? ●A joint effort of the National Cancer Institute and the National Human Genome Research Institute ●The goal is to accelerate our understanding of the molecular basis of cancer ●Has a wide variety of array-based and next generation sequencing data types available from a variety of platforms ●Freely available 1 10/2/15 Cancer # of cases Cancer # of cases Lymphoid neoplasm diffuse large B-‐cell lymphoma 48 Acute myeloid leukemia 200 Adrenocor7cal carcinoma 80 Bladder urothelial carcinoma 412 Brain lower grade glioma 516 Breast invasive carcinoma 1098 Cervical squamous cell carcinoma & endocervical adenocarcinoma 308 Cholangiocarcinoma 36 Colon adenocarcinoma 461 Esophageal carcinoma 185 Glioblastoma 528 Head and neck squamous cell carcinoma 528 Kidney chromophobe 66 Kidney renal clear cell carcinoma 536 Kidney renal papillary cell carcinoma 291 Liver hepatocellular carcinoma 291 Lung adenocarcinoma 521 Lung squamous cell carcinoma 504 Mesothelioma 87 Ovarian serous cystadenocarcinoma 586 Pancrea7c adenocarcinoma 185 Pheochromocytoma and paraganglioma 179 Prostate adenocarcinoma 498 Rectum adenocarcinoma 171 Sarcoma 261 Skin cutaneous melanoma 470 Stomach adenocarcinoma 443 Tes7cular germ cell tumors 150 Thymoma 124 Thyroid carcinoma 507 Uterine carcinosarcoma 57 Uterine corpus endometrial carcinoma 548 Uveal melanoma 80 Cancer # of cases Cancer # of cases Acute myeloid leukemia 200 48 Adrenocor7cal carcinoma 80 Lymphoid neoplasm diffuse large B-‐cell lymphoma Bladder urothelial carcinoma 412 Brain lower grade glioma 516 Breast invasive carcinoma 1098 Cervical squamous cell carcinoma & endocervical adenocarcinoma 308 Cholangiocarcinoma 36 Colon adenocarcinoma 461 Esophageal carcinoma 185 Glioblastoma 528 Head and neck squamous cell carcinoma 528 Kidney chromophobe 66 Kidney renal clear cell carcinoma 536 Kidney renal papillary cell carcinoma 291 Liver hepatocellular carcinoma 291 Lung adenocarcinoma 521 Lung squamous cell carcinoma 504 Mesothelioma 87 Ovarian serous cystadenocarcinoma 586 Pancrea7c adenocarcinoma 185 Pheochromocytoma and paraganglioma 179 Prostate adenocarcinoma 498 Rectum adenocarcinoma 171 Sarcoma 261 Skin cutaneous melanoma 470 Stomach adenocarcinoma 443 Tes7cular germ cell tumors 150 Thymoma 124 Thyroid carcinoma 507 Uterine carcinosarcoma 57 Uterine corpus endometrial carcinoma 548 Uveal melanoma 80 2 10/2/15 Data Types ●Clinical data ●DNA sequencing ●miRNA sequencing ●Protein expression ●mRNA sequencing ●Total RNA sequencing ●Array-based expression ●DNA methylation ●Copy number Clinical ●Collected by the Biospecimen Core Resource ●Biotab format ▪ tab-delimited ▪ convenient ▪ easy to sort and manipulate ●Includes a wide variety of clinical information ▪ demographics ▪ drug treatment ▪ radiation treatment 3 10/2/15 DNA Sequencing ●Whole genome .bam files (controlled access) ●Exome sequencing .bam files (controlled access) ●Somatic mutations .maf files (open access) and .vcf files (controlled access) RNA Sequencing ●miRNAseq ●mRNAseq ●Total RNAseq ●Each one includes .bam files (controlled access) and .txt files including calculated expression signals (open access) 4 10/2/15 Protein Expression ●MD Anderson Reverse Phase Protein Array (RPPA) Core Facility ▪ High throughput assay ▪ Antibodies printed across a slide ▪ Quantifies the amount of protein in multiple samples simultaneously ●About 150 proteins assessed ●.txt files containing normalized protein expression for each gene, per sample Array-based Expression ●Gene expression (DNA microarray) ●miRNA ●Includes ▪Raw signals per probe ▪Normalized signals per probe ▪Expression calls for genes, per sample 5 10/2/15 DNA Methylation ●Bisulfite sequencing ▪.bam and .vcf files (controlled access) ▪whole genome methylation calls .bed files (open access) ●Array-based ▪.idat files with raw signal intensities ▪.txt files with beta values Copy Number ●SNP array ▪raw data .CEL files (controlled access) ▪.txt files with normalized copy number data ●CN array ▪.txt files with raw signals per probe ▪.txt files with copy number alterations for aggregated regions per sample 6 10/2/15 How do you download it? ●Data matrix ●Bulk download ●HTTP directories ●File search How do you download it? ●Data matrix ●Bulk download ●HTTP directories ●File search 7