Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational Analysis of Transcript Identification Using GenBank Slides by Terry Clark Differentiation of hematopoietic cells Pluripotent stem cell Myeloid Myeloid Myeloid Pluripotent stem cell Lymphoid Lymphoid Lymphoid Neutrophil Monocyte Erythrocyte Platelet Eosinophil Basophil B cell T cell Genome-wide gene expression SAGE (Serial Analysis of Gene Expression) Figure 1 Schematic illustration of the SAGE process Jes Stollberg et al. Genome Res. 2000; 10: 1241-1248 SAGE & GLGI Overview mRNA SAGE quantitative analysis of expressed genes by collecting tags SPGI match identify most of expressed genes collect cDNA clones match GenBank no match multi-match GLGI extend tags into longer 3' cDNAs single-match Gene identification What is the chance of duplicate tags? • We can assume we are drawing randomly from the set of all 4-letters sequences of the given tag length • This is the same problem as having unique overlaps in the contig matching problem for shotgun sequencing Random Model Random model does not reflect biological process • Genes evolve by duplication as well as point mutation • Many motifs are repeated • Function widgets at work? • Result is a strong bias in observed biological sequences, not a uniform distribution as the simple model hopes. • Here are some numbers …. SAGE tags match to many genes (Tags from Hashimoto S, et al. Blood 94:837, 1999) Tags matched gene numbers CCTGTAATCC GTGAAACCCC CCACTGCACT ACTTTTTCAA TTGGGGTTTC TGCACGTTTT TGTGTTGAGA CCCGTCCGGA TTGGTCCTCT CTGACCTGTG TACCTGCAGA AGGCTACGGA GGGCTGGGGT CCCTGGGTTC CACAAACGGT GTGAAGGCAG GGGCATCTCT ATGGCTGGTA CGCCGCCGGC AGGGCTTCCA TTGGTGAAGG GTGGCCACGG GTTCACATTA TGGTGTTGAG CCCATCGTCC GTTGTGGTTA TTGTAATCGT CCCACAACCT GAGGGAGTTT CCAGAACAGA 40 5 30 5 17 4 44 9 8 5 5 4 3 3 3 3 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 Matched genes (only show up to 10) Hs.267557,Hs.240615,Hs.231705,Hs.283045,Hs.236713,Hs.232277,Hs.181553,Hs.262716,Hs.181392,Hs.220696 Hs.282868,Hs.170225,Hs.184220,Hs.194021,Hs.231625,Hs.171830,Hs.270571,Hs.270572,Hs.272193,Hs.283921 Hs.118778,Hs.256868,Hs.96023,Hs.31575,Hs.47517,Hs.200451,Hs.271222,Hs.253240,Hs.270018,Hs.270415 Hs.16426,Hs.10669,Hs.75155,Hs.28166,Hs.13975,Hs.79136,Hs.111334,Hs.133430,Hs.79356,Hs.239100 Hs.231375,Hs.273127,Hs.275603,Hs.175173,Hs.276612,Hs.224773,Hs.62954,Hs.182771,Hs.276326 Hs.199160,Hs.279943,Hs.36927,Hs.5338,Hs.169793,Hs.83450,Hs.173902,Hs.183506 Hs.284136,Hs.275865,Hs.275221,Hs.274466,Hs.181165 Hs.276353,Hs.277498,Hs.277573,Hs.276350,Hs.180842 Hs.12328,Hs.108124,Hs.9739,Hs.112845 Hs.277477,Hs.181244,Hs.77961 Hs.100000,Hs.256957,Hs.253884 Hs.119122,Hs.211582,Hs.183297 Hs.183698,Hs.118757,Hs.90436 Hs.52891,Hs.111334 Hs.2043,Hs.195453 Hs.4221,Hs.77039 Hs.75061,Hs.76807 Hs.254246,Hs.182426 Hs.182825,Hs.132753 Hs.29797,Hs.276544 Hs.278674,Hs.75968 Hs.112405 Hs.84298 Hs.275865 Hs.151604 Hs.75415 Hs.125078 Hs.252136 Hs.76064 Hs.111222 Tag Frequency Groups for 10-base Tag Set Containing 878,938 Tags for UniGene Human Unique Tags among 878,938 EST Derived Tags Unique Tags among 32,851 Gene Derived Tags Converting tag into longer 3’ sequence 5' end 3' end SAGE tag 3' longer sequence 3' end Generation of Longer 3'cDNA for Gene Identification (GLGI) SAGE t ag NNNNNNNNNN nnnnnnnnnn NNNNNNNNNN nnnnnnnnnn Sense extension antisense extension 10 bases TAAAAAAAAAAACTCGCCGGCGAA ATTTTTTTTTTTGAGCGGCCGCTT TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT NNNNNNNNNN nnnnnnnnnn TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT NNNNNNNNNN nnnnnnnnnn TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT NNNNNNNNNN nnnnnnnnnn TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT NNNNNNNNNN nnnnnnnnnn TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT hundred bases UniGene Human 3’ Part Length Distribution Myeloid Tag Matches with UniGene Human SAGE Tag Reference Database SAGE Tag Processing with GIST k-mer tree GIST Performance with Improved IO Conspirators Terry Clark Andrew Huntwork Josef Jurek L. Ridgway Scott Sanggyu Lee Janet D. Rowley San Ming Wang