Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 www.sinica.edu.tw/~trees/bioinformatics E-mail: [email protected] 90/4/9 pm 中研院生物資訊中心 (BRC) 1 Outline 90/4/9 pm Introduction Some Research Topics Related Links and Resources Bioinformation Research Center (BRC) 中研院生物資訊中心(BRC) 2 Chromosome 90/4/9 pm 中研院生物資訊中心(BRC) 3 Introduction DNA Sequence Gene 5‘ 3’ Exon(coding regions) Intron mRNA cDNA Complement DNA RNA Protein Function 5‘UTR 90/4/9 pm DNA 3’UTR ORF 4 Introduction Phosphoric acid(磷酸) Deoxyribose (去氧核糖) DNA nucleotide acid (核苷酸) Nitrogenous base (含氮鹽基) Nitrogenous base (含氮鹽基) Purines : Pyrimidine : Adenine (A, 腺嘌呤) Guanine (G, 鳥糞嘌呤) Cytosine (C, 胞嘧啶) Thymine (T, 胸腺嘧啶) DNA sequence: A, C, G, T --- 4 letters RNA sequence: A, C, G, U (Uracil, (U), 尿嘧啶) --- 4 letters 90/4/9 pm 中研院生物資訊中心 (BRC) 5 5‘ 3‘ ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA TGGCACACCGTCACGTGTCCATAAACCGGTATCTGT 3‘ 5‘ Codon ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA Amino acid 90/4/9 pm 43 = 64 20 中研院生物資訊中心(BRC) 6 Introduction DNA sequence: A, C, G, T --- 4 letters RNA sequence: A, C, G, U --- 4 letters Amino acid sequence: --- 20 letters First Position (5’) U C A G 90/4/9 pm Second position U C A G Third Position (3’) Phe (F) Phe (F) Leu (L) Leu (L) Leu (L) Leu (L) Leu (L) Leu (L) Ile (I) Ile (I) Ile (I) Met (M) Ser (S) Ser (S) Ser (S) Ser (S) Pro (P) Pro (P) Pro (P) Pro (P) Thr (T) Thr (T) Thr (T) Thr (T) Tyr (Y) Tyr (Y) Stop Stop His (H) His (H) Gln (Q) Gln (Q) Asn (N) Asn (N) Lys (K) Lys (K) Cys (C) Cys (C) Stop Trp (W) Arg (R) Arg (R) Arg (R) Arg (R) Ser (S) Ser (S) Arg (R) Arg (R) U C A G Val Val Val Val Ala Ala Ala Ala Asp (D) Asp (D) Glu (E) Glu (E) Gly (G) Gly (G) Gly (G) Gly (G) U C A G (V) (V) (V) (V) (A) (A) (A) (A) 中研院生物資訊中心 U C A G U C A G 7 Introduction 6-frame translations 5'3' Frame 1 aagctgatcgatcgattttagatagagaaaaaact K L I D R F - I E K K 5'3' Frame 2 aagctgatcgatcgattttagatagagaaaaaact S - S I D F R - R K N 5'3' Frame 3 aagctgatcgatcgattttagatagagaaaaaact A D R S I L D R E K T 3'5' Frame 1 agttttttctctatctaaaatcgatcgatcagctt S F F S I - N R S I S 3'5' Frame 2 agttttttctctatctaaaatcgatcgatcagctt V F S L S K I D R S A 3'5' Frame 3 agttttttctctatctaaaatcgatcgatcagctt F F L Y L K S I D Q L 90/4/9 pm 中研院生物資訊中心 8 Introduction Gene : Exon & Intron cDNA Database EST (Expressed Sequence Tags) DB HGI (Human Gene Index) DB UniGene DB 90/4/9 pm 中研院生物資訊中心 (BRC) 9 Introduction Human Genome Sequencing (2/11/2001) Draft 61.0 % Finished 32.5% Total 90/4/9 pm 93.5 % 中研院生物資訊中心 10 Chromosome gap 90/4/9 pm 中研院生物資訊中心 12 Introduction Genome Database -- 3×109 HTGS (High Throughput Genomic Sequences) Phase 0: Single-few pass reads of a single clone (not contigs) Phase 1: Unfinished, may be unordered, unoriented contigs, with gaps. Phase 2: Unfinished, ordered, oriented contigs, with or without gaps. Phase 3: Finished, no gaps (with or without annotations). 90/4/9 pm 中研院生物資訊中心 (BRC) 12 Introduction Size range (kb) Contigs Aggregate size (kb) <30 kb 44 666 0.1% 30-100 479 32172 4.9% 100-250 1628 260933 39.9% 250-500 421 144518 22.1% 500-1000 145 98623 15.1% >1000 kb 43 116557 17.8% 2760 653471 100.0% total 90/4/9 pm 中研院生物資訊中心 (BRC) Percent of total 13 Outline 90/4/9 pm Introduction Some Research Topics Related Links and Resources Bioinformation Research Center (BRC) 中研院生物資訊中心 14 Gene number of human Early estimate: 60,000~100,000 By Ch22: ~45,000 By EST: ~140,000 By Ch22 & HGI-5.0: ~120,000 (1.38-fold gene rich and extremely cleaning and assemble process) By 2/16/2001 Science: ~ 30,000 There are many more genes awaiting discovery within the sequence 90/4/9 pm 中研院生物資訊中心 15 Some Research Topics Alternative Splicing Genome Annotation Gene Signature Human Diversity 90/4/9 pm 中研院生物資訊中心 (BRC) 16 Human Genome: 3x109 bp Genomic Sequence Variations Single Nucleotide Polymorphism (SNP) 106-107 gSNP Gene Coding Region cSNP Non-coding Region rSNP iSNP Inter-genic Region nSNP Functional Variants (5%) 90/4/9 pm 中研院生物資訊中心 17 Gene-based SNPs Gene 2 Gene 1 exon P1 P2 Intron cSNP iSNP 90/4/9 pm nSNP rSNP 中研院生物資訊中心 18 Human Diversity SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP) acccgctcgtcgct tgt cggctaattgcgcgaat C c g C H Synonymous (tgt tgc C) Silent 90/4/9 pm tat Y Non-synonymous Y: polar (tgt C, tgg W) C: polar W: nonpolar (Conservative) (Non-conservative) 中研院生物資訊中心 (BRC) 19 Human Diversity SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP) Purines (A/G) & Pyrimidines (C/T) Transition: A G, Transversion: A/G C T C/T CD-CV: common diseases - common variants. 90/4/9 pm 中研院生物資訊中心 (BRC) 20 Pseudogene Ch22: 134 pseudogenes (134/679 19%) Pseudogene Processed pseudogene (cDNAgenebank, 82% of 134 pseudogenes) a) Single block b) Lack characteristic intron – exon structure Spliced pseudogene – segments of duplicated gene families 90/4/9 pm 中研院生物資訊中心 (BRC) 21 Repetitive Sequence Centromere Telomere Tandem Repeats Mini Satellite (Variable Number Tandem Repeats (VNTR)): 15~100 bp Micro Satellite (Short Tandem Repeats (STR)): 2~5 bp α-Satellite: at centromere Telomere Repeats Interspersed Repeats SINEs (Short Interspersed Elements): Alu, MIR, MER, LTR, PTR, LINEs (Long Interspersed Elements): LINE1, LINE2, 90/4/9 pm 中研院生物資訊中心 (BRC) 22 Outline 90/4/9 pm Introduction Some Research Topics Related Related Links Links and and Resources Resources Bioinformation Research Center (BRC) 中研院生物資訊中心 23 Related Links and Resources TIGR(The Institute for Genomic Research) http://www.tigr.org/ NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ Sanger --- http://www.ensembl.org/ Japan Science and Technology Corporation - Advanced Lifescience Information System JST - ALIS ) http://www-alis.tokyo.jst.go.jp/HGS/top.pl 90/4/9 pm 中研院生物資訊中心 (BRC) 24 Related Links and Resources Gene Prediction Programs http://www.bork.embl-heidelberg.de/genepredict.html http://linkage.rockefeller.edu/wli/gene/programs.html ExPASy_Traslate Tool http://expasy.nhri.org.tw/tools/dna.html Bioinformatics Research Center, Academia Sinica http://www.sinica.edu.tw/~trees/bioinformatics/bioinformatics. html 90/4/9 pm 中研院生物資訊中心 (BRC) 25 Outline 90/4/9 pm Introduction Some Research Topics Related Links and Resources Bioinformation Research Center (BRC) 中研院生物資訊中心 26 Firewall Local Server Lab. 1 90/4/9 pm Lab. 2 Lab. 3 中研院生物資訊中心 27 CRASA: Complexity Reduction Algorithm for Sequence Analysis Genome Annotation Alternative Splicing SNP (Single Nucleotide Polymorphism) cDNA database 90/4/9 pm Genome Sequences: Chromosome1~22, X,Y 中研院生物資訊中心 (BRC) 28 CRASA: Complexity Reduction Algorithm for Sequence Analysis Environment PC Clustering: 10 PC (PIII-667), 1 Server Win2000 (NT) HD: IDE support RAID DB2 Algorithm Progressive Processing: Pyramid Structure Pattern Match Direct Search Parallel Processing 90/4/9 pm 中研院生物資訊中心 (BRC) 29 Parallel Processing Sorting & assembling: CPU bound Server Network I/O bound HD I/O bound p1 p2 p3 query 90/4/9 pm 中研院生物資訊中心 30 Bioinformatics ? Computer Science Biology