Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Bioinformatics & Biology Andreas Gisel IITA – Bioscience & Bioinformatics A member of CGIAR c onsortium www.iita.org Andreas Gisel Bioinformatics specialist at IITA Institute for Biomedical Technologies, Italy Novartis SA, pharmaceutical company, Switzerland Trained as Molecular Biologist Novartis SA, pharmaceutical company, Switzerland University of California, Berkeley Federal Institute of Technology, Switzerland A member of CGIAR c onsortium www.iita.org Bioinformatics - definition Bio – Biology, Life Sciences DATA INTERPRETATIONS Informatics – computational sciences A member of CGIAR c onsortium RESULTS Bio informatics www.iita.org Bioinformatics - definition Bio – Biology, Life Sciences INTERPRETATIONS Informatics – computational sciences RESULTS DATA Knowledge Data Repositories A member of CGIAR c onsortium www.iita.org Basic Biology Life The condition that distinguishes organisms from inorganic objects and dead organisms, being manifested by Ø growth through metabolism, Ø reproduction, Ø and the power of adaptation to environment through changes originating internally. A member of CGIAR c onsortium www.iita.org Basic Biology Life A member of CGIAR c onsortium www.iita.org Basic Biology Cell The cell (from Latin cella, meaning "small room") is the basic structural, functional, and biological unit of all known living organisms. A cell is the smallest unit of life that can replicate independently, and cells are often called the "building blocks of life". A member of CGIAR c onsortium www.iita.org Basic Biology Cell A member of CGIAR c onsortium www.iita.org Basic Biology DNA (Deoxyribonucleic acid), carrier of information RNA (Ribodeoxynucleic acid), transported of information Protein, functional unit orchestrating the cell functionality and cell division A member of CGIAR c onsortium www.iita.org Basic Biology - DNA DNA (Deoxyribonucleic acid), carrier of information Up to hundreds of millions of nucleotides 4 nucleotide Guanine Cytosine A member of CGIAR c onsortium Adenine Thymidine www.iita.org Basic Biology - DNA DNA (Deoxyribonucleic acid), carrier of information A member of CGIAR c onsortium www.iita.org Basic Biology - RNA RNA (Ribodeoxynucleic acid), transported of information 4 nucleotide Guanine Cytosine A member of CGIAR c onsortium Adenine Uracil www.iita.org Basic Biology - RNA RNA (Ribodeoxynucleic acid), transported of information RNA folding Alternative splicing A member of CGIAR c onsortium www.iita.org Basic Biology - Protein Protein, functional unit orchestrating the cell functionality and cell division 20 amino acids A member of CGIAR c onsortium www.iita.org Basic Biology - Protein Protein, functional unit orchestrating the cell functionality and cell division A member of CGIAR c onsortium www.iita.org Basic Biology – Genetic Code How to translate the 4 letter code of the RNA into a 20 letter protein code? It would require a minimum of three DNA nucleotides to "spell out" one amino acid, and indeed this is the number that is actually used. Any single set of three nucleotides is called a codon, and the set of all possible three- nucleotide combinations is called "the genetic code" or "triplet code." There are sixty-four different combinations or codons (4 × 4 × 4 = 64). A member of CGIAR c onsortium www.iita.org Basic Biology – Genetic Code A member of CGIAR c onsortium www.iita.org Basic Biology - Systems A member of CGIAR c onsortium www.iita.org Basic Biology - Systems A member of CGIAR c onsortium www.iita.org Basic Biology - Sequencing Sequencing is the process of determining the precise order of nucleotides within a DNA molecule or order of amino acids in a protein molecule. ² First fully sequenced bio-sequence ² amino acid of insulin (51aa) 1955 ² First fully sequence nucleic acid ² tRNA (75nt) 1965 ² First DNA ² Bacteriophage (5375nt) 1977 ² DNA sequencing ² Sanger sequencing technology (1975) ² Pyrosequencing (Next Generation sequencing 2004) A member of CGIAR c onsortium www.iita.org Basic Biology - Sequencing Sequencing is the process of determining the precise order of nucleotides within a DNA molecule or order of amino acids in a protein molecule. DNA ATGTCTGGGCTTGTGGGCTTGCTTGTGGGTCTTGTGCTGGTGGGTT CTGTTAGCTCAGCGAAATTCGATGAGCTATTTCAACCCGGCTGGGC RNA AUGUCUGGGCUUGUGGGCUUGCUUGUGGGUCUUGUGCUGGUGGGUU CUGUUAGCUCAGCGAAAUUCGAUGAGCUAUUUCAACCCGGCUGGGC Protein MVGMDLFKCVMMIMVLVVSCGEAVSGAKFDELYRSSWAMDHCVNEG EVTKLKLDNYSGAGFESRSKYLFGKVSIQIKLVEGDSAGTVTAFYM A member of CGIAR c onsortium www.iita.org Biological Data Ø Descriptions Ø Pictures ? A member of CGIAR c onsortium www.iita.org Biological Data Ø Descriptions Ø Pictures Ø Sequences A member of CGIAR c onsortium www.iita.org Biological Data Ø Descriptions Ø Pictures Ø Sequences Ø Protein Ø RNA Ø DNA A member of CGIAR c onsortium www.iita.org Biological Data Ø Descriptions Ø Pictures Ø Sequences Ø Protein Ø RNA Ø DNA Ø Structures A member of CGIAR c onsortium www.iita.org Biological Data Ø Descriptions Ø Pictures Ø Sequences Ø Protein Ø RNA Ø DNA Ø Structures Ø Protein Ø RNA A member of CGIAR c onsortium www.iita.org Biological Data Ø Descriptions Ø Pictures Ø Sequences Ø Protein Ø RNA Ø DNA Ø Structures Ø Protein Ø RNA Ø Interactions A member of CGIAR c onsortium www.iita.org Biological Data Ø Descriptions Ø Pictures Ø Sequences Ø Protein Ø RNA Ø DNA Ø Structures Ø Protein Ø RNA Ø Interactions Ø Expressions A member of CGIAR c onsortium www.iita.org Data Explosion Ø Descriptions Ø Pictures Up to 600’000’000’000 (600GB) bases per experiment Ø Sequences Ø Protein Ø RNA Ø DNA Ø Structures NGS (Next Generation Sequencing) Ø Protein Ø RNA Ø Interactions Ø Expressions A member of CGIAR c onsortium Up to 1 million data points per experiment www.iita.org Data Explosion Ø Descriptions Ø Pictures Ø Sequences Ø Protein Ø RNA Ø DNA Ø Structures Ø Protein Ø RNA Ø Interactions Ø Expressions A member of CGIAR c onsortium www.iita.org Data Analysis – DNA/RNA sequences Sequence without knowledge connected to it is meaningless! What to do? Ø Sequence similarity Ø Finding genes and regulatory elements Ø Functional analysis of genes Ø Homology Ø Polymorphism A member of CGIAR c onsortium www.iita.org Data Analysis So we need bioinformatics tools and reference data Hardware – Computing infrastructure (CPU, RAM, Storage) Tools – Programs that process your data Reference data – Databases for existing data INTERNET– connection to external Databases A member of CGIAR c onsortium www.iita.org Bioinformatics @ IITA Hardware – Computing infrastructure (CPU, RAM, Storage) HP Blade, with: Ø 3 blades with each 2 16-core processors (AMD Opteron Processor 6272), Ø 384Gb RAM Ø 2Tb attached storage (DAS) Ø 8TB attached storage (NAS) The operating system is Ubuntu 14.04.1 LTS installed via biolinux 8. A member of CGIAR c onsortium www.iita.org Data Analysis So we need bioinformatics tools and reference data Tools – Programs that process your data Reference data – Databases for existing data A member of CGIAR c onsortium www.iita.org Data Analysis – Data Format FASTA A member of CGIAR c onsortium www.iita.org Data Analysis – Data Format GeneBank A member of CGIAR c onsortium www.iita.org Data Analysis – Data Format EMBL A member of CGIAR c onsortium www.iita.org Data Analysis - Tools Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST, Needleman-Wunsch, Smith-Waterman Ø Multi-sequence alignment Ø ClustalW, T-coffee, Muscle Ø Sequence analysis Ø EMBOSS, UGENE A member of CGIAR c onsortium www.iita.org Data Analysis - Tools Basic Local Alignment Search Tool Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST http://blast.ncbi.nlm.nih.gov/Blast.cgi http://www.ebi.ac.uk/ena/search/#Search A member of CGIAR c onsortium www.iita.org Data Analysis - Tools Basic Local Alignment Search Tool Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST A member of CGIAR c onsortium www.iita.org Data Analysis - Tools Basic Local Alignment Search Tool Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST A member of CGIAR c onsortium www.iita.org Data Analysis - Tools Basic Local Alignment Search Tool Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST A member of CGIAR c onsortium www.iita.org Data Analysis - Tools Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST, Needleman-Wunsch, Smith-Waterman Needleman-Wunsch Smith-Waterman A member of CGIAR c onsortium www.iita.org Data Analysis - Tools ClustalW, T-coffee, Muscle Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST, Needleman-Wunsch, Smith-Waterman Ø Multi-sequence alignment Ø ClustalW, T-coffee, Muscle A member of CGIAR c onsortium www.iita.org Data Analysis - Tools EMBOSS Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST, Needleman-Wunsch, Smith-Waterman Ø Multi-sequence alignment Ø ClustalW, T-coffee, Muscle Ø Sequence analysis Ø EMBOSS http://emboss.sourceforge.net/what/ A member of CGIAR c onsortium www.iita.org Data Analysis - Tools UGENE Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST, Needleman-Wunsch, Smith-Waterman Ø Multi-sequence alignment Ø ClustalW, T-coffee, Muscle Ø Sequence analysis Ø EMBOSS, UGENE A member of CGIAR c onsortium www.iita.org Data Analysis - Tools Ø Sequence similarity Ø Pairwise-sequence alignment Ø BLAST, Needleman-Wunsch, Smith-Waterman Ø Multi-sequence alignment Ø ClustalW, T-coffee, Muscle Ø Sequence analysis Ø EMBOSS, UGENE A member of CGIAR c onsortium www.iita.org Data Analysis - Databases Ø Sequence databases Ø DNA Ø EMBL - ENA – Ensembl Ø NCBI – Entrez – GEO – UCSC Ø DDBJ Ø Pytozome Ø Protein Ø UniProt, SwissProt, Trembl Ø RNA Ø Rfam, miRBase, Silva, Ø Pattern & Structure databases Ø Annotation databases Ø Diseases Ø etc A member of CGIAR c onsortium www.iita.org Data Analysis - Databases Ø Sequence databases Ø DNA Ø EMBL - ENA – Ensembl Ø NCBI – Entrez – GEO – UCSC Ø DDBJ http://www.ddbj.nig.ac.jp/ http://www.ncbi.nlm.nih.gov/nucleotide/ A member of CGIAR c onsortium http://www.ebi.ac.uk/ena www.iita.org Data Analysis - Databases Ø Sequence databases Ø DNA Ø EMBL - ENA – Ensembl Ø NCBI – Entrez – GEO – UCSC A member of CGIAR c onsortium www.iita.org Data Analysis - Databases Ø Sequence databases Ø DNA Ø NCBI – Entrez – GEO – UCSC A member of CGIAR c onsortium www.iita.org Data Analysis - Databases Ø Sequence databases Ø DNA Ø Ensembl (Plant) A member of CGIAR c onsortium www.iita.org Data Analysis - Databases Ø Sequence databases Ø DNA Ø Phytozome A member of CGIAR c onsortium www.iita.org Data Analysis - Databases Ø Sequence databases Ø Protein Ø UniProt, SwissProt, Trembl A member of CGIAR c onsortium www.iita.org Data Analysis - Databases Ø Sequence databases RNA Ø Rfam, miRBase, Silva, A member of CGIAR c onsortium www.iita.org Bioinformatics and this is only the ….. ... enjoy bioinformatics!! [email protected] A member of CGIAR c onsortium www.iita.org