Download Introduction to Bioinformatics

Document related concepts
no text concepts found
Transcript
Bioinformatics & Biology
Andreas Gisel
IITA – Bioscience & Bioinformatics
A member of CGIAR c onsortium
www.iita.org
Andreas Gisel
Bioinformatics specialist at IITA
Institute for Biomedical Technologies, Italy
Novartis SA, pharmaceutical company, Switzerland
Trained as Molecular Biologist Novartis SA, pharmaceutical company, Switzerland
University of California, Berkeley
Federal Institute of Technology, Switzerland A member of CGIAR c onsortium
www.iita.org
Bioinformatics -­ definition
Bio – Biology, Life Sciences
DATA
INTERPRETATIONS
Informatics – computational sciences
A member of CGIAR c onsortium
RESULTS
Bio informatics
www.iita.org
Bioinformatics -­ definition
Bio – Biology, Life Sciences
INTERPRETATIONS
Informatics – computational sciences
RESULTS
DATA
Knowledge
Data Repositories
A member of CGIAR c onsortium
www.iita.org
Basic Biology
Life
The condition that distinguishes organisms
from inorganic objects and dead
organisms, being manifested by
Ø growth through metabolism,
Ø reproduction,
Ø and the power of adaptation to
environment through changes
originating internally.
A member of CGIAR c onsortium
www.iita.org
Basic Biology
Life
A member of CGIAR c onsortium
www.iita.org
Basic Biology
Cell
The cell (from Latin cella, meaning "small room") is the basic structural, functional, and biological unit of all known living organisms. A cell is the smallest unit of life that can replicate independently, and cells are often called the "building blocks of life".
A member of CGIAR c onsortium
www.iita.org
Basic Biology
Cell
A member of CGIAR c onsortium
www.iita.org
Basic Biology
DNA (Deoxyribonucleic acid), carrier of information
RNA (Ribodeoxynucleic acid), transported of information
Protein, functional unit orchestrating the cell functionality and cell division
A member of CGIAR c onsortium
www.iita.org
Basic Biology -­ DNA
DNA (Deoxyribonucleic acid), carrier of information
Up to hundreds of millions of nucleotides
4 nucleotide
Guanine
Cytosine
A member of CGIAR c onsortium
Adenine
Thymidine
www.iita.org
Basic Biology -­ DNA
DNA (Deoxyribonucleic acid), carrier of information
A member of CGIAR c onsortium
www.iita.org
Basic Biology -­ RNA
RNA (Ribodeoxynucleic acid), transported of information
4 nucleotide
Guanine
Cytosine
A member of CGIAR c onsortium
Adenine
Uracil
www.iita.org
Basic Biology -­ RNA
RNA (Ribodeoxynucleic acid), transported of information
RNA folding
Alternative splicing
A member of CGIAR c onsortium
www.iita.org
Basic Biology -­ Protein
Protein, functional unit orchestrating the cell functionality and cell division
20 amino acids
A member of CGIAR c onsortium
www.iita.org
Basic Biology -­ Protein
Protein, functional unit orchestrating the cell functionality and cell division
A member of CGIAR c onsortium
www.iita.org
Basic Biology – Genetic Code
How to translate the 4 letter code of the RNA into a 20 letter protein code?
It would require a minimum of three DNA nucleotides to "spell out" one amino acid, and indeed this is the number that is actually used.
Any single set of three nucleotides is called a codon, and the set of all possible three-­
nucleotide combinations is called "the genetic code" or "triplet code." There are sixty-­four different combinations or codons (4 × 4 × 4 = 64).
A member of CGIAR c onsortium
www.iita.org
Basic Biology – Genetic Code
A member of CGIAR c onsortium
www.iita.org
Basic Biology -­ Systems
A member of CGIAR c onsortium
www.iita.org
Basic Biology -­ Systems
A member of CGIAR c onsortium
www.iita.org
Basic Biology -­ Sequencing
Sequencing is the process of determining the precise order of nucleotides within a DNA molecule or order of amino acids in a protein molecule.
² First fully sequenced bio-­sequence
² amino acid of insulin (51aa) 1955
² First fully sequence nucleic acid
² tRNA (75nt) 1965
² First DNA
² Bacteriophage (5375nt) 1977
² DNA sequencing
² Sanger sequencing technology (1975)
² Pyrosequencing (Next Generation sequencing 2004)
A member of CGIAR c onsortium
www.iita.org
Basic Biology -­ Sequencing
Sequencing is the process of determining the precise order of nucleotides within a DNA molecule or order of amino acids in a protein molecule.
DNA
ATGTCTGGGCTTGTGGGCTTGCTTGTGGGTCTTGTGCTGGTGGGTT
CTGTTAGCTCAGCGAAATTCGATGAGCTATTTCAACCCGGCTGGGC
RNA
AUGUCUGGGCUUGUGGGCUUGCUUGUGGGUCUUGUGCUGGUGGGUU
CUGUUAGCUCAGCGAAAUUCGAUGAGCUAUUUCAACCCGGCUGGGC
Protein
MVGMDLFKCVMMIMVLVVSCGEAVSGAKFDELYRSSWAMDHCVNEG
EVTKLKLDNYSGAGFESRSKYLFGKVSIQIKLVEGDSAGTVTAFYM
A member of CGIAR c onsortium
www.iita.org
Biological Data
Ø Descriptions
Ø Pictures
?
A member of CGIAR c onsortium
www.iita.org
Biological Data
Ø Descriptions
Ø Pictures
Ø Sequences
A member of CGIAR c onsortium
www.iita.org
Biological Data
Ø Descriptions
Ø Pictures
Ø Sequences
Ø Protein
Ø RNA
Ø DNA
A member of CGIAR c onsortium
www.iita.org
Biological Data
Ø Descriptions
Ø Pictures
Ø Sequences
Ø Protein
Ø RNA
Ø DNA
Ø Structures
A member of CGIAR c onsortium
www.iita.org
Biological Data
Ø Descriptions
Ø Pictures
Ø Sequences
Ø Protein
Ø RNA
Ø DNA
Ø Structures
Ø Protein
Ø RNA
A member of CGIAR c onsortium
www.iita.org
Biological Data
Ø Descriptions
Ø Pictures
Ø Sequences
Ø Protein
Ø RNA
Ø DNA
Ø Structures
Ø Protein
Ø RNA
Ø Interactions
A member of CGIAR c onsortium
www.iita.org
Biological Data
Ø Descriptions
Ø Pictures
Ø Sequences
Ø Protein
Ø RNA
Ø DNA
Ø Structures
Ø Protein
Ø RNA
Ø Interactions
Ø Expressions
A member of CGIAR c onsortium
www.iita.org
Data Explosion
Ø Descriptions
Ø Pictures
Up to 600’000’000’000 (600GB) bases per experiment
Ø Sequences
Ø Protein
Ø RNA
Ø DNA
Ø Structures
NGS
(Next Generation Sequencing)
Ø Protein
Ø RNA
Ø Interactions
Ø Expressions
A member of CGIAR c onsortium
Up to 1 million data points per experiment
www.iita.org
Data Explosion
Ø Descriptions
Ø Pictures
Ø Sequences
Ø Protein
Ø RNA
Ø DNA
Ø Structures
Ø Protein
Ø RNA
Ø Interactions
Ø Expressions
A member of CGIAR c onsortium
www.iita.org
Data Analysis – DNA/RNA sequences
Sequence without knowledge connected to it is meaningless!
What to do?
Ø Sequence similarity
Ø Finding genes and regulatory elements
Ø Functional analysis of genes
Ø Homology
Ø Polymorphism
A member of CGIAR c onsortium
www.iita.org
Data Analysis
So we need bioinformatics tools and reference data
Hardware – Computing infrastructure (CPU, RAM, Storage)
Tools – Programs that process your data
Reference data – Databases for existing data
INTERNET– connection to external Databases
A member of CGIAR c onsortium
www.iita.org
Bioinformatics @ IITA
Hardware – Computing infrastructure (CPU, RAM, Storage)
HP Blade, with: Ø 3 blades with each 2 16-­core processors (AMD Opteron Processor 6272), Ø 384Gb RAM Ø 2Tb attached storage (DAS)
Ø 8TB attached storage (NAS)
The operating system is Ubuntu 14.04.1 LTS installed via biolinux 8.
A member of CGIAR c onsortium
www.iita.org
Data Analysis
So we need bioinformatics tools and reference data
Tools – Programs that process your data
Reference data – Databases for existing data
A member of CGIAR c onsortium
www.iita.org
Data Analysis – Data Format
FASTA
A member of CGIAR c onsortium
www.iita.org
Data Analysis – Data Format
GeneBank
A member of CGIAR c onsortium
www.iita.org
Data Analysis – Data Format
EMBL
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST, Needleman-­Wunsch, Smith-­Waterman
Ø Multi-­sequence alignment
Ø ClustalW, T-­coffee, Muscle Ø Sequence analysis
Ø EMBOSS, UGENE
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
Basic Local Alignment Search Tool
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST
http://blast.ncbi.nlm.nih.gov/Blast.cgi
http://www.ebi.ac.uk/ena/search/#Search
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
Basic Local Alignment Search Tool
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
Basic Local Alignment Search Tool
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
Basic Local Alignment Search Tool
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST, Needleman-­Wunsch, Smith-­Waterman
Needleman-­Wunsch
Smith-­Waterman
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
ClustalW, T-­coffee, Muscle
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST, Needleman-­Wunsch, Smith-­Waterman
Ø Multi-­sequence alignment
Ø ClustalW, T-­coffee, Muscle
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
EMBOSS
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST, Needleman-­Wunsch, Smith-­Waterman
Ø Multi-­sequence alignment
Ø ClustalW, T-­coffee, Muscle Ø Sequence analysis
Ø EMBOSS
http://emboss.sourceforge.net/what/
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
UGENE
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST, Needleman-­Wunsch, Smith-­Waterman
Ø Multi-­sequence alignment
Ø ClustalW, T-­coffee, Muscle Ø Sequence analysis
Ø EMBOSS, UGENE
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Tools
Ø Sequence similarity
Ø Pairwise-­sequence alignment
Ø BLAST, Needleman-­Wunsch, Smith-­Waterman
Ø Multi-­sequence alignment
Ø ClustalW, T-­coffee, Muscle Ø Sequence analysis
Ø EMBOSS, UGENE
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Databases
Ø Sequence databases
Ø DNA
Ø EMBL -­ ENA – Ensembl
Ø NCBI – Entrez – GEO – UCSC
Ø DDBJ
Ø Pytozome
Ø Protein
Ø UniProt, SwissProt, Trembl
Ø RNA Ø Rfam, miRBase, Silva, Ø Pattern & Structure databases
Ø Annotation databases
Ø Diseases
Ø etc
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Databases
Ø Sequence databases
Ø DNA
Ø EMBL -­ ENA – Ensembl
Ø NCBI – Entrez – GEO – UCSC
Ø DDBJ
http://www.ddbj.nig.ac.jp/
http://www.ncbi.nlm.nih.gov/nucleotide/
A member of CGIAR c onsortium
http://www.ebi.ac.uk/ena
www.iita.org
Data Analysis -­ Databases
Ø Sequence databases
Ø DNA
Ø EMBL -­ ENA – Ensembl
Ø NCBI – Entrez – GEO – UCSC
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Databases
Ø Sequence databases
Ø DNA
Ø NCBI – Entrez – GEO – UCSC
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Databases
Ø Sequence databases
Ø DNA
Ø Ensembl (Plant)
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Databases
Ø Sequence databases
Ø DNA
Ø Phytozome
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Databases
Ø Sequence databases
Ø Protein
Ø UniProt, SwissProt, Trembl
A member of CGIAR c onsortium
www.iita.org
Data Analysis -­ Databases
Ø Sequence databases
RNA Ø Rfam, miRBase, Silva, A member of CGIAR c onsortium
www.iita.org
Bioinformatics
and this is only the …..
... enjoy bioinformatics!!
[email protected]
A member of CGIAR c onsortium
www.iita.org
Related documents