Download Bioinformatics Tools

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polycomb Group Proteins and Cancer wikipedia , lookup

Protein moonlighting wikipedia , lookup

Copy-number variation wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

NUMT wikipedia , lookup

Oncogenomics wikipedia , lookup

Transposable element wikipedia , lookup

Genetic engineering wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

Metagenomics wikipedia , lookup

ENCODE wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Non-coding DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomic library wikipedia , lookup

History of genetic engineering wikipedia , lookup

Human genome wikipedia , lookup

Pathogenomics wikipedia , lookup

Minimal genome wikipedia , lookup

Helitron (biology) wikipedia , lookup

Public health genomics wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome editing wikipedia , lookup

Genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Introduction to Bioinformatics
234525-236523
Lecturer: Dr. Yael Mandel-Gutfreund
Teaching Assistance:
Martin Akerman
Sivan Bercovici
Course web site :
http://webcourse.cs.technion.ac.il/234525
What is Bioinformatics?
2
Course Objectives
• To introduce the bioinfomatics discipline
• To make the students familiar with the major
biological questions which can be addressed
by bioinformatics tools
• To introduce the major tools used for
sequence and structure analysis and explain
in general how they work (limitation etc..)
3
Course Structure and Requirements
1.Class Structure
1.
2.
2 hours Lecture
1 hour tutorial
2. Home work
•
Homework projects will be given every second week
•
The homework will be done in pairs.
•
5/5 homework projects submitted
2. A final project will be conducted and submitted
in pairs
4
Grading
• 30 % Homework assignments
• 70% final project
5
Literature list
• Gibas, C., Jambeck, P. Developing Bioinformatics
Computer Skills. O'Reilly, 2001.
• Lesk, A. M. Introduction to Bioinformatics. Oxford
University Press, 2002.
• Mount, D.W. Bioinformatics: Sequence and Genome
Analysis. 2nd ed.,Cold Spring Harbor Laboratory
Press, 2004.
Advanced Reading
Jones N.C & Pevzner P.A. An introduction to
Bioinformatics algorithms MIT Press, 2004
6
What is Bioinformatics?
7
What is Bioinformatics?
“The field of science in which biology, computer
science, and information technology merge to
form a single discipline”
Ultimate goal: to enable the discovery of new
biological insights as well as to create a global
perspective from which unifying principles in
biology can be discerned.
8
from purely lab-based science to an information science
Bioinformatics
Bio = Informatics
9
Central Paradigm in Molecular Biology
Gene (DNA)
mRNA
Protein
21ST centaury
Genome
Transcriptome
Proteome
10
Genome
• Chromosomal DNA of an organism
• Coding and non-coding DNA
• Genome size and number of genes does not
necessarily determine organism complexity
11
Transcriptome
• Complete collection of all possible mRNAs
(including splice variants) of an organism.
• Regions of an organism’s genome that get
transcribed into messenger RNA.
• Transcriptome can be extended to include all
transcribed elements, including non-coding RNAs
used for structural and regulatory purposes.
12
Proteome
• The complete collection of proteins that can
be produced by an organism.
• Can be studied either as static (sum of all
proteins possible) or dynamic (all proteins
found at a specific time point) entity
13
From DNA to Genome
Watson and Crick
DNA model
First protein
sequence
1955
1960
First protein
structure
1965
1970
1975
1980
1985
14
1990
First bacterial
genome
1995
Hemophilus Influenzae
Yeast genome
2000
First human
genome draft
15
Complete Genomes
Total
2008 2007
706 456
Eukaryotes
78
43
Bacteria
578 383
Archaea
50
29
16
Perhaps not surprising!!!
How humans
are chimps?
Comparison between the full drafts of the human and chimp genomes
revealed that they differ only by 1.23%
17
What’s Next ?
The “post-genomics” era
Annotation
Comparative
genomics
Structural
genomics
Functional
genomics
Goal:
to understand the living cell
18
Annotation
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG
CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA
CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
TAT GGA CAA TTG GTT TCT TCT CTG AAT ......
.............. TGAAAAACGTA
19
Identify the genes within a
given sequence of DNA
Identify the sites
Which regulate the gene
Annotation
Predict the function
20
TF binding site
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG
CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA
CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
TAT GGA CAA TTG GTT TCT TCT CTG AAT .................................
Transcription
Start Site
promoter
.............. TGAAAAACGTA
ORF=Open Reading Frame Ribosome binding Site
CDS=Coding Sequence
21
Comparative
genomics
Human ATAGCGGGGGGATGCGGGCCCTATACCC
Chimp ATAGGGG - - GGATGCGGGCCCTATACCC
Mouse ATAGCG - - - GGATGCGGCGC -TATACCA
22
Researchers have learned a great deal about the
function of human genes by examining their
counterparts in simpler model organisms such as
the mouse.
Conservation of the IGFALS (Insulin-like growth factor)
Between human and mouse.
23
Functional
genomics
24
Understanding the function of genes and other parts of the genome
25
A network of interactions can be built
For all proteins in an organism
A large network of 8184 interactions among 4140 S. Cerevisiae
proteins
26
Structural
genomics
27
Assigning the structures of all proteins
protein complexes
Biologic processes
Shape and electrostatics
Active sites
fold
Evolutionary
relationship
Protein-ligand complexes
Functional sites
28
Resources and Databases
The different types of data are collected in
database
– Sequence databases
– Structural databases
– Databases of Experimental Results
All databases are connected
29
Sequence databases
•
•
•
•
Gene database
Genome database
SNPs database
Disease related mutation database
30
Gene database
• Give information into gene functionality
• Alternative splicing of genes
– Alternative pattern of exons included to create
gene product
• EST
31
Genome Databases
• Data organized by species
• Clones assembled into contigous pieces
‘contigs’ or whole chromosomes
• Information on non-coding regions
• Relativity
32
Genome Browsers
• Annotation adds value to sequence
• Easy “walk” through the genome
• Comparative genomics
33
Genome Browsers
• UCSC Genome Browser
http://genome.ucsc.edu/
• Ensembl Genome Browser (http://www.ensembl.org)
• WormBase: http://www.wormbase.org/
• AceDB: http://www.acedb.org/
• Comprehensive Microbial Resource:
http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl
• FlyBase: http://flybase.bio.indiana.edu/
34
SNP database
Single Nucleotide Polymorphisms (SNPs)
• Single base difference in a single position
among two different individuals of the same
species
• Play an important role in differentiation and
disease
35
Sickle Cell Anemia
• Due to 1 swapping an A for a T, causing inserted amino
acid to be valine instead of glutamine in hemoglobin
Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/
36
Healthy Individual
>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA
GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC
AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC
TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT
CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT
GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC
>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]
EEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG
MVHLTP
AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN
ALAHKYH
37
Diseased Individual
>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA
GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC
AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC
TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT
CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT
GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC
>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]
VEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG
MVHLTP
AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN
ALAHKYH
38
Disease Databases
• Genes are involved in disease
• Many diseases are well studied
• Description of diseases and what is known
about them is stored
39
Structure Databases
• 3-dimensional structures of proteins, nucleic
acids, molecular complexes etc
• 3-d data is available due to techniques such
as NMR and X-Ray crystallography
40
41
Databases of Experimental Results
• Data such as experimental microarray
images- expression data
• Proteomic data
• Metabolic pathways, protein-protein
interaction data, regulatory networks
• ETC………….
42
Literature Databases
PubMed
http://www.ncbi.nlm.nih.giv/PubMed
Service of the National Library of Medicine
• MEDLINE publication database
– Over 17,000 journals
– 15 million citations since 1950
43
Putting it all Together
• Each Database contains specific
information
• Like other biological systems also these
databases are interrelated
44
PROTEIN
PIR
DISEASE
ASSEMBLED
GENOMES
LocusLink
SWISS-PROT
OMIM
GoldenPath
OMIA
WormBase
MOTIFS
TIGR
BLOCKS
Pfam
GENOMIC DATA
Prosite
GenBank
ESTs
dbEST
DDBJ
GENES
EMBL
RefSeq
unigene
AllGenes
SNPs
GENE
EXPRESSION
dbSNP
STRUCTURE
PDB
MMDB
SCOP
PATHWAY
Stanford MGDB
KEGG
NetAffx
COG
ArrayExpress
GDB
LITERATURE
PubMed
45