Download Data management

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression profiling wikipedia , lookup

Point mutation wikipedia , lookup

Genealogical DNA test wikipedia , lookup

DNA sequencing wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Primary transcript wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

DNA barcoding wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Epigenomics wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Molecular cloning wikipedia , lookup

Public health genomics wikipedia , lookup

Gene wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Minimal genome wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Human genome wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Synthetic biology wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Genome evolution wikipedia , lookup

Whole genome sequencing wikipedia , lookup

RNA-Seq wikipedia , lookup

History of genetic engineering wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Genomic library wikipedia , lookup

Pathogenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome editing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

Transcript
NCBI Bioinformatics Workshop
Rabat, Morocco 2012
What is Bioinformatics?
Bioinformatics is the application of information technology to
the field of molecular biology.
The term bioinformatics was coined by Paulien Hogeweg in
1979 for the study of informatics' processes in biotic systems.
Its primary use since at least the late 1980s has been in
genomics and genetics, particularly in those areas of
genomics involving large-scale DNA sequencing.
Bioinformatics now entails the creation and advancement of
databases, algorithms, computational and statistical
techniques, and theory to solve formal and practical problems
arising from the management and analysis of biological data.
Wikipedia
What is NCBI?
On November 4, 1988 that President Ronald
Reagan signed the Health Omnibus Extension
Act to create The National Center for
Biotechnology Information as part of National
Library of Medicine at NIH.
• Create automated systems for knowledge
about molecular biology, biochemistry, and
genetics.
• Perform research into advanced methods of
analyzing and interpreting molecular
biology data.
• Enable biotechnology researchers and
medical care personnel to use the systems
and methods developed.
History of molecular biology
1860 Genetics Gregor Mendel
discovered that genes determine
characteristics of the organism genes are
passed to children from both parents
1943 Molecular biology
James Watson discovered that DNA
molecule might store the genes
1962 Noble Prize James Watson, Francis Crick,
Wilkins (Rosaline Franklin)
1970 Central Dogma (first announced in 1952)
and reinstated by Francis Crick in Nature.
Central Dogma of molecular biology
The central dogma of molecular biology was first enunciated by Francis Crick in 1958[1] and restated in a Nature paper published in 1970
The general transfers describe the normal flow of biological information: DNA can be copied to DNA
(DNA replication), DNA information can be copied into mRNA, (transcription), and proteins can
be synthesized using the information in mRNA as a template (translation).
Does the central dogma still stand?
Koonin EV. Biol Direct. 2012 Aug 23;7(1):27. [Epub ahead of print]
History of biotechnology
1590 the microscope is discovered by Janssen
1675 Leeuwehoek discovered protozoa and bacteria
1855 Escherichia coli bacterium is discovered (major research and production tool for
biotechnology
1879 Flemming discovered chromatin, rod-like structures in cell nucleus, later called
‘chromosomes’
1942 The electron microscope is used to identify and characterize a bacteriophage- a virus
that infects bacteria.
1953 Watson and Crick reveal the three-dimensional structure of DNA.
1973 Cohen and Boyer perform the first successful recombinant DNA experiment, using
bacterial genes.
1983 The Polymerase Chain Reaction (PCR) technique
1995 First bacterial genome is sequenced by whole genome shotgun technology
2001 The sequence of the human genome is published in Science and Nature, making it
possible for researchers all over the world to begin developing treatments.
2005 Next Generation Sequencing: Illumna, MySeq, Ion Toron, PAcBio
History of Bioinformatics
Sequence database
1960 - Margaret Dayhoff collected sequences in a database that later become PIR
1974 –GenBank; 1980 –EMBL(ENA); 1984 – DDBJ; 1984 –SwissProt
Sequence comparison
1970 – Needleman- Wuncsh global pairwise alignment
1972 - Smith-Waterman local alignment
1973 – multiple alignment
Database searches by sequence similarity
1988 – FASTA by Pearson and Lipman
1990 – BLAST by Altshul, Gish, Lipman
Text search and retrieval system
1990 – Entrez designed by Lipman and Benson
Algorithms
Gene prediction
Protein structure
Hidden Markov Model
Clustering
Trees
Problem Solving
DATA
Data
manag
ment
Validation
Experim
ent
Hypothesis
Interpretation
MODEL
Visualization
Analysis
For every complex problem, there is an answer that is clear, simple, and wrong…
- H. L. Mencken
ROC curve analysis
Receiver Operating Characteristic (ROC) curve analysis (Metz, 1978; Zweig & Campbell, 1993)
Challenges in Computational Biology
Multiple alignments and
phylogenetic tree
Protein structure prediction
Protein
Genome assembly and annotation
Homology searches
Challenging issues in Bioinformatics
• Data management
processing, storage
accuracy (highthrouput low quality)
search and retrieval
presentation
• Data analysis
algorithms
statistical techniques
• Simulation modeling and prediction
Parameter estimation
prediction accuracy
NCBI mission: discovery
initiative
Validation
NCBI
Analysis
Search
Visualization
What is GenBank?
NCBI’s Primary Sequence Database
• Nucleotide only sequence database
• Archival in nature
– Historical
– Reflective of submitter point of view (subjective)
– Redundant
• GenBank Data
– Direct submissions (traditional records)
– Batch submissions (EST, GSS, STS)
– ftp accounts (genome data)
• Three collaborating databases
– GenBank
– DNA Database of Japan (DDBJ)
– European Molecular Biology Laboratory (EMBL) Database
Sequence Databases
RefSeq
Labs
Sequencing
Centers
TATAGCCG
AGCTCCGATA
CCGATGACAA
Curators
TATAGCCG
TATAGCCG
TATAGCCG
TATAGCCG
Updated
continually
by NCBI
GenBank
Updated ONLY
by submitters
Genome
Assembly
UniGene
Algorithms
Next Generation Sequencing
Next Generation Sequencing
NGS produces a lot of data
Information retrieval
NCBI Discovery initiative
Entrez Search and retrieval system
Vice President Gore 1997
"From a computer in the comfort of your own home or from one in your neighborhood library,
you will be able to access timely and accurate information. Already 30,000 people a day are
using MEDLINE. By making it more accessible -- free and private -- we can increase that number
many times over."
Improve information retrieval
Add
links
filters
Related information
Rescuing Zero-Result PubMed Searches
2011
2008
Zero-result rescued
by spelling
19%
Improvement
Zero-result rescued
by spelling
Auto-complete
Gene sensor
Citation sensor/Hydra
16% of all
PubMed
searches
Unassisted
Unassisted
37%
Improvement
Sequence analysis
Visualization
NCBI Bioinformatics Workshop 2009
NCBI Bioinformatics Workshop 2011