Download Nucleotide Database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Introduction to :
Nucleotide Database
Presented by:
Leila Mirzapour
Nucleotide Database
URL:http://www.ncbi.nlm.nih.gov/nucleotide/
• The Nucleotide database is a collection of sequences from several sources,
including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript
sequence data provide the foundation for biomedical research and
discovery.
What is GenBank?
• GenBank ® is the NIH genetic sequence database, an annotated collection
of all publicly available DNA sequences ( Nucleic Acids Research , 2011
Jan;39(Database issue):D32-7 ). There are approximately 126,551,501,141
bases in 135,440,924 sequence records in the traditional GenBank divisions
and 191,401,393,188 bases in 62,715,288 sequence records in the WGS
division as of April 2011.
• The complete release notes for the current version of GenBank are
available on the NCBI ftp site. A new release is made every two months.
GenBank is part of the International Nucleotide Sequence Database
Collaboration , which comprises the DNA DataBank of Japan (DDBJ), the
European Molecular Biology Laboratory (EMBL), and GenBank at NCBI.
These three organizations exchange data on a daily basis.
Access to GenBank
 There are several ways to search and retrieve data from GenBank.
• Search GenBank for sequence identifiers and annotations with Entrez
Nucleotide , which is divided into three divisions: CoreNucleotide (the
main collection), dbEST (Expressed Sequence Tags), and dbGSS (Genome
Survey Sequences).
• Search and align GenBank sequences to a query sequence using BLAST
(Basic Local Alignment Search Tool). BLAST searches CoreNucleotide,
dbEST, and dbGSS independently; see BLAST info for more information
about the numerous BLAST databases.
• Search, link, and download sequences programatically using NCBI eutilities .
GenBank Data Usage
• The GenBank database is designed to provide and encourage access within
the scientific community to the most up to date and comprehensive DNA
sequence information. Therefore, NCBI places no restrictions on the use or
distribution of the GenBank data. However, some submitters may claim
patent, copyright, or other intellectual property rights in all or a portion of
the data they have submitted. NCBI is not in a position to assess the
validity of such claims, and therefore cannot provide comment or
unrestricted permission concerning the use, copying, or distribution of the
information contained in GenBank.
RefSeq
• The Reference Sequence (RefSeq) collection aims to provide a
comprehensive, integrated, non-redundant, well-annotated set of sequences,
including genomic DNA, transcripts, and proteins. RefSeq is a foundation
for medical, functional, and diversity studies; they provide a stable
reference for genome annotation, gene identification and characterization,
mutation and polymorphism analysis (especially RefSeqGene records),
expression studies, and comparative analyses.
Nucleotide Tools
•
•
•
•
•
Submit to GenBank
LinkOut
E-Utilities
BLAST
Batch Entrez
LinkOut
• LinkOut is a service that allows you to link directly from PubMed and
other NCBI databases to a wide range of information and services beyond
the NCBI systems. LinkOut aims to facilitate access to relevant online
resources in order to extend, clarify, and supplement information found in
NCBI databases. Examples of LinkOut Resources include full-text
publications, biological databases, consumer health information, research
tools, and more.
• All links are specially assigned to specific database records. When
accessing a link through LinkOut, no additional searching should be
necessary to access the relevant resource that has been linked to the record.
Online resources that may be valuable to users of PubMed and other NCBI
databases are encourage to participate in LinkOut.
E-utilities
• The Entrez Programming Utilities (E-utilities) are a set of eight server-side
programs that provide a stable interface into the Entrez query and database
system at the National Center for Biotechnology Information (NCBI). The
E-utilities use a fixed URL syntax that translates a standard set of input
parameters into the values necessary for various NCBI software
components to search for and retrieve the requested data. The E-utilities are
therefore the structured interface to the Entrez system, which currently
includes 38 databases covering a variety of biomedical data, including
nucleotide and protein sequences, gene records, three-dimensional
molecular structures, and the biomedical literature.
Blast
•
•
Blast programs use a heuristic search algorithm. The programs use the satistical
methods of Karlin and Altschul.
Blast programs were designed for fast database searching, with minimal
sacrifice of sensitivity to distant related sequences.
 Blast Programs
Blast is actually a family of programs
 BLASTN – Nucleotide query searching a nucleotide database.
 BLASTP – Protein query searching a protein database.
 BLASTX – Translated nucleotide query sequence (6 frames) searching a
protein database.
 TBLASTN – Protein query searching a translated nucleotide (6 frames)
database.
 TBLASTX – Translated nucleotide query (6 frames) searching a translated
nucleotide (6 frames) database.

•
•
•


•
•
•
Blast method
Compare query to each sequence in database
Use heuristic to speed pairwise comparison
Create ‘sequence abstraction’ by listing exact and similar words
On the fly for the query
In advance for the database
Find semilar words between query and each database sequence
Extend such words to obtain high-scoring sequence pairs (HSPs)
Calculate statistics analytically
Batch Entrez
• Use Batch Entrez to upload a file of GIs or accession numbers from the
Nucleotide or Protein databases, or upload a list of record identifiers from
other Entrez databases.
INSDC
• The International Nucleotide Sequence Databases (INSD) have been
developed and maintained collaboratively between DDBJ, ENA, and
GenBank for over 18 years.
• The INSDC advisory board, the International Advisory Committee , is
made up of members of each of the databases' advisory bodies. At their
most recent meeting, members of this committee unanimously endorsed
and reaffirmed the existing data-sharing policy of the three databases that
make up the INSDC, which is stated below.
• Individuals submitting data to the international sequence databases should
be aware of INSDC policy.
‫چهار دعای برتر تحویل سال‪ :‬اول دعا برای ظهور آن بی مثال‪ ،‬دوم تمام ملت بی ضرر و بی‬
‫مالل‪،‬سوم رسیدن ما به قله کمال‪ ،‬چارم تمام جیب ها پر از پول اما حالل ‪...‬‬
‫پیشاپیش عیدتان مبارک‬