Download Protein sequence database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenomics wikipedia , lookup

Genomic library wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Pathogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

DNA supercoil wikipedia , lookup

DNA barcoding wikipedia , lookup

Designer baby wikipedia , lookup

DNA vaccination wikipedia , lookup

Molecular cloning wikipedia , lookup

Human genome wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Primary transcript wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Public health genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Microsatellite wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Microevolution wikipedia , lookup

Non-coding DNA wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Deoxyribozyme wikipedia , lookup

History of genetic engineering wikipedia , lookup

Point mutation wikipedia , lookup

Genome editing wikipedia , lookup

Metagenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

Transcript
seminar on
bioinformatics
BY
S.JHANSI RANI
MPHARMACY II SEMESTER
DEPARTMENT OF INDUSTRIAL PHARMACY
UNIVERSITY COLLEGE OF PHARMACEUTICAL SCIENCES
KAKATIYA UNIVERSITY
WARANGAL
Contents:
Introduction
Data bases
DNA sequence data
Biological data
Molecular biology
DNA and RNA
Bioinformatics software
Personalized medicine
Single Nucleotide Polymorphism
Molecular modelling
Drug docking
Applications
Conclusion
References
BIOINFORMATICS
 Bioinformatics has been defined as a means for analyzing, comparing,
graphically displaying, modeling, storing, systemizing, searching, and
ultimately distributing biological information, which includes sequences,
structures and function.
 Bioinformatics is a serious attempt to understand what it means when we say
that genes code for physiological traits, like intelligence, brown hair, or
susceptibility to cancer.
DATABASES
A database is a collection of information that is organized so that it can easily
be accessed, managed, and updated. In one view, databases can be
classified according to type of content: bibliographic, full-text,
numeric, and images.
If the information is about biological data, such as nucleotide or protein
sequence, it’s secondary or 3D-structure, metabolic pathways, microarray data and scientific publications etc., then the data base is called
biological databases
 A primary biological databases give information about sequence or structure
information alone.
Primary nucleotide sequence database: GenBank, EMBL, DDBJ
Primary protein sequence database: PIR-PSD, Swiss-prot, TrEMBL.
Primary structure database: PDB
 Secondary structure database gives information on classification of proteins
based on their structure.
Secondary structure database: CATH, SCOP, DSSP, FSSP, DALI.
Nucleotide sequence database: Genbank
Protein sequence database: Swissprot
Structure database: PDB
GenBank-The nucleotide sequence database
This is the NIH genetic sequence database, an annotated collection of all publicly
available DNA sequences.
Genbank is part of the international nucleotide sequence database collaboration,
which comprises the DNA databank of Japan (DDBJ), the European molecular
biology laboratory (EMBL), and Genbank at NCBI. These three organization
exchange data on daily basis.
uniprotKB/Swiss-prot The protein sequence databases
This is manually annotated protein knowledgebase established and maintained
by the uniprot consortium, a collaboration between the swiss institute of
Bioinformatics (SIB) and the department of bioinformatics and structural
biology of the Geneva University,the European Bioinformatics institute (EBI)
and the Georgetown University medical center’s protein information
Resource (PIR).
PDB-The structural database
Protein databank
 This is an international archive of 3D-structural information for biological
macromolecules.
 PDB is managed by the RCSB (research collaboratory for structural
Bioinformatics)
 Secondary structure describes its features in the molecule.In particular, it
specifies the positions of alpha helices, beta sheets and turns in the protein
specifies tertiary interactions such as disulphide bonds, hydrogen bonds and
salt bridges.
Accessing biological databases
One can access Biological information from databases through
Entrez
SRS-Sequence retrieval system.
DBJET
Entrez
The Entrez global query cross-database search system is a powerful federated
search engine, or web portal that allows users to search many discrete health
sciences databases at the NCBI website.
Entrez can efficiently retrieve related sequences, structures, and references. This
system can provide views of gene and protein sequences and chromosome
maps.
SRS (sequence retrieval system) is a system for integrating heterogenous
databases.
DBJET is an integrated database retrieval system for major biological databases.
What Kinds of Information?
Bioinformatics deals with any type of data that is of interest to biologists
– DNA and protein sequences
– Images of microarrays
– Raw data collected from any type of field or laboratory experiment
– Articles from the literature and databases of citations.
COMPONENTS OF BIOINFORMATICS
Creation of databases: This involves the organizing, storage and management
The biological data sets. The databases are accessible to researchers to know the
existing information and submit new entries.
Development of algorithms and statistics: It involves the development of
tools and resources to determine the relationship among the members of large data
sets. e.g. comparison of protein sequence data with the already existing protein
sequences.
Analysis of data and interpretation: The appropriate use of databases to
analyse the data and interpret the results in a biologically meaningful manner.
Molecular Biology
 signals are received at the cell surface, and travel eventually to the nucleus
 Transcription factors cause the signal to be converted into a change in
expression of a gene
 The gene products are converted to proteins in the cytoplasm where they can
now effect further changes in the cell.
The Overview of DNA and RNA
 Deoxyribonucleic acid (DNA) is a macromolecular chain of nucleotides that
serves as a basic carrier of genetic information and is able to self-replicate.
DNA can be represented as a sequence of nucleotide bases.
DNA
 DNA sequences are typically from thousands to millions of bases long. DNA
usually consists of two strands of complementary nucleotide sequences that are
base paired to each other. DNA in humans forms a linear chain, but DNA can
also form a circular molecule.
 A hypothetical double-stranded DNA molecule can be represented as
ACGTGGTAGAGACCCTGTGTGATAGACCACGGGTA
TGCACCATCTCTGGGACACACTATCTGGTGCCCAT
As A pairs with T and C pairs with G and vice versa
Here A - Adenine, C - Cytocine, G –guanine, T-Thymine
RNA
The RNA is the same as DNA with the exception that T is replaced by U, which
represents uracil nucleotide. An organism is further classified into two
types.
Eukaryotes - higher-order organisms whose DNA is enclosed in a cell nucleus.
E.g. humans
Prokaryotes - organisms such as bacteria whose DNA is not enclosed in a
nucleus. E.g. bacteria
GENOMICS
Genome
complete set of genetic instructions for making an organism
 Genomics
attempts to analyze or compare the entire genetic content of
species
3 billion chemical base pairs make up human DNA
 There are about 30,000 genes
 There are about 100,000 proteins
Changes in a single base pair are responsible for many defects.
Genomics
Comparative Genomics:
For understanding the genomes of different species of organisms
Functional Genomics:
Identification of genes and their respective functions
Structural Genomic: Predictions related to functions of proteins
BIOINFORMATICS SOFTWARE
GCG
 Genetics Computer Group
 The Wisconsin Package for Sequence Analysis Consists of 130+ integrated
programs
 Web based, command-line and X window analysis
SeqWeb






Database Searching
and Retrieval for GCG
Comparison
Protein Analysis
Mapping
Pattern Recognition
EMBOSS
 EMBOSS is a site where you will find around 100 bioinformatics programs
 Sequence alignment
 Database search with sequence pattern
 Protein motif identification
RASMOL
 RasMol is a molecular graphics program intended for the visualization of
proteins, nucleic acids and small molecules.
 It displays the molecule on the screen in a variety of color schemes and
molecule representations
 The loaded molecule can be shown as wireframe bonds, cylinder, stick bonds,
space filling spheres, molecular ribbons.
 BLAST
 Basic Local Alignment Search Tool (BLAST)
 Collection of Software Program Tool
 Software version 2.1.13 offered by




National Center for Biotechnology Information (NCBI)
at the National Institutes of Health
Compares nucleotide or protein sequences to sequence databases
Finds regions of local similarity between sequences
Calculates the statistical significance of matches
Helps infer functional relationships between sequences and identify members
of gene families
PERSONALISED MEDICINE
 A lifelong, individually tailored health care approach to the detection, prevention
and treatment of disease based on knowledge of an individual's precise genetic
profile
 The promise of pharmacogenomics is that both the choice of the drug and its dose
will be determined by the individual genetic make up leading to the personalised,
more efficacious and less harmful drug therapy.
SINGLE NUCLEOTIDE POLYMORPHISM(SNP)
 Scattered throughout the human genome are millions of discrete, one-letter
variations known as SNPs.
 Most SNPs are benign, with absolutely no effect on gene structure or
expression.
 But a subset of these variations provides crucial links to disease-causing
genes, either because they directly alter a gene's activity or because they
help pinpoint the location of such a disease-related gene.
 SNPs are also found in genes for drug-metabolizing enzymes, influencing
individuals' ability to process a drug properly.
 The sequence of bases in DNA varies from person to person - resulting in the
individual characteristic of every human being.
 The SNPs are variations in DNA at a single base.
 The SNPs which serve as genetic markers for identifying disease lead to
personalized medicines for a wide variety of diseases.
MOLECULAR MODELLING
Cn3D
It is a software from united states National Library of Medicine
Used to view three-dimensional structures from NCBI’s Entrez retrieval
service
 It simultaneously displays structure sequence, and alignment.
What sets Cn3D apart from other software is its ability to correlate
structure and sequence information
Ex: A scientist can quickly find the residues in a crystal structure that
corresponding to known disease mutations, or conserved active site residues
from a family of sequence homologs.
 Cn3D display structure-structure alignments along with their structure
based sequence alignments, to emphasize what regions of group of related
proteins are most conserved in structure and sequence.
Drug docking:
In the field of molecular modeling, docking is a method which predicts the
preferred orientation of one molecule to a second when bound to each other to
form a stable complex.
Knowledge of the preferred orientation in turn may be used to predict the
strength of association or binding affinity between two molecules.
Docking is frequently used to predict the binding orientation of small
molecule drug candidates to their protein targets in order to, inturn, predict the
affinity and activity of the small molecule.
Applications:
Designing Drugs
• Understanding of how Structures Bind Other Molecules (Function)
• Designing of Inhibitors
• Docking, Structure Modeling
Conclusion:
During the past decade there have been tremendous technical advances
in the life and medical sciences. Nowhere have such advances been more
dramatic than in the fields of genome sequencing and protein
identification. Along with these advances has come a flood of genetic
and biochemical data. But with the existence of these public data bases
containing billions of data entries, the need for a robust, analytical
approach in handling this data with respect to its biological significance
becomes paramount.
References:
Hooman H. Rashidi, Lukas K. buehler, Bioinformatics basics , Applications in
biological science and medicine , page no 1-33.
Imtiyaz Alam Khan,Elementary bioinformatics ,Page no. 1-4o
 K.Kasturi and K. sri lakshmi, Bioinformatics-A practical manual
www.ncbi.nlm.nih.gov
www.bioinformatics.org
www.valdo.com
www.pharmainfo.com
THANK U