Download Protein sequence database

seminar on bioinformatics BY S.JHANSI RANI MPHARMACY II SEMESTER DEPARTMENT OF INDUSTRIAL PHARMACY UNIVERSITY COLLEGE OF PHARMACEUTICAL SCIENCES KAKATIYA UNIVERSITY WARANGAL Contents: Introduction Data bases DNA sequence data Biological data Molecular biology DNA and RNA Bioinformatics software Personalized medicine Single Nucleotide Polymorphism Molecular modelling Drug docking Applications Conclusion References BIOINFORMATICS  Bioinformatics has been defined as a means for analyzing, comparing, graphically displaying, modeling, storing, systemizing, searching, and ultimately distributing biological information, which includes sequences, structures and function.  Bioinformatics is a serious attempt to understand what it means when we say that genes code for physiological traits, like intelligence, brown hair, or susceptibility to cancer. DATABASES A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one view, databases can be classified according to type of content: bibliographic, full-text, numeric, and images. If the information is about biological data, such as nucleotide or protein sequence, it’s secondary or 3D-structure, metabolic pathways, microarray data and scientific publications etc., then the data base is called biological databases  A primary biological databases give information about sequence or structure information alone. Primary nucleotide sequence database: GenBank, EMBL, DDBJ Primary protein sequence database: PIR-PSD, Swiss-prot, TrEMBL. Primary structure database: PDB  Secondary structure database gives information on classification of proteins based on their structure. Secondary structure database: CATH, SCOP, DSSP, FSSP, DALI. Nucleotide sequence database: Genbank Protein sequence database: Swissprot Structure database: PDB GenBank-The nucleotide sequence database This is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. Genbank is part of the international nucleotide sequence database collaboration, which comprises the DNA databank of Japan (DDBJ), the European molecular biology laboratory (EMBL), and Genbank at NCBI. These three organization exchange data on daily basis. uniprotKB/Swiss-prot The protein sequence databases This is manually annotated protein knowledgebase established and maintained by the uniprot consortium, a collaboration between the swiss institute of Bioinformatics (SIB) and the department of bioinformatics and structural biology of the Geneva University,the European Bioinformatics institute (EBI) and the Georgetown University medical center’s protein information Resource (PIR). PDB-The structural database Protein databank  This is an international archive of 3D-structural information for biological macromolecules.  PDB is managed by the RCSB (research collaboratory for structural Bioinformatics)  Secondary structure describes its features in the molecule.In particular, it specifies the positions of alpha helices, beta sheets and turns in the protein specifies tertiary interactions such as disulphide bonds, hydrogen bonds and salt bridges. Accessing biological databases One can access Biological information from databases through Entrez SRS-Sequence retrieval system. DBJET Entrez The Entrez global query cross-database search system is a powerful federated search engine, or web portal that allows users to search many discrete health sciences databases at the NCBI website. Entrez can efficiently retrieve related sequences, structures, and references. This system can provide views of gene and protein sequences and chromosome maps. SRS (sequence retrieval system) is a system for integrating heterogenous databases. DBJET is an integrated database retrieval system for major biological databases. What Kinds of Information? Bioinformatics deals with any type of data that is of interest to biologists – DNA and protein sequences – Images of microarrays – Raw data collected from any type of field or laboratory experiment – Articles from the literature and databases of citations. COMPONENTS OF BIOINFORMATICS Creation of databases: This involves the organizing, storage and management The biological data sets. The databases are accessible to researchers to know the existing information and submit new entries. Development of algorithms and statistics: It involves the development of tools and resources to determine the relationship among the members of large data sets. e.g. comparison of protein sequence data with the already existing protein sequences. Analysis of data and interpretation: The appropriate use of databases to analyse the data and interpret the results in a biologically meaningful manner. Molecular Biology  signals are received at the cell surface, and travel eventually to the nucleus  Transcription factors cause the signal to be converted into a change in expression of a gene  The gene products are converted to proteins in the cytoplasm where they can now effect further changes in the cell. The Overview of DNA and RNA  Deoxyribonucleic acid (DNA) is a macromolecular chain of nucleotides that serves as a basic carrier of genetic information and is able to self-replicate. DNA can be represented as a sequence of nucleotide bases. DNA  DNA sequences are typically from thousands to millions of bases long. DNA usually consists of two strands of complementary nucleotide sequences that are base paired to each other. DNA in humans forms a linear chain, but DNA can also form a circular molecule.  A hypothetical double-stranded DNA molecule can be represented as ACGTGGTAGAGACCCTGTGTGATAGACCACGGGTA TGCACCATCTCTGGGACACACTATCTGGTGCCCAT As A pairs with T and C pairs with G and vice versa Here A - Adenine, C - Cytocine, G –guanine, T-Thymine RNA The RNA is the same as DNA with the exception that T is replaced by U, which represents uracil nucleotide. An organism is further classified into two types. Eukaryotes - higher-order organisms whose DNA is enclosed in a cell nucleus. E.g. humans Prokaryotes - organisms such as bacteria whose DNA is not enclosed in a nucleus. E.g. bacteria GENOMICS Genome complete set of genetic instructions for making an organism  Genomics attempts to analyze or compare the entire genetic content of species 3 billion chemical base pairs make up human DNA  There are about 30,000 genes  There are about 100,000 proteins Changes in a single base pair are responsible for many defects. Genomics Comparative Genomics: For understanding the genomes of different species of organisms Functional Genomics: Identification of genes and their respective functions Structural Genomic: Predictions related to functions of proteins BIOINFORMATICS SOFTWARE GCG  Genetics Computer Group  The Wisconsin Package for Sequence Analysis Consists of 130+ integrated programs  Web based, command-line and X window analysis SeqWeb       Database Searching and Retrieval for GCG Comparison Protein Analysis Mapping Pattern Recognition EMBOSS  EMBOSS is a site where you will find around 100 bioinformatics programs  Sequence alignment  Database search with sequence pattern  Protein motif identification RASMOL  RasMol is a molecular graphics program intended for the visualization of proteins, nucleic acids and small molecules.  It displays the molecule on the screen in a variety of color schemes and molecule representations  The loaded molecule can be shown as wireframe bonds, cylinder, stick bonds, space filling spheres, molecular ribbons.  BLAST  Basic Local Alignment Search Tool (BLAST)  Collection of Software Program Tool  Software version 2.1.13 offered by     National Center for Biotechnology Information (NCBI) at the National Institutes of Health Compares nucleotide or protein sequences to sequence databases Finds regions of local similarity between sequences Calculates the statistical significance of matches Helps infer functional relationships between sequences and identify members of gene families PERSONALISED MEDICINE  A lifelong, individually tailored health care approach to the detection, prevention and treatment of disease based on knowledge of an individual's precise genetic profile  The promise of pharmacogenomics is that both the choice of the drug and its dose will be determined by the individual genetic make up leading to the personalised, more efficacious and less harmful drug therapy. SINGLE NUCLEOTIDE POLYMORPHISM(SNP)  Scattered throughout the human genome are millions of discrete, one-letter variations known as SNPs.  Most SNPs are benign, with absolutely no effect on gene structure or expression.  But a subset of these variations provides crucial links to disease-causing genes, either because they directly alter a gene's activity or because they help pinpoint the location of such a disease-related gene.  SNPs are also found in genes for drug-metabolizing enzymes, influencing individuals' ability to process a drug properly.  The sequence of bases in DNA varies from person to person - resulting in the individual characteristic of every human being.  The SNPs are variations in DNA at a single base.  The SNPs which serve as genetic markers for identifying disease lead to personalized medicines for a wide variety of diseases. MOLECULAR MODELLING Cn3D It is a software from united states National Library of Medicine Used to view three-dimensional structures from NCBI’s Entrez retrieval service  It simultaneously displays structure sequence, and alignment. What sets Cn3D apart from other software is its ability to correlate structure and sequence information Ex: A scientist can quickly find the residues in a crystal structure that corresponding to known disease mutations, or conserved active site residues from a family of sequence homologs.  Cn3D display structure-structure alignments along with their structure based sequence alignments, to emphasize what regions of group of related proteins are most conserved in structure and sequence. Drug docking: In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when bound to each other to form a stable complex. Knowledge of the preferred orientation in turn may be used to predict the strength of association or binding affinity between two molecules. Docking is frequently used to predict the binding orientation of small molecule drug candidates to their protein targets in order to, inturn, predict the affinity and activity of the small molecule. Applications: Designing Drugs • Understanding of how Structures Bind Other Molecules (Function) • Designing of Inhibitors • Docking, Structure Modeling Conclusion: During the past decade there have been tremendous technical advances in the life and medical sciences. Nowhere have such advances been more dramatic than in the fields of genome sequencing and protein identification. Along with these advances has come a flood of genetic and biochemical data. But with the existence of these public data bases containing billions of data entries, the need for a robust, analytical approach in handling this data with respect to its biological significance becomes paramount. References: Hooman H. Rashidi, Lukas K. buehler, Bioinformatics basics , Applications in biological science and medicine , page no 1-33. Imtiyaz Alam Khan,Elementary bioinformatics ,Page no. 1-4o  K.Kasturi and K. sri lakshmi, Bioinformatics-A practical manual www.ncbi.nlm.nih.gov www.bioinformatics.org www.valdo.com www.pharmainfo.com THANK U

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Protein sequence database