Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dhananjay Bhole Project Student, Bioinformatics Centre University of Pune, Email: [email protected] Contact no. 9850123212 Topics to be covered Bioinformatics definition History Scope Goal Importance and limitations. Computer in biology and medicine What is computer Mini-frame computer and main frame computer. Application of computer in biology. Database concept Biological databases: Type of biological databases, DNA databases, protein databases What is genomics and proteomics. Human genome project: History and importance What is Bioinformatics Definition: Definition by The NIH Biomedical Information Science and Technology Initiative Consortium Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. History A succinct chronological landmark events on the development of bioinformatics First major bioinformatics project: Margaret Dayhoff in 1965, developed a first protein sequence database called Atlas of Protein Sequence and Structure In the early 1970s, the Brookhaven National Laboratory established the Protein Data Bank for archiving threedimensional protein structures The first sequence alignment algorithm: Needleman and Wunsch in 1970 The first protein structure prediction algorithm: Chou and Fasman in 1974 History cont… In early 1980s, GenBank was established… The fast database searching algorithms was developed such as FASTA by William Pearson and BLAST by Stephen Altschul and coworkers. The start of the human genome project in the late 1980s provided a major boost for the development of bioinformatics. The development and the increasingly widespread use of the Internet in the 1990s made instant access to, and exchange and dissemination of, biological data possible. History cont… Fundamental reason that bioinformatics gained prominence as a discipline was the advancement of genome studies that produced unprecedented amounts of biological data The explosion of genomic sequence information generated a sudden demand for efficient computational tools to manage and analyze the data. The development of computational tools depended on knowledge generated from a wide range of disciplines including mathematics, statistics, computer science, information technology, and molecular biology. The merger of these disciplines created an information-oriented field in biology, which is now known as bioinformatics. Scope Bioinformatics consists of two subfields: The development of computational tools and databases and the application of these tools and databases in generating Biological knowledge to better understand living systems. The subfields are complementary to each other. The tool development includes writing software for sequence, structural, and functional analysis, and the construction and curating of biological databases. Scope cont… These tools are used in three areas of genomic and molecular biological research: Molecular sequence analysis Molecular structural analysis Molecular functional analysis. The analyses of biological data often generate new problems and challenges that in turn spur the development of new and better computational tools. Scope cont… The areas of sequence analysis include sequence alignment, sequence database searching, motif and pattern discovery, gene and promoter finding, reconstruction of evolutionary relationships, and genome assembly and comparison. Structural analyses include protein and nucleic acid structure analysis, comparison, classification, and prediction. The functional analyses include gene expression profiling, protein–protein interaction prediction, protein subcellular localization prediction, metabolic pathway reconstruction, and simulation Goal The ultimate goal of bioinformatics is to better understand a living cell and how it functions at the molecular level. By analyzing raw molecular sequence and structural data, bioinformatics research can generate new insights and provide a “global” perspective of the cell. The functions of a cell can be better understood by analyzing sequence data. Cellular functions are mainly performed by proteins whose capabilities are ultimately determined by their sequences. Thus solving functional problems using sequence and sometimes structural approaches has proved to be a fruitful endeavor. Applications Apart from molecular biology, Bioinformatics is having a major impact on many areas of biotechnology and biomedical sciences. It has applications, for example, in knowledge-based drug design, forensic DNA analysis, and agricultural biotechnology. Computational studies of protein–ligand interactions provide a rational basis for the rapid identification of novel leads for synthetic drugs. Knowledge of the three-dimensional structures of proteins allows molecules to be designed that are capable of binding to the receptor site of a target protein with great affinity and specificity. Such informatics-based approach significantly reduces the time and cost necessary to develop drugs with higher potency, fewer side effects, and less toxicity. Application cont… In forensics, results from molecular phylogenetic analysis have been accepted as evidence in criminal courts. Example: Some sophisticated Bayesian statistics and likelihood-based methods for analysis of DNA have been applied in the analysis of forensic identity. It is worth mentioning that genomics and bioinformtics are now poised to revolutionize our healthcare system by developing personalized and customized medicine. Bioinformatics tools are being used in agriculture as well. Plant genome databases and gene expression profile analyses played an important role in the development of new crop varieties that have higher productivity and more resistance to disease. Limitations Bioinformatics depends on experimental science to produce raw data for analysis. Bioinformatics predictions are not formal proofs of any concepts. They do not replace the traditional experimental research methods of actually testing hypotheses. The quality of bioinformatics predictions depends on the quality of data and the sophistication of the algorithms being used. Bioinformatics is by no means a mature field. Most algorithms lack the capability and sophistication to truly reflect reality. They often make incorrect predictions that make no sense when placed in a biological context. Errors in sequence alignment, for example, can affect the outcome of structural or phylogenetic analysis. The outcome of computation also depends on the computing power available. Many accurate but exhaustive algorithms cannot be used because of the slow rate of computation. Instead, less accurate but faster algorithms have to be used. This is a necessary trade-off between accuracy and computational feasibility. Computer in biology and medicine What is computer : Computer is an automatic electronic device used to perform an arithmatic and logical operation. Types of computers: Micro computer, mini-frame computer and main frame computer work stations. Micro computer: A common small computer used for personal purpose . eg personal desk top or laptop computers. Miniframe computers: The larger computers or work stations used for commercial perpose eg servers in small computer lab. Operating systems and architectures is arose in the 1970s and 1980s, but minicomputers are generally not considered mainframes. Main frame computers: Mainframes (often colloquially referred to as Big Iron) are computers used mainly by large organizations for critical applications, typically bulk data processing such as census, industry and consumer statistics, ERP, and financial transaction processing. Most large-scale computer system architectures were firmly established in the 1960s. Application of computers in biology To store vast, diverse, and complex life sciences data To have fast and easy accessibility of biological data To make biological information more understandable and useful by using various visualization tools. To analyze biological data for addressing theoretical and experimental questions in biology by using mathematical and computational approaches. Basic database concept Any form of information whether on paper or in electronic form may refer to as data. any electronic file no matter what the format: database data, text, images, audio and video. Everything read and written by the computer can be considered data except for instructions in a program that are executed (software). The term data is the plural of "datum," which is one item of data. Technically, raw facts and figures, such as orders and payments, which are processed into information, such as balance due and quantity on hand. A common misconception is that software is also data. Software is executed, or run, by the computer. Data are "processed." Thus, software causes the computer to process data. Basic database concepts cont… What is Database? A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria. The chief objective of the development of a database is to organize data in a set of structured records to enable easy retrieval of information. Database management systems (DBMS) are collections of tools used to manage databases. Four basic functions performed by all DBMS are: Create, modify, and delete data structures, e.g. tables Add, modify, and delete data Retrieve data selectively Generate reports based on data Database components Databases are composed of related tables, while tables are composed of fields and records. Field: A field is an area (within a record) reserved for a specific piece of data. Examples: customer number, customer name, street address, city, state, phone, current balance. Fields are defined by: Field name Data type Character: text, including such things as telephone numbers and zip codes Numeric: numbers which can be manipulated using math operators Date: calendar dates which can be manipulated mathematically Database components cont… Size Amount of space reserved for storing data Record A record is the collection of values for all the fields pertaining to one entity: i.e. a person, product, company, transaction, etc. Table A table is a collection of related records. For example, employee table, product table, customer, and orders tables. In a table, records are represented by rows and fields are represented as columns. Relationships There are three types of relationships which can exist between tables: One-to-One One-to-Many Many-to-Many The most common relationships in relational databases are One-to-Many and Many-to-Many. Key Fields: In order for two tables to be related, they must share a common field. The common field (key field) in the "one" table of a One-to- Many relationship needs to be a primary key. The same field in the "many" table of a One-to-Many relationship is called the foreign key. Primary key: A Primary key is a field or a combination of two or more fields. The value in the primary key field for each record uniquely identifies that record. Foreign key: For the "many" records of the Order table, the foreign key identifies with which unique record in the Customer table they are associated. Biological databases Need: As the volume of genomic data grows, sophisticated computational methodologies are required to manage the data deluge. The very first challenge in the genomics era is to store and handle the staggering volume of information through the establishment and use of computer databases. Biological database is the development of databases to handle the vast amount of molecular biological data, which is a fundamental task of bioinformatics. Types of biological databases Overview: There are over 1,000 public and commercial biological databases. These biological databases usually contain genomics and proteomics data. Databases are also used in taxonomy. The data are nucleotide sequences of genes or amino acid sequences of proteins. Also contain information about function, structure, localisation on chromosome, clinical effects of mutations as well as similarities of biological sequences. Biological databases cont… Biological databases are generally of 3 types: Sequence databases Structure databases Functional databases. Further the databases can be classified as dna databases and protein databases Biological databases cont… Most important public databases for molecular biology: Primary sequence databases Meta-databases Genome Browsers Specialized databases Expression, regulation & pathways databases Protein sequence databases Protein structure databases Microarray-databases Protein-Protein Interactions Reference: Most important public databases for molecular biology from http://www.kokocinski.net/bioinformatics/databases.php DNA sequence databases Some well known DNA sequence databases: NCBI EMBL DDBJ. NCBI: National centre for biotechnology information developed by national library of medicine, National institute of helth USA. Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease. DNA sequence databases cont… EMBL: European molecular biology laboratory. Developed by European bioinformatics institute Heidelberg Germany It also archives up to date and detail information about biological macro molecules such as nucleotide sequences and protein sequences. DNA sequence databases cont… DDBJ: (DNA Data Bank of Japan) Began DNA data bank activities in earnest in 1986 at the National Institute of Genetics (NIG) with the endorsement of the Ministry of Education, Science, Sport and Culture. It also provide worldwide many tools for data retrieval and analysis developed by at DDBJ and others. DNA sequence databases cont… Database collaboration: NCBI, EMBL and DDBJ are collaborated internationally for exchange of data and information on Internet and by regularly holding two meetings, the International DNA Data Banks Advisory Meeting and the International DNA Data Banks Collaborative Meeting. The three data banks share virtually the same data at any given time. Protein sequence databases Protein sequence databases: swis prot, tremble. Swiss-Prot: it is a manually curated biological database of protein sequences. Swiss-Prot was created in 1986 by Amos Bairoch during his PhD and developed by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute. Swiss-Prot strives to provide reliable protein sequences associated with a high level of annotation such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc. The UniProt consortium was created: it is a collaboration between the Swiss Institute of Bioinformatics, the European Bioinfomatics Institute and the Protein Information Resource (PIR), thus these protein database produce the UniProt Knowledgebase, the world's most comprehensive catalogue of information on proteins. Protein sequence database cont… Tremble: Translated nucleotide sequence database of European molecular biology laboratory. The database also archive the same kind of information as that of swis prot. Genomics and proteomics What is gene ,genome and genomics? Gene : A segment of dna or chromosome responsible for coding one or more functional protein. Genome: The genome is the gene complement of an organism. A genome sequence comprises the information of the entire genetic material of an organism. Genomics: the science deals with the study of entire genome, gene organization such as gene order, gene arrangement, gene ontology etc The goal of Genomics is to determine the complete DNA sequence for all the genetic material contained in an organism's complete genome. Structural genomics and functional genomics Structural Genomics: is the systematic effort to gain a complete structural description of a defined set of molecules, ultimately for an organism’s entire proteome. Structural genomics projects apply X-ray crystallography and NMR spectroscopy in a high-throughput manner. It also applies bioinformatics or incilico approach to solve structures of nucleic acids and proteins. Functional genomics: is the aims at determining the function of the proteome (the protein complement encoded by an organism's entire genome). It expands the scope of biological investigation from studying single genes or proteins to studying all genes or proteins at once in a systematic fashion. uses large-scale experimental methodologies combined with statistical analysis of the results. Comparitive genomics Comparative genomics: is the analysis and comparison of genomes from different species to gain a better understanding of how species have evolved and to determine the function of genes and noncoding regions of the genome. . Genome researchers look at many different features when comparing genomes: sequence similarity, gene location, the length and number of coding regions (called exons) within genes, the amount of noncoding DNA in each genome, and highly conserved regions maintained in organisms as simple as bacteria and as complex as humans. Comparative genomics involves the use of computer programs that can line up multiple genomes and look for regions of similarity among them. Eg blast, phylif etc Comparative genomics is applied to study phylogenetic relationship and evolution of different organisms. Proteom and proteomics Proteom: The Proteome is the protein complement expressed by a genome. While the genome is static, the proteome continually changes in response to external and internal events. Proteomics: The study of how the entire set of proteins produced by a particular organism interact It encompasses the identification and quantification of proteins, and the effect of their modifications, interactions, activities, and function, during disease states, and treatment. Human genome project What was the Human Genome Project? The Human Genome Project (HGP) was the international, collaborative research program whose goal was the complete mapping and understanding of all the genes of human beings. The HGP was the natural culmination of the history of genetics In HGP researchers have deciphered the human genome in three major ways: 1. determining the order, or "sequence," of all the bases in our genome's DNA. 2. making maps that show the locations of genes for major sections of all our chromosomes. 3. and producing what are called linkage maps, complex versions of the type originated in early Drosophila research, through which inherited traits (such as those for genetic disease) can be tracked over generations. Human genome project cont The HGP has revealed that there are probably somewhere between 30,000 and 40,000 human genes. The completed human sequence can now identify their locations. This ultimate product of the HGP has given the world a resource of detailed information about the structure, organization and function of the complete set of human genes. The International Human Genome Sequencing Consortium published the first draft of the human genome in the journal Nature in February 2001 with the sequence of the entire genome's three billion base pairs some 90 percent complete. The full sequence was completed and published in April 2003. from the outset. Another major component of the HGP - and an ongoing component of NHGRI - is therefore devoted to the analysis of the ethical, legal and social implications (ELSI) of our newfound genetic knowledge, and the subsequent development of policy options for public consideration. Techniques in HGP The tools created through the HGP also help to characterize the entire genomes of several other organisms used extensively in biological research, such as mice, fruit flies and flatworms. These efforts support each other, because most organisms have many similar, or "homologous," genes with similar functions. These techniques include: DNA Sequencing The Employment of Restriction Fragment-Length Polymorphisms (RFLP) Yeast Artificial Chromosomes (YAC) Bacterial Artificial Chromosomes (BAC) The Polymerase Chain Reaction (PCR) Electrophoresis