Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics The Genome The hereditary information that an organism passes to its offspring is represented in each of its cells. The representation is in the form of DNA molecules. The totality of this information is called the genome of the organism. In humans the genome consists of nucleotides. The genome is the totality of DNA stored in chromosomes typical of each species. The genome contains most of the information needed to specify an organism’s properties. Genetics: Genetics is the study of heredity. Proteins: A protein is a very large biological molecule composed of a chain of smaller molecules called amino acids. DNA DNA was discovered in 1869. Most of the DNA in cells is contained in the chromosomes. DNA is chemically very different from protein. DNA is structuredas a double helix consisting of two long strands that wind around a common axis. Each strand is a very long chain of nucleotides of four types, A,C, T and G. The linear ordering of the nucleotides determines the genetic information A major task of molecular biology is to: Extract the information contained in the genomes of different organisms; Elucidate the structure of the genome; Apply this knowledge to the diagnosis and ultimately, treatment, of genetic diseases (about 4000 such diseases in humans have been identified); By comparing the genomes of different species, explain the process and mechanisms of evolution. These tasks require the invention of new algorithms. Bioinformatics supports in all the above objectives Biopolymer: A macromolecule in a living organism that is formed by linking together several smaller molecules, as a protein from amino acids or DNA from nucleotides. Sequencing: sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule. Eg, DNA sequencing, RNA sequencing, Protein sequencing. 1 The Human Genome Project (HGP) is an international scientific research project with the goal of determining the sequence of chemical base pairs which make up human DNA, and of identifying and mapping all of the genes of the human genome from both a physical and functional standpoint. It remains the world's largest collaborative biological project. The Human Genome Project (HGP) was the international, collaborative research program whose goal was the complete mapping and understanding of all the genes of human beings. All our genes together are known as our "genome." The HGP was the natural culmination of the history of genetics research. The HGP has revealed that there are about 20,500 human genes. The completed human sequence can now identify their locations. This ultimate product of the HGP has given the world a resource of detailed information about the structure, organization and function of the complete set of human genes. When the Human Genome Project was begun in 1990 it was understood that to meet the project's goals, the speed of DNA sequencing would have to increase and the cost would have to come down. Over the life of the project virtually every aspect of DNA sequencing was improved. It took the project approximately four years to sequence its first one billion bases but just four months to sequence the second billion bases. During the month of January, 2003, 1.5 billion bases were sequenced. As the speed of DNA sequencing increased, the cost decreased from 10 dollars per base in 1990 to 10 cents per base at the conclusion of the project in April 2003. Although the Human Genome Project is officially over, improvements in DNA sequencing continue to be made. Researchers are experimenting with new methods for sequencing DNA that have the potential to sequence a human genome in just a matter of weeks for a few thousand dollars. DNA sequencing performed on an industrial scale has produced a vast amount of data to analyze. In August 2005 it was announced that the three largest public collections of DNA and RNA sequences together store one hundred billion bases, representing over 165,000 different organisms. As sequence data began to pile up, the need for new and better methods of sequence analysis was critical. Bioinformatics is the branch of biology that is concerned with the acquisition, storage, and analysis of the information found in nucleic acid and protein sequence data. Computers and bioinformatics software are the tools of the trade. 2 Genetic data represent a treasure trove for researchers and companies interested in how genes contribute to our health and well being. Almost half of the genes identified by the Human Genome Project have no known function. Researchers are using bioinformatics to identify genes, establish their functions, and develop gene-based strategies for preventing, diagnosing, and treating disease. A DNA sequencing reaction produces a sequence that is several hundred bases long. Gene sequences typically run for thousands of bases. The largest known gene is that associated with Duchenne muscular dystrophy. It is approximately 2.4 million bases in length. In order to study genes, scientists first assemble long DNA sequences from series of shorter overlapping sequences. Scientists enter their assembled sequences into genetic databases so that other scientists may use the data. Since the sequences of the two DNA strands are complementary, it is only necessary to enter the sequence of one DNA strand into a database. By selecting an appropriate computer program, scientists can use sequence data to look for genes, get clues to gene functions, examine genetic variation, and explore evolutionary relationships. Bioinformatics is a young and dynamic science. New bioinformatic software is being developed while existing software is continually updated. BIOINFORMATICS: Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of the sequences and structural information as well methods to access, search, visualize and retrieve the information. Sequence data can be used to make predictions of the functions of newly identified genes,estimate evolutionary distance in phylogeny reconstruction, determine the active sites of enzymes, construct novel mutations and characterize alleles of genetic diseases to name just a few uses. Sequence data facilitates: Analysis of the organization of genes and genomes and their evolution Protein sequence can be predicted from DNA sequence which further facilitates possible prediction of protein properties, structure, and function (proteins rarely sequenced in entirety today) 3 Identification of regulatory elements in genes or RNAs Identification of mutations thatlead to disease, etc. Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. There are three important sub-disciplines within bioinformatics involving computational biology: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of dataincluding nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information. One of the simpler tasks used in bioinformatics concern the creation and maintenance of databases of biological information. Nucleic acid sequences (and the protein sequences derived from them) comprise the majority ofsuch databases. While the storage and or ganization of millions of nucleotides is far from trivial, designing a database and developing an interface whereby researchers can both access existing information and submit new entries is only the beginning. The most pressing tasks in bioinformatics involve the analysis of sequence information. Computational Biologyis the name given to this process, and it involves the following: Finding the genes in the DNA sequences of various organisms Developing methods to predict the structure and/or function of newly discovered proteins and structural RNA sequences. Clustering protein sequences into families of related sequences and the development of protein models. Aligning similar proteins and generating phylogenetic trees to examine evolutionary relationships. 4 Data-mining is the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms. For example, new insight into the molecular basis of a disease may come from investigating the function of homologs of the disease gene in model organisms. Equally exciting is the potential for uncovering phylogenetic relationships and evolutionary patterns.The process of evolution has produced DNA sequences that encode proteins with very specific functions. It is possible to predict the three-dimensional structure of a protein using algorithms that have been derived fromour knowledge of physics, chemistry and most importantly, from the analysis of other proteins with similar amino acid sequences. Definition of Bioinformatics Roughly, bioinformatics describes any use of computers to handle biological information. In practice the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular biology"- the use of computers to characterize the molecular components of living things. "Classical" bioinformatics: "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequencesand related information.” The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bioinformatics is the field ofscience in which biology, computer science, and information technology merge into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information." 5 Introduction Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine. Bioinformatics and computational biology involve the use or development of techniques including applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. The core principle of these techniques is using computing resources in order to solve problems on scales of magnitude far too great for human discernment. Research in computational biology often overlaps with systems biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and proteinprotein interactions, and the modeling of evolution. Bioinformatics has evolved into a full-fledged multidisciplinary subject that integrates developments in information and computer technology as applied to Biotechnology and Biological Sciences. Bioinformatics uses computer software tools for database creation, data management, data warehousing, data mining and global communication networking. Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of the sequences and structural information as well methods to access, search, visualize and retrieve the information. Bioinformatics concern the creation and maintenance of databases of biological information whereby researchers can both access existing information and submit new entries. Function genomics, biomolecular structure, proteome analysis, cell metabolism, biodiversity, downstream processing in chemical engineering, drug and vaccine design are some of the areas in which 6 Bioinformatics is an integral component. Sub-disciplines within bioinformatics There are three important sub-disciplines within bioinformatics involving computational biology: The development of new algorithms and statistics with which to assess relationships among members of large data sets The analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures and The development and implementation of tools that enable efficient access and management of different types of information Activities in bioinformatics We can split the activities in bioinformatics in two areas (1) the organization and (2) the analysis of biological data. Analysis activity in Bioinformatics Organization activity in Bioinformatics - The creation of databases of - Development of methods to predict the structure biological information - The maintenance databases and/or function of newly discovered proteins and of these structural RNA1 sequences. - Clustering protein sequences into families of related sequences and the development of protein models. - Aligning phylogenetic similar trees proteins to and examine generating evolutionary relationships Aims of Bioinformatics: 1 RiboNucleic Acid. a long linear polymer of nucleotides found in the nucleus but mainly in the cytoplasm of a cell where it is associated with microsomes; it transmits genetic information from DNA to the cytoplasm and controls certain chemical processes in the cell 7 The aims of bioinformatics are basically three-fold. They are Organization of data in such a way that it allows researchers to access existing information & to submit new entries as they are produced. While data-creation is an essential task, the information stored in these databases is useless unless analysed. Thus the purpose of bioinformatics extends well beyond mere volume control. To develop tools and resources that help in the analysis of data. For example, having sequenced a particular protein, it is with previously characterized sequences. This requires more than just a straightforward database search. As such, programs such as BLAST much consider what constitutes a biologically significant resemblance. Development of such resources extensive knowledge of computational theory, as well as a thorough understanding of biology. Use of these tools to analyse the individual systems in detail, and frequently compared them with few that are related. Bioinformatics and its scope: Bioinformatics uses advances in the area of computer science, information science, computer and information technology, communication technology to solve complex problems in life sciences and particularly in biotechnology. Data capture, data warehousing and data mining have become major issues for biotechnologists and biological scientists due to sudden growth in quantitative data in biology such as complete genomes of biological species including human genome, protein sequences, protein 3-D structures, metabolic pathways databases, cell line & hybridoma information, biodiversity related information. Advancements in information technology, particularly the Internet, are being used to gather and access ever-increasing information in biology and biotechnology. Functional genomics, proteomics, discovery of new drugs and vaccines, molecular diagnostic kits and pharmacogenomics are some of the areas in which bioinformatics has become an integral part of Research & Development. The knowledge of multimedia databases, tools to carry out data analysis and modeling of molecules and biological systems on computer workstations as well as in a network environment has become essential for any student of Bioinformatics. Bioinformatics, the multidisciplinary area, has grown so much that one divides it into 8 molecular bioinformatics, organal bioinformatics and species bioinformatics. Issues related to biodiversity and environment, cloning of higher animals such as Dolly and Polly, tissue culture and cloning of plants have brought out that Bioinformatics is not only a support branch of science but is also a subject that directs future course of research in biotechnology and life sciences. The importance and usefulness of Bioinformatics is realized in last few years by many industries. Therefore, large Bioinformatics R & D divisions are being established in many pharmaceutical companies, biotechnology companies and even in other conventional industry dealing with biological. Bioinformatics is thus rated as number one career in the field of biosciences. In short, Bioinformatics deals with database creation, data analysis and modeling. Data capturing is done not only from printed material but also from network resources. Databases in biology are generally in the multimedia form organized in relational database model. Modeling is done not only on single biological molecule but also on multiple systems thus requiring a use of high performance computing systems. The Potential of Bioinformatics: The potential of Bioinformatics in the identification of useful genes leading to the development of new gene products, drug discovery and drug development has led to a paradigm shift in biology and biotechnology-these fields are becoming more & more computationally intensive. The new paradigm, now emerging, is that all the genes will be known "in the sense of being resident in database available electronically", and the starting point of biological investigation will be theoretical and a scientist will begin with a theoretical conjecture and only then turning to experiment to follow or test the hypothesis. With a much deep understanding of the biological processes at the molecular level, the Bioinformatics scientist have developed new techniques to analyse genes on an industrial scale resulting in a new area of science known as 'Genomics'. The shift from gene biology has resulted in the development of strategies-from lab techniques to computer programmes to analyse whole batch of genes at once. Genomics is revolutionizing drug development, gene therapy, and our entire approach to health care and human medicine. The genomic discoveries are getting translated in to practical biomedical results through Bioinformatics applications. Work on proteomics and genomics will continue using highly sophisticated software tools and data networks that can carry multimedia databases. 9 Thus, the research will be in the development of multimedia databases in various areas of life sciences and biotechnology. There will be an urgent need for development of software tools for data mining, analysis and modelling, and downstream processing. Security of data, data transfer and data compression, auto checks on data accuracy and correctness will also be major research area of bioinformatics. The use of virtual Reality in drug design, metabolic pathway design, and unicellular organism design, paving the way to design and modification of muticellular organisms, will be the challenges challenges which Bioinformatics scientist and specialist have to tackle. It has now been universally recognized that Bioinformatics is the key to the new grand data-intensive molecular biology that will take us into 21 century. Bioinformatics - Industry Overview The Bioinformatics industry has grown to keep up with the information explosion, growing at 25-50% a year. In 2000, the US market Research company Oscar Gruss estimated that the value of the Bioinformatics industry would touch $2 billion. Now it s demand for individuals capable of doing bioinformatics is soaring. Industry's demand for scientists with skills in Bioinformatics far exceeds the supply of qualified specialists in the field, Seems likely that this figure will be reached within the coming year. Therefore, companies are developing methods of spotting potential Bioinformatics experts and then training them on the job. Bioinformatics and drug discovery: In recent years, we have seen an explosion in the amount of biological information that is available. Various databases are doubling in size every 15 months and we now have the complete genome sequences of more than 100 organisms. It appears that the ability to generate vast quantities of data has surpassed the ability to use this data meaningfully. The pharmaceutical industry has embraced genomics as a source of drug targets. It also recognises that the field of bioinformatics is crucial for validating these potential drug targets and for determining which ones are the most suitable for entering the drug development pipeline. Recently, there has been a change in the way that medicines are being developed due to our increased understanding of molecular biology. In the past, new synthetic organic 10 molecules were tested in animals or in whole organ preparations. This has been replaced with a molecular target approach in which in-vitro screening of compounds against purified, recombinant proteins or genetically modified cell lines is carried out with a high throughput. This change has come about as a consequence of better and ever improving knowledge of the molecular basis of disease. All marketed drugs today target only about 500 gene products. The elucidation of the human genome which has an estimated round 30,000 genes, presents immense new opportunities for drug discovery and simultaneously creates a potential bottleneck regarding the choice of targets to support the drug discovery pipeline. The major advances in genomics and sequencing means that finding an attractive target is no longer a problem but finding the targets that are most likely to succeed has become the challenge. The focus of bioinformatics in the drug discovery process has therefore shifted from target identification to target validation. The accumulation of this information into databases about potential targets means that the pharmaceutical companies can save themselves much time, effort and expense exerting bench efforts on targets that will ultimately fail. The information that is gathered helps to characterise the different targets into families and subfamilies. It also classifies the behaviour of the different molecules in a biochemical and cellular context. Bioinformatics and computational biology: Bioinformatics and computational biology each maintain close interactions with life sciences to realize their full potential. Bioinformatics applies principles of information sciences and technologies to make the vast, diverse, and complex life sciences data more understandable and useful. Computational biology uses mathematical and computational approaches to address theoretical and experimental questions in biology. Although bioinformatics and computational biology are distinct, there is also significant overlap and activity at their interface. Biocomputing Biocomputing is often used as a catch-all term covering all this area at the intersection of Biology and Computation , although many other terms are used to name the same area. We can distinguish in to (non-disjoint) sub-fields: 11 Bioinformatics - this includes management of biological databases, data mining and data modeling, as well as IT-tools for data visualization Computational Biology - this includes efforts to solve biological problems with computational tools (such as modeling, algorithms, heuristics) DNA2 computing and nano-engineering - this includes models and experiments to use DNA (and other) molecules to perform computations Computations in living organisms - this is concerned with constructing computational components in living cells, as well as with studying computational processes taking place daily in living organisms Computational Biology Computational Biology is application of core technology of computer science (eg. algorithms, artificial intelligence, databases etc) to problems arising from biology. Computational biology is particularly exciting today because the problems are large enough to motivate the efficient algorithms and moreover the demand of biology on computational science is increasing. The most pressing tasks in bioinformatics involve the analysis of sequence information. Computational Biology is the name given to this process, and it involves the following: Finding the genes in the DNA sequences of various organisms Developing methods to predict the structure and/or function of newly discovered proteins and structural RNA sequences. Clustering protein sequences into families of related sequences and the development of protein models. Aligning similar proteins and generating phylogenetic trees to examine evolutionary relationships. Conclusion: The ultimate goal of bioinformatics is to uncover the wealth of biological information hidden in the mass of sequence, structure, literature and other biological data and obtain a clearer insight into the fundamental biology of organisms and to use this information to 2 DesoxyriboNucleic Acid. a long linear polymer found in the nucleus of a cell and formed from nucleotides and shaped like a double helix; associated with the transmission of genetic information. "DNA is the king of molecules" 12 enhance the standard of life for mankind. It is being used now and in the foreseable future in the areas of molecular medicine to help produce better and more customised medicines to prevent or cure diseases, it has environmental benefits in, identifying waste cleanup bacteria and in agriculture it can be used for producing high yield low maintenance crops. These are just a few of the many benefits bioinformatics will help develop. The influence of genomics and bioinformatics will not only influence science. It will influence the society in may ways. From crop cultivation and food production to health care and life insurance. From crime investigation and personal identification to computer chip fabrication and genetic modification law development. 13