* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download comparative genomics
Nucleic acid analogue wikipedia , lookup
Exome sequencing wikipedia , lookup
Gene expression wikipedia , lookup
Gene expression profiling wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA barcoding wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Molecular ecology wikipedia , lookup
Genomic library wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Non-coding DNA wikipedia , lookup
Community fingerprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
What is bioinformatics? Long Definition: The study of the application of computer and statistical techniques to the management of biological information, including development of methods to search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data. Short Definition: The management, analysis, and visualization of molecular, cellular, and genomic information. Molecular Biology Computational Biology Bioinformatics Computer Science Genomics Genomics-what is it? Development and application of genetic mapping, sequencing, and computation (bioinformatics) to analyze the genomes of organisms. Sub-fields of genomics: 1. Structural genomics-genetic and physical mapping of genomes. 2. Functional genomics-analysis of gene function (and non-genes). 3. Comparative genomics-comparison of genomes across species. Includes structural and functional genomics. Evolutionary genomics. COMPARATIVE GENOMICS Brief Review Definition A comparison of gene numbers , gene locations & biological functions of gene, in the genomes of different organisms, one objective being to identify groups of genes that play a unique biological role in a particular organism. Few Terminologies Homology :- Homology is the relationship of any two characters ( such as two proteins that have similar sequences ) that have descended, usually through divergence, from a common ancestral character. Homologues are thus components or characters (such as genes/proteins with similar sequences) that can be attributed to a common ancestor of the two organisms during evolution. Homologoues can either be orthologues, paralogues or xenologues. Orthologues are homologues that have evolved from a common ancestral gene by speciation. They usually have similar functions. Paralogues are homologues that are related or produced by duplication within a genome followed by subsequent divergence. They often have different functions. Xenologues are homologous that are related by an interspecies (horizontal transfer) of the genetic material for one of the homologues. The functions of the xenologues are quite often similar. Analogues Analogues are non-homologues genes/proteins that have descended convergently from an unrelated ancestor. They have similar functions although they are unrelated in either sequence or structure. Comparative Genomics Two very large problems are immediately apparent in undertaking the sequencing of entire genomes. First, the vast numbers of species and the much larger size of some genomes makes the entire sequencing of all genomes a non-optimal approach for understanding genome structure. Second, within a given species most individuals are genetically distinct in a number of ways. What does it actually mean, for example, to "sequence a human genome"? The genomes of two individuals who are genetically distinct differ with respect to DNA sequence by definition. These two problems, and the potential for other novel applications, have given rise to new approaches which, taken together, constitute the field of comparative genomics. Because all modern genomes have arisen from common ancestral genomes, the relationships between genomes can be studies with this fact in mind. This commonality means that information gained in one organism can have application in other even distantly related organisms. Comparative genomics enables the application of information gained from facile model systems to agricultural and medical problems. The nature and significance of differences between genomes also provides a powerful tool for determining the relationship between genotype and phenotype through comparative genomics and morphological and physiological studies. The Role of Bioinformatics in Identification of Drug Targets from Bacterial and Fungal Genomes Dr. Andrew E. DePristo, Director of Bioinformatics, Genome Therapeutics Corporation Bacterial genomes are appearing at an ever-increasing rate, with a September 1999 listing by NCBI indicating 16 completed, 10 being annotated, and 55 being sequenced. Fungal genomes and proteomes are less prevalent with one complete, a few nearly complete, and large collections of cDNA sequences available for about five organisms. This presentation will discuss use of this bacterial and fungal genomic diversity, along with high-throughput bioinformatics tools, to attach confidence to certain functional predictions and to allow identification and targeting of essential genes that are unique to specific organisms. Methods (WET) Introduction A DNA walk of a genome represents how the frequency of each nucleotide of a pairing nucleotide couple changes locally. This analysis implies measurement of the local distribution of Gs in the content of GC and of Ts in the content of TA. Lobry was the first to propose this analysis (1996, 1999). Two complementary representations can be derived from the DNA walk: the cumulative TA- and the GC-skew analysis. Aim: By reading these description of the algorithm, a reader not trained in genomics is able to redraw our graphs, using the basic genometric data file that is posted on our web resource for each organism as a zip file (.zip). 1) DNA walk 1.1) Drawing a DNA walk by reading a sequence file nucleotide by nucleotide. A simple algorithm is used to draw a DNA walk by simply assigning a direction to each nucleotide. We propose the following assignment, slightly different from Lobry's: to T, C, A, and G correspond the E(ast), S(outh), W(est), and N(orth) directions, respectively (Lobry, 1999). Reading the nucleotide sequence nucleotide by nucleotide, and following the rule, a path clearly emerges on the graph: Figure 1. Figure 1: DNA walk of the sequence GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACC CACCAGGGACCCAGGACCC Starting from the bottom left (bold blue line), the curve end at the bottom left (pink line) 1.2) Drawing a DNA walk by slicing a sequence file nucleotide into small windows A simple way to draw quickly this kind of graph is suggested by Lobry (1996) by cutting a genome into windows of equal length. Figure 2: DNA walk of the same sequence as the one presented in Figure 1: GTCTGGTGTCTGGAGTTCCT GGGTCTTGAGACCACAGGA CCCACCAGGGACCCAGGAC CC The sequence was sliced into 5-nucleotide windows. Only the fifth nucleotide per window is plotted. We can also work with the mean values of the window… Comment: this method is not as precise as the first one. We could use it with a spreadsheet software without affecting the final resolution of the curve at the genome level. 2) The cumulative TA- and the GC-skew analyses. 2.1) Drawing a cumulative TA- or a GC-skew analysis by reading a sequence file nucleotide by nucleotide. Cumulative TA-skew analysis: Assign to each nucleotide the following direction: to A, T, C, and G correspond the S, N, nd (no direction), and nd directions, respectively. On the graph, after the reading of one nucleotide, the pointer has to go one step eastward. If a A, or T, is read, a further step is added, southward, or northward, respectively. Cumulative GC-skew analysis: Assign to each nucleotide the following direction: to A, T, C, and G correspond the nd, nd, S, and N directions, respectively. On the graph, after reading one nucleotide, the pointer has to move one step eastward. If a C, or G, is read, a further step is added, southward, or northward, respectively. Methods (dry) Bioinformatics. Its tools (software) Computational analysis in drug target discovery Shannon entropy is a measure of variation or change over a time series.Genes that exhibit significant changes are regarded as good target candidates. Clustering is a method for grouping patterns by similarities in their shapes. GCG History (tools) Founded in 1982 as a service of the Department of Genetics at the University of Wisconsin, GCG became a private company in 1990 and was acquired by Oxford Molecular Group in 1997. The company was one of the pioneers of bioinformatics and its Wisconsin Package sequence analysis tools are widely used and well regarded throughout the pharmaceutical and biotechnology industries and in academia. To support enterprise bioinformatics efforts, GCG developed SeqStore, its Oracle-based data management system. Desktop solutions are delivered to bench scientists through products such as MacVector and OMIGA GCG Wisconsin Package Molecular biologists worldwide use the GCG® Wisconsin Package® as their software of choice for comprehensive sequence analysis. The Wisconsin Package meets research needs across disciplines, project teams, and labs to provide an enterprisewide solution. Based on published algorithms from the fields of mathematical and computational biology, the Package includes tools for: Comparison Database Searching and Retrieval DNA/RNA Secondary Structure Editing and Publication Evolution Fragment Assembly Gene Finding and Pattern Recognition Importing and Exporting Mapping Primer Selection Protein Analysis Translation PAUP* version 4.0 is a major upgrade and new release of the software package for inference of evolutionary trees, for use in Macintosh, Windows, UNIX/VMS, or DOS-based formats. The influence of highspeed computer analysis of molecular, morphological and/or behavioral data to infer phylogenetic relationships has expanded well beyond its central role in evolutionary biology, now encompassing applications in areas as diverse as conservation biology, ecology, and forensic studies. The success of previous versions of PAUP: Phylogenetic Analysis Using Parsimony has made it the most widely used software package for the inference of evolutionary trees Target Validation Target validation involves taking steps to prove that a DNA, RNA, or protein molecule is directly involved in a disease process and is therefore a suitable target for development of a new therapeutic compound. Genes that do not belong to an established family are critical to many disease processes and also need to be validated as potential drug targets. Target validation & identification Computer based Drug- design:- Beginning with the protein engineering and analysis tools we can identify and evaluate the target. Then, with that information we may attack the target with a variety of tools to identify new and novel drug candidates. The complete suite of software products provides for a seamless environment to work more efficiently & quickly. Target validation & identification Computational component analyzes genomic sequences resulting in 3D and functional annotations. Once annotated, sequences can be identified as potential drug targets for development. X-ray crystallography has become a central tool in modern drug and target discovery. These annotations, made from knowledge of predicted protein structure, are an important component in identifying potential targets, thereby facilitating successful and competitive drug discovery. Outcomes/ Benefits Provides “first pass” information on the function of the putative protein based on the existence of conserved protein sequence motifs. Advancements in computer software technologies (Bioinformatics) has made comparative analysis of genomes an extremely powerful approach for functional genomics too. These studies can also reveal insights into the recruitment of enzymes in a pathway Outcomes/ Benefits It will help us to understand the genetic basis of diversity in organisms, both speciation & variation, events that are important aspects of evolutionary biology. Comparative genomics provides a powerful way in which to analyze sequence data. Indeed, there is already a long list of 'model' organisms, which allow comparative analyses in a variety of ways. Outcomes/ Benefits The very small vertebrate genome of the pufferfish provides a simple and economical way of comparing sequence data from mammals and fish, representing a large evolutionary divergence and so permitting the identification of essential elements that are still present in both species. These elements include genes and the associated machinery that controls their expression; elements that, in many cases, have survived the test of time