* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download IGR-ANNOT: A Multiagent System for InterGenic - Inf
Whole genome sequencing wikipedia , lookup
Genomic imprinting wikipedia , lookup
Public health genomics wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Genomic library wikipedia , lookup
Human genome wikipedia , lookup
Genetic engineering wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Pathogenomics wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome editing wikipedia , lookup
Point mutation wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene desert wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
IGR-ANNOT: A Multiagent System for InterGenic Regions Annotation Sandro Camargo, João Valiati, Luis Otávio Álvares, Paulo Engel, Sergio Ceroni Introduction • The exponential growth of genomic data has led to an absolute requirement for computerized tools to analyze this data. • A new genome sequencing does not answer all questions about the organism. Progress is more likely to come from comparing the genomes of different organisms. Introduction • There are many tools and techniques to compare complete genomes and coding regions, but there is a lack for techniques for compare non-coding regions of DNA, which contains regulatory elements. • Many of the differences between species may be attributed to changes in the regulation of transcription and translation. • Transcription and translation are often regulated via elements that lie in intergenic regions. InterGenic Regions • Intergenic regions are defined as the sequence between the translational stop of a gene and translational start of the next gene. • For obtaining intergenic regions of an organism are necessary: – the complete genome of this organism (the nucleotides sequence) – the information about coding regions (start and stop positions, orientation, and name). InterGenic Regions • Our decision was to work with GenBank files because they contain all this necessary information for identifying coding regions, and this information will be used to infer the necessary information about intergenic regions. InterGenic Regions • The format design is based on a tabular approach and consists of the following items: – Feature Key: a single word or abbreviation indicating functional group; – Location: instructions for finding a feature; – Qualifiers: auxiliary information about a feature. InterGenic Regions Key Location/Qualifiers CDS 23..400 /product=“alcohol dehydrogenase” /gene="adhI" An example of a feature in the feature table. InterGenic Regions • InterGenic Regions naming conventions: IGR-O-G1-G2 where O = {F|R|B|X} depending on the previous and next gene orientations, and G1 and G2 are the names coding regions which intergenic regions contains regulatory information. InterGenic Regions • Intergenic regions will be written in the GenBank file format using the feature misc_feature. • According to the GenBank file format description, this feature key is used for annotate regions of biological interest which cannot be described by any other feature key. IGR-ANNOT Engineering Process • The multiagent approach is particularly attractive to this problem because: – information content is heterogeneous. – information can be distributed. – much of the annotation work for each gene can be done by different laboratories using different methodologies for annotate information about genes. • We have used MASE and AgentTool to modelling the agent. IGR-ANNOT Engineering Process • • • • • User Interface Agent (UIA) File Reader Agents (FRA) Gene Agents (GA) InterGenic Regions Agents (IGRA) File Writer Agents (FWA) IGR-ANNOT Engineering Process IGR-ANNOT Engineering Process • To implementing this architecture, we have used the Perl language, and it can be run on any suitable platform. • Perl have many features, like string manipulation facilities, that become it a very interesting language to working with DNA sequences, • besides there are complete packages to implementing multiagent systems. Results Discussion • We have extensively used IGR-ANNOT to creating intergenic regions annotation in several genomes of Mycoplasmataceae family. • To getting a graphical view of annotation created by our tool we have used the Artemis tool. • The next figures are presenting the Mycoplasma Hyopneumoniae 232 genome. Results Discussion Results Discussion Results Discussion Len1 Len2 %Idy Mhy Mhy232 458 458 99,34 IGR-FMP04451_oppB-1 IGR-R-oppB 345 346 99,42 IGR-FMP0611_MHP0054 IGR-F-mhp057 574 572 98,26 IGR-XMP07135_rpsOMP01224_MHP0106 IGR-X-mhp275rps15 307 316 93,99 IGR-XMP09826_MHP0309MP03567_baiH IGR-X-mhp321baiH Results Discussion Len1 Len2 %Idy Mhy Mhy232 1156 1157 98,02 IGR-RMP03198_MHP0344 IGR-R-mhp354 1037 1033 94,49 IGR-BMP18658_MHP0508MP05045_pdhC IGR-B-mhp502aceF 395 395 99,49 IGR-BMP07145_deoCMP12669_gyrA IGR-B-deoC-gyrA 528 543 96,69 IGR-F-MP02519_lgt IGR-R-lgt Conclusions • This system is now successfully in use by biologists at the UFRGS. • The result of IGR-ANNOT application provides an easy way to comparing intergenic regions among different organisms. • Although the positive results achieved until now in genomes of Mycoplasmataceae family, further tests will be performed, mainly using most complexes genomes. Future Works • Create an environment to InterGenic Regions comparison. • IGR-ANNOT will be available publicly to other biologists over the web at www.inf.ufrgs.br/~scamargo in software section.