* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Investigation of the role of expanded gene families
Transposable element wikipedia , lookup
Genomic library wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Copy-number variation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Genome (book) wikipedia , lookup
Pathogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression programming wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gene expression profiling wikipedia , lookup
The Selfish Gene wikipedia , lookup
Point mutation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genetic engineering wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Mycobacterium tuberculosis, the causative agent of tuberculosis, continuously exists as the leading infectious disease agent, causing millions of deaths each year. In addition, the emergence of extremely drug resistant tuberculosis strains (XDR TB) indicates the rebellious survival strategies adopted by the organism to continue its unfettered pathogenic lead by mutating the drug targets. The ability of the organism to evolve resistance to drugs with enhanced pathogenecity appears, at least in part, to be provided by the mechanism of gene duplication. This evolutionary mechanism generates additional DNA copies to add to the already existing genetic material, thereby providing the organism with an extra copy of the gene and creating an opportunity to exploit one of the copies for neofunctionalization. This project aims to identify the expanded gene families in Mycobacterium tuberculosis and investigate the potential contribution of gene duplication events to pathogenicity. The availability of the complete genome sequence of Mycobacterium tuberculosis, strain H37Rv, along with other microbial genomes provided us with an opportunity to compare and find major differences in the expansion of gene families across different organisms. For identification of gene duplicates in tuberculosis complex organisms, protein signature and sequence data from 77 selected organisms were retrieved from the InterPro and UniProtKB databases. Perl scripts were written for clustering the proteins with the same protein signature data (common InterPro matches and thus common protein functions or domains) into duplicate gene sets. The duplicate gene clusters were those that precisely exhibit complete domain identity over their entire length. The proteins lacking identity even in a single domain were excluded from duplicate gene clusters. In addition to the InterPro data, complete protein sequences from each individual organism were clustered into related sets by running BlastClust at different percentage identities over varying lengths of the sequences. The proteins common to both InterPro duplicate gene clusters and sequence based duplicate gene clusters were treated as potential duplicate genes for each organism and considered for generation of a final gene duplication matrix. The generated matrix clearly displays the degree of protein family expansions across the different pathogenic and non pathogenic groups. A preliminary analysis of the results brought into light many important duplicate gene sets which have expanded only in the tuberculosis complex group and may be important candidates involved in pathogenicity. Further, whole genome sequence clustering of protein sequences across all the selected organisms is being performed to cross verify the results and include the missing duplicate genes from the signature clustering methods (not all proteins in a genome have InterPro matches and thus can be clustered using signatures). The selected pathogenic duplicate gene sets which have expanded in tuberculosis complex organisms and not in any other organism potentially contain functions which could have expanded to impart a subfunctionalisation or neofunctionalization benefit to the organism. Therefore, such functions may be playing an important role in the organism’s pathogenicity and will be considered for further evolutionary studies.