Download Investigation of the role of expanded gene families

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transposable element wikipedia , lookup

Genomic library wikipedia , lookup

Epistasis wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Copy-number variation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene therapy wikipedia , lookup

Public health genomics wikipedia , lookup

Genome (book) wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

NEDD9 wikipedia , lookup

Gene expression programming wikipedia , lookup

Protein moonlighting wikipedia , lookup

Gene expression profiling wikipedia , lookup

The Selfish Gene wikipedia , lookup

Point mutation wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome editing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Mycobacterium tuberculosis, the causative agent of tuberculosis, continuously exists as
the leading infectious disease agent, causing millions of deaths each year. In addition, the
emergence of extremely drug resistant tuberculosis strains (XDR TB) indicates the
rebellious survival strategies adopted by the organism to continue its unfettered
pathogenic lead by mutating the drug targets. The ability of the organism to evolve
resistance to drugs with enhanced pathogenecity appears, at least in part, to be provided
by the mechanism of gene duplication. This evolutionary mechanism generates additional
DNA copies to add to the already existing genetic material, thereby providing the
organism with an extra copy of the gene and creating an opportunity to exploit one of the
copies for neofunctionalization. This project aims to identify the expanded gene families
in Mycobacterium tuberculosis and investigate the potential contribution of gene
duplication events to pathogenicity.
The availability of the complete genome sequence of Mycobacterium tuberculosis, strain
H37Rv, along with other microbial genomes provided us with an opportunity to compare
and find major differences in the expansion of gene families across different organisms.
For identification of gene duplicates in tuberculosis complex organisms, protein signature
and sequence data from 77 selected organisms were retrieved from the InterPro and
UniProtKB databases. Perl scripts were written for clustering the proteins with the same
protein signature data (common InterPro matches and thus common protein functions or
domains) into duplicate gene sets. The duplicate gene clusters were those that precisely
exhibit complete domain identity over their entire length. The proteins lacking identity
even in a single domain were excluded from duplicate gene clusters.
In addition to the InterPro data, complete protein sequences from each individual
organism were clustered into related sets by running BlastClust at different percentage
identities over varying lengths of the sequences. The proteins common to both InterPro
duplicate gene clusters and sequence based duplicate gene clusters were treated as
potential duplicate genes for each organism and considered for generation of a final gene
duplication matrix. The generated matrix clearly displays the degree of protein family
expansions across the different pathogenic and non pathogenic groups.
A preliminary analysis of the results brought into light many important duplicate gene
sets which have expanded only in the tuberculosis complex group and may be important
candidates involved in pathogenicity. Further, whole genome sequence clustering of
protein sequences across all the selected organisms is being performed to cross verify the
results and include the missing duplicate genes from the signature clustering methods
(not all proteins in a genome have InterPro matches and thus can be clustered using
signatures). The selected pathogenic duplicate gene sets which have expanded in
tuberculosis complex organisms and not in any other organism potentially contain
functions which could have expanded to impart a subfunctionalisation or
neofunctionalization benefit to the organism. Therefore, such functions may be playing
an important role in the organism’s pathogenicity and will be considered for further
evolutionary studies.