Download Gilbert - C-MORE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Silencer (genetics) wikipedia , lookup

Expanded genetic code wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genetic code wikipedia , lookup

Genomic library wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Use Case Template (revised from EarthCube version 1.1)
Summary Information Section
Use Case Name
Using population genomes to analyse taxon specific functional constraints
Contact(s)
Jack A. Gilbert: [email protected]
Naseer Sangwan: [email protected]
Chris Marshall: [email protected]
Melissa Dsouza: [email protected]
Pamela Weisenhorn: [email protected]
Overarching Science Driver
To understand how translational fine-tuning shapes the microbial genome evolution in natural
environment
Science Objectives, Outcomes, and/or Measures of Success
(I) Create habitat specific database of population level orthologous genes with pre-calculated
metrics i.e. codon bias, dN/dS.
(ii) Create new workflows and analysis pipelines to compute codon bias and dN/dS values
across fragmented metagenome assemblies representing complex environments e.g.
soil/sediment
(iii) Create new normalization methods for accurate correlation between dN/dS and codon bias
values of population level genes
Key people and their roles
Jack A. Gilbert: Lead PI
Naseer Sangwan: Postdoctoral researcher
Chris Marshall: Postdoctoral researcher
Pamela B. Weisenhorn : Postdoctoral researcher
Melissa Dsouza: Postdoctoral researcher
Basic Flow
1. Quality trimming and de-novo assembly of shot-gun metagenome datasets
2. Binning Metagenome contigs into population genomes (pan-genomes)
3. Gene calling on contig bins representing population genomes
4. Identification of orthologous genes between population genomes
5. Cross validation of orthologous genes (i.e length cut-off, sequencing errors)
1
6. Calculating pairwise dN/dS and codon bias values
7. Normalization and calculation of pairwise correlation between dN/dS and codon bias
profiles
8. Demarcate & functionally characterize protein pairs w/ positive and/or negative selection
Critical Existing Cyberinfrastructure
o Alignable Tight Genome Clusters (ATGC) database of prokaryote genomes (has
genomes of cultured isolates)
o Integrated Microbial Genomes (IMG) (e.g. can be used to pull orthologous genes)
o MicroScope pipeline ( e.g. *has size limit for annotation*)
Critical Cyberinfrastructure Not in Existence
o Central database of population genomes i.e. reconstructed from metagenomes
o Unique algorithms for calculating codon bias and dN/dS across short protein
sequences.
o Accurate normalization method that can handle the average genome size variation
across populations
Activity Diagram
This can be targeted during the workshop
Problems/Challenges
1. How to acess the habitat specific gene pool information?
Recommendation : Create a comprehensive portal that can store such datasets.
2. High-throughput methods to screen orthologous genes across multipule population genomes
a. some methods exist, but they are specific for genome sequences of cultured micobes.
b. Recommendation: develop new methods or modify the existing methods to target the
genome bins represting mix of strains or species.
3. How to calculate accurate rate to evolution and codon bias on short protein sequences.
a. There are some methods but they are not validated for errors and bias caused during
metagenome data analysis e.g length variation, average genome size variation etc.
b. Recommendation: develop some new method to calculate and normalize the dN/dS
and codon bias profiles of population genomes. e.g consider the average genome size
variations.
References
-Ran W, Kristensen DM, Koonin EV. (2014). Coupling Between Protein Level Selection and
Codon Usage Optimization in the Evolution of Bacteria and Archaea. mBio 5:e00956–14.
-Nielsen, R. (2005). Molecular signatures of natural selection. Annu Rev Genet. 39:197-218.
Notes
2