Download Analysing thousands of bacterial genomes: gene annotation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA barcoding wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genetic engineering wikipedia , lookup

Koinophilia wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

History of genetic engineering wikipedia , lookup

Minimal genome wikipedia , lookup

Genomic library wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Metagenomics wikipedia , lookup

Human microbiota wikipedia , lookup

Microevolution wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
ALLBIO training workshop
Analysing thousands of bacterial
genomes: gene annotation,
metabolism, regulation
Marseille, June 22-25, 2014
Course book
CONTENTS
RSAT TUTORIALS..............................................................................................4
1.Phylogenetic profiling: co-occurrence between genes across genomes 4
Test case 1: Escherichia coli..................................................................4
Motivation...........................................................................................4
Protocol...............................................................................................4
2.References.............................................................................................6
2/12
RSAT
TUTORIALS
1. Phylogenetic profiling: co-occurrence between genes
across genomes
Test case 1: Escherichia coli
Motivation
As a first study case, we will generate and analyse phylogenetic profiles from
Escherichia coli, because this is one of the well characterized bacterial genomes.
Protocol
Connect to pedagogix server through putty (only Windows users):

Start the putty software. You should get a windows as shown below.

Type 139.124.66.4 in the option Host Name (or IP adress).

Press Open.

You will be prompted for a login. Enter your login (you should have received
one by mail). Press the Return key from your keyboard.

You will then be prompted for a password, and have to confirm that you trust
the host key of the server. Accept it by typing 'Y'.
Make a directory to store phylogenetic profile analysis
mkdir Ecoli
##
Change directory to Ecoli
cd Ecoli
Main command:
phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon
Bacteria -depth [#] -task [species|bbh|profiles|all]
OPTIONS:
-v: verbosity, display varying levels of status messages while programe is running.
-org: the organism name for which you want to construct the phylogenetic profile.
-taxon: selecting the taxon for generating phylogenetic profiles.
-depth: depth of taxonomic hierarchy. In order to reduce the redundancy due to the
stronger representation of some taxonomic groups, this parameter allows selection of
subset of species from the ~2.500 bacterial genomes currently available on RSAT,
(ex: many strains for E.coli).
-task: selecting task (species, bbh profiles, network) to perform for selected
organism.
1) -task species
Selection of the genomes/organisms for creating phylogenetic profiles:
phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon
Bacteria -depth 5 -task species
Output: List of species selected for phylogenetic profile.
ls results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacteria_depth5
Explore the exported table
more
results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacteria_depth5/Esche
richia_coli_K_12_substr__MG1655_uid57779_Bacteria_depth5_selected_species.tab
Command to count the Number of selected species at different taxonomic
depth :
grep -v '^;\|#'
results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacter
ia_depth5/Escherichia_coli_K_12_substr__MG1655_uid57779_Bacteria_depth5_sel
ected_species.tab | wc -l
-depth
Number of selected
species
2
55
3
116
4
207
5
368
6
657
Interpretation: this shows that with increase in –depth level, you select more number
of species for phylogenetic profiles which sometime leads to over-representation of
particular taxon, hence it may bias the result. Increase in –depth level also increases
the program execution time.
2) -task BBH
To define orthologs for genes of your genome of interest (in this example
Escherichia_coli_K_12_substr__MG1655_uid57779), we use bbh (Bi-directional
Best Hit) approach. BBH detect all bi-directional best hits (from genome-blast result)
between each protein of the query genome and each genome of the reference taxon.
The result is a table with one row per bbh, with different scores.
Create the phylogenetic profile
phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon
Bacteria -depth 5 -task bbh
Read the phylogenetic profile
more
results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacter
ia_depth5/Escherichia_coli_K_12_substr__MG1655_uid57779_Bacteria_depth5_bb
h_len50_id10_bits_e1e-05.tab
3) -task profiles
This task convert BBH file to phylogenetic profile tables. A phylogenetic profile table
contains 1 row per protein-coding gene of the query organism, and one column per
genome of the reference taxon. This program exports a series of separate
phylogenetic profiles whose cell values indicate various statistics.
Create the phylogenetic profile
phylo-profiles -v 1 –org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon
Bacteria -depth 5 -task profiles
List the files
ls results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacteria_depth5/
Visualizing and clustering phylogenetic profiles:
3.1)
Download GENE-E from :
https://www.broadinstitute.org/cancer/software/GENE-E/download.html.
3.2)
Open GENE-E
3.2.1) File->open File->
3.2.2) Select
results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs
_Bacteria_depth5/Escherichia_coli_K_12_substr__MG1655_uid57779_Bacter
ia_depth5_profiles_len50_id10_e1e-05_boolean_names.tab.
3.2.3) there will be one pope up window (shown below):
Click on the second row and third column (shown in grey)
3.2.4) press “OK” button.
3.2.5) Now we will make cluster of this profile using: tools->clustering->
hierarchical clustering, (as shown below)
3.2.6) You will have one pop-up window for selecting clustering parameters.
Select “Cluster columns” (Species) and “Cluster rows”(Genes) and press
“ÖK”
Output:
Here you can see that the clusters of phylogenetic profile (to see full
heatmap; press window icon as shown below)
Read means orthologs for a given gene(row) are present in
genome(columns) and blue means absent.
4) -task network
This task creates a co-occurrence network from the phylogenetic profiles. The
network is built by computing the similarity between profiles of each gene pair. the
network contains one node per gene, and one edge is instantiated for each gene pair
passing a
threshold on the selected similarity metrics.
Create the phylogenetic profile
phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon
Bacteria -depth 5 -task network
5) -task all
Running all task (species, bbh, profiles, network), one task followed by another
task.
phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon
Bacteria -depth 5 –task all
Visualizing network in cytoscape.
2. References
Van Helden, J., Toussaint, A., and Thieffry, D. (2011). Bacterial Molecular
Networks (Humana Press Inc.).
Salgado, H., Moreno-Hagelsieb, G., Smith, T., and Collado-Vides, J. (2000).
Operons in Escherichia coli: genomic analyses and predictions. Proc Natl
Acad Sci U S A 97, 6652–6657.