* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Analysing thousands of bacterial genomes: gene annotation
Survey
Document related concepts
DNA barcoding wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genetic engineering wikipedia , lookup
Koinophilia wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
History of genetic engineering wikipedia , lookup
Minimal genome wikipedia , lookup
Genomic library wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Metagenomics wikipedia , lookup
Human microbiota wikipedia , lookup
Microevolution wikipedia , lookup
Transcript
ALLBIO training workshop Analysing thousands of bacterial genomes: gene annotation, metabolism, regulation Marseille, June 22-25, 2014 Course book CONTENTS RSAT TUTORIALS..............................................................................................4 1.Phylogenetic profiling: co-occurrence between genes across genomes 4 Test case 1: Escherichia coli..................................................................4 Motivation...........................................................................................4 Protocol...............................................................................................4 2.References.............................................................................................6 2/12 RSAT TUTORIALS 1. Phylogenetic profiling: co-occurrence between genes across genomes Test case 1: Escherichia coli Motivation As a first study case, we will generate and analyse phylogenetic profiles from Escherichia coli, because this is one of the well characterized bacterial genomes. Protocol Connect to pedagogix server through putty (only Windows users): Start the putty software. You should get a windows as shown below. Type 139.124.66.4 in the option Host Name (or IP adress). Press Open. You will be prompted for a login. Enter your login (you should have received one by mail). Press the Return key from your keyboard. You will then be prompted for a password, and have to confirm that you trust the host key of the server. Accept it by typing 'Y'. Make a directory to store phylogenetic profile analysis mkdir Ecoli ## Change directory to Ecoli cd Ecoli Main command: phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon Bacteria -depth [#] -task [species|bbh|profiles|all] OPTIONS: -v: verbosity, display varying levels of status messages while programe is running. -org: the organism name for which you want to construct the phylogenetic profile. -taxon: selecting the taxon for generating phylogenetic profiles. -depth: depth of taxonomic hierarchy. In order to reduce the redundancy due to the stronger representation of some taxonomic groups, this parameter allows selection of subset of species from the ~2.500 bacterial genomes currently available on RSAT, (ex: many strains for E.coli). -task: selecting task (species, bbh profiles, network) to perform for selected organism. 1) -task species Selection of the genomes/organisms for creating phylogenetic profiles: phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon Bacteria -depth 5 -task species Output: List of species selected for phylogenetic profile. ls results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacteria_depth5 Explore the exported table more results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacteria_depth5/Esche richia_coli_K_12_substr__MG1655_uid57779_Bacteria_depth5_selected_species.tab Command to count the Number of selected species at different taxonomic depth : grep -v '^;\|#' results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacter ia_depth5/Escherichia_coli_K_12_substr__MG1655_uid57779_Bacteria_depth5_sel ected_species.tab | wc -l -depth Number of selected species 2 55 3 116 4 207 5 368 6 657 Interpretation: this shows that with increase in –depth level, you select more number of species for phylogenetic profiles which sometime leads to over-representation of particular taxon, hence it may bias the result. Increase in –depth level also increases the program execution time. 2) -task BBH To define orthologs for genes of your genome of interest (in this example Escherichia_coli_K_12_substr__MG1655_uid57779), we use bbh (Bi-directional Best Hit) approach. BBH detect all bi-directional best hits (from genome-blast result) between each protein of the query genome and each genome of the reference taxon. The result is a table with one row per bbh, with different scores. Create the phylogenetic profile phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon Bacteria -depth 5 -task bbh Read the phylogenetic profile more results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacter ia_depth5/Escherichia_coli_K_12_substr__MG1655_uid57779_Bacteria_depth5_bb h_len50_id10_bits_e1e-05.tab 3) -task profiles This task convert BBH file to phylogenetic profile tables. A phylogenetic profile table contains 1 row per protein-coding gene of the query organism, and one column per genome of the reference taxon. This program exports a series of separate phylogenetic profiles whose cell values indicate various statistics. Create the phylogenetic profile phylo-profiles -v 1 –org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon Bacteria -depth 5 -task profiles List the files ls results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs_Bacteria_depth5/ Visualizing and clustering phylogenetic profiles: 3.1) Download GENE-E from : https://www.broadinstitute.org/cancer/software/GENE-E/download.html. 3.2) Open GENE-E 3.2.1) File->open File-> 3.2.2) Select results/phylo_profiles/Escherichia_coli_K_12_substr__MG1655_uid57779_vs _Bacteria_depth5/Escherichia_coli_K_12_substr__MG1655_uid57779_Bacter ia_depth5_profiles_len50_id10_e1e-05_boolean_names.tab. 3.2.3) there will be one pope up window (shown below): Click on the second row and third column (shown in grey) 3.2.4) press “OK” button. 3.2.5) Now we will make cluster of this profile using: tools->clustering-> hierarchical clustering, (as shown below) 3.2.6) You will have one pop-up window for selecting clustering parameters. Select “Cluster columns” (Species) and “Cluster rows”(Genes) and press “ÖK” Output: Here you can see that the clusters of phylogenetic profile (to see full heatmap; press window icon as shown below) Read means orthologs for a given gene(row) are present in genome(columns) and blue means absent. 4) -task network This task creates a co-occurrence network from the phylogenetic profiles. The network is built by computing the similarity between profiles of each gene pair. the network contains one node per gene, and one edge is instantiated for each gene pair passing a threshold on the selected similarity metrics. Create the phylogenetic profile phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon Bacteria -depth 5 -task network 5) -task all Running all task (species, bbh, profiles, network), one task followed by another task. phylo-profiles -v 1 -org Escherichia_coli_K_12_substr__MG1655_uid57779 -taxon Bacteria -depth 5 –task all Visualizing network in cytoscape. 2. References Van Helden, J., Toussaint, A., and Thieffry, D. (2011). Bacterial Molecular Networks (Humana Press Inc.). Salgado, H., Moreno-Hagelsieb, G., Smith, T., and Collado-Vides, J. (2000). Operons in Escherichia coli: genomic analyses and predictions. Proc Natl Acad Sci U S A 97, 6652–6657.