* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Set - people.vcu.edu
Gene desert wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Pathogenomics wikipedia , lookup
Microevolution wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Designer baby wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Essential gene wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Scenario 5 Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please consider what capabilities you would want to serve your research and annotation interests. A narrative to help you go through the simulation appears in a red-bordered box, such as the one below. To begin: 1. Click on Slide Show, (on the upper toolbar) 2. Click View Show 3. Click Continue button Continue Scenario 5 Analysis: Discovery of possible regulatory motifs You’ve decided you want to know what regulates the expression of nif genes, encoding the machinery for nitrogen fixation. Here’s your strategy: • Collect nif genes from Anabaena PCC 7120 into set • Include in set orthologs of the Anabaena genes • Extract 5’ sequences from all genes in set • Analyze set of 5’ sequences for motifs • (Search for other genes with same motifs) Continue Build set Display set Click on Build Set to begin finding orfs with the desired specifications Modify set Set operation Build set Display set Modify set Choose set type All items in All open reading frames of All amino acid sequences of All intergenic regions of Human-annotated orfs of Private set Public set The first goal is to find all open reading frames within Prochlorococcus annotated as nif genes, so click on All open reading frames in Set operation Cancel Build set All items in Display set Modify set Set operation Choose set type Choose database All open reading frames of Arthrobacter platensis Gloeobacter violaceus Microcystis aeruginosa Nostoc punctiforme Nostoc PCC Anabaena PCC7120 7120 Prochlorococcus MED4 Prochlorococcus MIT9313 Prochlorococcus S120 Synechococcus PCC6301 Synechococcus PCC7942 Synechococcus WH Synechocystis PCC 6803 Thermosynechococcus Trichodesmium Unicellulular Filamentous All Click on Anabaena PCC 7120 Cancel Build set Display set Modify set Set operation Cancel Variable Data Operation Function Done All items in Choose set type Choose database All open reading frames of Anabaena PCC 7120 You want to compare the description of each orf with “nif”. To get a tool to extract the description, click on Function . such that: Build set Display set Modify set Set operation Cancel Variable Data Operation Function Done All items in Choose set type Choose database All open reading frames of Anabaena PCC 7120 Choose function Closest ortholog of Protein product of Upstream region of Downstream region of Description of Category of Annotation level of (item Click on Description of. such that: Build set Display set Modify set Set operation Cancel Variable Data Operation Function Done All items in Choose set type Choose database All open reading frames of Anabaena PCC 7120 Choose function Description of Op (item) = includes excludes You want to find orfs whose description includes the word “nif”. Click on includes. such that: Build set All items in Display set Modify set Set operation Cancel Data Operation Function Done Choose set type Choose database All open reading frames of Anabaena PCC 7120 Choose function Description of Op (item) includes Type description term(s) nif You can type in any characters to search for. For this simulation, the term “nif” is provided. Press the Enter key such that: Build set Display set Modify set Set operation Cancel Variable Data Operation Function Done All items in Choose set type Choose database All open reading frames of Anabaena PCC 7120 Choose function Description of Op (item) includes No more specifications. Press the Done button. Type description term(s) nif such that: Build set Display set Modify set Set operation Cancel Variable Data Operation Function Done All items in Choose set type Choose database All open reading frames of Anabaena PCC 7120 Choose function Description of Op (item) includes Type description term(s) nif If this were a complicated search, you might want to save the specifications as a script. In this case, just save the results by clicking on Save only results. Save results and script Save Save only only results results such that: Build set Display set Modify set Set operation Cancel Variable Data Operation Function Done All items in Choose set type Choose database All open reading frames of Anabaena PCC 7120 Choose function Description of Type description term(s) Op (item) includes nif Type name of set 7120 nif genes All orfs of Anabaena whose descriptions include “nif” will be collected into a set. You can name the set anything you want. For this simulation, a name is provided. Press the Enter key. such that: Build set Display set Modify set Set operation Done Set: 7120 nif genes Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit Anab7120:alr0692 similar to nifU Anab7120:alr0874 nifH2 dinitrogenase reductase Anab7120:asr1309 similar to nifU Anab7120:alr1407 nifV1 homocitrate synthase Anab7120:asr1408 nifZ iron-sulfur cofactor synthesis Anab7120:asr1409 nifT << more items >> This is the result of the search. The set is displayed both as a list of orfs and a graphical representation of the genetic neighborhood of each orf. You can find out more about an orf by clicking its name or its arrow. For now, just press . Continue Continue Build set Display set Modify set Set operation Set: 7120 nif genes Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit Anab7120:alr0692 similar to nifU Anab7120:alr0874 nifH2 dinitrogenase reductase Anab7120:asr1309 similar to nifU Anab7120:alr1407 nifV1 homocitrate synthase Anab7120:asr1408 nifZ iron-sulfur cofactor synthesis Anab7120:asr1409 nifT << more items >> This search, like most, is only a beginning. It brought up some unintended hits (“nif” found “NiFe”). More seriously, it brought up many genes probably in the middle of operons and unlikely to be preceded by regulatory motifs. The genetic neighborhood gives clues as to operon structure. Select the two most likely orfs to begin operons by clicking on the circles next to alr0874 and alr1407. Done Build set Display set Modify set Set operation Set: 7120 nif genes Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit Anab7120:alr0692 similar to nifU Anab7120:alr0874 nifH2 dinitrogenase reductase Anab7120:asr1309 similar to nifU Anab7120:alr1407 nifV1 homocitrate synthase Anab7120:asr1408 nifZ iron-sulfur cofactor synthesis Anab7120:asr1409 nifT << more items >> Let’s suppose you proceed in a like fashion through the rest of the list. Press . Done Done Build set Anab7120:alr0874 Anab7120:alr1407 Anab7120:all1438 Anab7120:all1455 Display set Modify set Show orf ID Show gene name Set: Show description nifH2 dinitrogenase reductase Show coordinates nifV1 homocitrate synthase Show graphic nifE nitrogenase Fe/Mo cofactor Show neighbors: +/- 1 nifH dinitrogenase reductase Show map Anab7120:all1517 nifB Set operation 7120 nif genes nitrogen fixation protein Anab7120:alr2968 nifV2 homocitrate synthase The set now consists of the six Anabaena nif genes that you judged most likely to be preceded by transcriptional signals. It might be interesting to see where this set is located on the genome. To do this, click , then make some room by Display set clicking on Show graphic. Done Build set Anab7120:alr0874 Anab7120:alr1407 Anab7120:all1438 Anab7120:all1455 Display set Modify set Show orf ID Show gene name Set: Show description nifH2 dinitrogenase reductase Show coordinates nifV1 homocitrate synthase Show graphic nifE nitrogenase Fe/Mo cofactor Show neighbors: +/- 1 nifH dinitrogenase reductase Show map Anab7120:all1517 nifB Set operation 7120 nif genes nitrogen fixation protein Anab7120:alr2968 nifV2 homocitrate synthase Replace the space-consuming description with coordinates by clicking on Show description, and then click Show coordinates and finally Show map. Done Build set Anab7120:alr0874 Anab7120:alr1407 Anab7120:all1438 Anab7120:all1455 Display set Modify set Show orf ID Show gene name Set: Show description nifH2 Show coordinates nifV1 Show graphic nifE Show neighbors: +/- 1 nifH Show map Set operation 7120 nif genes Anab7120:all1517 nifB Anab7120:alr2968 nifV2 Replace the space-consuming description with coordinates by clicking on Show description, and then click Show coordinates and finally Show map. Done Build set Anab7120:alr0874 Anab7120:alr1407 Anab7120:all1438 Anab7120:all1455 Display set Modify set Show orf ID Show gene name Set: Show description nifH2 1008496 -> 1009389 Show coordinates nifV1 1671878 -> 1673011 Show graphic nifE 1696389 <- 1697831 Show neighbors: +/- 1 nifH 1713396 <- 1714283 Show map Anab7120:all1517 nifB Set operation 7120 nif genes 1776670 <- 1778097 Anab7120:alr2968 nifV2 3609625 -> 3611012 Replace the space-consuming description with coordinates by clicking on Show description and then Show coordinates, and finally, click on Show map. Done Build set Display set Modify set Set: 7120 nif Anab7120:alr0874 nifH2 1008496 -> 1009389 Anab7120:alr1407 nifV1 1671878 -> 1673011 Anab7120:all1438 nifE 1696389 <- 1697831 Anab7120:all1455 nifH 1713396 <- 1714283 Anab7120:all1517 nifB 1776670 <- 1778097 Set operation Maintenance genes Set operations Analysis tools Discovery tools Transformations Transformations Anab7120:alr2968 nifV2 3609625 -> 3611012 Anabaena chromosome Four of the six putative nif operons are clustered near 1.7 Mb... but back to business. Our idea was to extend the set to include orthologs in other nitrogen-fixing cyanobacteria. To do this, click Set operation , then Transformations, then Ortholog of. 6413771 bp Done Closest ortholog Ortholog of of Protein product of Upstream region of Downstream region of Build set Display set Modify set Choose set type Orthologs of ( All open reading frames of All amino acid sequences of All intergenic regions of Human-annotated orfs of Public set Private set You want the orthologs of the orfs in the set you just made. This set is yours – a private set – as opposed to certain sets that are available to all users. Click Private set. Set operation Cancel Build set Orthologs of ( Display set Modify set Set operation Choose set type Choose set Private set 7120 IS895 seqs 7120 nif genes 7120 STTR7 regions Light-specific genes Npun STTR7 regions The list of choices will consist of whatever sets you may have created. Choose the one you just made: 7120 nif genes. Cancel Build set Orthologs of ( Display set Modify set Choose set type Choose set Private set 7120 nif genes At present, the set of filamentous cyanobacteria include just the nitrogenfixing strains Nostoc punctiforme, Trichodesmium erythreum, Anabaena. Click on filamentous. Set operation Cancel Choose database in Arthrobacter platensis Gloeobacter violaceus Microcystis aeruginosa Nostoc punctiforme Anabaena PCC 7120 Prochlorococcus MED4 Prochlorococcus MIT9313 Prochlorococcus S120 Synechococcus PCC6301 Synechococcus PCC7942 Synechococcus WH8102 Synechocystis PCC 6803 Thermosynechococcus Trichodesmium erythreum Unicellulular Filamentous filamentous All ) Build set Orthologs of ( Display set Modify set Choose set type Choose set Private set 7120 nif genes Set operation Choose database in Type name of set all nif genes All orthologs of the selected nif genes will be combined and saved in a set of your choice. For this simulation, a name is provided. Press the Enter key. Cancel Filamentous ) Build set Display set Modify set Set: all nif genes Anab7120:alr0874 nifH2 dinitrogenase reductase Anab7120:alr1407 nifV1 homocitrate synthase Anab7120:all1438 nifE nitrogenase Fe/Mo cofactor Anab7120:all1455 nifH dinitrogenase reductase Anab7120:all1517 nifB nitrogen fixation protein Anab7120:alr2968 nifV2 homocitrate synthase NostPunc:637.025 nifH2 dinitrogenase reductase NostPunc:510.011 nifV1 homocitrate synthase NostPunc:651.072 nifE nitrogenase Fe/Mo cofactor NostPunc:510.021 nifB nitrogen fixation protein << more items >> The set now consists of nif genes from all filamentous cyanobacteria. From this set we want to extract the upstream sequences. Click on Set operation , then click on Transformations and Upstream region of. Set operation Maintenance Set operations Analysis tools Discovery tools Transformations Transformations Done Ortholog of Protein product of Upstream region of Downstream region of Build set Display set Modify set Choose set type Upstream region of ( All open reading frames of Human-annotated orfs of Public set Private set Again you want the orfs from a set you made yourself, so click on Private set. Set operation Cancel Build set Display set Upstream region of ( Modify set Set operation Choose set type Choose set Private set 7120 IS895 seqs 7120 nif genes 7120 STTR7 regions all nif genes Light-specific genes Npun STTR7 regions The set you just defined magically appears on the list (no chance for misspelling). Click on it. Cancel ) Build set Display set Upstream region of ( Modify set Set operation Choose set type Choose set Private set all nif genes Type name of set all nif genes – 5’ Give this new set of 5’ regions a descriptive name (done here for you). Press the Enter key. ) Cancel Build set Display set Modify set Set operation Done Set: all nif genes – 5’ Anab7120.C:1006982-1008496d Anab7120.C:1671462-1671878d Anab7120.C:1697832-1698138c Anab7120.C:1713264-1713395c Anab7120.C:1778098-1779034c Anab7120.C:3609273-3609624d NostPunc.637:37288-37376d NostPunc.510:15955-16325d NostPunc.651:60311-60584c NostPunc.510:5239-6338c << more items >> The resulting set consists of sequences not orfs, and so the elements are defined by coordinates. Clicking on a coordinate brings up the sequence display (see Scenario 6). Clicking on a graph of an orf brings up the orf’s annotation Continue page. Click . Continue Build set Display set Modify set Set: all nif genes – Anab7120.C:1006982-1008496d Anab7120.C:1671462-1671878d Set operation Maintenance 5’Set operations Analysis tools Discovery tools Transformations Anab7120.C:1697832-1698138c Anab7120.C:1713264-1713395c Anab7120.C:1778098-1779034c Anab7120.C:3609273-3609624d NostPunc.637:37288-37376d NostPunc.510:15955-16325d NostPunc.651:60311-60584c NostPunc.510:5239-6338c << more items >> The final step in this procedure is to analyze the set of upstream sequences of nif genes hoping to find a common motif. Click on Set Set operation operatio , then Analysis tools. Tools based on Position-Specific Scoring Matrices (PSSM’s) are most often used for the task. Click on one of these: Meme. Done Align PSSM: Gibbs sampler PSSM: Meme Make HMM Build set Display set Modify set Choose set type PSSM: Meme of ( Public set Private set Click Private set and then all nif genes – 5’ to give Meme the set of 5’ sequences. Set operation Cancel Build set Display set PSSM: Meme of ( Modify set Set operation Choose set type Choose set Private set 7120 IS895 seqs 7120 nif genes 7120 STTR7 regions all nif genes all nif genes – 5’ Npun STTR7 regions Click Private set and then all nif genes – 5’ to give Meme the set of 5’ sequences. ) Cancel Build set Display set PSSM: Meme of ( Modify set Set operation Choose set type Choose set Private set all nif genes – 5’ Type name of results PSSM:all nif – 5’ Give the results a name, press Enter, and the task is accomplished. ) Cancel Scenario 5 Analysis: Discovery of possible regulatory motifs Summary • The interface facilitates operations on sets of genes and sequences • The interface puts at your disposal powerful tools (that already exist), without the need to figure out a different computer environment • Taken together, these capabilities make possible a focus by those not particularly adept at computer programming on the function of noncoding sequences But don’t be fooled – the interface does not yet exist. That’s the point of the proposal!