Download Exercise 1: Mining transcriptomics data In the Patric ( https://www

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Exercise 1: Mining transcriptomics data

In the Patric ( https://www.patricbrc.org/database) search for VBIEscCol129921_2790,
find the corresponding recA entry and go to the gene page.

Click on the transcriptomics tab and find the conditions ( at least 3 or 4) were this gene is
the least and the most expressed. What do these conditions have in common?

Find the genes that are positively correlated, meaning they are co-expressed with recA.
Download the list of the corresponding protein sequences in FASTA format and keep
that window open, you will need it later).

What do all these genes have in common? To find this out if you know nothing about
gene names, you can:
 click on the pathway link in that page in PATRIC.
 Download the list of correlated genes as an excel sheet. Copy the column of the refseq
locus tag and copy it in the DAVID function ontology analysis tools.
(https://david.ncifcrf.gov/summary.jsp) Paste your list and choose locus Tag as Identifier
and submit your list
 What process are the genes co-expressed with recA involved in? Why did the
PATRIC/Pathway tool give you poor results?
 In the microbesonline (http://www.microbesonline.org) database. Find the recA gene
from Escherichia coli click on the “E” link. Click on the + correlated genes, (let it run
while we continue, we will get back to it later). Do You find the same result than with
PATRIC?
Exercise 2: Identifying regulators

Paste the list of fasta sequences of recA co-regulated proteins you found in PATRIC in
the GenBrowser P2RP ( http://www.p2rp.org) a tool to identify prokaryotic regulators.
Which gene in the list is a regulator. Capture its name and RefSeq Id for future use.

Go to RegPrecise ( http://regprecise.lbl.gov/RegPrecise/ ) and Find the page for the
regulator you identified in 1) in E. coli.
Get the list of genes that are predicted to be regulated by this gene based on the presence
of a specific DNA binding site upstream the transcription unit start. Download the list of
RefSeq Ids.


To know if this is a positive or negative regulator go and find the regulon entry for this
regulator in RegDB ( http://regulondb.ccg.unam.mx/). If this gene is deleted do you
predicted the regulated genes will be up regulated or down regulated ? You can also go
and fin this regulator in Prodoric (http://prodoric.tu-bs.de/) , and go Prodonet link to see
the network ( JAVA will have to work)

From the PATRIC database home page. Go to the Organisms Tab and select Escherichia.
Then Go to the Transcriptomics tab.
Filter experiment with the gene name of the regulator you identified above, two
experiments should come up , select the one with PubMed 11333217 and view the gene
list.

In the “Filter by one or more keywords or locus tags” Post all the Refseq locus Tags of
the genes that are regulated by this regulator (that you got on RegPrecise) and look at the
data using the heatmap view. Do you find the expected results, you can check the paper
where this study was published to help you.
From PATRIC can you access the GEO datasets. You can find the ones that that are
curated starting with GDS or d do the analysis on Geo2R yourself. Your goal is to
visualize how the expression of recA changes with UV in the WT and in the mutant.
For those who are fast you can post the same list of genes in http://colombos.net/ and
play with the analysis tools


Exercise 3: Mining phenotype data






For E. coli, the Keio collection is available. It is a set of individual deletions of every
non-essential genes. This collection was profiled under a set of growth conditions and
chemical stresses. Go to ecoliwiki (http://ecoliwiki.net/tools/chemgen/) to access that
data. Then, choose growth data and type in recA in the strain box and click SUBMIT.
What do the chemical where the growth the recA is the most affected have in common?
Chose correlation among strains and leave strain 2 empty. Look at the list of genes with
correlation coefficient >0.4 and click SUBMIT. What does this suggest on the function of
the protein?
Only a few organisms have mutant collections available (see
http://ogee.medgenius.info/browse/) but now TnSeq technology allows us to obtain
fitness data in a wide range of organisms and conditions (See PMID 26336012 for
review).
Microbesonline has fitness data for four organisms. Find the recA gene in Zymomonas
mobilis subsp. mobilis ZM4. Click on the fitness (F) data and look at the (+) fitness
profile. Looking at the genes that have the highest correlation what does it tell you about
TnSeq experiments compared to chemical genomics?.
A more comprehensive (~20 organisms in 300 conditions) and very user friendly TnSeq
analysis platform can be found at http://fit.genomics.lbl.gov/cgi-bin/myFrontPage.cgi.
Compare cofitness data for recA from E. coli and Shewanella oneidensis MR-1.
Why do you think the data from MR-1 seems so much more informative? You can read
the original paper (http://mbio.asm.org/content/6/3/e00306-15.full ) or
the help (http://fit.genomics.lbl.gov/cgi-bin/help.cgi) to help you answer.
Exercise 4: Data integration and enrichment analyses, when it works



We analyzed the expression and phenotype fitness data linked to recA, now we are going
to explore platforms that integrate different types of omics data.
Go to inetbio.org. Find what this platform predicts as function for recA. Look at the top
ranking prediction GO_biological_process_term and score and explain what type of
evidence is used to make this prediction.
Go to Genemania (http://genemania.org/) look at the E .coli recA networks, find how to
color the top 5 enrichments terms. You can do the same with the human one and see how
conserved is the RecA function between coli and man!
Exercise 5; Data integration and enrichment analyses II, when it does not works
Using only comparative genomics tools and different types of omics data we were able to predict
that the yeast YLR143W protein was the missing last enzyme diphtamide synthesis
DPH6.(https://biologydirect.biomedcentral.com/articles/10.1186/1745-6150-7-32). Here we are
going to go the different sites to reproduce how some these predictions were made.


Chemical genomics data is available for yeast on the Yeast Fitness database
(http://fitdb.stanford.edu ). Look at the data for YLR143W. What does this tell you on
the function of the protein?
Now let's see if the integrative databases can predict the function. Go first to the String
database (there to go back in time you should turn off text mining).Then try the Stitch
database (http://stitch.embl.de),
GeneMania (http://genemania.org.), BioPixie (http://imp.princeton.edu.) and
ConsensusPathDB-yeast ) (http://cpdb.molgen.mpg.de/YCPDB). For all these databases
look at network analysis and what the enrichment analysis gives you. How good are these
databases at predicting the function of YLR143W? Why do you think they are not as
successful as a human for this protein while they were for RecA.