Download Functional Genomics I: Transcriptomics and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Heritability of IQ wikipedia , lookup

X-inactivation wikipedia , lookup

RNA interference wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

NEDD9 wikipedia , lookup

Oncogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Essential gene wikipedia , lookup

Microevolution wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Genome (book) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
ExploringTranscriptomicdata
1. ExploringRNAsequencedatainPlasmodiumfalciparum.
Note:Forthisexerciseusehttp://www.plasmodb.org
a.
Find all genes in P. falciparum that are up-regulated during the later stages of the
intraerythrocyticcycle.
- Hint:Usethefoldchangesearchforthedataset“Transcriptomeduringintraerythrocytic
development (Bartfai et al.)”. For this data set, synchronized Pf3D7 parasites were
assayedbyRNA-seqat8time-pointsduringtheiRBCcycle.Wewanttofindgenesthat
areup-regulatedinthelatertimepoints(30,35,40hours)usingtheearlytimepoints(5,
10,15,20,25hours)asreference.
There are a number of parameters to manipulate in this search. As you modify
parametersontheleftsidenotethedynamichelpontherightside.Seescreenshots.
-
Direction:thedirectionofchangeinexpression.Chooseup-regulated.
-
Fold Change>= the intensity of difference in expression needed before a gene is
returnedbythesearch.Choose12butfeelfreetomodifythis.
-
Betweeneachgene’sAVERAGEexpressionvalue: Thisparameterappearsonceyouhave
chosentwoReferenceSamplesanddefinestheoperationappliedtoreferencesamples.
Foldchangeiscalculatedastheratiooftwovalues(expressioninreference)/(expression
incomparison).Whenyouchoosemultiplesamplestoserveasreference,wegenerate
one number for the fold change calculation by using the minimum, maximum, or
average.Chooseaverage
-
Reference Sample: the samples that will serve as the reference when comparing
expressionbetweensamples.choose5,10,15,20,25
-
Andit’sAVERAGEexpressionvalue: Thisistheoperationappliedtocomparisonsamples.
seeexplanationabove.Chooseaverage
-
ComparisonSample:thesamplethatyouarecomparingtothereference.Inthiscaseyou
areinterestedingenesthatareup-regulatedinlatertimepointschoose30,35,40
b.
-
For the genes returned by the search, how does the RNA-sequence data compare to
microarraydata?
Hint:PlasmoDBcontainsdatafromasimilarexperimentthatwasanalyzedbymicroarray
insteadofRNAsequencing.Thisexperimentiscalled:Erythrocyticexpressiontimeseries
(3D7, DD2, HB3) (Bozdech et al. and Linas et al.) or Pf-iRBC 48hr for shorter column
headings.TodirectlycomparethedataforgenesreturnedbytheRNA-seqsearchthat
youjustran,addthecolumncalled“Pf-iRBC48hr-Graph”.
OPTIONAL: You can also run a fold change search using this experiment to compare
resultsonagenomescale.Addasteptoyourstrategyandintersecttheresultsofafold
changesearchusingthe“Erythrocyticexpressiontimeseries(3D7,Dd2,HB3)(Bozdechet
al.andLinasetal.)”experiment(undermicroarrayevidence).Configureitsimilarlytothe
RNA-seq experiment although you will probably need to make the fold change smaller
(try2or3)duetothedecreaseddynamicrangeofmicroarrayscomparedtoRNA-seq.
2. ExploringmicroarraydatainTriTrypDB.
Note:Forthisexerciseusehttp://www.tritrypdb.org
a. Find T. cruzi protein coding genes that are upregulated in amastigotes compared to
trypomastigotes.Gotothetranscriptexpressionsectionthenselectmicroarray.Choose
the fold change (FC) search for the data set called: Transcriptomes of Four Life-Cycle
Stages(Minningetal.).
-
Selectthedirectionofregulation,yourreferencesampleandyourcomparisonsample.For
thefoldchangekeepthedefaultvalue2.
-
Howmanygenesdidyoufind?Dotheresultsseem
plausible?
-
Are any of these genes also up-regulated in the
replicative insect stage (epimastigotes)? How can
you find this out? (Hint: add a step and run a
microarray search comparing expression of
epimastigotestometacyclics).
-
Dothesegeneshaveorthologsinotherkinetoplastids?(Hint:addastepandrunanortholog
transformonyourresults).
-
How many orthologs exist in L. braziliensis? (Hint: look at the filter table between the
strategy panel and your result list. Click on the number in of gene to view results from a
specific species). Explore your results. Scan the product descriptions for this list of genes.
Did you find anything interesting? Perhaps a GO enrichment analysis would support your
ideas.
3. FindinggenesbasedonRNAseqevidenceandinferringfunctionofhypotheticalgenes.
Note:Usehttp://plasmodb.orgforthisexercise.
a.
Find all genes in P. falciparum that are up-regulated at least 50-fold in ookinetes
compared to other stages: “Transcriptomes of 7 sexual and asexual life stages (LopezBarragan et al.)”. For this search select “average” for the operation applied on the
referencesamples.
b.
-
The above search will give you all genes that are up-regulated by 50 fold in ookinetes
comparedtotheotherstages.Despitethehighfoldchange,somegenesinthelistmay
behighlyexpressedintheotherstages.Howcanyouremovegenesfromthelistthatare
highlyexpressedintheotherstages?
Hint:RunasearchforgenesbasedonRNASeqevidencefromthesameexperiment,but
thistimeselectthepercentilesearch:P.f.sevenstages-RNASeq(percentile).What
minimalpercentilevaluesshouldyouchoose?40–100%?
c.
Which metabolic pathways are represented in this gene list? Hint: add a step and
transform results to pathways. How does this result compare to running a pathways
enrichmentonstep2?
d.
Whathappensifyourevisethefirststepandmodifythefolddifferencetoalowervalue-
10forexample?
e.
PlasmoDBalsohasanexperimentexamininggeneexpressionduringsexualdevelopment
inPlasmodiumberghei(rodentmalaria).Canyoudetermineiftherearegenesthatare
up-regulatedinbothhumanandrodentookinetes(comparedtoallotherstages)?Hint:
startbydeletingthelaststepyouaddedinthisexercise(transformtopathways).Todo
thisclickoneditthendeleteinthepopup.Next,addstepsfortheP.bergheiexperiments
“PbergheiANKA5asexualandsexualstagetranscriptomesRNASeq”.Notethatyouwill
have to use a nested strategy or by running a separate strategy then combining both
strategies.
.
4.
FindgenesthatareessentialinprocyclicsbutnotinbloodformT.brucei.
Note:forthisexerciseusehttp://TriTrypDB.org.
-
FindthequeryforHighThroughputPhenotyping.Thinkabouthowtosetupthisquery
(Hint:youwillhavetosetupatwo-stepstrategy).Rememberyoucanplayaroundwith
the parameters but there is no one correct way of setting them up – try the default
parametersfirstandselectthe“inducedprocyclics”asthecomparisonsample.
-
Next add a step and run the same search except this time select the “induced
bloodstreamform”samples.
-
Howdidyoucombinetheresults?Rememberyouwanttofindgenesthatareessential
inprocyclicsandnotinbloodform.
5.
a.
FindingoocystexpressedgenesinT.gondiibasedonmicroarrayevidence.
Note:Forthisexerciseusehttp://toxodb.org
Findgenesthatareexpressedat10foldhigherlevelsinoneoftheoocyststagesthanin
any other stage in the “Oocyst, tachyzoite, and bradyzoite developmental expression
profiles (M4) (John Boothroyd)” microarray experiment. In this example, the maximum
expression value between genes in the reference and comparison groups was used to
determinethefolddifference.
b.
Addasteptolimitthissetofgenestoonlythoseforwhichallthenon-oocyststagesare
expressedbelow50thpercentile…ielikelynotexpressedatthosestages.(Hint:afteryou
clickonaddstepfindthesameexperimentundermicroarrayexpressionandchosethe
percentilesearch).
-
Selectthe4non-oocystsamples.
-
We want all to have less than 50th percentile so set minimum percentile to 0 and
maximumpercentileto50.
-
-
c.
Sincewewantallofthemtobeinthisrange,
chooseALLinthe“MatchesAnyorAllSelected
Samples”.
Toviewthegraphsinthefinalresulttable,
turnonthecolumnscalled“Tg-M4LifeCycle
Stages–graph”and“Tg-M4LifeCycleStage
%ile-graph”(insidethe“Tg-LifeCycle”Microarray).
Revise the first step of this strategy and compare the maximum expression of the
referencesamplestotheminimumofthecomparisonsamples.
-
Does this result
convincing?Why?
look
cleaner/more
-
Wouldyouconsiderthesegenestobeoocyst
specific?
Save this strategy so that you can use it for an
exercisewearedoinglaterduringthecourse.
d. Revisethefirststepofthisstrategytofindgenesthatare3foldhigherinday4oocyststhan
anyotherlifecyclestageinthisexperiment.
-
Doallthesegeneshaveday4oocystsastheglobalmaximumtimepoint?
-
Notethatwestillhavethesteptolimitthepercentileofnon-oocystsamplesto<=50th
percentile.Whathappensifyourevisethissteptoalsoincludetheunsporulatedandday
10 oocyst samples in this percentile range? Do you get more of fewer results back?
Why?
6. ComparingRNAabundanceandProteinabundancedata.
Note:forthisexerciseusehttp://TriTrypDB.org.
InthisexercisewewillcomparethelistofgenesthatshowdifferentialRNAabundancelevels
between procyclic and blood form stages in T. brucei with the list of genes that show
differentialproteinabundanceinthesesamestages.
Findgenesthataredown-regulated2-foldinprocyclicformcells.Gotothesearchpage
forGenesbyMicroarrayEvidenceandselectthefoldchangesearchforthe“Expression
profilingoffivelifecyclestages(MarilynParsons)”experimentandconfigurethesearch
to return protein-coding genes that are down-regulated 2 fold in procyclic form (PCF)
relative to the Blood Form reference sample. Since there are two PCF samples, it is
reasonabletochoosebothandaveragethem.
a.
b.
Add a step to compare with quantitative protein expression. Select protein expression
then “Quantitative Mass Spec Evidence” and the "Quantitative phosphoproteomes of
bloodstream and procyclic forms (Tb427) (Urbaniak et al.)" experiment. Configure this
searchtoreturngenesthataredown-regulatedinprocyclicformrelativetobloodform.
c.
d.
e.
Howmanygenesareintheintersection?Doesthismakesense?Makecertainthatyou
setthedirectionscorrectly.
Try changing directions and compare up-regulated genes/proteins. (Hint: revise the
existing strategy … you might want to duplicate it so you can keep both). When you
change one of the steps but not the other do you have any genes in the intersection?
Whymightthisbe?
Can you think of ways to provide more confidence (or cast a broader net) in the
microarray step? (Hint: you could insert steps to restrict based on percentile or add a
RNASequencingstepthathasthesamesamples).
7.FindgeneswithevidenceofphosphorylationinintracellularToxoplasmatachyzoites.
Forthisexerciseusehttp://www.toxodb.org
PhosphorylatedpeptidescanbeidentifiedbysearchingtheappropriateexperimentsintheMass
SpecEvidencesearchpage.
7a.
Find all genes with evidence of phosphorylation in intracellular tachyzoites. Select the
“Infectedhostcell,phosphopeptide-enriched(peptidediscoveryagainstTgME49)”sampleunder
theexperimentcalled“Tachyzoitephosphoproteomefrompurifiedparasiteorinfectedhostcell
(RH)(Treecketal.)”
7b.
Removeallgeneswithphosphorylationevidencefrompurifiedtachyzoites.
7c.
Removeallgenesthatarealsopresentinthephosphopeptide-depletedfractions(select
bothintracellularandextracellular).
7d.
Exploreyourresults.Whatkindsofgenesdidyoufind?Hint:usetheProductdescription
wordcolumnorperformaGOenrichmentanalysisofyourresults.Couldyouachievethissame
105geneswithatwostepstrategy?Hint:removedepletedandtachozoiteproteinsinonestep
ratherthantwo.
7e.
Areanyofthesegeneslikelytobesecreted?Hint:addastepsearchingforgeneswith
secretorysignalpeptides.
7f.
Pickoneortwoofthehypotheticalgenesinyourresultsandvisittheirgenepages.Can
youinferanythingabouttheirfunction?Hint:exploretheproteinandexpressionsections.
7g.
Whataboutpolymorphismdata?GobacktoyourstrategyandaddcolumnsforSNPdata
found under the population biology section. Explore the gene page for the gene that has the
most number of non-synonymous SNPs. Hint: you can sort the columns by clicking on the
up/downarrowsnexttothecolumnnames.