Download GEP Annotation Report - GEP Community Server

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Minimal genome wikipedia , lookup

Metagenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Human genome wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Pathogenomics wikipedia , lookup

Transposable element wikipedia , lookup

Protein moonlighting wikipedia , lookup

Epigenetics of human development wikipedia , lookup

History of genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genome (book) wikipedia , lookup

Gene therapy wikipedia , lookup

NEDD9 wikipedia , lookup

Genomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene desert wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome editing wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
LastUpdate:12/23/2016
GEPAnnotationReport
Note:Foreachgenedescribedinthisannotationreport,youshouldalsopreparethe
correspondingGFF,transcriptandpeptidesequencefilesaspartofyoursubmission.
Studentname:WilsonLeung
Studentemail:[email protected]
Facultyadvisor:SarahC.R.Elgin
College/university:WashingtonUniversityinSt.Louis
Project details
Projectname:contig10
Projectspecies:D.biarmipes
Dateofsubmission:08/04/2014
Sizeofprojectinbasepairs:43,013
Numberofgenesinproject:3
Doesthisreportcoverallofthegenesorisitapartialreport?Partialreport
Ifthisisapartialreport,pleaseindicatetheregionoftheprojectcoveredbythisreport:
Frombase25,000tobase28,000
Instructions for project with no genes
Ifyoubelievethattheprojectdoesnotcontainanygenes,pleaseprovidethe
followingevidencetosupportyourconclusion:
1. PerformaBLASTXsearchoftheentirecontigsequenceagainstthenon-redundant
(nr)proteindatabase.Provideanexplanationforanysignificant(E-value<1e-5)
hitstoknowngenesinthenrdatabaseastowhytheydonotcorrespondtoreal
genesintheproject.
2. ForeachGenscanprediction,performaBLASTPsearchusingthepredictedamino
acidsequenceagainstthenrproteindatabaseusingthestrategydescribedabove.
3. Examinethegeneexpressiontracks(e.g.,RNA-Seq)forevidenceoftranscribed
regionsthatdonotcorrespondtoalignmentstoknownD.melanogasterproteins.
PerformaBLASTXsearchagainstthenrdatabaseusingthesegenomicregionsto
determineiftheyshowsequencesimilaritytoknownorpredictedproteinsinthenr
database.
1
LastUpdate:12/23/2016
CompletethefollowingGeneReportFormforeachgeneinyourproject.Copyandpaste
thesectionsbelowtocreateasmanycopiesasneededwithinthisreport.Besureto
createenoughIsoformReportFormswithinyourGeneReportFormforallisoforms.
Gene report form
Genename(e.g.,D.biarmipeseyeless):D.biarmipesCG31997
Genesymbol(e.g.,dbia_ey):dbia_CG31997
Approximatelocationinproject(from5’endto3’end):25673-27471
NumberofisoformsinD.melanogaster:2
Numberofisoformsinthisproject:2
Completethefollowingtableforalltheisoformsinthisproject:
Name(s)ofuniqueisoform(s) Listofisoformswithidenticalcodingsequences
basedoncodingsequence
CG31997-PB
CG31997-PA
Note:Forisoformswithidenticalcodingsequence,youonlyneedtocompletethe
IsoformReportFormforoneoftheseisoforms(i.e.usingthenameoftheisoformlisted
intheleftcolumnofthetableabove).However,youshouldgenerateGFF,transcript,
andpeptidesequencefilesforALLisoforms,irrespectiveofwhethertheyhave
identicalcodingsequencesasotherisoforms.
Consensus sequence errors report form
Completethissectionifyouhaveidentifiederrorsintheprojectconsensussequence.
Allofthecoordinatesreportedinthissectionshouldberelativetothecoordinatesof
theoriginalprojectsequence.
Location(s)intheprojectsequencewithconsensuserrors:NA
Isoform report form
Completethisreportformforeachuniqueisoformlistedinthetableabove(copyand
pastetocreateasmanycopiesofthisIsoformReportFormasneeded):
Gene-isoformname(e.g.,dbia_ey-PA):dbia_CG31997-PB
Namesoftheisoformswithidenticalcodingsequencesasthisisoform:dbia_CG31997-PA
Isthe5’endofthisisoformmissingfromtheendoftheproject?No
Ifso,howmanyexonsaremissingfromthe5’end:
Isthe3’endofthisisoformmissingfromtheendoftheproject?No
Ifso,howmanyexonsaremissingfromthe3’end:
2
LastUpdate:12/23/2016
1. Gene Model Checker checklist
EnterthecoordinatesofyourfinalgenemodelforthisisoformintotheGeneModel
Checkerandpasteascreenshotofthechecklistresultsbelow:
Note:Forprojectswithconsensussequenceerrors,reporttheexoncoordinatesrelative
totheoriginalprojectsequence.IncludetheVCFfileyouhavegeneratedabovewhen
yousubmitthegenemodeltotheGeneModelChecker.TheGeneModelCheckerwilluse
thisVCFfiletoautomaticallyrevisethesubmittedexoncoordinates.
2. View the gene model on the Genome Browser
UsethecustomtrackfeaturefromtheGeneModelCheckertocaptureascreenshotofyour
genemodelshownontheGenomeBrowserforyourproject.Zoominsothatonlythis
isoformisinthescreenshot.(Seepage12oftheGeneModelCheckeruserguideonhowto
dothis;youcanfindtheguideunder“Help”è“Documentations”è“WebFramework”on
theGEPwebsiteathttp://gep.wustl.edu.)
Includethefollowingevidencetracksinthescreenshotiftheyareavailable.
1. Asequencealignmenttrack(D.melProteinsorOtherRefSeq)
2. Atleastonegenepredictiontrack(e.g.,Genscan)
3. AtleastoneRNA-Seqtrack(e.g.,RNA-SeqAlignmentSummary)
4. Acomparativegenomicstrack(e.g.,Conservation,D.mel.NetAlignment)
PasteascreenshotofyourgenemodelasshownontheGenomeBrowserbelow:
3
LastUpdate:12/23/2016
Low-frequencyRNA-Seqexonjunctionsnotannotated:
TheevidencefromtheRNA-SeqTopHattracksandMultizalignmentssuggestthatthere
mightbeadditionalisoformsbecauseofalternativesplicingatthe5'endofthisgene(red
arrowsinthescreenshotabove).However,mostoftheTopHatjunctionsaresupportedby
lessthan10reads.Hencethereisinsufficientevidencetopostulatethepresenceof
multiplenovelisoformsinD.biarmipescomparedtoD.melanogaster.
ExtraCDSpredictedbytheSNAPgenepredictor:
TheSNAPgenefinderpredictedaCDSat26,502-26,584(bluearrowinscreenshotabove)
betweenthefirstandsecondCDS'sofCG31997.TheRNA-SeqAlignmentSummarytrack
showsthattheregionsurroundingthisregionhaslow(<20reads)RNA-Seqreadcoverage
andtheregionisadjacenttoahATDNAtransposonfragment(seescreenshotbelow).
4
LastUpdate:12/23/2016
NCBIBLASTXsearchofthegenomicregionsurroundingtheSNAPCDSprediction
(contig10:26400-26700)againstthenrdatabasedidnotdetectanysignificant(E-value<
1e-5)sequencesimilaritytoknownproteinsinthenrdatabase(seescreenshotbelow).
ANCBIBLASTNsearchofthisregionagainstthentdatabasedetectedfivesignificant
matchestopredictedmRNAsinDrosophilasuzukii(seescreenshotbelow).
5
LastUpdate:12/23/2016
TheE-valuesfortheseD.suzukiimatchesrangefrom2e-10to1e-06andcorrespondto
threedifferentpredictedgenes(LOC108013970,LOC108011950,andLOC108014610).All
ofthesematchesareRefSeqpredictionsthathavenotbeenexperimentallyconfirmed.
TherearenosignificantmatchestoRefSeqrecordsthathavebeenexperimentally
confirmedandnosignificantmatchestosequencesintheotherspeciesbesidesD.suzukii.
Collectively,whilewecouldnotrejectthepossibilitythatthisregionofcontig10contains
anuntranslatedregionofanearbygene,thereisinsufficientevidencetopostulateanovel
isoformofCG31997comparedtoD.melanogaster.Giventheproximityofthisfeaturetothe
hATDNAtransposonandthemultiplematchestopredictedtranscriptsinD.suzukii,an
alternativeexplanationisthatthefeatureispartofatransposoninD.biarmipesandD.
suzukii.HencewehaveomittedthispredictedCDSinourannotationoftheCG31997
orthologinD.biarmipes.
3. Alignment between the submitted model and the D. melanogaster ortholog
Showanalignmentbetweentheproteinsequenceforyourgenemodelandtheprotein
sequencefromtheputativeD.melanogasterortholog.Youcaneitherusetheprotein
alignmentgeneratedbytheGeneModelChecker(availablethroughthe“Viewprotein
alignment”linkunderthe“DotPlot”tab)oryoucangenerateanewalignmentusingthe
“Aligntwoormoresequences”feature(bl2seq)attheNCBIBLASTwebsite.Pastea
screenshotoftheproteinalignmentbelow:
4. Dot plot between the submitted model and the D. melanogaster ortholog
PasteascreenshotofthedotplotofyoursubmittedmodelagainsttheputativeD.
melanogasterortholog(generatedbytheGeneModelChecker).Provideanexplanation
foranyanomaliesonthedotplot(e.g.,largegaps,regionswithnosequencesimilarity).
Note:Largeverticalandhorizontalgapsnearexonboundariesinthedotplotoften
indicatethatanincorrectsplicesitemighthavebeenpicked.Pleasere-examinethese
regionsandprovideadetailjustificationastowhyyouhaveselectedthisparticularsetof
donorandacceptorsites.
6
LastUpdate:12/23/2016
ThedotplotshowsthatthelasttwoCDS'sofCG31997-PBarehighlyconservedbetween
theproposedD.biarmipesgenemodelandtheD.melanogasterortholog.Examinationof
theproteinalignmentattheendofthesecondandthirdCDS'sindicatethattheaminoacids
havesimilarchemicalpropertieseventhoughtheyarenotidentical.Inaddition,the
lengthsofthesetwoCDS'sarethesamebetweenD.biarmipesandD.melanogaster.
ThedotplotshowsthatthebeginningofthefirstCDSofCG31997-PBisonlyweakly
conservedbetweenD.biarmipesandD.melanogaster.Inaddition,thedotplotshowsthat
thefirstCDSoftheD.biarmipesgenemodelislongerthantheorthologousCDSinD.
melanogaster.Theproteinalignmentshowsthatthereare8additionalaminoacidswithin
thefirstCDSintheproposedD.biarmipesgenemodelcomparedtoD.melanogaster.
ExaminationofthisregionintheGEPUCSCGenomeBrowsershowsthatthereisonlyone
methionineinframe+2thatcouldserveasthestartcodonforCG31997-PB(seescreenshot
below).TheexpansionofthisCDSisconsistentwiththeBLASTXalignment,theN-SCAN
geneprediction,andtheavailableRNA-Seqdata.Consequently,ourannotationhas
expandedthesizeofthisCDS(1_10820_0)inordertoretainthisisoforminD.biarmipes.
7