* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download GEP Annotation Report - GEP Community Server
Minimal genome wikipedia , lookup
Metagenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Human genome wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Pathogenomics wikipedia , lookup
Transposable element wikipedia , lookup
Protein moonlighting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
History of genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome (book) wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene desert wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome editing wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
LastUpdate:12/23/2016 GEPAnnotationReport Note:Foreachgenedescribedinthisannotationreport,youshouldalsopreparethe correspondingGFF,transcriptandpeptidesequencefilesaspartofyoursubmission. Studentname:WilsonLeung Studentemail:[email protected] Facultyadvisor:SarahC.R.Elgin College/university:WashingtonUniversityinSt.Louis Project details Projectname:contig10 Projectspecies:D.biarmipes Dateofsubmission:08/04/2014 Sizeofprojectinbasepairs:43,013 Numberofgenesinproject:3 Doesthisreportcoverallofthegenesorisitapartialreport?Partialreport Ifthisisapartialreport,pleaseindicatetheregionoftheprojectcoveredbythisreport: Frombase25,000tobase28,000 Instructions for project with no genes Ifyoubelievethattheprojectdoesnotcontainanygenes,pleaseprovidethe followingevidencetosupportyourconclusion: 1. PerformaBLASTXsearchoftheentirecontigsequenceagainstthenon-redundant (nr)proteindatabase.Provideanexplanationforanysignificant(E-value<1e-5) hitstoknowngenesinthenrdatabaseastowhytheydonotcorrespondtoreal genesintheproject. 2. ForeachGenscanprediction,performaBLASTPsearchusingthepredictedamino acidsequenceagainstthenrproteindatabaseusingthestrategydescribedabove. 3. Examinethegeneexpressiontracks(e.g.,RNA-Seq)forevidenceoftranscribed regionsthatdonotcorrespondtoalignmentstoknownD.melanogasterproteins. PerformaBLASTXsearchagainstthenrdatabaseusingthesegenomicregionsto determineiftheyshowsequencesimilaritytoknownorpredictedproteinsinthenr database. 1 LastUpdate:12/23/2016 CompletethefollowingGeneReportFormforeachgeneinyourproject.Copyandpaste thesectionsbelowtocreateasmanycopiesasneededwithinthisreport.Besureto createenoughIsoformReportFormswithinyourGeneReportFormforallisoforms. Gene report form Genename(e.g.,D.biarmipeseyeless):D.biarmipesCG31997 Genesymbol(e.g.,dbia_ey):dbia_CG31997 Approximatelocationinproject(from5’endto3’end):25673-27471 NumberofisoformsinD.melanogaster:2 Numberofisoformsinthisproject:2 Completethefollowingtableforalltheisoformsinthisproject: Name(s)ofuniqueisoform(s) Listofisoformswithidenticalcodingsequences basedoncodingsequence CG31997-PB CG31997-PA Note:Forisoformswithidenticalcodingsequence,youonlyneedtocompletethe IsoformReportFormforoneoftheseisoforms(i.e.usingthenameoftheisoformlisted intheleftcolumnofthetableabove).However,youshouldgenerateGFF,transcript, andpeptidesequencefilesforALLisoforms,irrespectiveofwhethertheyhave identicalcodingsequencesasotherisoforms. Consensus sequence errors report form Completethissectionifyouhaveidentifiederrorsintheprojectconsensussequence. Allofthecoordinatesreportedinthissectionshouldberelativetothecoordinatesof theoriginalprojectsequence. Location(s)intheprojectsequencewithconsensuserrors:NA Isoform report form Completethisreportformforeachuniqueisoformlistedinthetableabove(copyand pastetocreateasmanycopiesofthisIsoformReportFormasneeded): Gene-isoformname(e.g.,dbia_ey-PA):dbia_CG31997-PB Namesoftheisoformswithidenticalcodingsequencesasthisisoform:dbia_CG31997-PA Isthe5’endofthisisoformmissingfromtheendoftheproject?No Ifso,howmanyexonsaremissingfromthe5’end: Isthe3’endofthisisoformmissingfromtheendoftheproject?No Ifso,howmanyexonsaremissingfromthe3’end: 2 LastUpdate:12/23/2016 1. Gene Model Checker checklist EnterthecoordinatesofyourfinalgenemodelforthisisoformintotheGeneModel Checkerandpasteascreenshotofthechecklistresultsbelow: Note:Forprojectswithconsensussequenceerrors,reporttheexoncoordinatesrelative totheoriginalprojectsequence.IncludetheVCFfileyouhavegeneratedabovewhen yousubmitthegenemodeltotheGeneModelChecker.TheGeneModelCheckerwilluse thisVCFfiletoautomaticallyrevisethesubmittedexoncoordinates. 2. View the gene model on the Genome Browser UsethecustomtrackfeaturefromtheGeneModelCheckertocaptureascreenshotofyour genemodelshownontheGenomeBrowserforyourproject.Zoominsothatonlythis isoformisinthescreenshot.(Seepage12oftheGeneModelCheckeruserguideonhowto dothis;youcanfindtheguideunder“Help”è“Documentations”è“WebFramework”on theGEPwebsiteathttp://gep.wustl.edu.) Includethefollowingevidencetracksinthescreenshotiftheyareavailable. 1. Asequencealignmenttrack(D.melProteinsorOtherRefSeq) 2. Atleastonegenepredictiontrack(e.g.,Genscan) 3. AtleastoneRNA-Seqtrack(e.g.,RNA-SeqAlignmentSummary) 4. Acomparativegenomicstrack(e.g.,Conservation,D.mel.NetAlignment) PasteascreenshotofyourgenemodelasshownontheGenomeBrowserbelow: 3 LastUpdate:12/23/2016 Low-frequencyRNA-Seqexonjunctionsnotannotated: TheevidencefromtheRNA-SeqTopHattracksandMultizalignmentssuggestthatthere mightbeadditionalisoformsbecauseofalternativesplicingatthe5'endofthisgene(red arrowsinthescreenshotabove).However,mostoftheTopHatjunctionsaresupportedby lessthan10reads.Hencethereisinsufficientevidencetopostulatethepresenceof multiplenovelisoformsinD.biarmipescomparedtoD.melanogaster. ExtraCDSpredictedbytheSNAPgenepredictor: TheSNAPgenefinderpredictedaCDSat26,502-26,584(bluearrowinscreenshotabove) betweenthefirstandsecondCDS'sofCG31997.TheRNA-SeqAlignmentSummarytrack showsthattheregionsurroundingthisregionhaslow(<20reads)RNA-Seqreadcoverage andtheregionisadjacenttoahATDNAtransposonfragment(seescreenshotbelow). 4 LastUpdate:12/23/2016 NCBIBLASTXsearchofthegenomicregionsurroundingtheSNAPCDSprediction (contig10:26400-26700)againstthenrdatabasedidnotdetectanysignificant(E-value< 1e-5)sequencesimilaritytoknownproteinsinthenrdatabase(seescreenshotbelow). ANCBIBLASTNsearchofthisregionagainstthentdatabasedetectedfivesignificant matchestopredictedmRNAsinDrosophilasuzukii(seescreenshotbelow). 5 LastUpdate:12/23/2016 TheE-valuesfortheseD.suzukiimatchesrangefrom2e-10to1e-06andcorrespondto threedifferentpredictedgenes(LOC108013970,LOC108011950,andLOC108014610).All ofthesematchesareRefSeqpredictionsthathavenotbeenexperimentallyconfirmed. TherearenosignificantmatchestoRefSeqrecordsthathavebeenexperimentally confirmedandnosignificantmatchestosequencesintheotherspeciesbesidesD.suzukii. Collectively,whilewecouldnotrejectthepossibilitythatthisregionofcontig10contains anuntranslatedregionofanearbygene,thereisinsufficientevidencetopostulateanovel isoformofCG31997comparedtoD.melanogaster.Giventheproximityofthisfeaturetothe hATDNAtransposonandthemultiplematchestopredictedtranscriptsinD.suzukii,an alternativeexplanationisthatthefeatureispartofatransposoninD.biarmipesandD. suzukii.HencewehaveomittedthispredictedCDSinourannotationoftheCG31997 orthologinD.biarmipes. 3. Alignment between the submitted model and the D. melanogaster ortholog Showanalignmentbetweentheproteinsequenceforyourgenemodelandtheprotein sequencefromtheputativeD.melanogasterortholog.Youcaneitherusetheprotein alignmentgeneratedbytheGeneModelChecker(availablethroughthe“Viewprotein alignment”linkunderthe“DotPlot”tab)oryoucangenerateanewalignmentusingthe “Aligntwoormoresequences”feature(bl2seq)attheNCBIBLASTwebsite.Pastea screenshotoftheproteinalignmentbelow: 4. Dot plot between the submitted model and the D. melanogaster ortholog PasteascreenshotofthedotplotofyoursubmittedmodelagainsttheputativeD. melanogasterortholog(generatedbytheGeneModelChecker).Provideanexplanation foranyanomaliesonthedotplot(e.g.,largegaps,regionswithnosequencesimilarity). Note:Largeverticalandhorizontalgapsnearexonboundariesinthedotplotoften indicatethatanincorrectsplicesitemighthavebeenpicked.Pleasere-examinethese regionsandprovideadetailjustificationastowhyyouhaveselectedthisparticularsetof donorandacceptorsites. 6 LastUpdate:12/23/2016 ThedotplotshowsthatthelasttwoCDS'sofCG31997-PBarehighlyconservedbetween theproposedD.biarmipesgenemodelandtheD.melanogasterortholog.Examinationof theproteinalignmentattheendofthesecondandthirdCDS'sindicatethattheaminoacids havesimilarchemicalpropertieseventhoughtheyarenotidentical.Inaddition,the lengthsofthesetwoCDS'sarethesamebetweenD.biarmipesandD.melanogaster. ThedotplotshowsthatthebeginningofthefirstCDSofCG31997-PBisonlyweakly conservedbetweenD.biarmipesandD.melanogaster.Inaddition,thedotplotshowsthat thefirstCDSoftheD.biarmipesgenemodelislongerthantheorthologousCDSinD. melanogaster.Theproteinalignmentshowsthatthereare8additionalaminoacidswithin thefirstCDSintheproposedD.biarmipesgenemodelcomparedtoD.melanogaster. ExaminationofthisregionintheGEPUCSCGenomeBrowsershowsthatthereisonlyone methionineinframe+2thatcouldserveasthestartcodonforCG31997-PB(seescreenshot below).TheexpansionofthisCDSisconsistentwiththeBLASTXalignment,theN-SCAN geneprediction,andtheavailableRNA-Seqdata.Consequently,ourannotationhas expandedthesizeofthisCDS(1_10820_0)inordertoretainthisisoforminD.biarmipes. 7