* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Final Project Jocelyn Hansson Global Alignment with Affine Gap
Cell-free fetal DNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Designer baby wikipedia , lookup
History of genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Point mutation wikipedia , lookup
Microsatellite wikipedia , lookup
Metagenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Primary transcript wikipedia , lookup
Messenger RNA wikipedia , lookup
Epitranscriptome wikipedia , lookup
Sequence alignment wikipedia , lookup
FinalProject GlobalAlignmentwithAffineGapPenalties JocelynHansson TheProject: Theaimofthisprojectwastowriteandimplementacodetofindthebest possiblealignmentof2stringsofNucleotides(DNAorRNA)oraminoacids (proteins).GlobalAlignmentsareusefulinordertocomparedifferentDNAor proteinsequences,however,thewayyouscore/penalizealignmentsisimportant. Usingasingleconstantpenaltyforallinsertions/deletionsmaynotproducethebest possiblealignmentduetothenatureofmutations.Mutationstendtorearrange largechunksofDNAinasingleevent.Therefore,inordertocreatealignmentsthat abidebythisproperty,initializinggapsshouldbepenalizedmorethanextending gaps.Thischangeimprovesupontheconstantgapalignmentinthatitscores stringswithconsecutivematcheshigherthanmultiplesmall-dispersedalignment. IdecidedtousethisglobalalignmentfunctiontocomparetheOXTRgene betweenspecies.IdecidedtodothisbecauseIreadalotofneurosciencepapers thatutilizemousemodelsofAutismSpectrumDisorder(ASD)inordertostudythe disorderandtestpossibletreatments.TheOXTRgeneisthoughttoberelevantto ASD,asanabnormalformisfoundinmanyindividualswithASD.Additionally,this geneencodesfortheproteinreceptorforoxytocin,ahormoneinvolvedinmany socialprocessessuchasbondingandmaternalbehavior.Ithoughtcomparingthis geneacrossspecieswouldbeusefulinordertojustifyusingsuchamodel. InmycomparisonsIaimedtofindouthowsimilarthemouseandhuman OXTRgenesare.Isthesequencelargelyconservedbetweenthespecies?Whatare thetypesofdifferences?(i.e.maybeaonlyafewinsertions/deletions).Whatis therescore,andhowdoesthisscorecomparetothescoresofhumans/miceand otheranimals? TheData: ThedatausedinthisprojectweresequencesofDNAandmRNAtakenfrom theNationalCenterforBiotechnologyInformation(NCBI)onlinedatabase.Here,I foundDNAsequenceofthisgeneinhumans,mice(MusMusculus),andrhesus monkeys(MacacaMulatta).IadditionallyfoundthemRNAsequenceforthisgenein humans,mice,rhesusmonkeys,rats(RattusNorvegiculus),andtheboxerbreedof dog(CanisLupusfamiliaris).TheDNAandmRNAsequencesdidnotneedtobe shortenedoralteredinanyway. StepsofProgram: First,Iwroteafunctiontoreadfastafilesandoutputthesequenceinastring FunctiontocalculatescoresofpossibleAlignments • Thisprogramrequired3tablescontainingthescoresofpossiblealignments, and3backtracktablestokeeptrackofmoveswithineachtable.Eachnode ofeachtablereferstoindexiofstring1,andindexjofstring2.Thevalueat eachnode[i][j]issettothehighestpossiblescore.Thecorresponding backtracktablerecordedwhat“move”thehighestvaluecorrespondedwith. • BestPossibleScoreatindices: o Intheupperandlowertables–scorescorrespondedtoeither initializingagaporextendingagap § Ex:lowertable • Extendingagap:middle[i-1][j]–initializinggappenalty • Initializingagap:lower[i-1][j]-gapextensionpenalty o Forthemiddletablethereare3optionsateachnode: § Match/mismatch:middle[i-1][j-1]+score • Thescorewascalculatedusingafunctionfrom‘scoring functions’thatusedtheBLOSUM62scoringmatrix § Endagapfromeitherloweroruppertable • EX:endinggaponlower:middle[i][j]=lower[i][j] • Nopenaltiesbecauseendingagapwasnotpenalized • Asthefunctionfillsinthescoresofvariousalignments,eachbacktracktable kepttrackofwhatpossibilitywasthehighestscore–andwhat“move”it correspondedto o Again,eachofthe3tableshaditsownbacktracktable o Thelettersusedwere‘L’‘M’and‘U’ § Thesereferredtowhatevertablethescorejustcamefrom. • Ex:initializingagapwouldbemarked‘M’ • Ex:extendingagapviauppertablewouldbemarked‘U’ sinceitwasmovingfromtheuppertabletotheupper table • MaximumGlobalAlignmentScore: o Sincetherewasnopenaltyofendingagap,themaximumscore possiblecanbefoundinthelowerrightcornerofthemiddletable § Thinkofthisasa“freeride”tothemiddletable Next,afunctiontofigureoutthebestalignmentofthestringswascalculated Inputs:lower,upper,andmiddlebacktracktables,string1andstring2 • First2emptystringswhichwouldrepresentthealignmentstringswere created • Thisfunctionworkedbackwards,andthereforestartedatthelowerright cornerofthemiddletable.Inordertoiterateinthisdirection,valuesofiand jweresettothelengthoftheircorrespondingstrings • • AstringcalledTablewascreated.Thisvaluekeepstrackofwhichtablethe functioninin.Asthealignmentmovesbetweentables,thisstringisaltered. Thisvalueisinitiallysetto‘M’asitstartsinthemiddletable Toiteratebackwards,useawhilestatement:Whilebacktrackdoesnotequal startingvalue o Useifstatementstomakecommandsforeachpossible“move”ineach nodeatthe3backtracktables o ForUpperandLowerTables: § Iftablevaluematchestable: • Addletteratindexofstring1/2tocorresponding alignmentstringanddecrementi/j o Uppertable–j,lowertable–j o Adddashtootheralignmentstring • Lookatbacktrackvalueatthatindex,ifitcorresponds toadifferenttable–changevalueoftabletoletterthat correspondstothattable • Ifbacktrackvaluehasletterthatcorrespondstocurrent table,donotchangeit o Middletable: § Ifbacktrackvaluecorrespondstoanothertable,changetable valuetothattable § IfbacktrackvalueisM,thenaddcorrespondinglettersfrom strings1&2totheircorrespondingalignmentstrings • DecrementbothIandj • Donotchangevalueoftable ReversebothalignmentstringsandthenDone! • Results: Validationofconstantvs.affinegappenalties: MyfirstgoalwastocomparethealignmentsofDNAsequencestotheir correspondingmRNAsequenceswithconstantversusaffinegappenalties.As expected,usingaffinegapresultedinalignmentsthatshowedalargeamountof consecutivematches/mismatches,andconsecutiveinsertionsratherthandispersed ones.Thesegapsshowpossibleintrons.Therhesusmonkeysequencesprovidea goodexampleofthisproperty.Inbothtypesofglobalalignment,theDNA alignmentstringconsistsof1massivestringwithnoinsertions.ThemRNA alignmentstringwithaffinegappenaltiesisonelargeblockofcontinuoustext (matches/mismatches)followedbyoneextremelylongsequenceofinsertions,and thenasmallblockoftextattheveryend.Thealignmentwithconstantpenaltieshas manysinglelettermatches,andendswithalongstringofinsertions. A.Globalalignmentwithconstantpenalties B.Globalalignmentwithaffinegappenalties Figure1.A.showsasmallsectionoftherhesusmonkeymRNAstringthatcameoutofglobalalignmentwith constantgappenaltiesofmRNAandDNA.B.showsashortenedsectionofthemRNAalignmentstringfrom globalalignmentwithaffinegappenalties.Thelengthofinsertionshasbeenshortenedhere,buttheletters seenrepresentallthematches/mismatchesseeninthisalignment.ThefullalignmentscanbeseeninResultsDNA-mRNA-comparison-rhesus.docx. ComparisonsofOXTRgene: Organism mRNA sequence length Rat 5,335 Mouse 4,568 Human 4,361 Dog(boxer) 1,275 RhesusMonkey 1,253 Table1.LengthofOXTRmRNAsequencesof variousorganismsinorderofdecreasing length. Comparison Mouse-Rat Mouse-Human Dog-Monkey Human-Rat Human-monkey AffinegapScore 2,451 1,113 794 418 -1,919 Table2.Maximimglobalalignmentscoreswith affinegappenaltiesofOXTRmRNAsequences fromdifferentorganisms.Comparisonsare orderedindecreasingscore. Asexpected,theOXTRgenewasverysimilarbetweenhumansandmice,and miceandrats.Itwasunexpected,however,toseealargedifferencebetweenrat andhumanmRNA,especiallygiventhefactthatbothweresimilartothemouse gene.Asummaryofthedifferencebetweenalignmentscoresofthesespeciescanbe seeninFigure2.ThisfigurealsoshowsthedifferenceinlengthofmRNAsequence foreachcomparison.Here,itbecomesevidentthatthelowalignmentscore betweenhumanandratmRNAsequencescanbeexplainedinpartbythedifferent lengthsofthesesequences.Thelengthcomparisonalsohighlightstheweightofthe similaritiesbetweentheratandmousemRNAs.Despitethefactthatthis comparisonhasamuchlargerlengthdifferencethanthehuman-mousecomparison, ithasamuchhigherscore. 3500 3000 2500 2000 human-mouse 1500 human-rat 1000 rat-mouse 500 0 afpinegap alignment scores constantpenalty score length difference Figure1.Globalalignmentscoresbetweenhuman,mouseandratmRNAsequencesofthe OXTRsequence.Bothaffinegappenalties,andconstantpenaltiesareshown.The differenceinlengthofthemRNA’sforeachcomparisonarealsoshown. OverallthisdatademonstratesthattheOXTRmRNAsequenceissomewhat conservedbetweenrodentsandhumans.Thissequenceishighlyconserved, however,betweenratsandmice.Additionalcomparisonsdemonstratethatthis sequenceisnothighlyconservedwithinallmammals.Thedogandrhesusmonkey mRNA’shadmuchloweralignmentscoreswithallothersequences.Theprimary reasonfortheirloweralignmentscoresisthelengthoftheirmRNAsequences;dog andrhesusmonkeyOXTRmRNA’sweremuchshorterthanthesamesequencein rats,mice,andhumans.Thispropertycanbeseenintheextremelylowalignment scoreofhumanandrhesusmonkeymRNAs.Inordertobettercomparethese sequences,perhapsealocalalignmentfunctionshouldbeused.