Download Final Project Jocelyn Hansson Global Alignment with Affine Gap

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cell-free fetal DNA wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Human genome wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Point mutation wikipedia , lookup

Microsatellite wikipedia , lookup

Metagenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Primary transcript wikipedia , lookup

Messenger RNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

Sequence alignment wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Transcript
FinalProject
GlobalAlignmentwithAffineGapPenalties
JocelynHansson
TheProject:
Theaimofthisprojectwastowriteandimplementacodetofindthebest
possiblealignmentof2stringsofNucleotides(DNAorRNA)oraminoacids
(proteins).GlobalAlignmentsareusefulinordertocomparedifferentDNAor
proteinsequences,however,thewayyouscore/penalizealignmentsisimportant.
Usingasingleconstantpenaltyforallinsertions/deletionsmaynotproducethebest
possiblealignmentduetothenatureofmutations.Mutationstendtorearrange
largechunksofDNAinasingleevent.Therefore,inordertocreatealignmentsthat
abidebythisproperty,initializinggapsshouldbepenalizedmorethanextending
gaps.Thischangeimprovesupontheconstantgapalignmentinthatitscores
stringswithconsecutivematcheshigherthanmultiplesmall-dispersedalignment.
IdecidedtousethisglobalalignmentfunctiontocomparetheOXTRgene
betweenspecies.IdecidedtodothisbecauseIreadalotofneurosciencepapers
thatutilizemousemodelsofAutismSpectrumDisorder(ASD)inordertostudythe
disorderandtestpossibletreatments.TheOXTRgeneisthoughttoberelevantto
ASD,asanabnormalformisfoundinmanyindividualswithASD.Additionally,this
geneencodesfortheproteinreceptorforoxytocin,ahormoneinvolvedinmany
socialprocessessuchasbondingandmaternalbehavior.Ithoughtcomparingthis
geneacrossspecieswouldbeusefulinordertojustifyusingsuchamodel.
InmycomparisonsIaimedtofindouthowsimilarthemouseandhuman
OXTRgenesare.Isthesequencelargelyconservedbetweenthespecies?Whatare
thetypesofdifferences?(i.e.maybeaonlyafewinsertions/deletions).Whatis
therescore,andhowdoesthisscorecomparetothescoresofhumans/miceand
otheranimals?
TheData:
ThedatausedinthisprojectweresequencesofDNAandmRNAtakenfrom
theNationalCenterforBiotechnologyInformation(NCBI)onlinedatabase.Here,I
foundDNAsequenceofthisgeneinhumans,mice(MusMusculus),andrhesus
monkeys(MacacaMulatta).IadditionallyfoundthemRNAsequenceforthisgenein
humans,mice,rhesusmonkeys,rats(RattusNorvegiculus),andtheboxerbreedof
dog(CanisLupusfamiliaris).TheDNAandmRNAsequencesdidnotneedtobe
shortenedoralteredinanyway.
StepsofProgram:
First,Iwroteafunctiontoreadfastafilesandoutputthesequenceinastring
FunctiontocalculatescoresofpossibleAlignments
• Thisprogramrequired3tablescontainingthescoresofpossiblealignments,
and3backtracktablestokeeptrackofmoveswithineachtable.Eachnode
ofeachtablereferstoindexiofstring1,andindexjofstring2.Thevalueat
eachnode[i][j]issettothehighestpossiblescore.Thecorresponding
backtracktablerecordedwhat“move”thehighestvaluecorrespondedwith.
• BestPossibleScoreatindices:
o Intheupperandlowertables–scorescorrespondedtoeither
initializingagaporextendingagap
§ Ex:lowertable
• Extendingagap:middle[i-1][j]–initializinggappenalty
• Initializingagap:lower[i-1][j]-gapextensionpenalty
o Forthemiddletablethereare3optionsateachnode:
§ Match/mismatch:middle[i-1][j-1]+score
• Thescorewascalculatedusingafunctionfrom‘scoring
functions’thatusedtheBLOSUM62scoringmatrix
§ Endagapfromeitherloweroruppertable
• EX:endinggaponlower:middle[i][j]=lower[i][j]
• Nopenaltiesbecauseendingagapwasnotpenalized
• Asthefunctionfillsinthescoresofvariousalignments,eachbacktracktable
kepttrackofwhatpossibilitywasthehighestscore–andwhat“move”it
correspondedto
o Again,eachofthe3tableshaditsownbacktracktable
o Thelettersusedwere‘L’‘M’and‘U’
§ Thesereferredtowhatevertablethescorejustcamefrom.
• Ex:initializingagapwouldbemarked‘M’
• Ex:extendingagapviauppertablewouldbemarked‘U’
sinceitwasmovingfromtheuppertabletotheupper
table
• MaximumGlobalAlignmentScore:
o Sincetherewasnopenaltyofendingagap,themaximumscore
possiblecanbefoundinthelowerrightcornerofthemiddletable
§ Thinkofthisasa“freeride”tothemiddletable
Next,afunctiontofigureoutthebestalignmentofthestringswascalculated
Inputs:lower,upper,andmiddlebacktracktables,string1andstring2
• First2emptystringswhichwouldrepresentthealignmentstringswere
created
• Thisfunctionworkedbackwards,andthereforestartedatthelowerright
cornerofthemiddletable.Inordertoiterateinthisdirection,valuesofiand
jweresettothelengthoftheircorrespondingstrings
•
•
AstringcalledTablewascreated.Thisvaluekeepstrackofwhichtablethe
functioninin.Asthealignmentmovesbetweentables,thisstringisaltered.
Thisvalueisinitiallysetto‘M’asitstartsinthemiddletable
Toiteratebackwards,useawhilestatement:Whilebacktrackdoesnotequal
startingvalue
o Useifstatementstomakecommandsforeachpossible“move”ineach
nodeatthe3backtracktables
o ForUpperandLowerTables:
§ Iftablevaluematchestable:
• Addletteratindexofstring1/2tocorresponding
alignmentstringanddecrementi/j
o Uppertable–j,lowertable–j
o Adddashtootheralignmentstring
• Lookatbacktrackvalueatthatindex,ifitcorresponds
toadifferenttable–changevalueoftabletoletterthat
correspondstothattable
• Ifbacktrackvaluehasletterthatcorrespondstocurrent
table,donotchangeit
o Middletable:
§ Ifbacktrackvaluecorrespondstoanothertable,changetable
valuetothattable
§ IfbacktrackvalueisM,thenaddcorrespondinglettersfrom
strings1&2totheircorrespondingalignmentstrings
• DecrementbothIandj
• Donotchangevalueoftable
ReversebothalignmentstringsandthenDone!
•
Results:
Validationofconstantvs.affinegappenalties:
MyfirstgoalwastocomparethealignmentsofDNAsequencestotheir
correspondingmRNAsequenceswithconstantversusaffinegappenalties.As
expected,usingaffinegapresultedinalignmentsthatshowedalargeamountof
consecutivematches/mismatches,andconsecutiveinsertionsratherthandispersed
ones.Thesegapsshowpossibleintrons.Therhesusmonkeysequencesprovidea
goodexampleofthisproperty.Inbothtypesofglobalalignment,theDNA
alignmentstringconsistsof1massivestringwithnoinsertions.ThemRNA
alignmentstringwithaffinegappenaltiesisonelargeblockofcontinuoustext
(matches/mismatches)followedbyoneextremelylongsequenceofinsertions,and
thenasmallblockoftextattheveryend.Thealignmentwithconstantpenaltieshas
manysinglelettermatches,andendswithalongstringofinsertions.
A.Globalalignmentwithconstantpenalties B.Globalalignmentwithaffinegappenalties
Figure1.A.showsasmallsectionoftherhesusmonkeymRNAstringthatcameoutofglobalalignmentwith
constantgappenaltiesofmRNAandDNA.B.showsashortenedsectionofthemRNAalignmentstringfrom
globalalignmentwithaffinegappenalties.Thelengthofinsertionshasbeenshortenedhere,buttheletters
seenrepresentallthematches/mismatchesseeninthisalignment.ThefullalignmentscanbeseeninResultsDNA-mRNA-comparison-rhesus.docx.
ComparisonsofOXTRgene:
Organism
mRNA
sequence
length
Rat
5,335
Mouse
4,568
Human
4,361
Dog(boxer)
1,275
RhesusMonkey
1,253
Table1.LengthofOXTRmRNAsequencesof
variousorganismsinorderofdecreasing
length.
Comparison
Mouse-Rat
Mouse-Human
Dog-Monkey
Human-Rat
Human-monkey
AffinegapScore
2,451
1,113
794
418
-1,919
Table2.Maximimglobalalignmentscoreswith
affinegappenaltiesofOXTRmRNAsequences
fromdifferentorganisms.Comparisonsare
orderedindecreasingscore.
Asexpected,theOXTRgenewasverysimilarbetweenhumansandmice,and
miceandrats.Itwasunexpected,however,toseealargedifferencebetweenrat
andhumanmRNA,especiallygiventhefactthatbothweresimilartothemouse
gene.Asummaryofthedifferencebetweenalignmentscoresofthesespeciescanbe
seeninFigure2.ThisfigurealsoshowsthedifferenceinlengthofmRNAsequence
foreachcomparison.Here,itbecomesevidentthatthelowalignmentscore
betweenhumanandratmRNAsequencescanbeexplainedinpartbythedifferent
lengthsofthesesequences.Thelengthcomparisonalsohighlightstheweightofthe
similaritiesbetweentheratandmousemRNAs.Despitethefactthatthis
comparisonhasamuchlargerlengthdifferencethanthehuman-mousecomparison,
ithasamuchhigherscore.
3500
3000
2500
2000
human-mouse
1500
human-rat
1000
rat-mouse
500
0
afpinegap
alignment
scores
constantpenalty
score
length
difference
Figure1.Globalalignmentscoresbetweenhuman,mouseandratmRNAsequencesofthe
OXTRsequence.Bothaffinegappenalties,andconstantpenaltiesareshown.The
differenceinlengthofthemRNA’sforeachcomparisonarealsoshown.
OverallthisdatademonstratesthattheOXTRmRNAsequenceissomewhat
conservedbetweenrodentsandhumans.Thissequenceishighlyconserved,
however,betweenratsandmice.Additionalcomparisonsdemonstratethatthis
sequenceisnothighlyconservedwithinallmammals.Thedogandrhesusmonkey
mRNA’shadmuchloweralignmentscoreswithallothersequences.Theprimary
reasonfortheirloweralignmentscoresisthelengthoftheirmRNAsequences;dog
andrhesusmonkeyOXTRmRNA’sweremuchshorterthanthesamesequencein
rats,mice,andhumans.Thispropertycanbeseenintheextremelylowalignment
scoreofhumanandrhesusmonkeymRNAs.Inordertobettercomparethese
sequences,perhapsealocalalignmentfunctionshouldbeused.