Download Sequence Alignment Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nucleic acid analogue wikipedia , lookup

Transcriptional regulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome evolution wikipedia , lookup

Molecular cloning wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Point mutation wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Molecular ecology wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Community fingerprinting wikipedia , lookup

DNA barcoding wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Short Film
The Origin of Species: Lizards in an
Evolutionary Tree
Educator Materials
SEQUENCEALIGNMENTINTRODUCTIONUSINGCLUSTALX
ThisdocumentcanbeusedtointroducethebasicconceptofDNAsequencealignment,whichis
necessarybeforeDNAsequencescanbemeaningfullycompared.
FORMATOFDNASEQUENCEINFORMATION
TherearedifferentformatsforrepresentingDNAsequences.Shownbelowisapartialsequencefrom
thedog’scytochromeoxidasesubunitI(COI)geneinFASTAformat.FASTAformatstartswitha“>,”
followedbyinformationaboutthefiletotheendofthefirstline,followedbytheDNAsequence.
>gi|377685879|gb|JN850779.1|Canislupusfamiliarisisolatedog_3cytochromeoxidasesubunitI(COI)
gene,partialcds;mitochondrial
TACTTTATACTTACTATTTGGAGCATGAGCCGGTATAGTAGGCACTGCCTTGAGCCTCCTCATCCGAGCC
GAACTAGGTCAGCCCGGTACTTTACTAGGTGACGATCAAATTTATAATGTCATYGTAACCGCCCATGCTT…
AfilecontainingFASTAformatsequenceinformationmaycontainmultiplesequencesoneafter
another.Forexample:
WHATSEQUENCESDOWECHOOSETOCOMPARE?
Inmoderntaxonomicpractice,scientistsroutinelyanalyzetheDNAfromspecimenstheycollectto
obtaina“DNAbarcode,”ashortDNAsequenceuniquetoaparticularspecies,whichisusedtoidentify
thespeciesitbelongsto.Foranimalsandmanyothereukaryotes,differentgeneshavebeenusedfor
thispurpose.OneexampleisthemitochondrialcytochromeoxidasesubunitI(COI)gene,whichencodes
partofanenzymethatisimportantforcellularrespiration,andthemitochondrialNADHdehydrogenase
subunit2(ND2)geneisanother.Sequencesliketheseareavailablefromawiderangeofspecies,
makingitpossibletousethesegenesequencestoexplorephylogeneticrelationships.
COIorND2aregoodchoicesforDNAbarcoding,because,ingeneral,thereislittlevariationinthe
sequencesoforganismswithinthesamespecies,whilethereissignificantvariationinthesequencesof
organismsfromdifferentspecies.Therefore,theyprovideauniquesequencesignatureforaparticular
species,andaresuitableforcomparingphylogeneticrelationshipsbetweenspecies.
Becausethesesequencesaresosimilarwithinthesamespecies,thesegenesarenotagoodchoicefor
studyingvariationswithinthesamespecies,orevenamongspeciesthathaveveryrecentlyspeciated.
www.BioInteractive.org
PublishedMarch2014
UpdatedApril2015
Page1of3
Short Film
The Origin of Species: Lizards in an
Evolutionary Tree
Educator Materials
COIsequencesalsohavealowmutationrateamongmanyspeciesofplantsandcannotbeusedforDNA
barcodingorphylogeneticcomparisonsofthosespecies.
ThesequenceincludestheNADHdehydrogenasesubunit2genealongwithadjacentsequencesthat
includesometransferRNAgenes.ND2geneisoneofseveralgenesthatareoftenusedforgenetic
fingerprintinginanimals.Itissuitableforthispurposebecauseitisconservedenoughsothatthegeneis
sharedamongadiversegroupofanimals,yetdifferentenoughbetweendifferentanimalstoexamine
evolutionaryrelationshipbycomparingDNAsequences.
WHICHPROGRAMTOUSE
Toteachthebasicsofsequencealignment,werecommendClustalXifyoucaninstallsoftwareonyour
computer.Togeneratephylogeny,usingwww.phylogeny.frissimpler.
• ClustalXhasagraphicinterfacethatisintuitive,anditisanexcellenttoolforillustratingthe
conceptandtheprocessofsequencealignment.ClustalXisafreelyavailableinstalledprogram,
withitsadvantage(norelianceoninternet)anddisadvantage(requiresprograminstallation)in
theclassroomsetting.Itsalgorithmisalsoalittledated,andthereareotherprogramsthatdoa
betterjobofgeneratingphylogenies;howeveritissufficientasademonstrationofhowto
generatephylogenetictreesfromDNAsequencealignments.Thephylogenygeneratedrequires
anotherfreelyavailableprogram,NJplot,toprintorview.
• www.Phylogeny.frisaweb-basedtoolforgeneratingphylogenies.Usingthedefaultsettings,
phylogeny.frissimpletouse,anditusesadifferentalignmentgeneratorcalledMUSCLE.The
websitegeneratesaphylogenythatcanbesavedasdifferentgraphicfiles.However,thedisplay
ofalignmentisnotasintuitiveasinClustalX.
ALIGNMENTTUTORIALANDTREEGENERATIONVIACLUSTALX
SoftwareandFiles
InstallClustalX,whichisavailableathttp://www.clustal.org/clustal2/.(ForWindows,downloadclustalx2.1-win.msi;forMacOS,downloadclustalx-2.1-macosx.dmg.)Next,installNJplot,whichisavailableat http://pbil.univ-lyon1.fr/software/njplot.html.
Understandingsequencealignment
Let’suseClustalXtocompareDNAsequences.Forthisexercise,usethetestsequencefiletest.txt,which
containsthethreeshortDNAsequences(test1,test2,andtest3)shownbelow:
>test1
AAGGAAGGAAGGAAGGAAGGAAGG
>test2
AAGGAAGGAATGGAAGGAAGGAAGG
>test3
AAGGAACGGAATGGTAGGAAGGAAGG
www.BioInteractive.org
PublishedMarch2014
UpdatedApril2015
Page2of3
Short Film
The Origin of Species: Lizards in an
Evolutionary Tree
Educator Materials
LoadthesesequencesintoClustalXbychoosingfromthemenu,
File->LoadSequences,andthenselectingtest.txt.ClustalX
displaysthesesequencesasshownintheillustrationontheright
(PCversionshown).
Beforeyoucancomparesequences,youhaveto“align”them,
whichmeansliningupthesequencesandslidingthempastone
anotheruntilthebestmatchingpatternisfound.Alignment
allowsyoutoexaminedifferencesbetweenrelatedsequences;
suchdifferencesreflectevolutionaryrelationships.
Fromthemenu,chooseAlignment->DoCompleteAlignment.
Whenpromptedforoutputfilenames,usethedefaultnames
givenandclick“OK.”Thescreenchangestolookslikethe
illustrationontheright.
Noticethatit’saloteasiertoseedifferencesamongDNA
sequencesafteralignment.Youcanfigureoutwhatkindsof
mutationshaveoccurredineachsequencebyhowitcompares
totheothers(asshowninthelabeledillustration).Thenumber
ofdifferencesamongsequencesdetermineshowcloselyor
distantlyrelatedthecorrespondingorganismsare.
Deletion
Insertion
Substitution
Basedonthisinformationalone,whichtwosequencesdoyouthinkare
morecloselyrelated?Toseeifyouranswerwasaccurate,wecanuse
ClustalXtogenerateaphylogenetictree.
Fromthemenu,chooseTrees->DrawTree.Thiscreatesaphylogenetic
treefilecalledtest1.ph,whichcanbeopenedusingNJplot.exe.Launch
NJplot,thenfromitsmenu,chooseFile->Open,andselecttest1.ph.
Theresultshowsthattest1andtest2areonthesamebranchofthe
tree,indicatingthattheyaremorecloselyrelatedtoeachotherthanto
test3.
www.BioInteractive.org
PublishedMarch2014
UpdatedApril2015
Page3of3