Download BIT150 – Fall 2010 Midterm – Take-home exercises Due on Monday

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BIT150 – Fall 2010 Midterm – Take-home exercises
Due on Monday November 1st at 12pm by email to TA: [email protected] as
Midterm_Lastname
BEFORE STARTING…
 Exams are INDIVIDUAL. The finding of two or more unusual and identical
errors will be considered evidence of copying.
 Make sure you CAREFULLY READ AND UNDERSTAND EACH QUESTION of
the exam. Ask any doubt you may have ONLY regarding interpretation of the
questions of the exam to TA. Make sure that all parts of each question are
answered.
 Follow in detail the assignments given in each question of the exam. TA will
NOT guess the answers of the exam. Make sure you provide appropriate
references to indicate what each color/highlight/character used means.
Question 1
20 points
Question 2
15 points
Question 3
15 points
Question 4
25 points
Question 5
25 points
TOTAL
100pts
1. 20 Points
The cDNA sequence for Triticum monococcum gene FtsH is provided below:
>cDNA_TmFtsH_Wildtype_sequence
ATGTTTGAACTTCTGTTTGTGTCGCAGGCTGGAAGGCGGTCGTCGAGTGTGGTCTACAATGAGCTAGTGAGTACAAGTGCTTTCAGGAC
ACCTGCAAATGGCACCGGCGGAGTTCTCAAGGCGCTGCAAGAGAGGTACCGATCAAGCTACGTCGGTAGCTTCGCGCGCAGGCTACGAG
ACTTTGACACGCCAAGTGATGCCTCCCTTCTTAAAGAGATCTACAGAAGTAACCCAGAAAGGGTCGTGCAGATCTTTGAGAGCCAGCCT
TCCTTACATAACAACTCTTCGGCTCTCTCCCAGTATGTGAAGGCTCTTGTCGCTCTCGACAGGCTGGATGAAAGCCCGCTGCTTAAGAC
ATTGCAGAGAGGAATTGTCAATTCAGCAAGGGAGGAAGAAGGGTTTAGTGGCATCCCAGCATTTCAAAGTGTTGGCCGTACGACGAAAG
ATGGGGCTCTTGGTACTGCTGGTGCACCAATTCACATGGTTGCATCAGAGACTGGCCAATTCAAGGAGCAGCTTTGGCGTACCTTCCGA
AGCATTGCACTCACTTTCCTAGTAATCTCTGGCATCGGGGCTCTGATTGAAGATAGAGGAATTAGTAAAGGCCTTGGATTGCATGAAGA
GGTTCAGCCAAGCTTGGATTCGAGCACAAAATTCAGTGATGTCAAGGGGGTTGATGAAGCTAAAGCTGAACTCGAGGAAATAGTTCACT
ACCTACGAGATCCCAAGCGTTTCACACGCCTTGGTGGCAAGCTTCCAAAAGGTGTTCTACTTGTCGGCCCACCCGGGACAGGGAAAACC
ATGTTGGCAAGGGCTATCGCTGGGGAAGCTGGCGTTCCTTTCTTCTCCTGCAGCGGCAGTGAGTTTGAGGAGATGTTTGTGGGTGTCGG
GGCAAGAAGAGTGAGGGATCTATTCAGTGCAGCAAAGAAACGATCTCCATGTATAATTTTCATTGATGAAATTGATGCAATTGGTGGGA
GCAGAAACCCAAAAGATCAACAGTATATGAAGATGACCTTGAACCAGTTACTTGTTGAGCTGGATGGCTTTAAGCAGAATGATGGGATC
ATTGTAATTGCAGCAACAAACTTCCCCCAGTCACTAGATAAAGCCCTTGTTAGGCCTGGGCGTTTTGACCGTCATATTGTGGTTCCTAA
CCCAGATGTTGAGGGCCGACGGCAGATCCTGGAGACTCATATGTCAAAGGTGTTAAAAGCAGACGATGTGGATTTGATGACCATTGCCA
GGGGAACGCCTGGATTCTCAGGTGCAGACCTTGCAAACCTGGTGAACGTGGCTGCTCTCAAGGCTGCCATGGATGGAGCGAAATCTGTT
TCAATGACCGACCTCGAGTTTGCCAAGGACAGGATCATGATGGGCAGCGAGCGCAAATCAGCAGTGATATCCGACGAAAGCAGGAAGAT
GACTGCATACCATGAGGGAGGGCATGCGCTGGTCGCGATACACACGGCCGGTGCCCACCCCGTCCACAAGGCCACCATTGTTCCGAGGG
GAATGGCTCTGGGCATGGTCACGCAGCTGCCAGAGAAGGACCAGACCAGCGTGTCTAGGAAGCAGATGCTGGCAAGGCTGGACGTCTGC
ATGGGAGGGCGGGTGGCCGAGGAGCTTATATTTGGGGAGAGTGAGGTAACGTCGGGCGCGTCGTCCGACCTAAGCCAAGCGACCCGGGC
TGCCAAAGCCATGGTGACCAAGTATGGCATGAGCAAACGCGTGGGCCTTGTAGCCTACAATTATGACGACGATGGGAAGACCATGAGCA
CGCAGACGCGGGGTCTGGTGGAGCAGGAGGTGAAGGAGCTGCTGGAGACGGCCTACAACAATGCCAAGACGATCCTCACGACCCACAAC
AAGGAGCTGCACGCGCTCGCCAACGCCCTCATCGAGCGCGAGACCCTCACTGGCGCCCAGATCAAGAACCTTCTGTCGCAGGTAAACAG
CAGCAGTGACACTCAGCAGCCCCAGGCCGCTGAGGTTCCACAGCAAACACCCGCTGCCCCAGCCTCACCCCAGTCTCCAGCAGCAGCGG
CTGCGGCCGCCGCAGCAGCTGCAGCACAGCAAGCAGCGGCTCAAGCCAAAGGAGTCGCAGGCATCGGGTCCTAG
The following mutations have been found in the FtsH gene at the DNA level:
Mutation 1: G at position 1793 was mutated to T
Mutation 2: C at position 782 was mutated to T
Mutation 3: A at position 436 was mutated to G
1.1 Translate the provided wildtype cDNA into a protein sequence and paste it here.
1.2 Use the appropriate BLAST program to identify orthologous proteins.
List the program used:
To what conserved domain superfamily does this gene belong?
Provide accession number and protein of the following orthologs:
Organism
Accession
Protein sequence
-Oryza sativa
-Zea mays
-Arabidopsis
thaliana
1.3 Determine the resulting amino acid change for each base mutation and refer to
the BLOSUM62 scoring matrix to find the scores for each mutant.
(See slide 16 in Lecture 2 for BLOSUM score matrix)
Mutant T. Monococcum Amino Acid
Mutant Amino Acid
BLOSUM Score
1
G
2
P
3
T
1.4 Use MEGA to create a CLUSTAL alignment with the five orthologous proteins to
determine the most useful TILLING mutant to disrupt the function of this protein
among the three listed above. Present an image of the highlighted Variable
residues showing at least the first 25 amino acid positions.
1.5 Rank the mutants above in the order you would use them to study the function
of this protein and describe your reasoning:
2. 15 Points
Consider the four proteins below, and select the most appropriate program to
perform a Multiple Sequence Alignment.
 Name the program and describe why you chose it.

Include a publication quality image of your alignment.
Unaligned
>SequenceA
MAGSGRDRDPLVVGRVVGDVLDAFVRSTNLKVTYGSKTVSNGCELKPSMVTHQPRVEVGGNDMRTFYTLVMVDMRDPDAPSPSDP
>SequenceB
MAGRDREPLVVGRVVGDVLDPFVRTTNLRVSYGARTVSNGCELKPSMVDMPSPPDP
>SequenceC
MSINRDPLIVSRVVGDVLDPFNRSITLKVTYGQREVTNGLDLRPSQVQNKPRVEIGGEDLRNFYTLVMVDMRDPDVPSPSNP
>SequenceD
MVGSGMQRGAPLVVGRVIGDVVDPFVRRVALRVGYASRDVANGCELRPSAIADPVMVPDAPSPSDP
3. 15 Points
Using the provided alignment (midterm_alignment.mas), produce the following
phylogenetic trees:
 NJ with Bootstrap values
 ME
 MP
 UPGMA
NJ – Bootstrap
ME
MP
UPGMA
By hand, create a strict consensus tree for the trees you produced above.
Strict Consensus
4. 25 Points
Annotate any retroelements or genes in the following 23.5kb sequence from Triticum
monococcum.
>Tm_2010M
TGTTGACGTGTGCCTCCATGATGTGATCTCTCATGTCCAAAGGTGTTGTTTTCTATGATGCAAAACTAGTCTTGCCTTAATTGGTTCTA
GAGATTCTTCAAGAAAAGGGATTTTTCGTGTGGGTTTGTCTCGTTTAAGTTTAAACAACAGCAGGGATTTTCGTTTCTGATTCTTTCTG
TTCTGTGTCCTTGTGGCGATGTGTGAGCTCAGCTTAACAATTTCCATATAATAAGCAAATATACATCAGATGTACAGTCTGCATCTGGG
GCCCAATCTCTTTCAGATAAACTCATGCGGGCGGCTGTAAGTGTCATGTCTGTATGAGTGTAGGTTTGTGATAGCAATAGTAGGAGGCA
AACCTGAACCCTTCCAGTGGTCCCAGGAGGCCAAGTGATTGCAGATAGCCACACATCCAAACAGCCTGAATAGCATCCACTAGCTGGTG
CTCTCCTTCTCTATGCTCTAGATCAGAATTGGAGGGCTTCTGTGCAGCACCAGGATGCCCGGCAGCATCCACATCTCAGGCAAGTTATT
CTCTCTTATGATTTGTGCTGTTGTCTTGTAAAAACAAATCCGAACAAAATGACCATACTGTTATACTGGAAGCAAGAATTTTGCTGTTC
ATTTTGTTTTCATTTTACTGTGAAGCCATTAAACCACCAGTCACCGCTCCGCTGTTCTTGCAAGGTTGGTGTTCTGATTGTACTGTGCT
TTCCTCTTTCTTCACTTTCCAGTTACTAATATTATTGTTTGTATACACAAAAATCGTTTTGGACACAGTCGTTGTGGGCAAAAGAGAAT
ACCGTGGAAGCATAGGACAAGATGGTTTCTCTTTGTAAGCCTATTTTCATACTGCCTTGAGTGGCATTTAATTACTGTTACCTTTCTAC
TTCCACAAATGTACATTGAAGCTTTCTTTATGGTGCTTAAAGGAAACTTTGTAGTTCCCTCACACGGAAATATCTATGTGCAGTCCAGT
TACATCACTCCGTGACAGCACGGTAATGATGCTGCGGAATGCTGACGAAGAATTGATATCAAAAACAGGTATTATTTTTGTTGTTGGTG
GTGTTTTTATCTTTATCAGCAGACCTACAATTGATGTGCATCACGATTTCTTGAGAATCACCACAGAAGTAAAGACAAAGGAAATTGTT
GAGTTAGGAACTATGGATGTTGTTTTTACACTTGATAGTGGAGGGAAAATTATTATCCAGTTGCAATTCTTGCTCAGTGCTGAAGATCG
CAAACGCGTCCAAGAAATGGTAAGATGGCCCTTTTTACCTTGTTTGCTATTTCATCATAAGTTAAAGATCGTTTTCCCCCACAAAATGT
CCAGCTTAAGTCAAAGCATATTACTTTGTTTATTTATACTTATTTCCTATTATTCAGAGGAACTTTGCAATGAAAAGGAAACAGCAAGA
GCTGCTTGGGAATGGTCTTTATTTTCAAGGTAACCCGTAAATTATAATATTTTCAGTTCTGTCTGTTTTTCCATTGCAAGTACAAATTT
GGAGATCCATGCAACAGATTGTATGCGCTAGATGTGTGATGCCAATTTCTCCCCTACAAGGCTGAGGATACCCACTTCATGCATTAGTA
CTTCTTAAAATTCTTTCTTGTGATGTTAACATAGGCCCAGAACTGAGGCATGTACACTCATTAAACTTCTAGATTTAGTCCTAATTTTT
TGTTATACCTTAATTGCGATGCTAGTTTGTAATATTAATCATAGATAAAATATGTAATTGATATGAACAACATGTGGGAGTTCTCTTTC
TTACAGATTGCTTTCAAATATATCAGACAGCCAGTTGTCTAAGGAGCAGACTGAAAAGATCTCCGACATCCCAAGCAAAGGAGATCAAC
TTACGCTTTGGAAGAGTCTGTTGCTAGATGACTTGAAAGAGAGCGCTGTCTTCTCTGAAATAAGAGTTGATTCCCGCATGAAGGCTTCA
AAGGACCTGCTACTGCCAAGTGTTGGAAGCACTTCAAAGCTTGAAGGACCCATCATCGGCTCCAAGAAAGGGCATGGTGAACCAGAGAG
TAGAGCAAGTAGTGCAGTGAAGAAGATGATAAGTGCCTTCGAAAGCAGCCCTCCACAGGTTTTAAACCAATCAATATATTGAGAAACAA
ATAGCTTCATTCCCTGAATTATGCTTGTCAGGTTTTGGGGGGAAACACCGATTGCTTTACTGTGCACCAAGAAGGGTTTATAATGTATA
CATGTACGGAGACAAAGGAAAAATGCATGGAACCTAATTCTAACAAACTTCACATATATTTACACCATCTTCTCTTGGATTGCAGAGTC
TGCCTTCGATTACAAGGATCAAATCAGAAAGCTCATTGGAAGTGATGTCAGTTTCTTCAGAGACTGGTACCAATTCTTCAGACAAGCCT
TCTACTCCTGGCGCACCAGCGAATGCTTCCGACCGTACGCAGACAGGCCTTGTGGCCGAGACATCGGGCAAAGTGACCCTTCGTTCTGG
TGATAAGGATTCCAGTTCGAGGTCAGGAAGGCAAGTTATGTTTGGCAATAAGAAATCAAATGCATCTAGACAGATCAATCTATCAAATA
CATATGAAAGCAGAAGACGAAGCTCCAGTAGGCGCGATGAACCAGCTAAGAAGAGCATGGGAGAAGCTGACCTGATCCGTTCCAAGAAG
CGGTCTGAAGACAAGCACCGTCGCTCCATTGGTCCCTACTCGCCTGAGCAGACTAACAGTTTGGTTGCAACATCTAGCATCACCTGGAT
CCACCCACATGTTTGTATCACCACCGCGAGCCGACAGCTCAAAGATCTTGTTGAGCTTGAGCGCCTGGACCCAATGAAGTACGTAGAAC
AGAATGTCCAGGAAGACACTGACGAGGTACGGAAAACTAAACTGATGTCGATGTTCCTACCAACTGTTGGTGCTTTTACTGCTTTTTGA
GTTTGTCTTGCTGCTTCGTTCATCTTCATATGTATCACAAACCAAATTTTTGGAGCAGTGCACGAGCATCGATGAGGTGAGGCACGTGG
CTGACTCTGCGCAACGGAGTGGTGGGTTTCCAGTGTTGAATGGGTGGATGATTAACCAGGCAAGTTATTGCTTTGACCTCTCCTATCCT
ATCCTCTCTGATGCATGTTGCTCAAGAGTTTGTTTGCACATGCTAAATATGGATGTACTTGAAAACACCACTGATGCAGGGGGTGCGTG
TGGTCATTGTGATCATAGCCTGCGGTGCTGTGTTCCTCAACAACAGGTGAAGTCAGAGGACTTGCAATATCGATTAGAGGTTATTGTCG
CTCAGACTGTCAAGTAGCTGTGGAGTCCAGGATTAGGGGGTGTTCGGGTAGCCGGACTATACCTTCAGCCGGACTCCAGGACTATGAAG
ATACAAGATTGAAGACTTTGTCCCGTGTCCGGATGGGACTTTCCTTGGCGTGGAAGGCAAGCCTGGCGATGCAGATATTCAAGATCTCC
TACCATTGTAACCGACTTTGTGTAACCCTAACCCCCTCTGGTGTCTATATAAACCGGAGGGTTTTAGTCCGTAGGACAACTTCATCATA
CAACAATCATACCATAGGCTAGCTTCTAGGGTTTAGACTCCTTGATCTCGCGGTAGATCTACTCTTGTACTACCCATATCATCAATATT
AATCAAGCAGGACGTAGCCGGTCCTGAGATTTTGGGGGCCCGGGGCGAAAATAAATATGGGGCCCCTTAAAATTTTCTTTTAGCATTCT
TTGACGGAAAGTCACTTACCGTGGTCTTACCTTGGATCCCTCCTCCTGCTGGTTGTGTGAAGGTGAACGTAGATGCTGCGATGACACAG
TTGTGATTGTGTCAGAATGCAAGGAGCCGTGAGGGGGATTTCATGGGAGCAACAAATGTCATCTTGCTGGCATCCGGGGTCTTAGACAA
TGGAAGCACTTTCGATGGCGATGAATTTGCTCGGTGGCAAGGGGAGAGTGGTTAGTACTGGTTCTAGGTCAGCAATACCCCTACGATGG
TTCAGTGTGGCATTATGGGAAAACCATGGAGGAGATTACGGCAATAAAGCAAATATGATTCTCAAGAATATAAATTATGAAGGTTTCAT
CTACAGTAATAATTTGAACTAAGATGGTATAAAAACAAATGCATGAACATACACATCAACAAAACTCCCGCTGCTGCTTCCATGCATGA
AATTTATGACTTCATGATATAAATAGAATCACAAATCATACCTCTCATGAAGAGAGGAGAGAGAGAGAGTACTCAAATTAAACTTACTT
GTACTGATGCGTACACACCTGAAGATGACAATTGAGATGACAGATCGTGAGGCCTTTCACCCTTAGACCCCCATAATCCTCATGATAGT
ATTCGAACTTGCCACTGGCTGCCATGATTTTTTTTATAAGGTAACTGAAAACTCAACAAATTATCAGCCTCTAGACTGAAACAACAAGC
ATAGCTGCATAGATCGATCTCCTCATAGTCTGCTCTCCAAGTTACGCGGCAGCGACATCGTCAGCGAATCTGTCGTGGGGTGGATTTCG
ACGGTCCACCGGAGGCGAGACTTAGGGTCATTGTGAGGACGATCTGGGTGAGTCCGCAGTCGCCGGCTTGTGATTCACAATCTGGGCGA
AAAGGAATCAGCCGAAATCACAGGGAAGCAGTCAAGCGATTCACGATCTGGGCGAAAAGGATTTTGTGGGGAGGTGGCGTGATGAGTCC
AATGGCTCTGGTACGGACTGGCGACTCGGGTGGCGACGCGGCGGGTGGAGCGGCTCTGCTCGCTGGCCCCGGCGAGTCAGCGACCTGGT
TGGTGGTGCGGACGCCGGGCCGGTCGGCACGGGCGTCCCCGGAGGCGGCGCCGCGCTGGACGGGGAGCCGTTTCTGGGCTCTGGCCGTG
GGAAGCGACGATGAGGATTCGGAGGCGGAGGCGAGCGATGGCGAGGCTCGTCGTGCCGAGGCGAGACGACCGTCGATCGGCGCGTTCGT
GGCTCGGGCTGAGGAGCTAGGGGGCTCGCTCACGGCCGGGCGGCGGCGGGCCCTTTGCGCCCTGGCGGCGGCGGTGCTCGCCGACCAGG
GGTGTCTGGTCCAGCGTCGTTGGTGCGCGCGGCCTTGTTGGGGGGCGTGCGGGCGGCCTCCGGCAGCATGCGGGCTGCCTCCCATTCGC
GGGGACCGGGGCTTGCACCAGTGGTGCAGGTGGTGCCTGGGGCGGGGGTGCCGGCGGGAGCAGGGCGGCGACGGCGACGGAGGAGGCAA
CCGATCCCTGGCCTAGGGTTGGGGAGGCTGGCCGGGCTGGGCCGGTGGGCCCAAACCCAGGAAGTGAGTCAAGGTGGCCTTCGTGGGCC
GTGCGCGGTCCAGGGCGTTAGTTCCGCGGAGACGGGGCTGCGTCTGGAGGTGGGCTGCTCAGATTCGCGGGGTTGCGCGGAGAAGCCCA
GGCCTGGCCCGGCCTCGGAGCGGGCAGTACTTAAGTGGCTCTGGATCCGCCGCGGCGCCTCCGACAGCTCACTACGGTTTCCTGCCACT
CCATCTGAGGTGCGGCGCTTCGGTGGATCCAGGCGCTTCCTTGTCCCGCCCGCCTCGCGGACGGCTTTTGCGATCTTTCGCAGAGGTGG
TAGCCATGGTCTCCAAGGCGTACGGGCGTAATAACCAGAGGTCGAGTCTGTATATGTAGTCGAACAGGCGTAGCTATCAGTATATGCTC
GCACTAGTTAATTCGAGTGGGTCGGAGCGTGTCGGCTAGGAGGGTGGCTGGGGGCCTCCCCCAGCTTGGTGGCTCAAGGAGTAAGAGAG
AAAGCAGAAGAAGAAGGAGGCTTCAAGAAGAAGGGGCTGGAACTCAAGCGTAAGGAATCGTTCCCCCCCTACGGTGGTGACCGCGCCGG
CGACCCCTACTCCAAGCAGATGCTGAAGAAGCACAAGCCGGCGATGCAGGCTCCCTCCAAGGCCCCCAAGCCCTCCGCCTCCAAGGGCA
CGGCAGCGGAGCCCATTCCGGTCGAGGATGCGGGAGGGCCAGAGTGCTTCCACTGCGGGCGCACGGGCCACTACCAGAGCGAGTGCGGC
TTCAAGCAACTGTGCGTGCTCTGCAGAAACGAGGGGCATGCCTCGGCCTACTGCCCCACGCGCGACAAGCAGCTCGTGCTCCAGACAAT
GGGCCACACCTTTGCGGGTGGCGGTTTCTTCTGCCTATAATACCCGGAGAAGGCGGATGTGGCGGCGGAAGGAATGCTAGGGGCTAACG
CGGCCCTGGTGTCGGCGGCGCCCGGCGTCCTGTCCAAGGAGATCCTCGAGGCTGAGCTCCCCCATCTTTTTGAGGGCGAGTGGGACTGG
CAGGTGGCACCGTTCGATAGGGACTGCTTCTCTGTGGCTTTCCCGGACCCTGTCATGCTACGGATGGCGACGCGCAGCGGGAAGCTCTT
CCTCTCCATCAACAACATCACCGCCGACATCCGTGACGCGGTCCTCGCCGAGCCCAAGGCGCTGTCAATGCCGGAGGTTTGGGTGAAGC
TATCAGGTGTGCCTCCGAACCAGCGGCGTGTGGAGCGGCTGATGGCGGCCACGACTATGATCGGCCGGCCGTTGGTGGTGGATGAATTG
TCCCTTATCCGTTCGGGCCCGGTGCGTATGAAGTTTGCTTGTCGTGTGCCGGCCAAGCTGTGCGTATCTGTGCAAATCTGGTTCAATGG
CGAGGGCTACACCATCAAGCTCGAGCCTGAGGTGGACCAACCGCGCCCGGCGGCCCCTGCTCTCCCGCCCCCGCCGCCGCCCCCTGCGG
GTCGGGGCCCGAGGGGTCACGGCAAGGACCAGCGCAAGGACAAGGAGCCGGAGCAAGACGCCAGCATGGAGGAGGACGACTCCATCGAT
ACCGCGGCCTGGGACAAGCTCGGGATCTCTCCGTCGGGCCCTGCCGCGCTGGTGGAGGGGGTGGCGGGGCTGCGGGCTGTGCTGGGGTC
GGGCCACGAGGGGATGCCAATCTCAATCCCCAACCCCTCGCGGCTCCGTTCGGTGGGATCTCCGGCGTCCGGTGGGCGTGGCTACACCT
TGACCGGCTCAGTCAAGCCGATGAAGAAGACGGCGGTGCGCAAGGTCAGCACGGAGGTGGTGCGCACGATGGGACCGCTGGGGGCGCCT
CCCATGCCTTGGCGGTCGGTCATGGCTCTGGCGGCCCCAACCTTGCCACCCGTCGCTGACCCGCTGCTGCAAGCGGCGGCCAAGGCGGC
GATGATCTCCAAGGCCAAGCGTACAAAGACGGTGCCGGCGGCTCCGGTCCGCACAAGCTCCTGCTCCAAGGGGGCCAAGGGCAACATGC
CCTCGCTGCAGCGGGCGCAACTCCTTCAGGCGCAGAAGAACATGGAGATCTCAGGTAATCCTCCGCCTCGCTTCACGGTTCTAGGTTCC
TTTTCGGACGATCACTTGCATGAGGTGTTGGAAGCGAGCGGCGTGGTTTCTTCTTGTGAGGGGGTGTCAGGGAGCTGATTTCGCTCGTT
CGCACCAAGGAGAGGGCGCAGGCAGCGCTCGCGGCTGCTGCGGCAGCGCTTGCTGACCAGAGGCCCGGTGAGCCCCGAGGGAGGGGCCA
CCACGGCACTTTCTGTGCCCGAAGAGTGGGGTCAGGCAACAGGGGTTGCCCCCTCCCTCGCCTTGCCGTGGAAGGGTGAGGTGCGTGGC
TAAGGCGGTCACCTCGCGAGGTGTCCGGCTCCGCAATCGCGTCATCTAGATGCGGGCCATCTTTTGGAACATTCGTGGCTTTGGCCACG
CGGGGGATTTTCATTTAAGGAGTACATGCGTAGAGAGGACGTCGACATTGTTGGTTTGTAGGAAACAATCAAGGGGGATTTTCGTTTCC
ATGAGTTGCTCGCGATTGATCCTTTGGAGCGCTTTGAGTGGCAACATGTTCCGGCTGTTGGTCACTCCGGTGGTTTGTTGCTAGGTTTA
AATCGCGCTCTGTATGAGATTATCGACTGGGACGTTGGTTCGTTTTTTTATCTCAGCACATTTTAGAGTTAGGGCCTCGCGCCGCGCAT
TGGTTGTCATTCAGGTCTACGGGCCGGCCGATCACTCGAGATCGACGGAGTTCTTGGGAGAACTCCAAGCCAAAGTTAACTTCATGATT
GAGGCGGCGCTTCCTGTCCTGGTGGGAGGCGATTTCAACCTGATTAGGTCGGGTGCGGATAAGAGTAACAGTAACATAAACTGGCCTCG
GGTGGCTATGTTTAATGTTGCGATCGCCTCGATGGCCCTCAGGGAGGTGGCTAGAATGGGCGCTCGATTCACATGGACAAACAAGCAAT
TAGACCCGGTACGCTCGGTGCTGGATCGTGTGTTCATGTCTCCGGATTGGGAAATGGTTTTTCCCCTCTACACTCTGGTCGCGGAGACT
CGTATTGGGTCGGACCATGTACCACTTGTGTACTCCTCGGGGGAGGATAGAGTGAGACGTAGCCCTCGTTTTTTCTTCGAAACGGCCTA
GTTTGAGGTGCCAGGCTTCGAGGAGCTCTTTAGGGAGAAGTGGAGGGCGTGCGTGAGCCAAGTGGTACATGGTCCGACCCAATACGAAA
TATGCCCTAGAGGCCCCAAAGTGGCCCGATGGAGTTCTGGAACGCTATCGGCGGACGACTCAGGGCAAGTCTCAAGGGATGGGGCGCTA
ACCTAGGGAGGTCTGACAAAGCACATAGAGCCGCGATCCTAGCGGAGATCGCAACGATTGATTCCCAGTCTGATATACGGGACCTTTCT
GAGACGAAATGGGCCCACGGATACGACCTGGAAAGTCAGGTTGAAGCGTCGTTGTGTGCGGAGGAAGAATATTGGCATCGCCGTAGTGG
TTTGAAATGGATCCTCAAAGGGGATGCCAATACCAAATATTTTCAAGCCTATGCTAATGGCCGCCGTCGGAAATGCTCGATCCTGAGGT
TGCAATCGGAGCAGGGGCTTCTGTTGCGACAAGAGGACATCTCTCGTCATATTTATGAGTTCTACATTAAGTTAATGTGGACATGTGGG
GACCAGAGGGCTGGGATGAGCGCGGATATGTGGGAAGCCGGACAACGGGTTCTTGACGGCAAGAACGAGGGGCTGGGGCTAGCATTCCT
TTCCCGAAGAAATCGATGCCGCGCTGATGGGTATGAAGGCGGACACGGCCCCTGGCCCGGATGGGTGGCCGGTGGCAATGTTTAAACGT
TTCTGGCCCTTGCTTAGGGGCCCAATTTTCGAGATCTGCAACGGGTTTATGCACGGTTCGGTAGATATATCGCGCCTTAACTTTGGGGT
GTTATCACTCATTCCAAAGGTTCAAGGGGCCAATGACATTAGACAGTTCCGTCCCATTGCGCTCATCACCGTTCCGTTTAAAATTTGTG
CCAAAGTGTATGCGACTAGGTTAGTTCCGATTTCCCATCGTGTGATCAATTGCAACCAATCCGTGTTCATCCGTGGTCGCAATATCCTT
GAGGCCCCCCTGGCCCTTCAGGAGATGATACACGAACTCAAACGCACTAAGGAACCGGCGGTGCCGTTTAAGCTAGACTTCGAAAAGGC
ATACGATCGGGTTAATTGGAATTTTCTCCGTCAGGTGCTACTCAGCTGGGGTTTTTCCGCTGTTTGGGTGCACCGCGTCATGCAGTTGG
TCTCGGGAGGACAAACTGCTATTTCGGTGAACGGAGAAGTTGGGCACTTCTTCCGGAATAAACGGGGCCTCAGACAAGGGGATCTATTT
TCCCCCCTCCTGTTCAACTTTATTGTTGACGCGCTGTCCTCTATGCTGAGGAAAGCGGCGGAGGCTAGCCATATCAAAGGGTTGGTTGG
ACATCTCATTCCAGGGGGAGTGACCCACTTGCAGTATGCGGATGACACGCTAGTGCTGTTTCGTCCGGACCTTCATAGCATTGCTGCGG
TCAAGGCGATTCTTCTCAGTTTTGAGCTCATGTCGGGCCTCAAAATTAACTTCCACAAATGCGAGGTGCTCTCGCTGGGGATCAAGGCA
CATTCCGGATCTGCTCAACTGCAAAGTGGGCAAATTCCCGTTCATTTATCTGGGCCTCCTGGTAGACACCAAACGACCCACGATAGAGG
ATTGGGAGCCTCTATGTGCCAAAGTGAGGAATCGTGTATGTCCATGGCGGGGCAAATTTTTGTCGAAAGCAGCGAGTCTAGTGCTCACA
AATTCCAGCCTGTCTTCCTTGCCAACGTCCTCCTGGTAGACTGCTTCTTCTTGCAGAAGGGGTCCACGCCAAGTTTGACACGCCTCGTG
CCAAGTTCTTTTGGGAAGGAACTAGCCCGAGCCATAAATACCACATGGTTAAATGGGCCTGGGTGTGTCGGCCCAAGGATTTGGGGGGT
CTTGGGATCACCAATTCTAGATGGCTTAACATAGCGTTGATGTGCAAATGGATTTGGAAAATCACCCAGGGGGCCTCCGGGTTGTGGGT
TGATCTCCTAAGGGCCAAATATTTTCCTAACGGGAACTTCTTTGAAGGGAGGGCAAGGGGCTCACCCTTTTGGAATGATCTGCAGTTGA
TCAAACCAGCTTTTTCCATGGGGGCAAAATTTTCGATCAGGAACGGCAGATCCGCGCGATTCTGGACTGATCATTGGATAGGTACCCAA
CCCCTTTGGGTCGAATTCCGGGATTTGTACGATATTGCTGACGACACCGCGCTGTCGGTGGCGGATGCGCTTGCCGCGATGCCACCTGA
GATTCAGTTTAAGCGGGAACTCAACAGGCCCGAGCAGGCAAGCCTCGCGGCCTTGCTGCAACTAATCGAACCAGTGGGTCTTTTGGACC
AGTCTGATTCGGTAAGTTGGGCACTTACTAACTCGGGGAAGTTCTCGGTGAACTCCTTATACCGCAAGTTGTGTCGAGGGCCGACACAA
CCAGTGATTGCTGGTTTGTGGAAAGCGCGGATCCCTTTGAAGATCAAACTTTTCATTTGCCAACTGTTTCGCCATAGGCTCCCCACTTC
CTTGAACTAGCCAAACGTAATGGACCGGCCATGGGTCCGTGTGCGTTGTGCGGGGAACCGGAGGATGCCAATCATGTGTTTTTCCGTTG
GCCTCTAGCGAGGTTCGCGTGGAGTGCAGTCCGGACCGCGGCGGGTGTCGTTTGGGACCCGCGCTCGGCTACCGAGCTTTTCAACCTCC
TAGATGCGATTAAGGGCCCTGAATATAGGGTTATGTGGAGTTGTGTGGGGCACTTCTCTGGGCCTTATGGCGAACTAGGAACAAGTTTA
CTATAGAAGGGTGTTTTCCAAATCATCCGGCTAATATCATCTTCAAATGCAACCTCCTATTGCAGCAGTGGAGTCCGTTGGGAAGGCGC
AAGGATGCTGAGTTGATCAAGATCGCCCAACAATGACTGGTGCAAGTATATACAATGTTTAGGGAGTCATGACTTCGGTCCCTTTTGGT
TGCGTGATGAGCCTGCGTGCTTTGTACGGGCCTTGCCTTGTAACCTTGATTGGTGGCTTGTTAAGTTTCGTTTCTACTTTCGTGGTCGA
GCCGATATGGCTGTTGGTGATGTATTAAGACTTGGTATGTGGTACGCTGCTGTTGGGGCTTTATTAATCTAAAGCTGGACGTATCTGGC
GTCTTCGTTCTAAAAAAGGAATTAGCCGAAATCACTGGGAAGCAGTCAATCAATTCAGGATCGGAAAGAATACCATGATAACGCTATCG
TGGGCTACCAACTACGTGTGCATCTCAGGCCTTGTTTCTCCCGGAGCTTGTGCCAAGGGCTCACTTTTTTTCCCAGAAACTGCCATGGG
CTAATTATGTTCGTGGGAGGGGGCTCCTAGCTTTCGCGGGCCCGGGGCGGCCGCCCCTGCTGCCCCCCCACGTAGGGTTTTACCTCCAT
CAAGAGGGCCCGAACCTGGGTAAAACATTGTTTCCCTTGTCTCCTGTTACCATCCGCCTAGACGCACAGTTCGGGACCCCCTACCCGAG
ATCCGCCGGTTTTGACACCGACATTGGTGCTTTCATTGAGAGTTCCTCTGTGTCGTTGCTTTTAGTCCCGATGGCTCCTTCGATCATCA
ACAACGATGCAGTCCAGGGTGAGACTTTTCTCCCCGGACAGATCTTCGTCTTCGGCGGCTTCGCACTGCGGGCCAATTCGCTTGGCCAC
CTTGAGCAGATCGAAAGCTACGCCCCTGGCCATCAGGTCAGGTTTGGAAGCCTAAACTATACGGCTGACATCCGCGGGGACTTGATCTT
CGACGGATTCGAGCCACAGCCAAGCGCGCCGCACTGTCTCGATGGGCATGATATAGCTCTGCCGCCGAACAGCGCTTTGGAGGCCGCAC
ACACACCGGTTCCGACCATTGATTCGGAGCCTACTGCGCCGATCGAGGATCAGCGGTTGGACGTTGCCTCAGGGGCTGCGATCTCAGAG
GCGATCGAGCCGAACTCGAACCCCGCACTCCGCATGGCCCGTGACTCCGAGGAGCCGGATTCCTCTCCGAACTCCGAGCCCCCCGCGCC
CCTGCCGATCGAATCCGATTGGGCGCCGATAATGGAGTTCACCGCCGTGGACATCTTTCAGCACTCGCCCTTCGGCGACATCCTGAATT
CTCTAAAGTCTCTCTCTTTATCAGGAGAGCCCTAGCCGGACTACGGTCAGCAAGGTTGGGATACGGACGATGAAGAAATTCAAAACCCA
CCCACCACCCACTTCGTAGCCACTGTCGACGACTTAACCGACATGCTTGACTTCGACTCCGAAGACATCGACGGTATGGACGACGATGC
AGGAGACGAACAAGAACCAGCACATGTAGGGCGCTGGAAGGCCACCTCGTCATATGACATATATATGGTGGACACTCCAAAGGATGGAG
ACGGCGATGGAATAGCGGGGGACGATACCTCTAAGAAACAGCCCAAGCGCCGGCGTCAGCGGCGCCGCTCTAAATCCCGCCAAAGGAAA
AACGGTGATTCCGGCACGGGAGATAATACTACTCCGGATAGCACCGAAGAACACCCCCTCCAGCAAGAGTCAGCACAGGAGGACAGAGA
AGCCAGCCCTCACGAGAGGGTGGCGGACAAAGAGGTTGAGGACGATAATCATATGCCTCCCTCCGAAGACGAGGCAAGCCTCGACGACG
ACGAGTTCGTCGTGCCAGAGCATCTCGCCGAACAAGAGCGTTTTAAACGCAGGCTTATGGCCACGGCAAGCAGCCTCAAGAAAAAGCAG
CAACAGCTTAGAGCTGATCAGGATTTGCTAGCTGATAGATGGACTGAAGTCCTTGCGGCCGAAGAGTATGAACTCGAACGCCCCTCCAA
GAGTTACCCAAAGCACAGGCTGCTACCCCGACTAGAGGAGGAAGCACCTACATCACCAGCGCATGACATGGCCGATCGGCCACCTCGTG
GCTGCGACAGAGAGGCCTCTCGGCCCTCCACTCAAGCCATGCCCCGGCACCGCGTCAAGCATACTAAGGCACGGGAAAATGCGCCCGAC
CTGCGCGACATACTGGAGGACAAGGCAAGGCAAACAAGATCGATCTATGGATCGCGCAGGCACCCCACGGCACGTGACGGTGACCGTCA
CTCCGGATGCAATGAATCCAGCCGGGCCGAACTCAACAGACAAAGCTCCTTCAAGCTGCGTCGTGATATAGCCCAATACAGAGGCGCCG
CACACCCACTATGCTTCACAGATGAAGTAATGGATCATAAAATCCCTGACGGTTTCAAACCCATAAACATCAAATCATATGATGGCACA
ACAGATCCTGCGGTATGGATCGAGGATTATCTCCTTCATATCCACATGGCCCGCGGTGATGATCTACACGCCATCAAATACCTCCCACT
CAAACTTAAGGGACCGGCCCGGGATTGGCTTAACAGCTTGCCAGTAGACTCAATCGGTTCTTGGGAGGACCTGGAAGCCGCATTCCTTG
ACAACTTCCAGGGCACTTATGTGCGACCACCGGACGCCGATGACCTAAGCCACATAATTCAGCAGCCAGAGGAATCGGCCAGGCAATTC
TGGACACGGTTCCTAACAAAGAAAAACCAGATAGTCGACTGTCCGGACGCAAAGGCCTTAGCGGCCTTCAAGCATAATATCCGTGATGA
GTGGCTTGCCCGGCACCTGGGACAGGAAAAGCCGAAATCTATCGCAACCCTCACGACACTCATGACCCGCTTTTGCGCGGGAGAAGACA
GCTGGCTAGCTCGCAGCAACAACTTAACCAAGAACCCTGGTAATTCGAATACCAAGGACAAAAGTGACAGGTCGCGTCGGAACAAACAA
AAGCCCCGCATTAACAGCGACAGCAATGAGGATACGACAGTTAATGCCAGATTCCGAGGCTACAAACCCAGTCAACGGAAAAGGCCATT
CAAAAGAAATACTCAGGGCCCGTCCAGTTTGGACCGAATACTCGACCGCTTGTGCCAGATACATGGCAACCCCGAAAAGCCAGCCAATC
ACACCAACAGGGATTGTCAGGTGTTCAAGCAGGCAGGCAAGTTAAGAGTCGAAAACAAAGACAAGGGGCTGCATAGCGACGACGAGGAG
GAGCCCAGGCCGCCGAACAACAATGGACAAAAGGGATTTCCCCCGCAAGTGCGGACGGTGAACATGATATACGCAACCCACATCCCCAA
GAGGGAGCAGAAGCGTGCGTTACGGGACGTATATGCGATGGAGCCAGTCGCCCCAAAGTTCAACCCATGGTCCTCCTGCCCGATCACCT
TTGATCGAAGGGACCACCCCACTAGCATCCGTCACGGTGGCTTCGCCGCATTGGTTCTCGACCCAATCATTGACGGATTTCATCTCACA
AGAGTCCTCATGGACAGCGGCACAGCCTGAACCTGCTTTACTAGGATACAGTGCAAAAAATAGGCATAGATCCCTCGAGGATCAAGCCC
ACCAAAATGACCTTTAAAGGTGTCATACCAGGTGTAGAAGCCAACTATACAGGCTCAGTTACATTGGAAGTGGTCTTCGGATCTCCGGA
TAACTTCCGAAGCGAGGAGTTAATCTTCGACATAGTCCCGTTCCGTAGTGGCTATCACGCACTGCTCGGGCGAACCGCATTCGCAAAAT
TCAACGCGGTACCGCACTATGCATACCTCAAGATCAAGATGCCAGGCCCTAGAGGAGTAATCACGGTCAATGGGAACACTGAATGCTCC
CTCCGAATGGAGGAGCACACGGCAGCGCTCGCAGCAGAAGTACAAAGCAGCCTCTCTAGGCAGTTCTCCAGTTCGGCCTTCAAAAAGCC
GGACACTATCAAGCGCGCCCGGAGTACCCCACAACAAGACCGCCTGGCATGTTCTGAGCTAGCGTAGCAATGCGGCCCCAACCCTAGCC
CTCGCGATATAGCGAAACCAGTGCTTCACATACATAACTACGCTCTTGAAATACCATGGGCACAGGGGAAGGGGCACTATCACGGCACG
CCCGAAATACGGCTTAAACCGCACCAGGGGCTGCCGGATTCTTTTTTTTTCTTTTACTCTCAGGACTCCATACTTCGGACGACCCGTTC
GGCAATTCAACTGCCACACAAACGATGCAAGACCCAGGAAAGCAGACAAGCCACGCCGCATTATGGAACTCCCAGGTGGTCTCTATTGC
GAGCAGTATACCTATTTTTTAATACAATTCCGCGGCCTGCCCCTGGCCAAGACATGTAAATAGTCCAATTTCTTTTGCTTATCGCACTA
TTTGTATCGTTCCGCTTTCATAGCAGCCTTTCTATAAACAATGCATAGCTTTTTGTCTATTTTTTGCATTGTCCTCTTTTTATATATAT
GTTTATTAATAACATGTTGCATCCATACACTGTGGCACGGCAAAAATACGCCAGGGGCTTTAGTACCCATCAATATGGCGTGAGAAGTC
CGTACACTTTCACAAGTGCGGCACCCCGAACTTATAGCACTATATGCATTGGCTCCGAATCATGATTTGGGTCAATAGTTGGGTTTGCC
TGGCTCCTATGTTTTGGTGCCTTACGTTCCGCTATATCGGCTAAGGTAGCACTAGGAGAACTACTGCGATTGTGCCCCAGTTGAGCTGG
GCTGAGCACCTTAGTAGAGAAAGCTAAAACTGACTGTCATGATGAGGCGAGAGACCGGTCGCTGTTCGAGAGGTTTTTTCGAGTCCTTA
AAGACTTATGCTGCTTCGAGCGAGGAACCGGCTTTGTCCGGCCAAGGCGTGGATAGCGCCCCGAACTCGGTCTTCCGAATACTAGGGGC
TTCGCTGAAATTTTAAAATTATAGAGTTCTATGGCTAAGTGAGAGTGTTCAAGCATTATACTCCGATTGCCTTGTTCGTTGTGCTGAGT
GCCTCCCTCGACGGACCCAATCATGGGAAAAAGAGCGCTCGGGTTTATCCCGAACACCCCAGCACTAGTGGCATGGGGGCAGAAGCCGA
CGAGTGGCCATCTCTCAATTTTTTGATAAACGGCCACACAGAAAGTAATATTTTAAATTCAAGCATTGCTTAGCGCATATGAACAAGTT
TTCAGCGCACAGGATAACACGAGCGAGTTCATTCAAAAATTACATCCTTGGTACATTCATCCGCCATAAGGCGGGCACCAGCCAGAACA
TTCTTGTAATAGTTCTCGGGCTTGCGATGCTCCTTCCCCGGCGGCGGCCCGTCCCTCACAAGCTTCTCACCGTCCAGCTTACCCCAGTG
CACCTTTGCACGGGCAAGGGCCCTACGGGCACCCCTTGATGCAGACGGAGCGCTTGATGACCTCGAGCCTTGGACAAGCCTCCACCAGC
CGCCGCACCAGTCCGAAGTAGCTCCCAGGCAGGGCCTCTCCAGGCCATAGCCGAACTATGAAGCCCTTTAGGGCCTGTTCAGCCGCCTT
GTGGAGCTCGACCGGCTACTTCAGCTGGTCGCTCAAGGGCACAGGATGTCCGGCCTCAGCGTACTGAGACTCGGCTCAGTAGAATGCGG
CGGCATCGTATACGCTGCGGGGAAGATCTGCGAATGCCCCTGGAGAGCTCCGGATTCGGGTAAGTAACAAGTAGTTTACTTTTATATGT
CTGCTTTGCATGGAGAATGCCTTACCCGCCGCTATCTTTTTCACCGCATCCAATTCTTGGAGGGCCTTCTGGGCTTCGGCCTTGGCAGA
TTTGGCAGTTTTAAGGGACGCAGCAAGCTCGGACGCTCGCGTCTTCGAGTCAAGCTCCAAACTCTCATGTTTTTTCATGAGAGCCTGAA
GCTCTTGCTGCACCTCGCCGACCTGTGCCTCAAATTTCTCCCGCTCGGTGCGCTCCGTGGCCGCCTTCTTCTCGGCCTCGGACAACGCC
TGCTTGAGGGTCGCCACCTCAGTTGTGGCCCCTACAATAAGCAGTGTACTCCTGTCATTTTTTGCAATTGCGTCTTCTTATAGGCATTT
TTTTCTATAAGGTATCTCTTACCTTCTTTCTCCTCGAGCTGCCGTTTTGCAAGGCCGAGCTCTTGCTCGGTCCGCTCGAGGTCCTGCTT
CAGGGCACCCACCTCCGCAGTCAGTGCGGCGGTGGCCAGCAGCAAAGCCTGCATACGTATATTGACTCCTTTTTAGTTAGACTCCTGTG
ATATTTAATAGATCCTCTATTCGGCTTTTCTTTCCGAACGCCAAACAGAGCATCAGGGGCTACTGTCTATGCGGTAATATTTTTACATA
TTTTTTACTTACCTCGAAGCCTGTTAGAAGGCTAGCACAAGCTTCAGTCAGTCCGCTTTTGGCGGACCGAACCTTCTGGACCATCGTAC
TCATGATGGTACGGTGCTCCTCGTCGATGGAGGCGCCCTTAAGCACCTCCAACAGATTGTCCGGCGCCTCCGGATGGACGGAGGACGCT
GGCTCAATAGGCTTGCTCCTCTTGGCAGGAGTTCGCCTGCCGCGGTCCGGAACCGTTGATGATTCCGGCGCGGTGTCCGGCTTAAAGCC
GGACTTGGAGCCCTGGGGGGTCTTGTCCCCTTCACTCCTGGAGTCCGGGAGGTCGCCTCGTGGCTCCTCCTTCAAGTGAAGGAAATATG
CCCTAGAGGCAATAATAAAGTTATTATTTATTTCCTTATATCATGATAAATGTTTATTATTCATGCTAGAATTGTATTAACCGGAAACA
TAATACATGCGTGAATACATAGACAAATAGAGTGTCACTAGTATGCCTCTACTTGACTAGCTCGTTAATCAAAGATGGTTATGTTTCCT
AACCATGAACAAAGAGTTGTTATTTGATTAACGAGGTCACATCATTAGTTGAATGATCTGATTGACATGACCCATTCCATTAGCTTAGC
ACCCGATCGTTTAGTATGTTGCTATTGCTTTCTTCATGACTTATACATGTTCCTATAACTATGAGATTATGCAACTCCCGTTTACCGGA
GGAACACTTTGGGTACTACCAAACGTCACAACGTAATTGGGTGATTATAAAGGAGTACTACAGGTGTCTCCAATGGTAGATGTTGGTTA
GGGTCTGTTTGATTCAAAGGATTTTCATAGGATCTTTGAAGGATTAGAATCCTTAGGAATTTTTCCTACGTTGGTCGTTTGATTCGTAG
GATTGAATCATGTAGAATATTTTCCTAAGGATTCATTTGTACTACGTTTCACAGGAATTATAACATGCACTCCAACCTCTTGAAAGAAA
TCCTTTGTTTTTCATGTGACACAATCAAACAAACTCAAATCCTATAGGGATCCAATGAACATGCCATTCCAATTCTACTTTTTTCCTAT
TCCCGTGTTTCTGCAATCCTATGAATCAAAGAGGCCCTTAGTTGGCGTATTTCGAGATTAGGGTTTGTCACTCCGATTGTCGGAGAGGT
ATCTCTGGGCCCTCTCGGTAATACACATCACATAAGCCTTGCAAGCATTATAACTAAGATGTTAGTTGTGAGATGATGTATTACGGAAC
GAGTAAAGAGACTTGCCAGTAACGAGATTGAACTAGGTATTGGATACCGGCGATCGAATCTCGGGCAAGTAACATACCGATGACAAAGG
GAACAACGTATGTTGTTATGCGGTCTGACCGATAAAGATCTTCGTAGAATATGTAGGAGCCAATATGGGCATCCAGGTCCCGCTATTGG
TTATTGGCCGGAGACGTGTCTCGGTCATGTCTACATTGTTCTCGAACTGTAGGGTCCGCACGCTTAACGTTACGATGACAGTTATTATG
AGTTTATGCATTTTGATGTACCGAAGGTTGTTCGGAGTCCCGGATGTGATCACGGACATGACGAGGAGTCTCGAAATGGTCGAGACATA
AAGATTGATATATTGGAAGCCTATGTTTGGACATCGGAAGTGTTCCGGGTGAAATCGGGATTTTACCGGATTACCGGGAGGGTTACCGG
AACCCCCCGGGAGCCAAATGGGCCTACATGGGCCTTAGTGGAAAGGTGAAAGGGGCTGCCATGGAGGGCTGCGCGCCTCCCCCCCTCCC
CTAGTCCTATTAGGACTAGGAGAGGTGGCCGGCCACCTCTCTCTCTCTTTCCCCCTTGGAGTCCTAGTTGGAATAGGATTGGAGGGGGG
AGTCCTACTCCCGGTAGGAGTAGGACTCCTCCTGCGCCTCCCTTGCTTGGCCAGCCAGCCCTCCCCCTCTCATCCTTTATATACGGGGG
CAGGGGGCACCTCTAGACACACAAGTTGATCCTTGAGATCATTCCTTAGCCGTGTGCGGTGCCCCCTGCCACCAAATTCCACCTCGATC
ATACCGTTGTAGTGCTTAGGCGAAGCCCTGCGTCGGTAGTACATCAAGATCGTCACCACGCCGTCGTGCTGACGGAACTCTTCCTCGAC
GCTTTGCTGGATCGGAGCCCGAGGATCGTCATCGAGCTGAACGTGTGCTAAGAACTCGGAGGTGCCGGAGTAACGGTGCTTGGATCGGT
CGGATCGGGAAGACGTACGACTATTTCCTCTACGTTGTGTGTGATCGCTTCCGCAGTCGGTCTGCGTTGGTACGTAGACAACACTCTCC
CCTCTCGTTGCTATGCATCACCATGATCTTGCGTGTGCGTAGGAAATTTTTTGAAATTACTACGTTCCCTAACATCAAGGGACCACCTC
CTCTTGCCCCGGAGATTGTCGCGACGCCACTTTGGCGTCAGCGCGGCACAGGGGGAGGCAGCGGGCGGAACTGAATTCACGTCTGATGA
ATCCAGGGAGCCGCTCGACAACTCGTCGAGCCCGTCCTTGGGCGGGCTGCATGATTATATTCGACATTAGGGAAAGTTGTGCAACAAAA
GGAATATCATGAGTTACTCTGGTATCCGAACACTTACGATCTCGCCAGACGCTTGGCCCTATCCGGCCACTCCTCTTCGTCCTCGTCGG
CGTCGGGGGCGTAGTCCGGCGGAGGGGTTTTCCTCTTCTTGGACCCTTCGGCGCCCCCTATTGGGGCGGCCTTCCTCTTGTTTCCTCCC
CCCGCTGGAGGGGGAGAGGTTTCTTCTTCCTCCTCCTCGTCTTCACGGGAGGAGTGCGTCTCGGACTCGTCGGACGACGAGTCCGACAC
CACCATATGCCGGGAACTCTTTTGAGTCCCCGTGGCCTTCTTCTTCTTGGCCTTCTTCTCCGGCACCATATAAGGTGCCGGAACCAGCA
GCTTCACTAGGGGAGCAGGGGCTGGGTCTTCGGGCAAGGGAGCCGGACAGTTAATCAGTCCGGACTTCGCCTGCCAAGCCTGTCAAAGG
TGAGGGAGTTTAGATCCCGCATAGAGTCAAACTATGAGAAAACTTAACATCCTGTAAAAGATGAAAATATCTTACCTCGCCAGCAGGAC
GCTGCGAGCTGAATCCGCGGTCTTCGGAGGCGGATGTGGGAGCCTCGGCGCCTTTGGTTAGCCCCTTCCAGACATCTTTGTACGTTGTG
TCGAAGAGCCTGCTCAGAGTCCGGTGGTGCGCCGGGTTGAACTCCCACAAATTGAAGTCGCGTTCTTGACACGGAAGGATCGGGCGGAC
GAGCATGACCTGGACTACGTTGACAAGCTTGAGCTGCTTGTTCACCTAGGGACTGGATGCATTTTTGCAGTCCGGTCACCTCTTCTTCG
TCGCCCCACGACAAGCCCGTCTCCTTCCAGGACGTGAGCCGTGTAGGGGGTCCGGATCGGAATTCAGGGGCTACGATCCACTTTGGATC
GCGTGGCTCGGTGATGTAAAACCACCCTGACTGCCAGCCCTTCAAGGTCTCCACAAAGGAGCCCTCGAGGCATAGGACGTTGGCTATTT
TGCCCGCCATGGCACCTCCGCACTCCGCCTGGTTGCTGCGCACCACCTTTGGCTTGACGTTGAAGGTCTTGAGCCAGAGGCCGAAATGG
GGGTGGATGCAGAGGAAAGCCTCGCACACGACGATAAACGCCGAGATATTGAGGATGAAGTTCGGGGCCAAATCGTGGAAATCTAGGCC
ATAATAGAACATGAGCCCACGGACAAATGGGTGGAGAGGAATACCCAGTCCGCAGAGGAAGTGGGGAAGAAATACTACCCTCTCATGGG
ACCTTGGGGTGGGGAGAAGCTACCCCTCCTCAGGAAGCCGGTGCGCGATGTCTTCGGACAAGTATCCGGCCTTCCTTAGCTTTTTGACA
TGGCCCTTCGTGACGGAGGAGACCGTCCACTTGCCTCCCGCTCCGGACATTGTTGGAGAAGATTGAGGTAGGAAGTGCGGGCTTGGGCG
CTGGAGCTCGGGTGGGCAAAGGAGGAAGAAGGCGTAGGTAAAAAGGTGGATCCTTATCCCCTTATATGCGCGGATGCGACTACGCGTCC
CCACCAGCCTAGTAAAACTCGCTTGCCTCCCAAGCGTCGTGATAAATTGCACGGTTGGGTTACCCACGTCCGTATTGATGAGAATCCCG
TAAATGGGGGACACGATCTCTGCTTTGACAAGACGTGCCAAGGAAACCGCCTCGCAAAACACGCTGAGGTGGAAAAGTGAAAACGATTC
GAATAAAGGCTTGGCCGTAGTGTGATGTCACGCTGCGGAATACGTCAGCAGATTAGATTTGTGTTAATATTATTCTCTCTGTGGCAATA
CGTGGAAACTTATTTTGCAGAGCCAGACACTACTCTTGGTGTTTACAAACTTTTATGAAGAATTTGGAGGAGGAACCCGCCTTGCAATG
TCGAAGACAATCTGCGCGTCGGACTCGTCGTCATTGAAACCTGGTTCAGGGGCTATTGAGGGAGTCCTGGATTAGGGGGTGCTTGAGTA
GCCGGACTATACCTTCAGTCGGACTCCAGGACTATGAAGATGCAAGATTGAAGACTTCGTCCGTGTCCGGATGGGACTTTCCTTGGCGT
GGAAGGCAAGCCTGGCGATGCGGATATTCAAGATCTCCTACCATTGTAACCGACTTTGTGTAACCCTAACnCCCTCCGGTGGTCTATAT
AAACCGGAGGGTTTTAGTTCGTAGGACAACTTCATCATACAACAATCATACCATAGGCTAGCTACTAGGGTTTAGCCTCCTTGATCTCG
TGGTAGATCTACTCTTGTACTACCCATATCATCAATATTAATCAAGCAGGACGTAGGGTTTTACCTCCATCAAGAGGGCCCAAACCTGG
GTAAAACATTGTGTCCCTTGTCTCCTGTTACCATCTGCCTAGACGCACAGTTCGGGACCCCCTACCCAAGATCCGCCGGTTTTGACACC
GACAGTAGCTGATTTGGAAGCGTCTTCATGATACCGAGCTCCCACACAGGAATGTGTGGGAGGACAGGGCACACGAGTATCCAGGATTT
TGTGTGATGTTTTTTCAGAAATGATTATTTATTCGCTCTCGCGTTCTTGTGGATGTTGTTGGCATATATACACGTGAGATTACAGGTAT
GTAAAATGACAAAAGATCTGATGAAAACCATCCTTACATATGTTGGAGGTGTGATATTGTACAAATTTGCAAAATTAACGGATGCCTCA
CAGAAGGCAGATTGGCTTTATTTTTAACGACAGATTGGCTCTTAATTTAAAGGCATTCAGCACGTAACGATTCGATAGATGTTTAGGGT
TAAAATCAACTTTGAGACGCACGATTTGTAAGAAGGCACGACAGTTTGTAAGTTGGGCTGGCTGGGCTCTAACCTAAGGAAGGAAGAAT
ATAGTACATGTATATATACAGTTGTTTTTGCTTGCAACTATCTCTTGCAATCAAGGATCGATCTAGCTGTTAATCTAGCTAGCTAGTAA
AGATAGACCTAATTAAGAAGTCGATGAGATTAACAGCTAGATCGACTTCTTAATTAGTTGGGCTGGGCCCTAACCTAAACCTAAGGAAG
GAAGGATAGATCGCGGCCCACGGCCCACCCCCACCTTAATTTCTCATTCTCTGATTCTTCTCCCTGGCGATCGCTGGCGGCTCTTCTTG
GGAGGAGGCTCAGTGGCTCCGTGCTCCAGCCAACGCCTGTTGGCCTGCAGTTCTTGCCGAGTATATTTAATACTTAGCAGTATTGAGCA
GCCTAGCTATGTAGTCTAGTATAATTTACAAAACAAATCACAGTATAAATTAAATATTACCAGAGGATAACTTGCTCGTCCCTCCAGCT
AGTTTGCTGTTGCAGTTGCGTCGATACCCTCTTCG
4.1 Use Dotter to align the complete sequence with itself. Present a picture of the
alignment and describe what you see.
4.2 Use TREP to annotate the retroelements. Include a picture of the results.
-Highlight each retroelement with a different color (one color per element).
-Where appropriate, mark the Long Terminal Repeats (LTRs) of each
retroelement by making them bold and underlined with the same highlight color
as the element that they belong to.
-Find the inversions delimiting the LTRs and indicate them in the sequence with
bold, red, underlined letters with the same highlight color as the element that
they belong to.
-Highlight in yellow Host Duplications (HD) or Target Site Duplications (TSD)
flanking the complete retroelement.
-Create a text box next to each element (or insert a comment on the first
letter of the element) and annotate its name and type of element.
4.3 Interpret the Dotter results in light of your final annotation from the TREP results.
Explain the order of insertion of any elements present in your sequence.
4.4 Annotate the genes you predict in the non-repetitive regions of this sequence. Show
your work.



In different colors from any repeat elements, highlight each gene you find in the
sequence using one color per gene.
Identify in Red font the Start (ATG), Stop (TGA) and splice site (GT or AG)
locations in the sequence.
Create a key for your highlighting scheme (ie. GENE1, GENE2, ELEMENT1,
ELEMENT2, etc)
4.5 Briefly list the steps (in order) that you took to annotate this sequence.
4.6 Provide the translated proteins of any genes you predict.
5. 25 Points
Using PreGAP4 and GAP4, assemble the 14 sequences that are of potential mutants for a
particular gene.
In the ‘Configure Modules’ tab of Pregap4:
 Sequencing Vector Clip: Unselect this option.
 Screen for Unclipped Vector: Unselect this option.
 Cloning Vector Clip: Unselect this option.
5.1 Answer the following questions:
Were all the sequences provided used to perform the assembly?
How many contigs were created?
What is the length (bp) of each contig?
5.2 Considering the reference sequence below, identify which reads (individuals) contain
the following mutations (Hint- check the trace files):
WT
Position
Number 1
C
508
Number 2
G
497
Number 3
C
1014
(Mutation positions are numbered from the Start codon)
Mutation
T
A
T
Individual
>PHD_genomicDNA
ATGGCCGGTAGGGATAGGGACCCGCTGGTGGTTGGCAGGGTTGTGGGGGACGTGCTGGACCCCTTCGTCCGGACCACCAACCTCAGGGT
GACCTTCGGGAACAGGACCGTGTCCAACGGCTGCGAGCTCAAGCCGTCCATGGTCGCCCAGCAGCCCAGGGTTGAGGTGGGCGGCAATG
AGATGAGGACCTTCTACACACTCGTACGTACACAGTCACTATCTAATGCCAATTTATCTCTGAAAGTGCTCACCACACGCACATGATCG
ATCGAGCTCGATCTATAGTACGTGAGGGAAATTGATTTTCGATGCTTCTGTTCACATGTTTGCCTCAGCAAGCACATGACTAATGCTCC
ATCTTGCATATGTCTCTGTGCCCTCTGGTGTTGATCATGATTTTTCTATGCTTCTTCTATGTTCGGGGAGCATTTATTTTTTATGCTTC
TCTTGACATGTTTCATGTTTGTCCTAGCAAGCACACGAGTAATTAAAGCTCGATCTTAAATACTCTCTCCGTCCGAATAAATGTACTTC
TAGCTTTTGTCTTAAGTCAAAGTTTTAAAATTTTGACCAACTTTATAGGAAAAAGTAGCAGCATTTATGACACTAAATTAGTATCACTA
GATTCGTTTTGAAATGTATTTTCATAATATATCAATTTGATATTATATATGTTACTACTTATTTGTATATAGTTGGTCAAAGTTTTAAA
ACTTTGACTTAGGATAAAAACTAGAAGTACACTTATTCGTGGACGGAGGGAGTATATGCTTATGTAGGTAGTACTCTCTACTTTGATCA
TGATGTGCACGCGTTTACTGCCCGCAGGTGATGGTAGACCCAGATGCTCCAAGTCCAAGCGATCCCAACCTTAGGGAGTATCTCCACTG
GTAAGTACTAAATTTGTAACTCAGTTGAATAATTTCTCTGTCCCTAGATATACACACTAGCTCATGTGTGCGTGTGTGTGTCTACATGT
GTGTGCAGGCTTGTGACAGATATCCCCGGTACAACTGGTGCGTCGTTCGGGCAGGAGGTGATGTGCTACGAGAGCCCTCGTCCGACCAT
GGGGATCCACCGCTTCGTGCTCGTACTCTTCCAGCAGCTCGGGCGGCAGACGGTGTACGCCCCCGGGTGGCGCCAGAACTTCAACACCA
GGGACTTCGCCGAGCTCTACAACCTCGGCCCGCCTGTCGCCGCCGTCTACTTCAACTGCCAGCGTGAGGCCGGCTCCGGCGGCAGGAGG
ATGTACAATTGA
5.3 Which of the three mutations would you expect to see in the protein? _______
 Generate the mutated cDNA and protein sequences for this mutant and use it with
BLASTp on the Protein Data Bank database. Use the structure link at the right of
any significant hits and navigate to the CN3D link to view the database protein
structure aligned with your mutant protein.
 Take a screen shot showing the location of the mutated amino acid in the worm
style rendering with secondary structure color rendering.
Mutant cDNA
Mutant Protein
CN3D Image
Related documents