* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Answers to Scoring in Scrabble (English Word Play)
Metalloprotein wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Non-coding DNA wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene expression wikipedia , lookup
Community fingerprinting wikipedia , lookup
Expression vector wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Proteolysis wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Biochemistry wikipedia , lookup
Protein structure prediction wikipedia , lookup
Genetic code wikipedia , lookup
Point mutation wikipedia , lookup
Biosynthesis wikipedia , lookup
BIO 102 Renn_Lab#1 (in-lab handout: answer-key) Name ___________________ Answers to Scoring in Scrabble (English Word Play) (1a) All words that contain ‘ghi’ regex = ‘ghi’ (1) coughing (2) laughing (3) laughingly (4) laughingstock (5) outweighing (6) sighing (7) weighing (8) weighings (1b) All words with ‘yz’ that are not immediately followed by an ‘e’ or an ‘i’ regex = ‘yz[^ei]’ (1) analyzable (2) Byzantine (3) Byzantinize (4) Byzantinizes (5) Byzantium (6) unanalyzable (1c) All words that contain “fin” or “phin” regex = ‘fin|phin’ or if you want to be fancy ‘(f|ph)in’ (1) affinities (2) affinity (3) autographing (4) Baffin (5) beefing (6) bluffing (7) briefing (8) briefings (9) burglarproofing (155) sphinx … 165 total Answers to Scrabble in DNA Land (2a) All 7-mers that contain GTGAC regex = ‘GTGAC’ TGTGACG GTGACGT GTGACGG AGTGACC GTGACCA GTGACAG … 48 total (2b) All 7-mers that contain the dimer GC immediately followed by a letter that is not G or a C. regex = ‘GC[^GC]’ GCTCAAT TTGCTCA TGGCACA Pg 1 of 5 BIO 102 Renn_Lab#1 (in-lab handout: answer-key) Name ___________________ AAGGGCT TGAAGCT … 2512 total Answers to Direct Repeats (English Word Play) (3a) Are there any words in which a sequence of two letters is repeated at least three times in the word? regex = ‘(.)(.).*\1\2.*\1\2’ (1) anticompetitive (2) antidisestablishmentarianism (3) confrontation (4) confrontations (5) contentment (6) enlightenment (7) inclining (8) indoctrinating (9) infringing (10) insinuating …23 total Answers to Direct Repeats in DNA Land (4a) Are there any motifs in which a sequence of two nucleotides is repeated at least three times in the motif? regex = ‘(.)(.).*\1\2.*\1\2’ AAAAAAA AAAAAAC AAAAAAG AAAAAAT AAAACAA AAAAGAA AAAATAA AACAAAA AACACAC AAGAAAA AAGAGAG AATAAAA AATATAT ACAACAC ACACAAC … 232 total Answers for “Mirror Repeats also called palindromes in English (English Word Play) (5a) Are there any words in which three consecutive letters anywhere in a word are followed by the reverse of those letters anywhere in the word? (Hint: you do not need ^ (start of word) and $ (end of word) for this). regex = ‘(.)(.)(.).*\3\2\1’ (1) addresser (2) addressers (3) amalgamate (4) amalgamated (5) amalgamates Pg 2 of 5 BIO 102 Renn_Lab#1 (in-lab handout: answer-key) Name ___________________ (6) amalgamating (7) amalgamation (8) analyticity (9) assertiveness (10) assesses… 117 total (5b) Are there any words in which a pair of letters is followed by a direct repeat of those pair of letters followed by two occurrences of the reverse of the pair of letters? (Each of the pairs can be zero or more letters apart). e.g., “AB...AB...BA...BA” regex = ‘(.)(.).*\1\2.*\2\1.*\2\1’ senselessness (5c) Are there any words in which this pattern occurs twice: a pair of letters is followed by its reverse of those pair of letters? (Each of the pairs can be zero or more letters apart). regex = ‘(.)(.).*\2\1.*\1\2.*\2\1’ (1) kinnikinnick (Look for this native plant at Eagle Fern Park during David Dalton’s field trip) (2) possessionlessness (3) Wallawalla Answers for Mirror Repeats in DNA Land (6a) Are there any motifs in which three consecutive nucleotides anywhere in a motif are followed by the reverse of those nucleotides anywhere in the motif? (Hint: you do not need ^ (start of motif) and $ (end of motif) for this). regex = ‘(.)(.)(.).*\3\2\1’ CTTTTTT AAAAAAA ATGTGTA GTGAAGT CAAAAAA ATAATAA AAAAAAA TTTTTTT ATTTTTT AAGTGAA … 702 total (6b) Are there any motifs in which a pair of adjacent nucleotides is followed by a direct repeat of those pair of nucleotides followed by two occurrences of the reverse of the pair of nucleotides? (Each of the pairs can be zero or more nucleotides apart), e.g., “GT...GT...TG...TG” regex = ‘(.)(.).*\1\2.*\2\1.*\2\1’ There are none because this regex matches a motif of minimum size of eight(8), but all our motifs are 7-mers! If we had had 8-mers on the list, a match could have been: ACACCACA. (6c) Are there any motifs in which this pattern occurs twice: a pair of adjacent nucleotides is followed by its reverse of those pair of nucleotides? (Each of the pairs can be zero or more nucleotides apart). regex = ‘(.)(.).*\2\1.*\1\2.*\2\1’ Pg 3 of 5 BIO 102 Renn_Lab#1 (in-lab handout: answer-key) Name ___________________ There are none because this regex matches a motif of minimum size of eight(8), but all our motifs are 7-mers! Answers for Pattern Matching in Large Texts. (9a) How many times do you think Darwin used the word “evolution” in his text? I though a lot but I’m wrong.. keep working (9b) Write a regex to find the correct answer. regex = ‘evolution’ 12 times {(15) but 2, 12, 13 are missing because the word occurs 2 times in that paragraph.} ( see 9e it really should be ‘/bevolution’ 10 times) (9c) Write a regex that will find any term related to evolution. regex = ‘evol.*’ (9d) How many are there? 26 times (includes 1 in the references) {(35) but several are missing because the word occurs 2 times in that paragraph} (9e) Look carefully. Did you get the words you weren’t expecting (“revolved” “revolution” )? YES (9f) Write a regex that will avoid these words? (Hint \s means “white space”) regex = ‘\bevol.*’ or ‘\sevol.*’ Pattern Matching for Protein Sequences (10a) Write a regex to find how many proteins in E.coli include the protein code GENE. regex = ‘GENE’ 9 (10b) Write a regex to find out how many proteins in E.coli include a string at least 10 of amino acids made up of DARWIN in any order. regex = ‘[DARWIN]{10,}’ 9 (10c) Write a regex to find proteins in E.coli include …. (make up your own) (10d) Is your name part of E.coli proteome? (If you don’t know what the word There is no amino acid with the code Z or U so SUZY doesn’t exist but SVSY is there 10 times. (10e) Write a regex for an amino acid pattern that is likely to form a long (10 amino acid) alpha helix secondary structure. regex = ‘P[MALEK]{10,}’ 4 Answers for Pattern Matching for Protein Sequences in DNA (11a) Write a regular expression that will search the genomic sequence for the possible Pg 4 of 5 BIO 102 Renn_Lab#1 (in-lab handout: answer-key) amino acid sequence GENE. regex = GG[AGCT]GA[AT]AA[CT]GA[AT] Name ___________________ 20 (11b) Do you find this sequence the same number of times in the DNA as you do when you search the Protein file for GENE? No found it one extra time If not, what is the reason? Searching DNA does not require the sequence to be in the correct “reading frame” (note: because this is a simple perl program we are really only searching by line, any patter that spans a “return” and is therefore broken up onto two lines will not be found by this script. There are easy ways to fix this) (11c) Write a regular expression to find a Lysine rich region. regex = (AA[AG]{3,} Lysine rich region could also be a Lysine every other amino acid (KNKNKN) (AA[AG].{3}){4,} the .{3} specifies the intervening codon. To allow some but not all amino acids to occupy this “lysine rich region” we would have to write the regex for each one allowed and separate them with the OR symbol | called “pipe”. (11d) Why might a researcher be interested in looking for secondary structure in a given DNA sequence rather than looking directly at an amino acid sequence? This is one way to find VERY ancient homology of genes. The DNA code may be very different, and even a predicted protein code may be highly diverged but the secondary structure of two gene products may be similar. (11e) Why might a researcher want to write their own program rather than using a webbased tool? See the bottom of the Protein matching page Pg 5 of 5