Download Answers to Scoring in Scrabble (English Word Play)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metalloprotein wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Protein wikipedia , lookup

Metabolism wikipedia , lookup

Non-coding DNA wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene expression wikipedia , lookup

Community fingerprinting wikipedia , lookup

Expression vector wikipedia , lookup

Gene wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Proteolysis wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Biochemistry wikipedia , lookup

Protein structure prediction wikipedia , lookup

Genetic code wikipedia , lookup

Point mutation wikipedia , lookup

Biosynthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Transcript
BIO 102 Renn_Lab#1 (in-lab handout: answer-key)
Name ___________________
Answers to Scoring in Scrabble (English Word Play)
(1a) All words that contain ‘ghi’
regex = ‘ghi’
(1) coughing
(2) laughing
(3) laughingly
(4) laughingstock
(5) outweighing
(6) sighing
(7) weighing
(8) weighings
(1b) All words with ‘yz’ that are not immediately followed by an ‘e’ or an ‘i’
regex = ‘yz[^ei]’
(1) analyzable
(2) Byzantine
(3) Byzantinize
(4) Byzantinizes
(5) Byzantium
(6) unanalyzable
(1c) All words that contain “fin” or “phin”
regex = ‘fin|phin’ or if you want to be fancy ‘(f|ph)in’
(1) affinities
(2) affinity
(3) autographing
(4) Baffin
(5) beefing
(6) bluffing
(7) briefing
(8) briefings
(9) burglarproofing
(155) sphinx … 165 total
Answers to Scrabble in DNA Land
(2a) All 7-mers that contain GTGAC
regex = ‘GTGAC’
TGTGACG
GTGACGT
GTGACGG
AGTGACC
GTGACCA
GTGACAG … 48 total
(2b) All 7-mers that contain the dimer GC immediately followed by a letter that is not G
or a C.
regex = ‘GC[^GC]’
GCTCAAT
TTGCTCA
TGGCACA
Pg 1 of 5
BIO 102 Renn_Lab#1 (in-lab handout: answer-key)
Name ___________________
AAGGGCT
TGAAGCT … 2512 total
Answers to Direct Repeats (English Word Play)
(3a) Are there any words in which a sequence of two letters is repeated at least three
times in the word?
regex = ‘(.)(.).*\1\2.*\1\2’
(1) anticompetitive
(2) antidisestablishmentarianism
(3) confrontation
(4) confrontations
(5) contentment
(6) enlightenment
(7) inclining
(8) indoctrinating
(9) infringing
(10) insinuating …23 total
Answers to Direct Repeats in DNA Land
(4a) Are there any motifs in which a sequence of two nucleotides is repeated at least three
times in the motif?
regex = ‘(.)(.).*\1\2.*\1\2’
AAAAAAA
AAAAAAC
AAAAAAG
AAAAAAT
AAAACAA
AAAAGAA
AAAATAA
AACAAAA
AACACAC
AAGAAAA
AAGAGAG
AATAAAA
AATATAT
ACAACAC
ACACAAC … 232 total
Answers for “Mirror Repeats also called palindromes in English (English Word
Play)
(5a) Are there any words in which three consecutive letters anywhere in a word are
followed by the reverse of those letters anywhere in the word?
(Hint: you do not need ^ (start of word) and $ (end of word) for this).
regex = ‘(.)(.)(.).*\3\2\1’
(1) addresser
(2) addressers
(3) amalgamate
(4) amalgamated
(5) amalgamates
Pg 2 of 5
BIO 102 Renn_Lab#1 (in-lab handout: answer-key)
Name ___________________
(6) amalgamating
(7) amalgamation
(8) analyticity
(9) assertiveness
(10) assesses… 117 total
(5b) Are there any words in which a pair of letters is followed by a direct repeat of those
pair of letters followed by two occurrences of the reverse of the pair of letters?
(Each of the pairs can be zero or more letters apart). e.g., “AB...AB...BA...BA”
regex = ‘(.)(.).*\1\2.*\2\1.*\2\1’
senselessness
(5c) Are there any words in which this pattern occurs twice: a pair of letters is followed
by its reverse of those pair of letters? (Each of the pairs can be zero or more letters apart).
regex = ‘(.)(.).*\2\1.*\1\2.*\2\1’
(1) kinnikinnick
(Look for this native plant at Eagle Fern Park
during David Dalton’s field trip)
(2) possessionlessness
(3) Wallawalla
Answers for Mirror Repeats in DNA Land
(6a) Are there any motifs in which three consecutive nucleotides anywhere in a motif are
followed by the reverse of those nucleotides anywhere in the motif?
(Hint: you do not need ^ (start of motif) and $ (end of motif) for this).
regex = ‘(.)(.)(.).*\3\2\1’
CTTTTTT
AAAAAAA
ATGTGTA
GTGAAGT
CAAAAAA
ATAATAA
AAAAAAA
TTTTTTT
ATTTTTT
AAGTGAA … 702 total
(6b) Are there any motifs in which a pair of adjacent nucleotides is followed by a direct
repeat of those pair of nucleotides followed by two occurrences of the reverse of the pair
of nucleotides? (Each of the pairs can be zero or more nucleotides apart), e.g.,
“GT...GT...TG...TG”
regex = ‘(.)(.).*\1\2.*\2\1.*\2\1’
There are none because this regex matches a motif of minimum size of eight(8), but all
our motifs are 7-mers! If we had had 8-mers on the list, a match could have been:
ACACCACA.
(6c) Are there any motifs in which this pattern occurs twice: a pair of adjacent
nucleotides is followed by its reverse of those pair of nucleotides? (Each of the pairs can
be zero or more nucleotides apart).
regex = ‘(.)(.).*\2\1.*\1\2.*\2\1’
Pg 3 of 5
BIO 102 Renn_Lab#1 (in-lab handout: answer-key)
Name ___________________
There are none because this regex matches a motif of minimum size of eight(8), but all
our motifs are 7-mers!
Answers for Pattern Matching in Large Texts.
(9a) How many times do you think Darwin used the word “evolution” in his text?
I though a lot but I’m wrong.. keep working
(9b) Write a regex to find the correct answer.
regex = ‘evolution’
12 times {(15) but 2, 12, 13 are
missing because the word occurs 2 times in that paragraph.}
( see 9e it really should be ‘/bevolution’ 10 times)
(9c) Write a regex that will find any term related to evolution.
regex = ‘evol.*’
(9d) How many are there?
26 times (includes 1 in the references) {(35) but several
are missing because the word occurs 2 times in that
paragraph}
(9e) Look carefully. Did you get the words you weren’t expecting (“revolved”
“revolution” )? YES
(9f) Write a regex that will avoid these words? (Hint \s means “white space”)
regex = ‘\bevol.*’ or ‘\sevol.*’
Pattern Matching for Protein Sequences
(10a) Write a regex to find how many proteins in E.coli include the protein code GENE.
regex = ‘GENE’
9
(10b) Write a regex to find out how many proteins in E.coli include a string at least 10 of
amino acids made up of DARWIN in any order.
regex = ‘[DARWIN]{10,}’
9
(10c) Write a regex to find proteins in E.coli include …. (make up your own)
(10d) Is your name part of E.coli proteome? (If you don’t know what the word
There is no amino acid with the code Z or U so SUZY doesn’t exist but SVSY is
there 10 times.
(10e) Write a regex for an amino acid pattern that is likely to form a long (10 amino acid)
alpha helix secondary structure.
regex = ‘P[MALEK]{10,}’
4
Answers for Pattern Matching for Protein Sequences in DNA
(11a) Write a regular expression that will search the genomic sequence for the possible
Pg 4 of 5
BIO 102 Renn_Lab#1 (in-lab handout: answer-key)
amino acid sequence GENE.
regex = GG[AGCT]GA[AT]AA[CT]GA[AT]
Name ___________________
20
(11b) Do you find this sequence the same number of times in the DNA as you do when
you search the Protein file for GENE? No found it one extra time
If not, what is the reason?
Searching DNA does not require the sequence to be in the correct “reading frame”
(note: because this is a simple perl program we are really only searching by line, any
patter that spans a “return” and is therefore broken up onto two lines will not be found by
this script. There are easy ways to fix this)
(11c) Write a regular expression to find a Lysine rich region.
regex = (AA[AG]{3,}
Lysine rich region could also be a Lysine every other amino acid (KNKNKN)
(AA[AG].{3}){4,}
the .{3} specifies the intervening codon.
To allow some but not all amino acids to occupy this “lysine rich region” we would have
to write the regex for each one allowed and separate them with the OR symbol | called
“pipe”.
(11d) Why might a researcher be interested in looking for secondary structure in a given
DNA sequence rather than looking directly at an amino acid sequence?
This is one way to find VERY ancient homology of genes. The DNA code may be
very different, and even a predicted protein code may be highly diverged but the
secondary structure of two gene products may be similar.
(11e) Why might a researcher want to write their own program rather than using a webbased tool?
See the bottom of the Protein matching page
Pg 5 of 5