Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding 15. Apr 27 20 Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding An alternative Lives of the Scientist Expect = 4e-98 3337901 3337951 3338001 3338051 3338101 3338151 3338201 3338251 3338301 3338351 3338401 3338451 3338501 3338551 3338601 3338651 3338701 3338751 3338801 3338851 TACACCAGAT ACGGTTACGA TTGCCGATTA GTGTATTGAA AACAACTGCT GGGGCAGGGA ATTGCAATTC CAATCCCTGA GTAAAAACGC TTGCAATGTT TCAATCCCTG TTGATTTTTT CAATTTTTTG ATCCCTGATA AATAGTCCCA AATTTTGTGT TGATAGGGAT GGATATGCAC CATATCTCCA GATTTTGATG ATTGATGTCG AACAACAAGC TTGCACTGAC GCGGGTGCAT TTCACTCTTG GCCGTTGCAA CTCCTTCCTC TAGGGATTTT TAAAGGTTTA AAACTGGTCT ATAGGGATTT TGACCATGTT GGGAAGAGGT GGGATTTTGA TTAGATGTTT TACTTGAATT TTTGATGAAT GTTTCAATCC TCCAACTGTA AATTGCAATC TTTTGATGGA TTAATCCGCC AGCTAAAGCC CAGACTACAT CGTGTTTGGC CTATTTCAAC TGGCTCTGCC GATGAATTGC GTTTCAATCC GCTTTGCCGA TGATGAATTG TCAATCCCTG AATCTGAAAC TGAATTGCAA CAATCCCTGA ACTTTGTTGT TGCAATCAGC CTGATAGGGA TTCAGCTGAA TTCGGCATAA TGTAATGATG AAAACGAGCA ATGCAAGGCG CACCAAACCC TATACCGTTA CCTAATAGGG ACCGTTCAGC AATATATTAT CTGATAGGGA TACCCAAATA CAATGAAATC ATAGGGATTT AGAATTTAGT TGTTGTTACT TAGGGATTTT AATATGCTGG AACGTATGCT TTTTGATGAA AAGTTTCAAT CCATTCTTCC CCAGAAATGG ATTTAAATCT ATCGCGAGAA GTAGATACTG ATTGGGGCAG ATTTTGATGA AACTTGGTTT TTCACAACTG TTTTGATGAA TTGCTAGGTT AGAAACATCT TGATGAATTG ATTTGTTTCA TAATCCGTCA GATGAATTGC TTTCAATCCC GTGGGATGCT TTGCAATTTG CCCTGATAGG ACCTCCAGTA AATAAAGCTTTACAAA CCAAACTCTGGCTTCA ATTGTGTAACCCAAGC TTTGATTCTTTCCTCTG TTAAATCGGATTGATT ATCTTCATCAAGGGCA AGACCTACAAATTTAC TCTACTTATA AAGAGTCTGT TTCTGTCTGC TGGATTTCGG GAACCTTAGT CTCCGTAAAC TGAATAAACT AAGAGTTTAA AAACCTGTAT TTATATATTT CCCCAGCTGT GACAGCACTG GCTGAAATTC CCCTGCACCA ATGAATGACT TTCAATCCAC TGAATGAACA TCTGACCTCT AACTCTAGCC GACTTCTGCT CTCTAACATG TTGTTAAAGG AGTTAAAAAC GGTTACATGA TAAGAAATTA CATTAAAAAG ACCCTCAAGA CGCTGAGAGC GGTCTTTCCT GAACGAACGA AGGGCTACAC CATACATGGT GGCAGCTTTC TGCCCCACTC ATACCAAAGT ATGTCAGCAA TACAAATGAA GAATTGCAGT ACTGCCTAAA ATTGCAATTA AGGCAAATAC AGGCACCGGC AGAGTGGTAC GTGGGCACTG TTGAATGAAA Globin Blast Expect = 4e-98 AATAAAGCTTTACAAA CCAAACTCTGGCTTCA Program the computer ATTGTGTAACCCAAGC TTTGATTCTTTCCTCTG TTAAATCGGATTGATT ATCTTCATCAAGGGCA AGACCTACAAATTTAC Biology researchers do not program Program the computer 10 Biology and Microbiology Depts at major universities Why hasn't it happened? Programming languages An alternative Lives of the Scientist (Part II) Repeated sequences bacterial genomes Genome of E. coli K12 str MG1655 genes genes REP sequences Algorithm to extract REP sequences Pattern Algorithm to extract REP sequences Pattern " " Algorithm to extract REP sequences Pattern "repeat_region " Algorithm to extract REP sequences Pattern "repeat_region " Algorithm to extract REP sequences Pattern "repeat_region " Special symbols ... As many of previous character as possible Algorithm to extract REP sequences Pattern "repeat_region ... Special symbols ... As many of previous character as possible " Algorithm to extract REP sequences Pattern "repeat_region ... Special symbols ... # As many of previous character as possible A single digit " Algorithm to extract REP sequences Pattern "repeat_region ...# Special symbols ... # As many of previous character as possible A single digit " Algorithm to extract REP sequences Pattern "repeat_region ...#... Special symbols ... # As many of previous character as possible A single digit " Algorithm to extract REP sequences Pattern "repeat_region ...#... Special symbols ... # () As many of previous character as possible A single digit Capture what's inside " Algorithm to extract REP sequences Pattern "repeat_region ...(#...) Special symbols ... # () As many of previous character as possible A single digit Capture what's inside " Algorithm to extract REP sequences Pattern "repeat_region ...(#...) Special symbols ... # () As many of previous character as possible A single digit Capture what's inside " Algorithm to extract REP sequences Pattern "repeat_region ...(#...) Special symbols ... # () * As many of previous character as possible A single digit Capture what's inside Any character " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)** Special symbols ... # () * As many of previous character as possible A single digit Capture what's inside Any character " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...) Special symbols ... # () * As many of previous character as possible A single digit Capture what's inside Any character " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)* Special symbols ... # () * As many of previous character as possible A single digit Capture what's inside Any character " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)* Special symbols ... # () * .. As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*.. Special symbols ... # () * .. As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*.. Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..' Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' '" Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'( Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' )'" Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'(*..)'" Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' We start Go to: www.people.vcu.edu/~elhaij Click: MICR 653 Using Firefox www.people.vcu.edu/~elhaij Click MICR 653 biobike.csbc.vcu.edu Function palette Workspace Results window General Syntax of BioBIKE Function-name Argument (object) Keyword object The basic unit of BioBIKE is the function box. It consists of the name of a function, perhaps one or more required arguments, and optional keywords and flags. A function may be thought of as a black box: you feed it information, it produces a product. Flag General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag Function boxes contain the following elements: • Function-name (e.g. SEQUENCE-OF or LENGTH-OF) • Argument: Required, acted on by function • Keyword clause: Optional, more information • Flag: Optional, more (yes/no) information General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag … and icons to help you work with functions: • Option icon: Brings up a menu of keywords and flags • Action icon: Brings up a menu enabling you to execute a function, copy and paste, information, get help, etc Clear/Delete icon: Removes information you entered or removes box entirely • Functions Sin Angle Sin (angle) Functions Length Entity Functions Length Entity "icahLnlna bormA" Abraham Lincoln "Abraham Lincoln" variable vs literal 14 192 14 Functions Length Entity "icahLnlna bormA" Abraham Lincoln "Abraham Lincoln" 14 192 14 US-presidents 44 list vs single value Functions Length Entity "icahLnlna bormA" Abraham Lincoln "Abraham Lincoln" US-presidents 14 192 14 (188 17044 189 163 …) single application of a function vs iteration of a function Functions Arcsin Sin Angle Angle Functions Arcsin Sin (angle) Nested functions Evaluated from the inside out A box is replaced by its value Angle Functions "transposase" Gene (npf0076) Functions Gene (npf0076) Nested functions Evaluated from the inside out A box is replaced by its value Pitfalls (the most common error in the language) Gene (npf0076) CLOSE BOXES BEFORE EXECUTING White is incompatible with execution Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'(*..)'" Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' Mining files for data Pattern matching • Quick and easy • Highly flexible • Works great BUT... Searching for conserved motifs Pattern matching • Quick and easy • Unforgiving (1 mismatch death) Conserved motifs of methyltransferases Pattern "[DS]PP[YF]" Special symbols [ ] Character set Searching for conserved motifs Pattern matching • Quick and easy • Unforgiving (1 mismatch death) • Ignores lots of information Position-specific scoring matrices (PSSMs) Searching for conserved motifs Pattern matching • Quick and easy • Unforgiving (1 mismatch death) • Ignores lots of information Position-specific scoring matrices (PSSMs) • Needs training set What if you don’t have one? Lives of the Scientist (Part III) What to do with no training set? New pattern discovery (Meme, Gibbs sampler, BioProspector) Human sequences 5’ to transcriptional start snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 nucleolin snRNP E rp S14 rp S17 ribosomal p. S19 a'-tubulin ba'1 b'-tubulin b'2 a'-actin skel-m. a'-cardiac actin b'-actin AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches GACAGGGCAGAA GCCCGGGTGTTT GCCGGGGACGCG GCCCCCGGGCCT GCCGCAGAGCTG A C G T 1 0.0 0.0 1.0 0.0 2 0.2 0.8 0.0 0.0 3 0.0 1.0 0.0 0.0 4 0.2 0.4 0.4 0.0 5 0.0 0.4 0.6 0.0 6 0.2 0.2 0.6 0.0 7 0.0 0.0 1.0 0.0 8 0.4 0.2 0.2 0.2 9 0.2 0.2 0.6 0.0 10 0.0 0.4 0.4 0.2 11 0.2 0.4 0.0 0.4 12 0.2 0.0 0.4 0.4 How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table GACAGGGCAGAA GCCCGGGTGTTT GCCGGGGACGCG GCCCCCGGGCCT GCCGCAGAGCTG A C G T 1 0.0 0.0 1.0 0.0 2 0.2 0.8 0.0 0.0 3 0.0 1.0 0.0 0.0 4 0.2 0.4 0.4 0.0 5 0.0 0.4 0.6 0.0 6 0.2 0.2 0.6 0.0 7 0.0 0.0 1.0 0.0 8 0.4 0.2 0.2 0.2 9 0.2 0.2 0.6 0.0 10 0.0 0.4 0.4 0.2 11 0.2 0.4 0.0 0.4 12 0.2 0.0 0.4 0.4 How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. If probability score high, remember pattern and score How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. If probability score high, remember pattern and score Step 6. Repeat Steps 1 - 5 What to do with no training set? New pattern discovery (Meme, Gibbs sampler, BioProspector) Human sequences 5’ to transcriptional start snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 nucleolin snRNP E rp S14 rp S17 ribosomal p. S19 a'-tubulin ba'1 b'-tubulin b'2 a'-actin skel-m. a'-cardiac actin b'-actin AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Searching for conserved motifs Pattern matching • Quick and easy • Unforgiving (1 mismatch death) • Ignores lots of information Position-specific scoring matrices (PSSMs) • Needs training set Meme, Gibbs sampler, et al (PSSM in reverse) • Relatively unbiased • Can't easily handle variable-length gaps Moral of the Stories Are you comfortable using programming in the service of your research? I have absolutely no experience in computer programming Are you comfortable using programming in the service of your research? I have a well defined background in programming I do not have any previous experience with computer programming I have minimal knowledge... Yes I can program in python I have absolutely no experience... I hope to gain more experience with programs used in bioinformatics. Please briefly describe the nature of your research What more do you hope to gain before the semester ends? Most classes are a lecture followed by a short instruction on how to do the assignment. This had not provided enough time for me to appreciate the programs being used. I want more hands-on time with the computer. Using Firefox www.people.vcu.edu/~elhaij Click MICR 653 Scientific Questions Scientific Questions I. What determines the beginning of a gene? Scientific Questions I. What determines the beginning of a gene? Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? HIV Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data RNAseq Measuring RNA through Microarrays Spot RNA from cell type #1 + RNA from cell type #2 Scan for red fluorescence Scan for green fluorescence Combine images Type #1 RNA > Type #2 RNA Type #2 RNA > Type #1 RNA Type #1 RNA Type #2 RNA Courtesy of Inst. für Hormon-und Fortpflanzungsforschung, Universität Hamburg Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data different conditions or different replicates Difference in intensity chip to chip Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data different conditions or different replicates Difference in intensity chip to chip Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data different conditions or different replicates Difference in intensity chip to chip Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria k k g1 k g2 k g3 k g4 GTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACAATAACTGGATAGCACTAGCAGAAGGGCTAGAAGGTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACGTAT Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria k k g1 k g2 k g3 k g4 GTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACAATAACTGGATAGCACTAGCAGAAGGGCTAGAAGGTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACGTAT Scientific Questions VI. Finding targets for DNA-binding proteins Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria VI. Finding targets for DNA-binding proteins OR3 OR2 cro CTTTTTTGTGCTCATACGTTAAATCTATCACCGCAAGGGATAAATATCTAACACCGTGCGTGTTGACTATTTTACCTCTGGCGGTGATAATGGTTGCATGTACTAAGGAGGTTGTATGGAACAACGCATA GAAAAAACACGAGTATGCAATTTAGATAGTGGCGTTCCCTATTTATAGATTGTGGCACGCACAACTGATAAAATGGAGACCGCCACTATTACCAACGTACATGATTCCTCCAACATACCTTGTTGCGTAT cI PRM OR1 Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria VI. Finding targets for DNA-binding proteins CTTTTTTGTGCTCATACGTTAAATCTATCACCGCAAGGGATAAATATCTAACACCGTGCGTGTTGACTATTTTACCTCTGGCGGTGATAATGGTTGCATGTACTAAGGAGGTTGTATGGAACAACGCATA GAAAAAACACGAGTATGCAATTTAGATAGTGGCGTTCCCTATTTATAGATTGTGGCACGCACAACTGATAAAATGGAGACCGCCACTATTACCAACGTACATGATTCCTCCAACATACCTTGTTGCGTAT Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria VI. Finding targets for DNA-binding proteins CTTTTTTGTGCTCATACGTTAAATCTATCACCGCAAGGGATAAATATCTAACACCGTGCGTGTTGACTATTTTACCTCTGGCGGTGATAATGGTTGCATGTACTAAGGAGGTTGTATGGAACAACGCATA GAAAAAACACGAGTATGCAATTTAGATAGTGGCGTTCCCTATTTATAGATTGTGGCACGCACAACTGATAAAATGGAGACCGCCACTATTACCAACGTACATGATTCCTCCAACATACCTTGTTGCGTAT Scientific Questions