* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presentation: Computation to Solve Problems
Epigenetics of human development wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Copy-number variation wikipedia , lookup
RNA silencing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Transposable element wikipedia , lookup
Microevolution wikipedia , lookup
Human genome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Designer baby wikipedia , lookup
Gene nomenclature wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microsatellite wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Genome editing wikipedia , lookup
Metagenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding Today is the last class. Would please you tell students: 1. Please submit all assignments. Last assignment due today because your class has no assignment. 2. Please finish the course evaluation. May 2 Apr 4 Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding (Through software tools) An alternative Lives of the Scientist World’s Greatest Explorer World’s Greatest Musicologist Expect = 4e-98 World’s Greatest Microbiologist 3337901 3337951 3338001 3338051 3338101 3338151 3338201 3338251 3338301 3338351 3338401 3338451 3338501 3338551 3338601 3338651 3338701 3338751 3338801 3338851 TACACCAGAT ACGGTTACGA TTGCCGATTA GTGTATTGAA AACAACTGCT GGGGCAGGGA ATTGCAATTC CAATCCCTGA GTAAAAACGC TTGCAATGTT TCAATCCCTG TTGATTTTTT CAATTTTTTG ATCCCTGATA AATAGTCCCA AATTTTGTGT TGATAGGGAT GGATATGCAC CATATCTCCA GATTTTGATG ATTGATGTCG AACAACAAGC TTGCACTGAC GCGGGTGCAT TTCACTCTTG GCCGTTGCAA CTCCTTCCTC TAGGGATTTT TAAAGGTTTA AAACTGGTCT ATAGGGATTT TGACCATGTT GGGAAGAGGT GGGATTTTGA TTAGATGTTT TACTTGAATT TTTGATGAAT GTTTCAATCC TCCAACTGTA AATTGCAATC TTTTGATGGA TTAATCCGCC AGCTAAAGCC CAGACTACAT CGTGTTTGGC CTATTTCAAC TGGCTCTGCC GATGAATTGC GTTTCAATCC GCTTTGCCGA TGATGAATTG TCAATCCCTG AATCTGAAAC TGAATTGCAA CAATCCCTGA ACTTTGTTGT TGCAATCAGC CTGATAGGGA TTCAGCTGAA TTCGGCATAA TGTAATGATG AAAACGAGCA ATGCAAGGCG CACCAAACCC TATACCGTTA CCTAATAGGG ACCGTTCAGC AATATATTAT CTGATAGGGA TACCCAAATA CAATGAAATC ATAGGGATTT AGAATTTAGT TGTTGTTACT TAGGGATTTT AATATGCTGG AACGTATGCT TTTTGATGAA AAGTTTCAAT CCATTCTTCC CCAGAAATGG ATTTAAATCT ATCGCGAGAA GTAGATACTG ATTGGGGCAG ATTTTGATGA AACTTGGTTT TTCACAACTG TTTTGATGAA TTGCTAGGTT AGAAACATCT TGATGAATTG ATTTGTTTCA TAATCCGTCA GATGAATTGC TTTCAATCCC GTGGGATGCT TTGCAATTTG CCCTGATAGG ACCTCCAGTA AATAAAGCTTTACAAA CCAAACTCTGGCTTCA ATTGTGTAACCCAAGC TTTGATTCTTTCCTCTG TTAAATCGGATTGATT ATCTTCATCAAGGGCA AGACCTACAAATTTAC TCTACTTATA AAGAGTCTGT TTCTGTCTGC TGGATTTCGG GAACCTTAGT CTCCGTAAAC TGAATAAACT AAGAGTTTAA AAACCTGTAT TTATATATTT CCCCAGCTGT GACAGCACTG GCTGAAATTC CCCTGCACCA ATGAATGACT TTCAATCCAC TGAATGAACA TCTGACCTCT AACTCTAGCC GACTTCTGCT CTCTAACATG TTGTTAAAGG AGTTAAAAAC GGTTACATGA TAAGAAATTA CATTAAAAAG ACCCTCAAGA CGCTGAGAGC GGTCTTTCCT GAACGAACGA AGGGCTACAC CATACATGGT GGCAGCTTTC TGCCCCACTC ATACCAAAGT ATGTCAGCAA TACAAATGAA GAATTGCAGT ACTGCCTAAA ATTGCAATTA AGGCAAATAC AGGCACCGGC AGAGTGGTAC GTGGGCACTG TTGAATGAAA Globin Blast Expect = 4e-98 AATAAAGCTTTACAAA CCAAACTCTGGCTTCA Program the computer ATTGTGTAACCCAAGC TTTGATTCTTTCCTCTG TTAAATCGGATTGATT ATCTTCATCAAGGGCA AGACCTACAAATTTAC Biology researchers do not program Program the computer 10 Biology and Microbiology Depts at major universities Why hasn't it happened? Programming languages An alternative Lives of the Scientist (Part II) Repeated sequences bacterial genomes Genome of E. coli K12 str MG1655 genes genes REP sequences Algorithm to extract REP sequences Pattern Algorithm to extract REP sequences Pattern " " Algorithm to extract REP sequences Pattern "repeat_region " Algorithm to extract REP sequences Pattern "repeat_region " Algorithm to extract REP sequences Pattern "repeat_region " Special symbols ... As many of previous character as possible Algorithm to extract REP sequences Pattern "repeat_region ... Special symbols ... As many of previous character as possible " Algorithm to extract REP sequences Pattern "repeat_region ... Special symbols ... # As many of previous character as possible A single digit " Algorithm to extract REP sequences Pattern "repeat_region ...# Special symbols ... # As many of previous character as possible A single digit " Algorithm to extract REP sequences Pattern "repeat_region ...#... Special symbols ... # As many of previous character as possible A single digit " Algorithm to extract REP sequences Pattern "repeat_region ...#... Special symbols ... # () As many of previous character as possible A single digit Capture what's inside " Algorithm to extract REP sequences Pattern "repeat_region ...(#...) Special symbols ... # () As many of previous character as possible A single digit Capture what's inside " Algorithm to extract REP sequences Pattern "repeat_region ...(#...) Special symbols ... # () As many of previous character as possible A single digit Capture what's inside " Algorithm to extract REP sequences Pattern "repeat_region ...(#...) Special symbols ... # () * As many of previous character as possible A single digit Capture what's inside Any character " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)** Special symbols ... # () * As many of previous character as possible A single digit Capture what's inside Any character " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...) Special symbols ... # () * As many of previous character as possible A single digit Capture what's inside Any character " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)* Special symbols ... # () * As many of previous character as possible A single digit Capture what's inside Any character " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)* Special symbols ... # () * .. As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*.. Special symbols ... # () * .. As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*.. Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' " Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..' Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' '" Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'( Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' )'" Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'(*..)'" Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' We start Go to: www.people.vcu.edu/~elhaij Click: MICR 653 Using Firefox www.people.vcu.edu/~elhaij Click MICR 653 biobike.csbc.vcu.edu Function palette Workspace Results window General Syntax of BioBIKE Function-name Argument (object) Keyword object The basic unit of BioBIKE is the function box. It consists of the name of a function, perhaps one or more required arguments, and optional keywords and flags. A function may be thought of as a black box: you feed it information, it produces a product. Flag General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag Function boxes contain the following elements: • Function-name (e.g. SEQUENCE-OF or LENGTH-OF) • Argument: Required, acted on by function • Keyword clause: Optional, more information • Flag: Optional, more (yes/no) information General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag … and icons to help you work with functions: • Option icon: Brings up a menu of keywords and flags • Action icon: Brings up a menu enabling you to execute a function, copy and paste, information, get help, etc Clear/Delete icon: Removes information you entered or removes box entirely • Functions Sin Angle Sin (angle) Functions Length Entity Functions Length Entity "icahLnlna bormA" Abraham Lincoln "Abraham Lincoln" variable vs literal 14 192 14 Functions Length Entity "icahLnlna bormA" Abraham Lincoln "Abraham Lincoln" 14 192 14 US-presidents 44 list vs single value Functions Length Entity "icahLnlna bormA" Abraham Lincoln "Abraham Lincoln" US-presidents 14 192 14 (188 17044 189 163 …) single application of a function vs iteration of a function Functions Arcsin Sin Angle Angle Functions Arcsin Sin (angle) Nested functions Evaluated from the inside out A box is replaced by its value Angle Functions "transposase" Gene (npf0076) Functions Gene (npf0076) Nested functions Evaluated from the inside out A box is replaced by its value Pitfalls (the most common error in the language) Gene (npf0076) CLOSE BOXES BEFORE EXECUTING White is incompatible with execution Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'(*..)'" Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' Algorithm to extract REP sequences s Pattern "repeat_region ...(#...)**(#...)*..'(*..)'" Special symbols ... # () * .. ' As many of previous character as possible A single digit Capture what's inside Any character As few of previous character as necessary ' or '' Mining files for data Pattern matching • Quick and easy • Highly flexible • Works great BUT... • Unforgiving (1 mismatch death) Conserved motifs of methyltransferases Pattern "[DS]PP[YF]" Special symbols [ ] Character set Searching for conserved motifs Pattern matching • Quick and easy • Unforgiving (1 mismatch death) • Ignores lots of information Position-specific scoring matrices (PSSMs) Searching for conserved motifs Pattern matching • Quick and easy • Unforgiving (1 mismatch death) • Ignores lots of information Position-specific scoring matrices (PSSMs) • Needs training set What if you don’t have one? Lives of the Scientist (Part III) What to do with no training set? New pattern discovery (Meme, Gibbs sampler, BioProspector) Human sequences 5’ to transcriptional start snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 nucleolin snRNP E rp S14 rp S17 ribosomal p. S19 a'-tubulin ba'1 b'-tubulin b'2 a'-actin skel-m. a'-cardiac actin b'-actin AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA “TATA box”? How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches GACAGGGCAGAA GCCCGGGTGTTT GCCGGGGACGCG GCCCCCGGGCCT GCCGCAGAGCTG A C G T 1 0.0 0.0 1.0 0.0 2 0.2 0.8 0.0 0.0 3 0.0 1.0 0.0 0.0 4 0.2 0.4 0.4 0.0 5 0.0 0.4 0.6 0.0 6 0.2 0.2 0.6 0.0 7 0.0 0.0 1.0 0.0 8 0.4 0.2 0.2 0.2 9 0.2 0.2 0.6 0.0 10 0.0 0.4 0.4 0.2 11 0.2 0.4 0.0 0.4 12 0.2 0.0 0.4 0.4 How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table GACAGGGCAGAA GCCCGGGTGTTT GCCGGGGACGCG GCCCCCGGGCCT GCCGCAGAGCTG A C G T 1 0.0 0.0 1.0 0.0 2 0.2 0.8 0.0 0.0 3 0.0 1.0 0.0 0.0 4 0.2 0.4 0.4 0.0 5 0.0 0.4 0.6 0.0 6 0.2 0.2 0.6 0.0 7 0.0 0.0 1.0 0.0 8 0.4 0.2 0.2 0.2 9 0.2 0.2 0.6 0.0 10 0.0 0.4 0.4 0.2 11 0.2 0.4 0.0 0.4 12 0.2 0.0 0.4 0.4 How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. If probability score high, remember pattern and score How does Meme work? snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. If probability score high, remember pattern and score Step 6. Repeat Steps 1 - 5 What to do with no training set? New pattern discovery (Meme, Gibbs sampler, BioProspector) Human sequences 5’ to transcriptional start snRNA U1 (pU1-6) histone H1t HMG-14 TP1 protamine P1 nucleolin snRNP E rp S14 rp S17 ribosomal p. S19 a'-tubulin ba'1 b'-tubulin b'2 a'-actin skel-m. a'-cardiac actin b'-actin AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Searching for conserved motifs Pattern matching • Quick and easy • Unforgiving (1 mismatch death) • Ignores lots of information Position-specific scoring matrices (PSSMs) • Needs training set Meme, Gibbs sampler, et al (PSSM in reverse) • Relatively unbiased • Can't easily handle variable-length gaps Moral of the Stories Are you comfortable using programming in the service of your research? I have had some R experience… However, I am still a Novice I have no experience in programming I have zero experience in computer programming before this class I am about 60% confidant in using python I have experience using Python, Java, Unix & DOS environments, R, mySQL/SQL, and SAS None…This is beyond my responsibilities in the lab. Using Firefox www.people.vcu.edu/~elhaij Click MICR 653 Scientific Questions I. What determines the beginning of a gene? Scientific Questions I. What determines the beginning of a gene? Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? HIV Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data Metabolic correlates to N-deprivation What enzymes of carbon metabolism are affected by N-starvation? Glycogen metabolism Pentose Phosphate Pathway Cyanobacteria use primarily the reactions of the Pentose Phosphate Pathway to break down glucose Carbon fixation Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data RNAseq Measuring RNA through Microarrays Spot RNA from cell type #1 + RNA from cell type #2 Scan for red fluorescence Scan for green fluorescence Combine images Type #1 RNA > Type #2 RNA Type #2 RNA > Type #1 RNA Type #1 RNA Type #2 RNA Courtesy of Inst. für Hormon-und Fortpflanzungsforschung, Universität Hamburg Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data different conditions or different replicates Difference in intensity chip to chip Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data different conditions or different replicates Difference in intensity chip to chip Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria k k g1 k g2 k g3 k g4 GTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACAATAACTGGATAGCACTAGCAGAAGGGCTAGAAGGTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACGTAT Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria k k g1 k g2 k g3 k g4 GTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACAATAACTGGATAGCACTAGCAGAAGGGCTAGAAGGTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACGTAT Scientific Questions VI. Finding targets for DNA-binding proteins Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria VI. Finding targets for DNA-binding proteins (targets known) VII. Finding targets for DNA-binding proteins (genes known)