Download Presentation: Computation to Solve Problems

Document related concepts

Epigenetics of human development wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene wikipedia , lookup

RNA silencing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genomics wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Transposable element wikipedia , lookup

Microevolution wikipedia , lookup

Human genome wikipedia , lookup

Non-coding DNA wikipedia , lookup

Designer baby wikipedia , lookup

Gene nomenclature wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microsatellite wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Genome editing wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Welcome to
Advanced Molecular Genetics,
Bioinformatics, and Computational Genomics
Pattern Recognition and Gene Finding
Today is the last class. Would please you tell students:
1. Please submit all assignments. Last assignment
due today because your class has no assignment.
2. Please finish the course evaluation.
May 2
Apr 4
Welcome to
Advanced Molecular Genetics,
Bioinformatics, and Computational Genomics
Pattern Recognition and Gene Finding
(Through software tools)
An alternative
Lives of the Scientist
World’s Greatest Explorer
World’s Greatest Musicologist
Expect = 4e-98
World’s Greatest Microbiologist
3337901
3337951
3338001
3338051
3338101
3338151
3338201
3338251
3338301
3338351
3338401
3338451
3338501
3338551
3338601
3338651
3338701
3338751
3338801
3338851
TACACCAGAT
ACGGTTACGA
TTGCCGATTA
GTGTATTGAA
AACAACTGCT
GGGGCAGGGA
ATTGCAATTC
CAATCCCTGA
GTAAAAACGC
TTGCAATGTT
TCAATCCCTG
TTGATTTTTT
CAATTTTTTG
ATCCCTGATA
AATAGTCCCA
AATTTTGTGT
TGATAGGGAT
GGATATGCAC
CATATCTCCA
GATTTTGATG
ATTGATGTCG
AACAACAAGC
TTGCACTGAC
GCGGGTGCAT
TTCACTCTTG
GCCGTTGCAA
CTCCTTCCTC
TAGGGATTTT
TAAAGGTTTA
AAACTGGTCT
ATAGGGATTT
TGACCATGTT
GGGAAGAGGT
GGGATTTTGA
TTAGATGTTT
TACTTGAATT
TTTGATGAAT
GTTTCAATCC
TCCAACTGTA
AATTGCAATC
TTTTGATGGA
TTAATCCGCC
AGCTAAAGCC
CAGACTACAT
CGTGTTTGGC
CTATTTCAAC
TGGCTCTGCC
GATGAATTGC
GTTTCAATCC
GCTTTGCCGA
TGATGAATTG
TCAATCCCTG
AATCTGAAAC
TGAATTGCAA
CAATCCCTGA
ACTTTGTTGT
TGCAATCAGC
CTGATAGGGA
TTCAGCTGAA
TTCGGCATAA
TGTAATGATG
AAAACGAGCA
ATGCAAGGCG
CACCAAACCC
TATACCGTTA
CCTAATAGGG
ACCGTTCAGC
AATATATTAT
CTGATAGGGA
TACCCAAATA
CAATGAAATC
ATAGGGATTT
AGAATTTAGT
TGTTGTTACT
TAGGGATTTT
AATATGCTGG
AACGTATGCT
TTTTGATGAA
AAGTTTCAAT
CCATTCTTCC
CCAGAAATGG
ATTTAAATCT
ATCGCGAGAA
GTAGATACTG
ATTGGGGCAG
ATTTTGATGA
AACTTGGTTT
TTCACAACTG
TTTTGATGAA
TTGCTAGGTT
AGAAACATCT
TGATGAATTG
ATTTGTTTCA
TAATCCGTCA
GATGAATTGC
TTTCAATCCC
GTGGGATGCT
TTGCAATTTG
CCCTGATAGG
ACCTCCAGTA
AATAAAGCTTTACAAA
CCAAACTCTGGCTTCA
ATTGTGTAACCCAAGC
TTTGATTCTTTCCTCTG
TTAAATCGGATTGATT
ATCTTCATCAAGGGCA
AGACCTACAAATTTAC
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
Globin
Blast
Expect = 4e-98
AATAAAGCTTTACAAA
CCAAACTCTGGCTTCA
Program the computer
ATTGTGTAACCCAAGC
TTTGATTCTTTCCTCTG
TTAAATCGGATTGATT
ATCTTCATCAAGGGCA
AGACCTACAAATTTAC
Biology researchers do not program
Program the computer
10 Biology and Microbiology
Depts at major universities
Why hasn't it happened?
Programming languages
An alternative
Lives of the Scientist
(Part II)
Repeated sequences bacterial genomes
Genome of
E. coli K12
str MG1655
genes
genes
REP sequences
Algorithm to extract REP sequences
Pattern
Algorithm to extract REP sequences
Pattern
"
"
Algorithm to extract REP sequences
Pattern
"repeat_region
"
Algorithm to extract REP sequences
Pattern
"repeat_region
"
Algorithm to extract REP sequences
Pattern
"repeat_region
"
Special symbols
...
As many of previous character as possible
Algorithm to extract REP sequences
Pattern
"repeat_region ...
Special symbols
...
As many of previous character as possible
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...
Special symbols
...
#
As many of previous character as possible
A single digit
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...#
Special symbols
...
#
As many of previous character as possible
A single digit
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...#...
Special symbols
...
#
As many of previous character as possible
A single digit
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...#...
Special symbols
...
#
()
As many of previous character as possible
A single digit
Capture what's inside
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)
Special symbols
...
#
()
As many of previous character as possible
A single digit
Capture what's inside
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)
Special symbols
...
#
()
As many of previous character as possible
A single digit
Capture what's inside
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)
Special symbols
...
#
()
*
As many of previous character as possible
A single digit
Capture what's inside
Any character
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**
Special symbols
...
#
()
*
As many of previous character as possible
A single digit
Capture what's inside
Any character
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)
Special symbols
...
#
()
*
As many of previous character as possible
A single digit
Capture what's inside
Any character
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)*
Special symbols
...
#
()
*
As many of previous character as possible
A single digit
Capture what's inside
Any character
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)*
Special symbols
...
#
()
*
..
As many of previous character as possible
A single digit
Capture what's inside
Any character
As few of previous character as necessary
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)*..
Special symbols
...
#
()
*
..
As many of previous character as possible
A single digit
Capture what's inside
Any character
As few of previous character as necessary
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)*..
Special symbols
...
#
()
*
..
'
As many of previous character as possible
A single digit
Capture what's inside
Any character
As few of previous character as necessary
' or ''
"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)*..'
Special symbols
...
#
()
*
..
'
As many of previous character as possible
A single digit
Capture what's inside
Any character
As few of previous character as necessary
' or ''
'"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)*..'(
Special symbols
...
#
()
*
..
'
As many of previous character as possible
A single digit
Capture what's inside
Any character
As few of previous character as necessary
' or ''
)'"
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)*..'(*..)'"
Special symbols
...
#
()
*
..
'
As many of previous character as possible
A single digit
Capture what's inside
Any character
As few of previous character as necessary
' or ''
We start
Go to: www.people.vcu.edu/~elhaij
Click: MICR 653
Using Firefox
www.people.vcu.edu/~elhaij
Click MICR 653
biobike.csbc.vcu.edu
Function palette
Workspace
Results window
General Syntax of BioBIKE
Function-name
Argument
(object)
Keyword
object
The basic unit of BioBIKE is the
function box. It consists of the
name of a function, perhaps one or
more required arguments, and
optional keywords and flags.
A function may be thought of as a
black box: you feed it information,
it produces a product.
Flag
General Syntax of BioBIKE
Function-name
Argument
(object)
Keyword
object
Flag
Function boxes contain the
following elements:
• Function-name (e.g. SEQUENCE-OF or LENGTH-OF)
• Argument: Required, acted on by function
• Keyword clause: Optional, more information
• Flag: Optional, more (yes/no) information
General Syntax of BioBIKE
Function-name
Argument
(object)
Keyword
object
Flag
… and icons to help you work with
functions:
•
Option icon: Brings up a menu of keywords and flags
•
Action icon: Brings up a menu enabling you to execute
a function, copy and paste, information, get help, etc
Clear/Delete icon: Removes information you entered
or removes box entirely
•
Functions
Sin
Angle
Sin (angle)
Functions
Length
Entity
Functions
Length
Entity
"icahLnlna bormA"
Abraham Lincoln
"Abraham Lincoln"
variable vs literal
14
192
14
Functions
Length
Entity
"icahLnlna bormA"
Abraham Lincoln
"Abraham Lincoln"
14
192
14
US-presidents
44
list vs single value
Functions
Length
Entity
"icahLnlna bormA"
Abraham Lincoln
"Abraham Lincoln"
US-presidents
14
192
14
(188 17044
189 163 …)
single application of a function
vs
iteration of a function
Functions
Arcsin
Sin
Angle
Angle
Functions
Arcsin
Sin (angle)
Nested functions
Evaluated from the inside out
A box is replaced by its value
Angle
Functions
"transposase"
Gene
(npf0076)
Functions
Gene
(npf0076)
Nested functions
Evaluated from the inside out
A box is replaced by its value
Pitfalls
(the most common error in the language)
Gene
(npf0076)
CLOSE BOXES BEFORE EXECUTING
White is incompatible with execution
Algorithm to extract REP sequences
Pattern
"repeat_region ...(#...)**(#...)*..'(*..)'"
Special symbols
...
#
()
*
..
'
As many of previous character as possible
A single digit
Capture what's inside
Any character
As few of previous character as necessary
' or ''
Algorithm to extract REP sequences
s
Pattern
"repeat_region ...(#...)**(#...)*..'(*..)'"
Special symbols
...
#
()
*
..
'
As many of previous character as possible
A single digit
Capture what's inside
Any character
As few of previous character as necessary
' or ''
Mining files for data
Pattern matching
• Quick and easy
• Highly flexible
• Works great
BUT...
• Unforgiving (1 mismatch  death)
Conserved motifs of methyltransferases
Pattern
"[DS]PP[YF]"
Special symbols
[ ]
Character set
Searching for conserved motifs
Pattern matching
• Quick and easy
• Unforgiving (1 mismatch  death)
• Ignores lots of information
Position-specific scoring matrices (PSSMs)
Searching for conserved motifs
Pattern matching
• Quick and easy
• Unforgiving (1 mismatch  death)
• Ignores lots of information
Position-specific scoring matrices (PSSMs)
• Needs training set
What if you don’t have one?
Lives of the Scientist
(Part III)
What to do with no training set?
New pattern discovery (Meme, Gibbs sampler, BioProspector)
Human sequences 5’ to transcriptional start
snRNA U1 (pU1-6)
histone H1t
HMG-14
TP1
protamine P1
nucleolin
snRNP E
rp S14
rp S17
ribosomal p. S19
a'-tubulin ba'1
b'-tubulin b'2
a'-actin skel-m.
a'-cardiac actin
b'-actin
AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC
GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT
CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG
GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT
CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG
TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT
GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC
TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT
ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT
GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG
GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA
CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC
TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC
CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA
“TATA box”?
How does Meme work?
snRNA U1 (pU1-6)
histone H1t
HMG-14
TP1
protamine P1
AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC
GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT
CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG
GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT
CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
How does Meme work?
snRNA U1 (pU1-6)
histone H1t
HMG-14
TP1
protamine P1
AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC
GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT
CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG
GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT
CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
How does Meme work?
snRNA U1 (pU1-6)
histone H1t
HMG-14
TP1
protamine P1
AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC
GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT
CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG
GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT
CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
Step 3. Construct position-dependent frequency table based on matches
GACAGGGCAGAA
GCCCGGGTGTTT
GCCGGGGACGCG
GCCCCCGGGCCT
GCCGCAGAGCTG
A
C
G
T
1
0.0
0.0
1.0
0.0
2
0.2
0.8
0.0
0.0
3
0.0
1.0
0.0
0.0
4
0.2
0.4
0.4
0.0
5
0.0
0.4
0.6
0.0
6
0.2
0.2
0.6
0.0
7
0.0
0.0
1.0
0.0
8
0.4
0.2
0.2
0.2
9
0.2
0.2
0.6
0.0
10
0.0
0.4
0.4
0.2
11
0.2
0.4
0.0
0.4
12
0.2
0.0
0.4
0.4
How does Meme work?
snRNA U1 (pU1-6)
histone H1t
HMG-14
TP1
protamine P1
AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC
GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT
CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG
GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT
CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
Step 3. Construct position-dependent frequency table based on matches
Step 4. Calculate relative probability of matches from frequency table
GACAGGGCAGAA
GCCCGGGTGTTT
GCCGGGGACGCG
GCCCCCGGGCCT
GCCGCAGAGCTG
A
C
G
T
1
0.0
0.0
1.0
0.0
2
0.2
0.8
0.0
0.0
3
0.0
1.0
0.0
0.0
4
0.2
0.4
0.4
0.0
5
0.0
0.4
0.6
0.0
6
0.2
0.2
0.6
0.0
7
0.0
0.0
1.0
0.0
8
0.4
0.2
0.2
0.2
9
0.2
0.2
0.6
0.0
10
0.0
0.4
0.4
0.2
11
0.2
0.4
0.0
0.4
12
0.2
0.0
0.4
0.4
How does Meme work?
snRNA U1 (pU1-6)
histone H1t
HMG-14
TP1
protamine P1
AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC
GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT
CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG
GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT
CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
Step 3. Construct position-dependent frequency table based on matches
Step 4. Calculate relative probability of matches from frequency table
Step 5. If probability score high, remember pattern and score
How does Meme work?
snRNA U1 (pU1-6)
histone H1t
HMG-14
TP1
protamine P1
AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC
GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT
CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG
GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT
CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
Step 1. Arbitrarily choose candidate pattern from a sequence
Step 2. Find best matches to pattern in all sequences
Step 3. Construct position-dependent frequency table based on matches
Step 4. Calculate relative probability of matches from frequency table
Step 5. If probability score high, remember pattern and score
Step 6. Repeat Steps 1 - 5
What to do with no training set?
New pattern discovery (Meme, Gibbs sampler, BioProspector)
Human sequences 5’ to transcriptional start
snRNA U1 (pU1-6)
histone H1t
HMG-14
TP1
protamine P1
nucleolin
snRNP E
rp S14
rp S17
ribosomal p. S19
a'-tubulin ba'1
b'-tubulin b'2
a'-actin skel-m.
a'-cardiac actin
b'-actin
AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC
GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT
CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG
GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT
CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT
GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG
TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT
GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC
TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT
ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT
GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG
GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA
CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC
TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC
CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA
Searching for conserved motifs
Pattern matching
• Quick and easy
• Unforgiving (1 mismatch  death)
• Ignores lots of information
Position-specific scoring matrices (PSSMs)
• Needs training set
Meme, Gibbs sampler, et al (PSSM in reverse)
• Relatively unbiased
• Can't easily handle variable-length gaps
Moral of the Stories
Are you comfortable using programming
in the service of your research?
I have had some R experience…
However, I am still a Novice
I have no experience in
programming
I have zero experience in
computer programming before
this class
I am about 60%
confidant in using
python
I have experience using Python,
Java, Unix & DOS
environments, R, mySQL/SQL,
and SAS
None…This is beyond my
responsibilities in the lab.
Using Firefox
www.people.vcu.edu/~elhaij
Click MICR 653
Scientific Questions
I. What determines the beginning of a gene?
Scientific Questions
I. What determines the beginning of a gene?
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
HIV
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
IV. Analysis of gene expression data
Metabolic correlates to N-deprivation
What enzymes of carbon metabolism are affected by N-starvation?
Glycogen
metabolism
Pentose
Phosphate
Pathway
Cyanobacteria use primarily the reactions of the
Pentose Phosphate Pathway to break down glucose
Carbon
fixation
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
IV. Analysis of gene expression data
RNAseq
Measuring RNA through Microarrays
Spot
RNA from cell type #1
+
RNA from cell type #2
Scan for red fluorescence
Scan for green fluorescence
Combine images
Type #1 RNA > Type #2 RNA
Type #2 RNA > Type #1 RNA
Type #1 RNA  Type #2 RNA
Courtesy of Inst. für Hormon-und Fortpflanzungsforschung, Universität Hamburg
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
IV. Analysis of gene expression data
different conditions
or
different replicates
Difference in
intensity
chip to chip
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
IV. Analysis of gene expression data
different conditions
or
different replicates
Difference in
intensity
chip to chip
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
IV. Analysis of gene expression data
V. CRISPRs in enteric bacteria
k
k
g1
k
g2
k
g3
k
g4
GTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACAATAACTGGATAGCACTAGCAGAAGGGCTAGAAGGTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACGTAT
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
IV. Analysis of gene expression data
V. CRISPRs in enteric bacteria
k
k
g1
k
g2
k
g3
k
g4
GTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACAATAACTGGATAGCACTAGCAGAAGGGCTAGAAGGTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACGTAT
Scientific Questions
VI. Finding targets for DNA-binding proteins
Scientific Questions
I. What determines the beginning of a gene?
II. Where in a bacterial genome are viruses integrated?
III. Determination of short tandem repeats (STRs)
IV. Analysis of gene expression data
V. CRISPRs in enteric bacteria
VI. Finding targets for DNA-binding proteins (targets known)
VII. Finding targets for DNA-binding proteins (genes known)