Download Systematic Name Gene Name Motif ID Expert Confidence Dubious

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Transcript
Systematic
Name
MATA1
YDR049W
YFL052W
YGR071C
YJL206C
YLR211C
YPL216W
MAL63
YER184C
YLL054C
YPR022C
YDR026C
Gene Name
Motif Expert
Dubious? Notes
ID Confidence
Need to study literature more
carefully and consult experts.but at
0
first glance none of these motifs
seems right
No evidence this is a TF, aside from
0
Dubious
a poorly-scoring C2H2 zinc finger
0
Dubious Putative zinc-cluster protein.
0
Dubious Unlikely to be true TF.
Seven motifs from ChIP-chip, but
none of them corresponds well to
ChIP-chip data, and none of them
resembles a GAL4 motif. 1169 has
0
a CGG in the middle, but too much
flanking information to be credible
without further independent
support.
0
Dubious Unlikely to be true TF.
0
Dubious Unlikely to be true TF.
This is an unconventional dimeric
136 Medium
GAL4-class motif
One motif from PBMs is a
monomeric GAL4-like motif and
the other is dimeric. Medium
confidence because there is little
independent support, and both
512 Medium
contain the CCGG core that I
believe may be an artifact.
However, both score significantly
on ChIP-chip data. Only 512 is
significant on expression data.
Three motifs available, from PBMs;
two dimeric GAL4-like motifs but
526 Medium
with different spacings and one
monomeric. No backup data but
looks tidy. Keep all three.
Only one motif available, from
PBMs; classical yeast C2H2 motif,
588 High
and has some relationship to ChIPchip data.
Three ChIP-chip motifs are
virtually identical in appearance;
696 High
resemble Reb1 motifs; high
correspondence to ChIP-chip data
YNR063W
804 High
YLL054C
816 Medium
YPR196W
861 High
YPR015C
871 High
YDR266C
1161 Low
YER064C
2094 Medium
YER184C
2095 Medium
YLR278C
2112 High
YGR067C
2191 High
Motifs from PBMs are virtually
identical. This is a monomeric
GAL4-like motif. 804 agrees more
with ChIP-chip data.
Three motifs available, from PBMs;
two dimeric GAL4-like motifs but
with different spacings and one
monomeric. No backup data but
looks tidy. Keep all three.
Motifs from PBMs are very similar
and are a variant monomeric
GAL4-like motif. Chose 861 as it
passes the significance threshold
against ChIP-chip data.
Only one motif available, from
PBMs; resembles motof from
CMR3 which is a paralogous gene
(and nearly adjacent on the
chromosome). And, scores
significantly against expression
data.
Motifs from ChIP-chip do not
correspond to ChIP-chip, and there
is no other supporting data. Chose
1161 only because it looks more
reasonable. Low confidence.
PBM motif has high score on GO
because it looks a lot like Gcn4
One motif from PBMs is a
monomeric GAL4-like motif and
the other is dimeric. Medium
confidence because there is little
independent support, and both
contain the CCGG core that I
believe may be an artifact.
However, both score significantly
on ChIP-chip data. Only 512 is
significant on expression data.
Only 2112 (from PBMs) stands out;
dimeric GAL4 motif with high
score on ChIP-chip.
PBM motif is a classical C2H2
motif that has good correspondence
to ChIP-chip data. 2191
YKL222C
2192 High
YML081W
2194 High
YLL054C
2242 Medium
MATA1a1-alpha2MATALPHA2dimer
dimer
1436 Medium
YKL112W
ABF1
1993 High
YMR072W
ABF2
541 Medium
YER045C
ACA1
8
YLR131C
ACE2
918 Incorrect
YLR131C
ACE2
1332 High
YDR448W
ADA2
0
Medium
corresponds best and has fewer
empty columns in the PWM.
Two motifs from PBMs resemble
monomeric GAL4-like motif. 2192
agrees best with ChIP-chip data and
expression data.
PBM motifs are a classical C2H2
motif that match each other and
have some correspondence to ChIPchip data. 2194 has highest
correspondence to ChIP chip.
Three motifs available, from PBMs;
two dimeric GAL4-like motifs but
with different spacings and one
monomeric. No backup data but
looks tidy. Keep all three.
Not clear that motif is optimal.
Most motifs are similar, and five
have pegged the ChIP P-value.
Choose 791- it's the highest scoring
overall, and is from PBMs
Protein is not expected to be
sequence specific. But motif is
Dubious obtained in vitro. May need further
investigation. Give medium
confidence, but label as dubious.
Literature motif 8 is supported by
experimental investigation, and
resembles a bZIP site, but has no
other support; motif was not
obtained objectively. Can bind as
heterodimer. The highest-scoring
motif (from ChIP, 1457) has low
information content - I'm concerned
it is learning other features of
bound promoters.
Likely represents Rap1 binding site.
Highest-scoring ChIP-chip motif is
Rap1 site. MITOMI motif 1332 is
next, and resembles the classic
Swi5/Ace2 motif.
Dubious Unlikely to be true TF.
YDR216W
ADR1
576 High
YGL071W
AFT1
658 High
YPL202C
AFT2
389 High
MATALPHA1- alpha11442 Medium
MCM1-dimer MCM1-dimer
YER069W
ARG5,6
1426 Medium
YMR042W
ARG80
1483 Medium
YML099C
ARG81
1506 High
YML099C
ARG81
1507 Incorrect
YDR421W
ARO80
725 High
YDR421W
ARO80
1509 High
YDR421W
ARO80
2115 High
PBM motif 576 has significant
correspondence to both ChIP-chip
and highest to expression data. And
has a classic yeast C2H2 look.
Most motifs are similar. Also very
similar to AFT2 motifs. ChIP-chip
motif 658 scores highest on both
ChIP-chip and expression data.
All motifs look similar. ChIP-chip
motif 389 scores high on ChIP-chip
data and also best on expression
data.
Not clear that motif is optimal.
Not clear that motif is optimal.
Motif 1482 is an Arg81 site. 1483,
however, is similar to Mcm1.
Choose this, give Medium
confidence.
ChIP motif 1506 correlates well
with ChIP and also with expression
data. Resembles dimeric GAL4
class motif.
Likely represents Mcm1 binding
site.
PBM motif 2115 appears
monomeric and has highest
correspondence to ChIP-chip data.
ChIP motif 1509 appears dimeric
and correlates with ChIP data.
Literature motif 725 appears
trimeric and has experimental
support. Retain all three.
PBM motif 2115 appears
monomeric and has highest
correspondence to ChIP-chip data.
ChIP motif 1509 appears dimeric
and correlates with ChIP data.
Literature motif 725 appears
trimeric and has experimental
support. Retain all three.
PBM motif 2115 appears
monomeric and has highest
correspondence to ChIP-chip data.
ChIP motif 1509 appears dimeric
YPR199C
ARR1
603 Medium
YIL130W
ASG1
807 Medium
YIL130W
ASG1
2116 Medium
YKL185W
ASH1
28
YKL185W
ASH1
648 Incorrect
YKL185W
ASH1
932 Incorrect
YGR097W
ASK10
0
YOR113W
AZF1
499 High
YKR099W
BAS1
402 High
Medium
and correlates with ChIP data.
Literature motif 725 appears
trimeric and has experimental
support. Retain all three.
Only motif 603 has significant
scores with ChIP-chip and
expression data; looks somewhat
like a YAP motif
Two PBM motifs appear to
represent monomeric and dimeric
versions of the same motif. This is
the dimeric version. No other
supporting data; hence medium
confidence.
Two PBM motifs appear to
represent monomeric and dimeric
versions of the same motif. This is
the monomeric version. No other
supporting data; hence medium
confidence. Picked 2116 because it
has a higher GO score and
expression score.
The literature motif may not
represent the full binding activity of
the protein. Also, it is not supported
by ChIP-chip. ChIP-chip identifies
Mcm1-like motifs. But, it does
score highly in both ChIP-chip and
expression. The only higher-scoring
motif has almost no information
content.
Likely represents Mcm1 binding
site.
Likely represents Mcm1 binding
site.
I did not find any evidence that this
Dubious is a sequence-specific DNA-binding
protein.
PBM motif 499 scores as well as
the ChIP-chip motifs, but without
the circularity. No significant data
except ChIP-chip, however.
Virtually all motifs are similar, with
GAGTCA core. ChIP motif 402 has
YNL039W
YDL074C
YER159C
BDP1
BRE1
BUR6
0
0
0
Dubious
Dubious
Dubious
YKL005C
BYE1
0
Dubious
YDR423C
CAD1
2073 High
YDR423C
CAD1
2098 High
YMR280C
CAT8
33
YJR060W
CBF1
1346 High
YGR140W
YMR213W
CBF2
CEF1
0
0
YMR168C
CEP3
524 High
YLR098C
CHA4
1607 Incorrect
Medium
Dubious
Dubious
highest correspondence to both
ChIP-chip and expression data.
Unlikely to be true TF.
Unlikely to be true TF.
Unlikely to be true TF.
SGD: "Negative regulator of
transcription elongation, contains a
TFIIS-like domain and a PHD
finger, multicopy suppressor of
temperature-sensitive ess1
mutations, probably binds RNA
polymerase II large subunit". No
evidence this is a sequence-specific
TF.
Classic YAP motif in most cases.
Include examples of both
overlapping and adjacent
monomeric sites - there are
examples of both in PBM data and
they both score highly on ChIP
data. This one is overlapping.
Classic YAP motif in most cases.
Include examples of both
overlapping and adjacent
monomeric sites - there are
examples of both in PBM data and
they both score highly on ChIP
data. This one is adjacent.
Near-classic dimeric GAL4 motif.
Literature-based. Not clear this is
an optimal site but it does bind.
Seems to hit the right GO category.
Classic E-box. MITOMI motif
1346 nearly has highest
correspondence to ChIP-chip data
and is non-circular; no other
supporting data
Unlikely to be true TF.
Unlikely to be true TF.
Two PBM motifs agree. Went with
524 because it appears neater. No
other supporting data for any of
them.
Likely represents Rap1 binding site.
YLR098C
CHA4
2120 High
YER164W
CHD1
0
YOR028C
CIN5
409 High
YOR028C
CIN5
1349 Medium
YPR013C
CMR3
859 High
YER130C
COM2
534 High
YNL027W
CRZ1
516 High
YIL036W
CST6
585 High
YGL166W
CUP2
48
YPL177C
CUP9
2121 High
Medium
Two PBM motifs agree, and PBM
motif 2120 has highest
correspondence to ChIP-chip data,
even highter than the best ChIPchip motif. Has a GAL4-like
appearance, albeit a variant.
Monomeric. (Highest scoring motif
- 1607 - is actually a Rap1 motif).
Dubious Unlikely to be true TF.
Most motifs match the classic YAP
motif. This is the best in vivo motif
(highest match to ChIP-chip).
Most motifs match the classic YAP
motif. This is the best in vitro motif
(highest match to ChIP-chip) and it
is different from the ChIP-based
motifs - might reflect homo vs.
heterodimer?
PBM motifs are very similar. No
other supporting data, but it's a
clean motif. Chose 859 because it
most closely resembles motif from
paralog YPR015c.
PBM motif 534 has the highest
correspondence to expression data.
Not much else supporting any of
the motifs, although the two PBM
motifs look about the same. Also
look like typical yeast C2H2 motifs.
PBM motif 516 scores highest on
ChIP and expression; resembles
classic literature motifs
PBM motif 585 correlates with
expression data (deletion and
overexpression). ChIP motif 1466
has higher ChIP score but is lower
on expression.
Three motifs account for three
possible spacings in the literature
motif; it is not clear that this is the
optimal site, however
MITOMI and PBM motifs are
similar. PBM motif 2121 has
slightly lower correspondence to
YKR034W
DAL80
1355 Medium
YIR023W
DAL81
53
YNL314W
DAL82
690 High
YML113W
DAT1
1416 Medium
YPL049C
DIG1
0
YER088C
DOT6
2221 High
YLR228C
ECM22
849 High
Low
ChIP data, but more significant
correspondence to expression data.
MITOMI motif 1355 has highest
correspondence to ChIP-chip. But
it's not striking..none of them are,
despite the fact that this is a classic
GATA site (GATAAG).
None of the motifs agree with each
other. The literature motif
characterization was indirect; hence
low confidence that this is the true
motif. The ChIP-chip motifs score
higher on ChIP data but that's
circular.
PBM and ChIP-chip motifs agree;
select ChIP-chip as it scores higher
on ChIP-chip although the extra A's
on the side could be either due to
the FL protein or some other in
vivo factor.
The literature (e.g. PMID:
8532535) suggests that the
sequence specificity may be more
promiscuous than the name
suggests. To my knowledge there
has not been any SELEX or PBM
demonstrating that any motif is
correct. But, it does bear some
relationship to ChIP-chip and
expression data.
Not a TF - it binds Ste12; all the
Dubious
motifs are Ste12 motifs.
PBM motif 812 most closely
resembles that of homolog TOD6,
which is well-supported; has
highest correlation to both ChIP
and expression data.
PBM motif 2122 is a monomeric
GAL4 class motif, and scores
highest on both ChIP and
expression ata. 849 is a classic
dimeric GAL4 motif with lower but
still reasonable scores and is
moderately predictive across the
board.
YLR228C
ECM22
2122 High
YPL021W
ECM23
578 High
YMR176W
ECM5
0
YBR033W
EDS1
2093 High
YBR239C
ERT1
2188 Medium
YNR054C
ESF2
0
Dubious
YNL023C
FAP1
0
Dubious
YDL166C
FAP7
0
Dubious
YPR104C
YPR104C
YPR104C
YPR104C
YPR104C
YPR104C
FHL1
FHL1
FHL1
FHL1
FHL1
FHL1
406
629
893
1196
1504
1618
YPR104C
FHL1
2203 High
Dubious
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
PBM motif 2122 is a monomeric
GAL4 class motif, and scores
highest on both ChIP and
expression ata. 849 is a classic
dimeric GAL4 motif with lower but
still reasonable scores and is
moderately predictive across the
board.
PBM motif 578 strongly resembles
that from other yeast GATA-class
TFs
Unlikely to be true TF.
PBM and ChIP-chip motifs are very
similar. PBM motif 2093 scores
most significantly on ChIP data.
Classic GAL4 class motif.
Three PBM motifs are all classic
monomeric GAL4 motifs. Chose
2188 because it has fewer
noninformative flanking positions,
and higher significance on
expression data. Also, 826 has the
CCGG core that I suspect may be
an artefact of PBMs or the DBD
clones used in these studies. The
highest-scoring ChIP motif is
circular and does not resemble a
GAL4 class binding site.
This is supposed to be a ribosome
biogenesis factor. I found no
evidence that it is a sequencespecific DNA-binding protein.
Unlikely to be true TF.
This is supposed to be a ribosome
biogenesis factor. I found no
evidence that it is a sequencespecific DNA-binding protein.
Likely represents Rap1 binding site.
Likely represents Rap1 binding site.
Likely represents Rap1 binding site.
Likely represents Rap1 binding site.
Likely represents Rap1 binding site.
Likely represents Rap1 binding site.
ChIP-chip motifs are all Rap1.
PBMs identify a different motif
YIL131C
FKH1
2002 High
YNL068C
FKH2
830 High
YER109C
FLO8
67
YCL058C
FYV5
1417 Low
YGL254W
FZF1
69
YDR009W
GAL3
0
YPL248C
GAL4
1510 High
Low
Low
which also corresponds to ChIPchip data. Selected 2203 as it scores
highest on ChIP-chip and
expression data.
Classic Forkhead motif for most of
them. 2002 strongly resembles
PBM motif but scores higher on
both ChIP (which is circular) and
expression (which is not).
Most motifs are classic Forkhead.
PBM motif 830 is one of the
highest scoring and is not circular.
I found no evidence that this is a
sequence-specific DNA-binding
protein, i.e. that it binds directly to
DNA in vitro. The motif has a
Dubious limited relationship to ChIP-chip
data. The literature motif scores
better than the motif derived from
the ChIP-chip study. Also, the
motif is identical to that for MSS11.
Literature motif is derived from a
single promoter and while the
protein seems to have some DNAbinding activity, perhaps in
conjunction with other TFs, I find
the evidence supporting this precise
binding site incomplete, since it is
derived from a single site. Hence,
low confidence in the motif.
Literature motif is the only one that
appears credible. PBM motif I
believe is a known artifact.
Literature motif gets low
confidence however as it is based
on a single known binding
sequence.
Gal3 is not a sequence-specific
Dubious
DNA-binding protein
ChIP-chip motif 1510 resembles
literature motif, and PBM motif
875, but scores highly on ChIP and
expression data, across the board.
Note, however, that the high ChIPchip scores stem from an
YPL248C
GAL4
2206 High
YML051W
GAL80
0
YFL021W
GAT1
962 High
YLR013W
GAT3
2128 High
YIR013C
GAT4
565 High
YEL009C
GCN4
1363 High
YPL075W
GCR1
2071 High
experiment with high negative
correlation. PBM motif 2206
appears to be a monomeric version,
socres even higher on ChIP-chip
and expression.
ChIP-chip motif 1510 resembles
literature motif, and PBM motif
875, but scores highly on ChIP and
expression data, across the board.
Note, however, that the high ChIPchip scores stem from an
experiment with high negative
correlation. PBM motif 2206
appears to be a monomeric version,
socres even higher on ChIP-chip
and expression.
Gal80 is not a sequence-specific
Dubious
DNA-binding protein
ChIP-chip motif 962 scores higher
on both ChIP-chip and expression
data
All PBM motifs look similar, also
similar to a subset of other GATAs.
2128 scores quite highly on ChIPchip (albeit with negative
correlation!), and also higher on
expression and OE data.
Two PBM motifs look similar, also
similar to a subset of other GATAs.
565 scores higher on expression and
OE data.
Virtually all motifs look the same.
MITOMI motif 1363 is as good as
any of the ChIP-chip motifs but not
circular; scores high across the
board.
Gcr2 is not a DNA-binding protein.
SGD: "Gcr1p is a DNA-binding
protein interacting with the
consensus sequence CTTCC,
whereas Gcr2p interacts with
Gcr1p". But, ChIP-chip motif 606
is probably the best Gcr1 motif
available (even though it came from
Gcr2 ChIP).
YNL199C
GCR2
0
YDR096W
GIS1
562 High
YER040W
GLN3
539 High
YJL103C
GSM1
856 Medium
YGL181W
GTS1
694 Low
YJL110C
GZF3
2133 High
YPR008W
HAA1
1425 Medium
YFL031W
HAC1
94
YFL031W
HAC1
1788 High
Medium
Gcr2 is not a DNA-binding protein.
SGD: "Gcr1p is a DNA-binding
protein interacting with the
consensus sequence CTTCC,
Dubious whereas Gcr2p interacts with
Gcr1p". But, ChIP-chip motif 606
is probably the best Gcr1 motif
available (even though it came from
Gcr2 ChIP).
All motifs similar; PBM motif 562
has highest correspondence to
deletion expression data and
overexpression data
Most motifs are classic GATA or
GATAAG. PBM motif 539 scores
highest on ChIP.
Only PBM motif 856 reaches
significance, on expression. Classic
GAL4-type monomeric site. No
other data, relation to expression
not strong. Medium confidence.
None of the three motifs resembles
an AT-hook binding site. Only one
correlates with the ChIP-chip data,
but that's circular. Low confidence.
Classic GATA motif 2133 from
PBM scores highest on ChIP-chip
and expression data
Literature motif is not completely
determined, but scores highly on
ChIP-chip data. Regardless,
medium confidence.
1788 is the overall winner. But,
literature motif 94 also scores well
in ChIP-chip, despite being
somewhat different. Possible
difference in heterodimerazation
partners, or proteolytic fragment?
Retain both.
1788 is the overall winner. But,
literature motif 94 also scores well
in ChIP-chip, despite being
somewhat different. Possible
difference in heterodimerization
YOL089C
HAL9
799 High
YOL089C
HAL9
2134 High
YLR256W
HAP1
2078 High
YGL237C
HAP2
695 High
YBL021C
HAP3
695 High
YKL109W
HAP4
695 High
YOR358W
HAP5
695 High
partners, or proteolytic fragment?
Retain both, score 94 as medium.
PBM motifs 799 and 2134 score
highest on ChIP-chip data; classic
dimeric and monomeric GAL4
sites, respectively.
PBM motifs 799 and 2134 score
highest on ChIP-chip data; classic
dimeric and monomeric GAL4
sites, respectively.
Literature binding site is direct
CGG repeats with a 6bp spacer
(PMID: 7958882). PBM motif 2078
gets this; it scores highest overall,
including significant scores on both
ChIP-chip and expression.
Subunit of the heme-activated,
glucose-repressed Hap2/3/4/5
CCAAT-binding complex - there
should be a single motif for all four
proteins, containing CCAAT. ChIPchip motif 695 resembles
CCAATCA, and scores highly on
ChIP-chip, OE, and deletion
expression data.
Subunit of the heme-activated,
glucose-repressed Hap2/3/4/5
CCAAT-binding complex - there
should be a single motif for all four
proteins, containing CCAAT. ChIPchip motif 695 resembles
CCAATCA, and scores highly on
ChIP-chip, OE, and deletion
expression data.
Subunit of the heme-activated,
glucose-repressed Hap2/3/4/5
CCAAT-binding complex - there
should be a single motif for all four
proteins, containing CCAAT. ChIPchip motif 695 resembles
CCAATCA, and scores highly on
ChIP-chip, OE, and deletion
expression data.
Subunit of the heme-activated,
glucose-repressed Hap2/3/4/5
YCR065W
HCM1
570 High
YBL008W
HIR1
0
YOR038C
HIR2
0
YJR140C
HIR3
0
YCL067C
HMLALPHA2 2079 Medium
YCL067C
HMLALPHA2 2102 Medium
YDR174W
HMO1
2249 Low
YCR096C
HMRA2
558 Medium
CCAAT-binding complex - there
should be a single motif for all four
proteins, containing CCAAT. ChIPchip motif 695 resembles
CCAATCA, and scores highly on
ChIP-chip, OE, and deletion
expression data.
PBM and SAAB/EMSA motifs
both look similar to standard FH
motif. PBM motif 570 has stronger
correspondence to expression data.
Hir1,2,3 are a nucleosome assembly
Dubious
complex, not TFs
Hir1,2,3 are a nucleosome assembly
Dubious
complex, not TFs
Hir1,2,3 are a nucleosome assembly
Dubious
complex, not TFs
Protein is similar to
PBX/MEIS/TGIF; both PBM
motifs have some similarity (central
ACA/TGT), so do sites in crystal
and in vivo (e.g. PMID: 1682054)
but no clear winner between the
two. Keep both PBM motifs in
curated set (2102 and 2079) but
give medium confidence - no
supporting ChIP or expression data.
Protein is similar to
PBX/MEIS/TGIF; both PBM
motifs have some similarity (central
ACA/TGT), so do sites in crystal
and in vivo (e.g. PMID: 1682054)
but no clear winner between the
two. Keep both PBM motifs in
curated set (2102 and 2079) but
give medium confidence - no
supporting ChIP or expression data.
This motif is uncharacteristic for a
Sox protein and HMG proteins
typically do not bind DNA in a
sequence specific manner. Since it
is from ChIP data it could be a
cofactor motif. Low confidence.
Should be similar to
MATALPHA2. The one PBM
YOR032C
HMS1
1498 Medium
YJR147W
HMS2
992 Low
YLR113W
HOG1
0
YMR172W
HOT1
0
YGL073W
HSF1
411 Medium
YGL073W
HSF1
476 Medium
YGL073W
HSF1
615 Medium
YGL073W
HSF1
1461 Medium
YDR225W
HTA1
0
motif is indeed related to the
MITOMI motif for MATALPHA2.
Motif 1498 scores reasonably on
ChIP. Other corroborating data are
not that convincing - medium
confidence.
The one ChIP-chip motif bears little
relationship to the ChIP data.it kind
of looks like an HNF-like site, but
still, low confidence.
This is a signalling molecule that
Dubious associates with many TFs (see
SGD)
Dubious Unlikely to be true TF.
Four types of motifs contain TTC
monomeric core and all score
highly on both ChIP and
expression. Appear to represent
different monomeric/multimeric
binding configurations. This is the
spaced direct repeat dimeric site.
From ChIP.
Four types of motifs contain TTC
monomeric core and all score
highly on both ChIP and
expression. Appear to represent
different monomeric/multimeric
binding configurations. This is the
monomeric site. From PBM.
Four types of motifs contain TTC
monomeric core and all score
highly on both ChIP and
expression. Appear to represent
different monomeric/multimeric
binding configurations. This is the
trimeric site. From ChIP.
Four types of motifs contain TTC
monomeric core and all score
highly on both ChIP and
expression. Appear to represent
different monomeric/multimeric
binding configurations. This is the
dimeric head-to-tail site. From ChIP
and prior.
Dubious Unlikely to be true TF.
YBL003C
HTA2
0
YLR223C
IFH1
0
YJR094C
IME1
0
YGL192W
IME4
1000 Low
YDR123C
INO2
713 High
YOL108C
INO4
713 High
YOR304W
YGL133W
ISW2
ITC1
0
0
YKL032C
IXR1
0
Dubious Unlikely to be true TF.
Cofactor of Fhl1p. No evidence for
Dubious
sequence-specific DNA-binding.
Interacts with UME6. The only
significant motif shares 5/6 bases
Dubious
with the UME6 motif core
(GCCGCC)
I could not find evidence that IME4
is a sequence-specific DNA-binding
protein. C3H1 is more typically an
RNA-binding domain or something
Dubious
besides nucleic acid binding. There
is one significant ChIP-chip motif
but perhaps it binds through a
cofactor. No other supporting data.
Ino2/4 binds as a heterodimer, so
there should just be one motif for
the two proteins. All motifs appear
similar but none of them is derived
from in vitro data. Nonetheless
most motifs match a classic E-box
with some preference for flanking
bases. Motif 713 is derived from
ChIP-chip; it is not the highestscoring ChIP-chip motif but it is
highest for OE and deletion
expression.
Ino2/4 binds as a heterodimer, so
there should just be one motif for
the two proteins. All motifs appear
similar but none of them is derived
from in vitro data. Nonetheless
most motifs match a classic E-box
with some preference for flanking
bases. Motif 713 is derived from
ChIP-chip; it is not the highestscoring ChIP-chip motif but it is
highest for OE and deletion
expression.
Dubious Unlikely to be true TF.
Dubious Unlikely to be true TF.
Binds cisplatin-modified DNA.
HMG domains. ChIP-chip motifs
Dubious
not significant. Dubious and no
credible motif.
YER051W
JHD1
662 Low
Dubious
YNL227C
JJJ1
0
Dubious
YCL055W
KAR4
127 Low
Dubious
YNL132W
KRE33
0
Dubious
YGR040W
KSS1
0
Dubious
YLR451W
LEU3
781 High
YLR451W
LEU3
2135 High
YDR034C
LYS14
133 High
YDR034C
LYS14
865 High
YMR021C
MAC1
1540 High
This is a histone demethylase. No
evidence for direct DNA binding.
Motif 662 is significant. Include,
but give low confidence - could be
a cofactor.
Unlikely to be true TF.
Evidence for sequence specific
DNA binding seems weak, hence
low confidence
This is supposed to be a ribosome
biogenesis factor. I found no
evidence that it is a sequencespecific DNA-binding protein.
There is no evidence that Kss1 is a
sequence-specific TF.
Most motifs look similar - dimeric
GAL4 motif. Literature motif (781)
has high correspondence to ChIPchip and expression data and is not
circular. But, PBM motif 2135,
which is a monomeric GAL4 motif,
scores highest on both ChIP-chip
and expression data.
Most motifs look similar - dimeric
GAL4 motif. Literature motif (781)
has high correspondence to ChIPchip and expression data and is not
circular. But, PBM motif 2135,
which is a monomeric GAL4 motif,
scores highest on both ChIP-chip
and expression data.
PBM motifs are virtually identical
and appear monomeric; literature
motif is dimeric. Include both.
Choose PBM motif 865 as it
appears to have more robust CGG.
PBM motifs are virtually identical
and appear monomeric; literature
motif is dimeric. Include both.
Choose PBM motif 865 as it
appears to have more robust CGG.
Literature motif 1540 most closely
most closely corresponds to ChIPchip data (albeit barely significant).
YGR288W
MAL13
0
YBR297W
MAL33
0
YCR040W
MATALPHA1 1418 Low
YCR039C
MATALPHA2 1364 High
YOR298C-A
MBF1
0
YDL056W
MBP1
2138 High
MBP1-SWI6dimer
MBP1-SWI60
dimer
YMR043W
MCM1
831 High
YGL197W
MDS3
0
Nothing else to gauge by, but no
reason to doubt literature motif.
None of the ChIP-chip motifs
correspond wekk to the data they
come from and/or resemble a GAL4
motif.
None of the ChIP-chip motifs
correspond wekk to the data they
come from and/or resemble a GAL4
motif.
According to PMID: 15118075,
binds the "Q site" which has
"consensus" ACAATGACAG.
Seems all that is in common is the
CAAT. I believe further study is
required.
According to PMID: 9858582, "A
comparison of the 2 binding sites in
both asg and hsg operators yields
the same consensus sequence, 5'CATGTA-3"; results in Figure 2 of
the same paper support a consensus
of CATGTAA. MITOMI yields
ACATG, which is the reverse
complement of most of the
literature consensus. Motif 1364 has
highest information content; use
this.
This is a coactivator. I found no
Dubious evidence that it is a sequencespecific TF.
Almost all motifs look similar to
literature binding site. PBM motif
2138 scores at the top on ChIP-chip
and expression. And is non-circular.
Redundant with MBP1
Most motifs resemble a classic SRF
site. PBM motif 831 scores highly
across the board, except for
expression data where none does
well, and its scores are non-circular.
I found no evidence that this is a
Dubious sequence-specific DNA-binding
protein. ChIP-chip motif does not
YIL128W
MET18
0
YIR017C
MET28
0
YPL038W
MET31
1370 High
YDR253C
MET32
2140 High
YNL103W
MET4
0
correlate with ChIP-chip data, or
anything else.
I found no evidence that this is a
sequence-specific DNA-binding
Dubious protein. ChIP-chip motif does not
correlate with ChIP-chip data, or
anything else.
Like MET4, component of a
complex. SGD: "Basic leucine
zipper (bZIP) transcriptional
activator in the Cbf1p-Met4pMet28p complex".."Both Met4p
and Met28p bind to DNA only in
the presence of Cbf1p, and the
presence of Cbf1p and Met4p
stimulates the binding of Met28p to
DNA (1, 2).". ChIP-chip motif 703
(CTGTGG) is clearly the Met31/32
motif. The other ChIP-chip motif is
essentially poly-A, and scores
poorly. Hence, neither of these
motifs represents the intrinsic
sequence specificity of MET28.
Need in vitro data for complexes.
Most motifs look similar. MITOMI
motif 1370 has highest overall
correlation to ChIP-chip, OE, and
deletion data.
Most motifs look similar. PBM
motif 2140 has highest
correspondence to both ChIP and
expression.
My understanding is that Met4 is a
modifier of the specificity of other
proteins. SGD states that it
"requires different combinations of
the auxiliary factors Cbf1p,
Met28p, Met31p and Met32p".
ChIP-chip motifs 1023 and 1024 I
believe are cofactor motifs; they are
E-boxes. ChIP-chip motif 689 is
different and matches Met28 and
Met32 motifs. (CTGTGG core).
Met28 is a bZIP protein, and Met32
is a C2H2. MITOMI motif for
Met32 is TGTGG. So this is the
YGR249W
MGA1
2141 Medium
YGL035C
MIG1
2142 High
YGL209W
MIG2
2143 High
YER028C
MIG3
2144 High
YER068W
MOT2
556 Medium
YMR070W
MOT3
2080 Medium
YOL116W
MSN1
1376 High
Met32 motif. I do not believe that
any of the Met4 motifs is correct.
Need to obtain motifs for
complexes.
PBM motif 2141 is similar to Hsf1
motif 476 (TTCCA). Has TTC
"core" which is shared by most
Hsf1 motifs. Scores reasonably on
ChIP data but no other supporting
information; hence "medium".
PBM motif 2142 has highest
correspondence to ChIP-chip AND
AUC for GO category "generation
of precursor metabolites and
energy". The adjacent A/T stretch,
which is also noted in the literature,
is found in ChIP-chip motif 654 and
others; however, that motif does not
sort as well for GO category
"generation of precursor
metabolites and energy" and also
scores lower for both ChIP and
expression, so it seems unlikely to
represent a key intrinsic activity of
the protein itself.
PBM motif 2143 has highest
correspondence to ChIP-chip data
PBM motif 2144 has highest
correspondence to ChIP-chip data
PBM motif 556 has high
correspondence to ChIP-chip data.
However, also resembles TATA
element, and could also be a
structural motif. RRMs normally
bind single-stranded RNA or DNA.
Give medium confidence.
PBM motif 2080 is very similar to
the literature motif and scores
highest on expression data.
Moreover, this motif explains highscoring ChIP-chip motifs for many
other TFs, e.g. Nrg1, Yap6, Sok2
MITOMI motif 1376 has the
highest correspondence to ChIPchip. MITOMI motif 1378 is very
YOL116W
MSN1
1378 High
YMR037C
MSN2
1380 High
YKL062W
MSN4
518 High
YMR164C
MSS11
204 Low
Dubious
YDR277C
MTH1
0
Dubious
YOR372C
NDD1
0
Dubious
YOR372C
NDD1
366 Incorrect
YLR254C
NDL1
0
YHR124W
NDT80
1464 High
Dubious
close, however, and seems to be a
circular permutation. Retain both
motifs.
MITOMI motif 1376 has the
highest correspondence to ChIPchip. MITOMI motif 1378 is very
close, however, and seems to be a
circular permutation. Retain both
motifs.
MITOMI motif 1380 has the
highest overall correspondence to
ChIP-chip, overexpression, and
deletion data. Resembles classic
Msn2/4 motif.
PBM motif 518 resembles both the
classical MSN motif and the PBM
motif, and scores highest on both
expression and ChIP-chip.
There is no evidence that this is a
sequence-specific DNA-binding
protein, rather than a cofactor. The
motif has a limited relationship to
ChIP-chip data. The literature motif
scores better than the motif derived
from the ChIP-chip study. Also, the
motif is identical to that for FLO8.
SGD: "interacts with Rgt1p and the
Snf3p and Rgt2p glucose sensors".
There is no evidence that this is a
sequence-specific transcription
factor.
There is no evidence that this is a
sequence-specific DNA-binding
protein, either in vitro or in vivo or
in its sequence. The ChIP-chip
motif that scores most highly is
actually an MCM1 motif, which is
consistent with the role of NDD1 as
a "transcriptional activator essential
for nuclear division".
Likely represents Mcm1 binding
site.
Unlikely to be true TF.
Motif 1464 matches literature
motifs and PBM motif, and nails
YOR156C
NFI1
0
YDL002C
NHP10
502 Low
YPR052C
NHP6A
879 Medium
YBR089C-A
NHP6B
792 Medium
YGR089W
NNF2
0
YDR043C
NRG1
2148 High
sporulation on GO. It also has the
highest correspondence to ChIPchip data.
Dubious Unlikely to be true TF.
NHP10 is an HMGB-type protein.
Known to prefer DNA ends. There
Dubious
is no independent support for the
single PBM motif.
NHP6A and NHP6B are similar to
the HMGB family, which is thought
to lack sequence specificity.
However, the proteins do bend the
DNA when they bind, and so may
have some level of sequence
specificity. Essentially similar
motifs were obtained for the two
different proteins (in the same
study) and the PBM motif for
Nhp6A has a good correspondence
to ChIP-chip data. Give both
Medium confidence.
NHP6A and NHP6B are similar to
the HMGB family, which is thought
to lack sequence specificity.
However, the proteins do bend the
DNA when they bind, and so may
have some level of sequence
specificity. Essentially similar
motifs were obtained for the two
different proteins (in the same
study) and the PBM motif for
Nhp6A has a good correspondence
to ChIP-chip data. Give both
Medium confidence.
I did not find any evidence that this
Dubious is a sequence-specific DNA-binding
protein.
PBM, ChIP-chip, and literature
motifs all appear very similar, and
resemble motif for the related
protein NRG2. Choose top PBM
motif (2148). There is also a
recurring ChIP-chip motif
(TGTGCCT) which I believe is
actually the MOT3 binding site.
YBR066C
NRG2
1383 High
YAL051W
OAF1
2060 Medium
YKR064W
OAF3
0
YHL020C
OPI1
0
YML065W
ORC1
1549 High
YBR060C
ORC2
0
YFL044C
OTU1
1166 Low
MITOMI motif 1383 looks like a
classic yeast C2H2 binding site
(row of G's). Also resembles motifs
obtained by both ChIP and PBMs
for related protein Nrg1.
Motif 2060 has a strong
resemblance to the literature motifs
for the Oaf1-Pip2 dimer, and scores
highly on both ChIP and expression
data. No in vitro support and it's
kind of weak looking so Medium
confidence.
I do not see how either of these
motifs could possibly be a Gal4class binding motif. And, there is
no correspondence to any of the
data, even the ChIP-chip data from
which it is derived.
Motifs do not match and do not
explain the ChIP-chip data from
which they are derived. Motif 1049
resembles the expected UAS-INO
(Ino2/4) binding site
Dubious (CATGTGAAAT) - Opi1 acts as a
repressor by binding Ino2. I believe
this protein is a corepressor, and
Ino2/4 are the DNA-binding
factors. Dubious as sequencespecific TF.
Looks like ORC1 motif. Which is
not really a TF, but it is a sequencespecific DNA-binding protein.
Dubious Unlikely to be true TF.
ChIP-chip motif 1166 has a good
relationship to ChIP-chip data, but
it is unusual for C2H2 motifs to be
A/T rich, and there is no other
support for this motif, so it could be
a cofactor, nucleosome-excluding,
TATA element, etc. In addition, it
only has a single C2H2 domain,
and is known to function as a
deubiquitylation enzyme. Low
confidence.
YDR081C
PDC2
1050 Low
YGL013C
PDR1
485 High
YGL013C
PDR1
899 Incorrect
YBL005W
PDR3
1387 Medium
YBL005W
PDR3
2062 Medium
YLR266C
PDR8
244 Medium
YLR266C
PDR8
528 Medium
YDR323C
PEP7
0
Motifs do not correlate with the
ChIP-chip data from which it was
derived. I found no other
experimental evidence that this is a
sequence-specific DNA-binding
protein. However, it does have
HTH and transposase motifs. Retain
motif 1050 but give low
confidence.
PBM motif 485 looks like a
traditional literature motif and has
highest correspondence to ChIP and
expression data. Dimeric GAL4
motif.
Likely represents Rap1 binding site.
MITOMI yields a simple GAL4
monomeric site that scores well in
ChIP-chip data. ChIP-chip yields a
dimeric site that resembles the
literature site. In vivo, PDR1 and
PDR3 may form heterodimers.
Retain both. This is the monomeric
motif.
MITOMI yields a simple GAL4
monomeric site that scores well in
ChIP-chip data. ChIP-chip yields a
dimeric site that resembles the
literature site. In vivo, PDR1 and
PDR3 may form heterodimers.
Retain both. This is the dimeric
ChIP-chip motif.
Both motifs are equally credible but
have very limited support.
Literature motif is related to that of
YRR1 literature motif. PBM motif,
however, is a classic GAL4
monomer. This is the literature
motif.
Both motifs are equally credible but
have very limited support.
Literature motif is related to that of
YRR1 literature motif. PBM motif,
however, is a classic GAL4
monomer. This is the PBM motif.
Dubious Unlikely to be true TF.
YKL043W
PHD1
393 High
YKL043W
PHD1
2153 High
YDL106C
PHO2
1680 Incorrect
YDL106C
PHO2
2154 High
YFR034C
PHO4
2222 High
YOR363C
YIL122W
PIP2
POG1
0
0
YLR014C
PPR1
2064 Low
High-scoring motifs are all similar,
with characteristic APSES GC core
and palindromic. PBM motifs score
highest on ChIP-seq data, while
ChIP-chip motif 393 (which
contains flanking G/C residues)
scores highest on expression data.
Retain both - possibly, the rest of
the protein contributes to binding
flanking residues. This is the ChIP
motif that scores highest on
expression data.
High-scoring motifs are all similar,
with characteristic APSES GC core
and palindromic. PBM motifs score
highest on ChIP-seq data, while
ChIP-chip motif 393 (which
contains flanking G/C residues)
scores highest on expression data.
Retain both - possibly, the rest of
the protein contributes to binding
flanking residues. This is the
higher-scoring PBM motif (2153).
Likely represents Abf1 binding site.
Motifs are largely all different from
each other. PBM motif 2154 scores
highly on ChIP data and resembles
classic TAAT homeobox core. Note
that PBM motif 794 even more
strongly resembles homeobox
(TAATTA) but scores slightly less
highly.
Almost all motifs match classic
HLH E-box. PBM motif 2222 has
highest match to both ChIP-chip
and expression data, without being
circular.
See Oaf1-Pip2-dimer
Dubious Unlikely to be true TF.
ChIP-chip motif 2064 almost
matches the literature site, which
has been confirmed by directed
experimentation, and scores highest
on most measures. But, give it low
confidence - it is not at all clear that
YKL015W
PUT3
2065 Medium
YKL015W
PUT3
2223 Medium
YPR186C
PZF1
1321 Low
YCR066W
RAD18
0
this is an optimal binding site, and
none of the scores for any of the
motifs are all that high.
Motifs vary considerably. ChIP
motif 2065 is a dimeric/(trimeric?)
GAL4-like site, and has the highest
correspondence to ChIP-chip data
(from which it is derived) and some
correspondence to expression data
(although it is not strong). PBM
motif 2223 is a monomeric GAL4like motif and has higher
correspondence to expression data,
albeit weaker (but still good)
correspondence to ChIP-chip data.
It is possible that the actual
sequence preference is some other
arrangement of monomeric sites
that were not picked up in either
assay - score as medium
confidence.
Motifs vary considerably. ChIP
motif 2065 is a dimeric/(trimeric?)
GAL4-like site, and has the highest
correspondence to ChIP-chip data
(from which it is derived) and some
correspondence to expression data
(although it is not strong). PBM
motif 2223 is a monomeric GAL4like motif and has higher
correspondence to expression data,
albeit weaker (but still good)
correspondence to ChIP-chip data.
It is possible that the actual
sequence preference is some other
arrangement of monomeric sites
that were not picked up in either
assay - score as medium
confidence.
This is a single literature site. The
protein almost certainly binds the
site but it has not been
demonstrated that this is an optimal
binding site.
Dubious Unlikely to be true TF.
YNL216W
RAP1
254 High
YMR075W
RCO1
1066 Low
YOR380W
RDR1
756 Medium
YOR380W
RDR1
2158 High
YCR106W
RDS1
506 High
YPL133C
RDS2
757 Medium
Most motifs look similar. ChIPchip motif 254 has highest
correspondence to expression data.
There is no evidence that this is a
sequence-specific DNA-binding
protein rather than a chromatin
factor. The higher-scoring ChIPDubious chip motif appears to have low
information content and does not
display strong correspondence to
the data it was generated from or to
expression data.
All motifs are related except 1851.
PBM motif 2158 is monomeric and
has highest correspondence to
ChIP-chip data. The literature motif
756 consists of two back-to-back
and slightly overlapping versions of
the monomeric PBM motif. There
is no evidence for direct binding in
this specific spacing and
orientation; however, the results of
mutations in reporters indicate that
both copies are necessary for
induction in the mutant. Retain both
motifs.
All motifs are related except 1851.
PBM motif 2158 is monomeric and
has highest correspondence to
ChIP-chip data. The literature motif
756 consists of two back-to-back
and slightly overlapping versions of
the monomeric PBM motif. There
is no evidence for direct binding in
this specific spacing and
orientation; however, the results of
mutations in reporters indicate that
both copies are necessary for
induction in the mutant. Retain both
motifs.
All motifs look similar. PBM motif
506 has a higher score on ChIPchip than any of the ChIP-chip
derived motifs.
All motifs contain CGG. PBM
motif 2226 appears to be a
YPL133C
RDS2
2226 Medium
YBR049C
REB1
907 High
YBR267W
REI1
489 High
YLR176C
RFX1
496 Medium
YLR176C
RFX1
1478 Medium
monomeric version of literature
motif 757. However, the paper that
produced motif 757 did not
demonstrate that this is an optimal
binding site. Retain both motifs and
give them a "medium" confidence.
All motifs contain CGG. PBM
motif 2226 appears to be a
monomeric version of literature
motif 757. However, the paper that
produced motif 757 did not
demonstrate that this is an optimal
binding site. Retain both motifs and
give them a "medium" confidence.
All motifs are similar. ChIP-chip
motif 907 has highest
correspondence to both ChIP-chip
and expression data, and strongly
resembles MITOMI and PBM
motifs.
PBM motif looks like a yeast C2H2
motif (row of C's); highly
significant relationship to ChIPchip data
Curious case - virtually all motifs
are similar in appearance, with a
common TGGCAAC core. They
range from what appear to be
monomers to full dimers, with
multiple partial forms. However,
none of them scores highly on both
ChIP-chip and expression data.
Select two representatives: one that
scores well on ChIP-chip, and one
that scores well on expression. This
is the one that scores most highly
on expression (close 2nd on
deletion and 1st on overexpression).
It is the only purely monomeric
motif. Give medium confidence,
since according to the literature this
protein should bind as a dimer.
Curious case - virtually all motifs
are similar in appearance, with a
common TGGCAAC core. They
range from what appear to be
YMR182C
RGM1
531 High
YKL038W
RGT1
2227 High
YHL027W
RIM101
600 High
YPL089C
RLM1
419 Medium
YPL089C
RLM1
1079 Incorrect
YGR044C
RME1
273 Low
monomers to full dimers, with
multiple partial forms. However,
none of them scores highly on both
ChIP-chip and expression data.
Select two representatives: one that
scores well on ChIP-chip, and one
that scores well on expression. This
is the one that scores most highly
on ChIP-chip. It is a dimer motif.
Give medium confidence, since it
has little relationship to expression
data.
PBM motif 531 looks like a C2H2
motif (row of G's), and scores well
on both ChIP-chip and deletion
expression data.
PBM motif 2227 is very similar to
"traditional" motif and to
monomeric GAL4 motifs, and
scores highest on ChIP-chip data.
All PBM motifs are similar.
ChIP-chip motif 600 is almost
identical to PBM motif 513, but
scores slightly higher on expression
data. Three of six motifs are very
similar.
Motif 419 has a MADS-like
appearance, and scores very highly
in ChIP-chip data, despite being
derived from the literature. Not
much correspondence to expression
however, hence Medium
confidence. ChIP-chip motif 910
does slightly better on expression
but to me is not a credible MADS
box binding site.
Likely represents Mcm1 binding
site.
Motif 273 shows similarity to RME
response elements (RREs),
GTACC(T/A)ACAAAA (in fact it
is derived from them). The fact that
RME has three C2H2 zinc fingers
and also requires an additional Cterminal region for binding in vitro,
together with its relatively large
YPR065W
ROX1
1396 High
YER169W
RPH1
547 High
YIL119C
RPI1
0
YDL020C
RPN4
1090 Incorrect
YDL020C
RPN4
1700 High
YDR303C
RSC3
580 High
footprint, are consistent with such a
large binding site. However, I gave
this motif a "low" score as there is
no systematic analysis in vivo or in
vitro indicating that these are really
the most preferred sites. It would be
valuable to redo the in vitro and in
vivo experiments under appropriate
conditions.
About half the motifs have a typical
ACAAT Sox core. MITOMI motif
1396 has highest correspondence to
both ChIP-chip and deletion
expression data.
About half of the motifs look
similar to each other, with GGGG
core typical of many yeast C2H2
proteins. PBM motif 547 has
meaningful scores on both ChIPchip and mutant expression data.
I'm somewhat concerned that motif
279 lacks two A residues captured
by both PBM experiments.
It is not clear that this is a
sequence-specific DNA-binding
protein; it contains no DNADubious binding domain and has no known
in vitro sequence specificity. The
motif comes only from ChIP-chip
so scoring on ChIP-chip is circular.
Likely represents Reb1 binding site.
In vitro motifs do not contain the
TTT sequence on the end. But they
were derived from the DBD only.
The rest of the protein may
contribute to binding the TTT
segment. Motif 1700 has the
highest correspondence to ChIPchip and expression and GO.
PBM motif 580 has best
correspondence to expression data the only significant independent
criterion - considering that the
correlations are all in the same
orientation (they are not for 2165).
All motifs look similar. Propose
YHR056C
RSC30
2164 Medium
YFR037C
RSC8
0
YJR127C
RSF2
575 High
YOL067C
RTG1
1493 Low
YOL067C
RTG1
1494 Low
YBL103C
RTG3
870 Low
that longer motifs could be due to
multiple binding sites in the same
sequence.
Arbitrary choice - all PBM motifs
look similar (and resemble motif
from homolog Rsc3). I have
downgraded this one from high to
medium because the best scoring
motif actually looks the least like
the Rsc3 motif.
Dubious Unlikely to be true TF.
No supporting data, but the PBM
motif 575 looks like a typical yeast
C2H2 motif (Adr1, which has
similar zinc fingers, Mig1, etc).
1493 and 1494 are a toss-up and
could represent different
dimerization partners, conceivably.
Similar to 1445 and 1446 above.
Retain both but give low
confidence.
1493 and 1494 are a toss-up and
could represent different
dimerization partners, conceivably.
Similar to 1445 and 1446 above.
Retain both but give low
confidence.
Only the PBM motif is a classic
HLH motif. Three different ChIPchip-derived motifs are all diverse,
but all score highly on ChIP-chip
data! Are they motifs of other TFs?
Check. 602: GCN4; 1095, TEC1;
1096: resembles 602, but is a closer
match to CUP9/TOS8. Also hits
GCN4. According to the literature
(PMID: 9032238) the core binding
site for the Rtg1p-Rtg3p
heterodimer is 5'-GGTCAC-3'; the
only motif that resembles this is
1446. Vague resemblance to 602
and 1096. I am going to retain
1446, which represents the
literature site; PBM motif 870,
which resembles an E-box, and
ChIP-chip motif 1445, which scores
YBL103C
RTG3
1445 Low
YBL103C
RTG3
1446 Low
YOR077W
RTS2
0
highest on ChIP-chip data. But give
all low confidence.
Only the PBM motif is a classic
HLH motif. Three different ChIPchip-derived motifs are all diverse,
but all score highly on ChIP-chip
data! Are they motifs of other TFs?
Check. 602: GCN4; 1095, TEC1;
1096: resembles 602, but is a closer
match to CUP9/TOS8. Also hits
GCN4. According to the literature
(PMID: 9032238) the core binding
site for the Rtg1p-Rtg3p
heterodimer is 5'-GGTCAC-3'; the
only motif that resembles this is
1446. Vague resemblance to 602
and 1096. I am going to retain
1446, which represents the
literature site; PBM motif 870,
which resembles an E-box, and
ChIP-chip motif 1445, which scores
highest on ChIP-chip data. But give
all low confidence.
Only the PBM motif is a classic
HLH motif. Three different ChIPchip-derived motifs are all diverse,
but all score highly on ChIP-chip
data! Are they motifs of other TFs?
Check. 602: GCN4; 1095, TEC1;
1096: resembles 602, but is a closer
match to CUP9/TOS8. Also hits
GCN4. According to the literature
(PMID: 9032238) the core binding
site for the Rtg1p-Rtg3p
heterodimer is 5'-GGTCAC-3'; the
only motif that resembles this is
1446. Vague resemblance to 602
and 1096. I am going to retain
1446, which represents the
literature site; PBM motif 870,
which resembles an E-box, and
ChIP-chip motif 1445, which scores
highest on ChIP-chip data. But give
all low confidence.
Homolog of Kin17; not a typical
Dubious
C2H2 zinc finger. Believed to be
YBL052C
SAS3
0
YOR140W
SFL1
605 Medium
YOR140W
SFL1
839 Medium
YLR403W
YLR403W
SFP1
SFP1
357 Incorrect
621 Incorrect
YLR403W
SFP1
797 High
YLR403W
SFP1
1100 Incorrect
"chromatin-associated proteins
involved in UV response and DNA
replication". No evidence for
sequence-specific DNA-binding.
Single ChIP-chip motif does not
have strong correspondence to the
data from which it is derived.
Dubious Unlikely to be true TF.
None of the motifs are highly
related to each other. But, most
share a GAAG core and are
otherwise A-rich. The PBM motif
839 in particular is compatible with
the putative binding sites that are
mutated in PMID 17594096, and it
scores well on ChIP-chip. Other
motifs may represent different
multimerization configurations.
ChIP-chip motif 605 also scores
well on ChIP-chip data, which is
circular, but I will retain it for
completeness.
None of the motifs are highly
related to each other. But, most
share a GAAG core and are
otherwise A-rich. The PBM motif
839 in particular is compatible with
the putative binding sites that are
mutated in PMID 17594096, and it
scores well on ChIP-chip. Other
motifs may represent different
multimerization configurations.
ChIP-chip motif 605 also scores
well on ChIP-chip data, which is
circular, but I will retain it for
completeness.
Likely represents Rap1 binding site.
Likely represents Rap1 binding site.
Most ChIP-seq studies identified
the Rap1 motif. PBM motif 797 is
less significant by ChIP-seq
(although still highly significant)
but is the winner across the board
for all types of expression data.
Likely represents Rap1 binding site.
YLR403W
SFP1
1710 Incorrect
YNL257C
SIP3
0
YJL089W
SIP4
573 Medium
YJL089W
SIP4
2067 Medium
YKR101W
SIR1
0
Dubious
YDL042C
SIR2
0
Dubious
YLR442C
SIR3
0
Dubious
YDR227W
SIR4
0
Dubious
Dubious
Likely represents Rap1 binding site.
Sip3 is a protein that "transcription
through interaction with DNAbound Snf1p" (SGD); no DNAbinding domain and no evidence for
direct interaction with DNA or
intrinsic sequence specificity.
PBM motif 573 is a monomeric
GAL4-type motif (others appear
dimeric) but it has good
correspondence to ChIP-chip data.
Only a few of the dimeric sites are
more significant - the motif from in
vivo analysis (PMID: 14685767)
does not score as highly as 2067
from ChIP-chip data, but they look
very similar. This is 573, the
presumed monomeric site
PBM motif 573 is a monomeric
GAL4-type motif (others appear
dimeric) but it has good
correspondence to ChIP-chip data.
Only a few of the dimeric sites are
more significant - the motif from in
vivo analysis (PMID: 14685767)
does not score as highly as 2067
from ChIP-chip data, but they look
very similar. This is 2067, the
presumed dimeric site.
There is no evidence that the SIR
proteins are sequence-specific
DNA-binding proteins. Most of the
motifs for them are Rap1 sites.
There is no evidence that the SIR
proteins are sequence-specific
DNA-binding proteins. Most of the
motifs for them are Rap1 sites.
There is no evidence that the SIR
proteins are sequence-specific
DNA-binding proteins. Most of the
motifs for them are Rap1 sites.
There is no evidence that the SIR
proteins are sequence-specific
DNA-binding proteins. Most of the
motifs for them are Rap1 sites.
YDR409W
SIZ1
0
YHR206W
SKN7
380 High
YHR206W
SKN7
583 High
YNL167C
SKO1
1401 High
YPR054W
SMK1
1875 Low
YBR182C
SMP1
864 Medium
Dubious Unlikely to be true TF.
Motifs are remarkably discordant
considering that they all resemble
each other in being G+C rich and
containing a GGCC core. Possibly
reflecting different modes of
multimerization? Include the two
that score highest on independent
data: PBM motif 583, which
represents a monomer, and ChIPchip motif 380, which appears to
represent a dimer.
Motifs are remarkably discordant
considering that they all resemble
each other in being G+C rich and
containing a GGCC core. Possibly
reflecting different modes of
multimerization? Include the two
that score highest on independent
data: PBM motif 583, which
represents a monomer, and ChIPchip motif 380, which appears to
represent a dimer.
The MITOMI motif 1401 is an
offset and asymmetric version of
the traditional consensus
(TGACGTCA) but has a higher
ChIP-chip and expression
correspondence than the motifs that
are more symmetric.
I could not find any evidence that
this protein binds directly to DNA.
There is only one motif derived
Dubious
from ChIP-chip but it bears little
relationship to the data from which
it was derived.
PBM motif 864 scores highest on
ChIP-chip and expression data. I
gave it a medium, however, because
it has low information content at
most positions, does not closely
match the literature motif (although
the literature motif does not mach
ChIP-chip or expression data), and
also does not resemble that of
YDR477W
SNF1
1110 Medium
Dubious
YOR290C
YCR033W
SNF2
SNT1
0
0
Dubious
Dubious
YGL131C
SNT2
612 Low
Dubious
YOR308C
SNU66
0
Dubious
YMR016C
SOK2
404 High
YJL127C
SPT10
1880 Low
YER148W
SPT15
798 High
RLM1, which according to the
literature should be related.
Motif 1110 has a quite strong
correspondence to ChIP-chip data
(from which it is derived).
However, there seems to be no
evidence that this is a sequencespecific DNA-binding protein.
Aside from a weak relationship to
expression data there is no
corroborating evidence here (and no
DNA-binding domain).
Unlikely to be true TF.
Unlikely to be true TF.
All three motifs are derived from
the same ChIP-chip data. However,
there is no corroborating data, and
not all SANT domains are DNAbinding - or are non-specific, in
chromatin proteins. So it could be a
cofactor motif; in fact it is similar to
motifs of Stp3 and Stp4. The
protein has other chromatin-related
domains (BAH, PHD/RING).
Hence the "Low" assessment.
Unlikely to be true TF.
ChIP-chip motif 404 has highest
correspondence to both ChIP-chip
and expression data - and strongly
resembles PBM motif
This is the protein that binds
histone promoters. The sequence
specificity is derived from the
histone promoters only so the
literature motif may be inaccurate.
Motif 1880 has higher scores
overall but does not resemble the
literature motif. Uncertain what to
do here - use 1880, but give low
confidence. Motif learned in vivo
could contain extrinsic information.
This is TATA-binding protein.
PBM motif 798 chosen because
1326 was derived from the 96-
YER161C
SPT2
1114 Low
Dubious
YKL020C
SPT23
670 Low
Dubious
YCR018C
SRD1
2232 Medium
YNL309W
STB1
0
YMR053C
STB2
710 Incorrect
YMR053C
STB2
710 Low
YDR169C
STB3
2233 High
YMR019W
STB4
2107 High
Dubious
Dubious
sequence TIRF-PBM array instead
of a full 40K PBM
I could not find any evidence that
this protein binds directly to DNA.
None of the motifs is significant.
All are from ChIP-chip. Motif 1114
chosen simploy because it has the
highest numbers overall.
I could not find any evidence that
this protein binds directly to DNA.
It has an IPT domain but no REL
domain. None of the ChIP-derived
motifs scores highly on ChIP data
or anything else. Motif 670 bears
some relationship to expression
data.
PBM studies yield nearly identical
motifs. 2232 closely resembles
motif from related GATA factors
and scores highest overall. This is
an unusual motif for the GATA
class; hence medium confidence
level.
No direct evidence that this is a
DNA-binding protein. It binds Swi6
and the ChIP motifs all resemble
Swi4 binding sites.
Likely represents Reb1 binding site.
No direct evidence that this is a
DNA-binding protein. Three ChIPderived motifs but none scores
highly by any measure. Motif 710
is an arbitrary choice - looks tidy.
STB3 binds RRPE element
(AAAAATTT) both in vivo and in
vitro (PMID 17616518). PBM
motifs 810 and 2233 strongly
resembles the RRPE element,
scores significantly in deletion
expression data, and nail the GO
categories "nucleolus" and
"ribosome biogenesis". 2233 gets
slightly higher scores.
PBM motif 2107 is clearly a
dimeric GAL4-class motif, and it
YHR178W
STB5
1405 High
YHR178W
STB5
2068 Medium
YKL072W
STB6
0
YHR084W
STE12
400 High
YDR463W
STP1
660 High
blows all the other motifs out of the
water.
All motifs have CGG core and most
have CGGnG. Most ChIP-derived
motifs have no relationship to
expression data. Mitomi motif 1405
and PBM motif 514 score decently
on both ChIP-chip and expression
data, and seem to nail the GO
category (oxidative stress
response), and look like classic
Gal4 halfmers. MITOMI motif
scores slighly higher overall. This is
presumably the monomeric motif
All motifs have CGG core and most
have CGGnG. Most ChIP-derived
motifs have no relationship to
expression data. Motif 2068 scores
highest overall; looks a bit unusual
for a Gal4 class motif but also does
well on expression data. Retain as
potential dimer motif, although it
may also incorporate extrinsic
information.
It is not clear that this is a
sequence-specific DNA-binding
protein; it contains no DNAbinding domain and has no known
Dubious in vitro sequence specificity. The
motif comes only from ChIP-chip
so scoring on ChIP-chip is circular.
The ChIP-chip motif looks a little
like a Rap1 motif.
All motifs but one resemble the
canonical literature site. Motif 400
is derived from ChIP-chip data (on
which it scores highest) but also
scores highest on expression data.
STP1 and 2 have very similar
DNA-binding domains. However,
they are not similar to those of
STP3 and 4. PBM motif for STP2
(800) correlates with ChIP-chip and
expression data. ChIP-chip motif
for STP1 (660) most strongly
resembles motif 800, and scores
YHR006W
STP2
2174 High
YLR375W
STP3
568 Medium
YDL048C
STP4
559 Medium
YPR086W
SUA7
1327 Low
highly on ChIP-chip data. In
addition, these motifs resemble
halfmers of literature-derived
binding sites.
STP1 and 2 have very similar
DNA-binding domains. However,
they are not similar to those of
STP3 and 4. PBM motif for STP2
(2174) correlates highest with
ChIP-chip and expression data.
ChIP-chip motif for STP1 (660)
most strongly resembles motif 800,
and scores highly on ChIP-chip
data. In addition, these motifs
resemble halfmers of literaturederived binding sites.
STP3 and 4 have very similar
DNA-binding domains. However,
they are not similar to those of
STP1 and 2; the next most closely
related are SWI5 and ACE2, with
major differences in the recognition
alpha helices. All of the STP4
motifs are different from each other
and none have any supporting data.
There is only one motif for STP3
(568) from PBM and it matches the
STP4 motif from the same study
(559) which is the basis for
choosing these two motifs.
STP3 and 4 have very similar
DNA-binding domains. However,
they are not similar to those of
STP1 and 2; the next most closely
related are SWI5 and ACE2, with
major differences in the recognition
alpha helices. All of the STP4
motifs are different from each other
and none have any supporting data.
There is only one motif for STP3
(568) from PBM and it matches the
STP4 motif from the same study
(559) which is the basis for
choosing these two motifs.
This protein is not expected to bind
Dubious
DNA; it is supposed to bind DNA-
YDR310C
SUM1
383 High
YDR310C
SUM1
478 High
YGL162W
SUT1
673 Medium
YPR009W
SUT2
2236 High
YGR002C
YPL016W
YJL176C
SWC4
SWI1
SWI3
0
0
0
YER111C
SWI4
584 High
YDR146C
SWI5
569 High
YLR182W
SWI6
0
YPL128C
TBF1
2178 High
Dubious
Dubious
Dubious
Dubious
bound TBP. The TIRF-PBM data
used to generate the motif included
only 96 sequences.
This is the motif for the FL SUM1;
scores highest on ChIP-chip and
resembles the canonical literature
motif; also has some relationship to
deletion expression data
This is the motif for the SUM1
AT_hook; scores highest in deletion
expression data
Four motifs, all derived from ChIPchip, contain CGG, but are unusual,
with degeneracy and a core of
CGGGG. Correlate somewhat with
both OE and deletion data,
however.
Highest-scoring motif (PBM) is a
classical GAL4-type monomeric
motif and is very significant in
ChIP-chip
Unlikely to be true TF.
Unlikely to be true TF.
Unlikely to be true TF.
Motif is well-characterized and
most published motifs match the
expected one. PBM motif (584)
scores highly (although not highest)
in Chip-chip data. It is, however,
non-circular, and specifically
captures "DNA metabolic process"
in GO analysis.
PBM, Chip-chip, and conservation
all yield similar motifs. ChIP-chip
scores highest in ChIP-chip but that
is circular. Choose PBM motif 569
which is nearly identical.
Swi6 is a cofactor, not a DNAbinding protein. These motifs are
for Mbp1 or Swi4.
All motifs, obtained by three
different means, are all very similar,
although there is no ChIP or
expression support for any of them.
TBP-TFIIA
TBP-TFIIA
1328 Low
TBP-TFIIATFIIB
TBP-TFIIATFIIB
1330 Medium
TBP-TFIIB
TBP-TFIIB
1329 Medium
YBR150C
TBS1
552 High
YBR150C
TBS1
2179 High
YOR337W
TEA1
817 Medium
YBR083W
TEC1
815 High
YDR362C
TFC6
0
Went with 2178, which is the
BEEML output.
The TIRF-PBM data used to
generate the motif included only 96
sequences. Also it is curious that
there is no TATA sequence in the
logo.
The TIRF-PBM data used to
generate the motif included only 96
sequences; hence, medium
confidence.
The TIRF-PBM data used to
generate the motif included only 96
sequences; hence, medium
confidence.
Two motifs from PBMs are nearly
identical GAL4-class motifs with
defined spacing and orientation.
Motif 552 has slightly higher
scores. Two motifs from BEEML
analysis of PBM data give
monomeric motif - also give this
high confidence.
Two motifs from PBMs are nearly
identical GAL4-class motifs with
defined spacing and orientation.
Motif 552 has slightly higher
scores. Two motifs from BEEML
analysis of PBM data give
monomeric motif - also give this
high confidence.
Three motifs, all from PBMs.
Choose 817 because it has a more
robust GAL4 "CGG" core. But
there is no convincing
corroborating data for either motif
and they do not match each other.
All motifs agree, and are significant
by several criteria. PBM motif 815
has the second-highest scores
overall, and it is non-circular for in
vivo binding. Also has highest GO
score.
Dubious Unlikely to be true TF.
YBR240C
THI2
1449 High
YER063W
THO1
0
YNL139C
THO2
786 Low
YBL054W
TOD6
852 High
YGL096W
TOS8
494 High
YNL079C
TPM1
0
YOR344C
TYE7
397 High
YDL170W
UGA3
486 Medium
This is a GAL4-class protein. All
motifs are ChIP-chip derived, none
resembles each other. 1449 is the
only one with respectable scores on
ChIP and expression,and it also has
the appearance of a GAL4 class
motif..although, the structural prior
presumably forces it to have this
property.
Dubious Unlikely to be true TF.
It is not clear that this is a
sequence-specific DNA-binding
protein; it contains no DNADubious
binding domain and has no known
in vitro sequence specificity. The
motif comes only from ChIP-chip.
Two PBM motifs largely agree; 852
has higher correspondence to
expression data while 495 has
higher correspondence to ChIPchip. Use 852; score is way higher.
Also for GO.
No corroborating data on this TF,
and only one PBM motif known
and one ChIP motif. But, it
resembles TGTCA, which was also
obtained for paralog Cup9 by
multiple approaches (GTGNCA), as
well as PBM results for the
Meis/Mrg/Pknox/Tgif family,
which are the closest mammalian
homologs. The ChIP motif (1902)
does not resemble a homeodomain
binding sequence, and scores lower
on expression data.
Dubious Unlikely to be true TF.
All studies except one get canonical
HLH motif. 795 (PBM) is nearly
tied for best ChIP-chip score with
the best ChIP-chip motif. Still,
ChIP motif 397 scores higher, and
looks identical, but with fewer
flanking empty positions.
Appears to be a monomeric GAL4class motif. Derived from PBM
YDL170W
UGA3
651 High
YPL139C
UME1
1143 Low
YDR207C
UME6
2239 High
YDR213W
UPC2
544 High
YDR520C
URC2
553 High
YPL230W
USV1
509 High
data, scores highly in ChIP-chip
data, but not as high as the dimeric
site derived from the ChIP-chip
data.
Appears to be a dimeric GAL4class motif. Scores highest in ChIPchip data, but is derived from the
same data. GO seems to match
known function!
It is not clear that this is a
sequence-specific DNA-binding
protein; it contains no DNAbinding domain and has no known
in vitro sequence specificity. The
motif comes only from ChIP-chip.
But, it has a high P-value, and the
Dubious
motif has low similarity to other
motifs, with the possible exception
of Yox1. But the function of the
protein is very different from that of
Yox1. Tough call - leave as
Dubious, but give Low confidence
to motif 1143.
All motifs are similar to each other.
BEEML-PBM motif 2239 scores
highest across the board.
The SRE is bound by UPC2 and the
"canonical" sequence is
TCGTATA. However, the more
degenerate version obtained by
PBM (motif 544) scores better in
both expression analysis and OE
experiments. Newer motif 2109
scores better on ChIP-chip, but
lower on expression, and the SRE is
well-characterized....I think this one
deserves further experimental
analysis.
This is a monomeric GAL4-class
motif. Two PBM studies essentially
agree, and have some relationship
to ChIP-chip data. No other
informative data.
Two PBM studies essentially agree
on classical C2H2 GGGGcontaining motif. Chose 509
YIL056W
VHR1
2091 Medium
YDR485C
VPS72
0
YML076C
WAR1
325 Low
YOR230W
WTM1
1148 Low
YOR229W
WTM2
0
YIL101C
XBP1
2039 High
YML007W
YAP1
2186 High
YHL009C
YAP3
672 High
because it scores much higher on
both ChIP and expression data.
PBM motif has high score on GO
because it looks a lot like Gcn4
Dubious Unlikely to be true TF.
None of the motifs are convincing,
but at least sequences with the
literature motif have been
experimentally confirmed to bind
the protein (even if it is not shown
that this is the optimal binding site)
It is not clear that this is a
sequence-specific DNA-binding
protein; it contains no DNADubious binding domain and has no known
in vitro sequence specificity. The
motifs come only from ChIP-chip
so scoring on ChIP-chip is circular.
It is not clear that this is a
sequence-specific DNA-binding
protein; it contains no DNADubious binding domain and has no known
in vitro sequence specificity. The
motif comes only from ChIP-chip
so scoring on ChIP-chip is circular.
PBM and in vitro selection-derived
motifs have highest scores across
the board. 842 is higher on GO, but
only slightly in AUC, and it has a
very large number of empty
flanking bases. 2039 (in vitro
selection) seems a reasonable
compromise - it's highest on ChIP
and almost the highest on
expression.
PBM motif 2186 looks like a
monomeric bZIP site but it has the
highest scores on both ChIP and
expression
ChIP-chip yields a classic 7-mer
Yap motif that scores well on ChIP
and significantly on expression.
Could be a heterodimer. Chose 672
over 1463 because it has a higher
YHL009C
YAP3
1411 High
YIR018W
YAP5
777 High
YIR018W
YAP5
896 Incorrect
YDR259C
YAP6
599 High
YOL028C
YAP7
1414 High
YOL028C
YAP7
1737 High
YDR451C
YHP1
716 High
YML027W
YOX1
498 High
score on expression data, which is
independent.
Mitomi yields a nearly palindromic
8-mer motif with strong similarity
to that of Yap6. PBM motif is
similar but appears to be partial.
ChIP-chip yields a classic 7-mer
Yap motif that scores well on ChIP
and significantly on expression.
Likely represents Rap1 binding site.
PBM and ChIP-chip can derive
basically the same motif, which is a
classical YAP motif. They score
similarly on all criteria. The ChIPchip motif (599) has fewer lowinformation flanking bases.
8-base bZIP core. Obtained by
Mitomi, so this is a homodimer.
Higher correspondence to
unstressed ChIP-chip data. Little
literature on this protein. 1414
chosen for higher ChIP-chip overall
scores; plus, it is a palindrome as
expected for a bZIP protein.
7-base bZIP core. Obtained in
ChIP-chip studies and higher
correspondence to stressed ChIPchip data. Possible heterodimer?
Little literature on this protein.
1737 chosen because it is largely
symmetric and has highest score for
both stressed and unstressed
Harbison data, also, higher GO
score
ChIP-chip, EMSA, and one-hybrid
all arrive at a classic homeodomain
TAATTG motif. Microarray
enrichment motif (716) scores
higher on OE data from another
study than ChIP motifs do, and
does nearly as well on ChIP data.
Two PBM studies and Pramila et al.
(PMID 12464633) agree on classic
homeodomain TAATTA motif. All
three correlate with expression
YOR172W
YRM1
813 High
YOR162C
YRR1
2245 High
YJL056C
ZAP1
2097 High
change and OE. Motif 453 is not a
direct measurement so choose PBM
motif that is the same length as the
typical homeodomain footprint 498 also correlates best with OE
data; expression scores are skewed
low by the large number of cellcycle measurements.
Two PBM studies largely agree on
classic GAL4-class monomeric
motif. Motif 813 has indications of
spacing and orientation of dimeric
protein.
Classic monomeric GAL4-class
motif. PBM studies agree and score
significantly on Harbison data. No
other motifs have
spacing/orientation except
11909958, but even the authors of
this study note that "Only half a
dyad seems to be conserved in this
consensus sequence". 2245 scores
highest in Harbison data.
Most motifs are similar but do not
exceed confidence thresholds on
any data type. PBM motif 2097 has
highest score for ChIP and
expression, and is not circular