Download Finding Promoters other important genomic sequences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Deoxyribozyme wikipedia , lookup

Genomic imprinting wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

RNA silencing wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Transcription factor wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epitranscriptome wikipedia , lookup

Non-coding RNA wikipedia , lookup

Lac operon wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome evolution wikipedia , lookup

Point mutation wikipedia , lookup

Molecular evolution wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Gene desert wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Transcript
Finding Promoters other
important genomic sequences
Lecture 10
Introduction
•
•
•
•
Purpose of Promoter analysis
Finding Prokaryotic promoters
Finding Eukaryotic promoters
Two basic approaches to finding promoters
and other regulatory elements
• Brief reference to some other interesting
sequence regions
Promoter Analysis
• The existence of a “potential” ORF indicates the
presence of a near by promoter.
• Promoter are essential elements of the DNA
sequence. The can be upstream/downstream of the
protein coding sequence (CDS) and are essential in
the binding of RNA polymerase and other factors that
initiate the transcription process.
• They exist in both eukaryotic and prokaryotic
organisms.
Global Sequence
3
Proximity of promoters
• Promoters in prokaryotes have well defined
b.p. sequences (motifs) upstream of the CDS
(true ORF):
• The Pribnow box: TATAAT at position -10
• A TTGACA at position -35
• An AT rich region before -35 position box.
– The -10/-35 refer to the bp upstream of the
transcription start site.
– Analyse the E.Coli Pal gene and see if you can find
the promoter region and indicate the transcription
start site (TSS).
ORF prokaryotics (Pal Gene E.Coli)
Adapted Understanding bioinformatics 9.3
Basic promoter prediction program
• Modify your exisiting code to search for
possible promoter regions. And determine the
distance from the beginning of the promoter
to the start codon.
• Analyse the region near the Pal gene (CDS and
promoter) and propose any other interesting
fact about the consequence of the high gene
density.
Eukaryotic gene promoters.
• Eukaryotic promoters are more complex and can
often be located long distances from the
transcription start site (TSS):
– While the core promoter is not as well defined it can
contain. …
• TATA box
• CAAT box
• GC rich regions
– Generally it is in close proximity to the Transcription
initiation region.
• Some programs consider a promoter region is correct if its:
– 200bp 5’ end
– 100bp 3’ end
Promoter Analysis
• Promoters characterisation (discovering
transcription factor binding patterns) takes two
basic approaches (Chapter 5 Baxevanis 2005):
– Pattern Driven Algorithms: depends on existing of
experimentally annotated data, in bioinformatics
databases, that relate to binding sites
– Care must be taken as this approach can lead to false
positives; binding site variability, short sequence
length.
– The analysis of the results must take into account the
surrounding region of the “putative” promoter site
Global Sequence
8
Promoter region of Eukaryotics genes
The figure below illustrates a number of eukaryotic promoters and illustrates the
variability. [klug 7th ed] . However it also illustrates the common features: TATA
box…
Example of Pattern Driven approach
Figure A and B show the results of patterns associated with the TATA box
Note a score of -8.16 must be obtained to classify it as a TATA box “region”
Figure C and D are associated with the DNA CAP signals (CAP is a transcriptional
activator). “do not confuse it with the 5’ RNA Cap (Cap and poly A tail )”
Promoter Analysis
• Sequence-driven algorithms: the assumption
that common, promoter / regulatory
[silencer/enhancer] functionality can be obtained
from underlying conserved, sequences.
– Genes that are co-regulation or co-expression provide
good candidates for obtaining data for this approach;
– Co-regulated genes (on/off), have the same
regulatory elements, often they contain similar
promoters/regulatory regions(an operon promoter is a
simple example of a common promoter)
– Genes that are Co-expression (on) also, could, have
similar promoter/regulatory regions.
•
Sequence Driven Approach
• The sequence driven approach can also be performed
across species. This approach can help regulatory sites;
enhancers/silencers as opposed to simply RNA
polymerase binding signals: the core promoter.
• Compare genes that are regulated in the same way or
with similar regulatory patterns and comparing
sequence: looking for matching segments/motifs.
• Baxevanis (p 129) highlights some problems with the
intaspecies approach can include:
– If background conservation is high difficult to detect such
sites.
– Some gene regions are more conserved than others.
– Some important regulatory elements are not conserved
across species.
Other regions: repeating elements
•
Tandem repeats: these are sequences that are repeated many times throughout
the DNA sequence. These sequences are often associated with CDS region
[Baxevanis p 297]
•
Inverted repeats: These are repeating sequences but are inverted and on opposite
strands: Often associated with regulatory elements
– ATGC---– -----CGTA
•
There are many other patterns that can be searched for such as tRNA genes (refer
to E. Coli Pal gene figure in the lecture 9)…. But these are not covered here.
Interested readers can refer to chapter 9 and chapter 10 in understanding
Bioinformatics
•
SNPs (single nucleotide polymorphisms) associated with looking for a single BP
change in the CDS. These can be associated with certain diseases such as sicKle cell
anemia (a-> t and so glu->Val). This changes the structure of the haemoglobin.
They can also be used in the study of evolution and gene finger printing.
(Baxevanis chapter 7)
Potential exam questions
• The search for promoters is often used to help
indicate the validity of ORF.
– Explain two approaches that can be used to find
such regions
(10 marks)
– Describe the problems associated with each
approach
(8 marks)