Download Lecture 6

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Long non-coding RNA wikipedia , lookup

RNA interference wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Non-coding RNA wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Ridge (biology) wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Copy-number variation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

NEDD9 wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Transposable element wikipedia , lookup

Genomic imprinting wikipedia , lookup

Primary transcript wikipedia , lookup

Genomic library wikipedia , lookup

Gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Minimal genome wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Human genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Point mutation wikipedia , lookup

Gene desert wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Pathogenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Genomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome editing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Introduction to Bioinformatics II
Lecture 6
By Ms. Shumaila Azam
• Gene: A sequence of nucleotides coding
for protein
• Gene Prediction Problem: Determine the
beginning and end positions of genes in a
genome.
Gene Prediction: Computational Challenge
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgc
taatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcg
gctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggct
atgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccga
tgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcg
gctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgc
ggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctg
ggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcat
gcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctat
gctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcgg
ctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgaca
atgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctat
gctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaag
ctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatg
catgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggc
tatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatg
cggctatgctaagctcatgcgg
Gene!
Central Dogma: DNA -> RNA -> Protein
DNA
CCTGAGCCAACTATTGATGAA
transcription
RNA
CCUGAGCCAACUAUUGAUGAA
translation
Protein
PEPTIDE
Gene Prediction
• Gene finding is one of the first and most
important steps in understanding the genome of
a species once it has been sequenced.
• In computational biology gene prediction or gene
finding refers to the process of identifying the
regions of genomic DNA that encode genes.
– protein-coding genes
– RNA genes
– regulatory regions
Gene Prediction
• Statistical analysis of the rates of homologous
recombination of several different genes could
determine their order on a certain chromosome, and
information from many such experiments could be
combined to create a genetic map specifying the rough
location of known genes relative to each other.
• Determining that a sequence is functional should be
distinguished from determining the function of the
gene or its product.
– in vivo experimentation through gene knockout
– bioinformatics research are making it increasingly possible
to predict the function of a gene based on its sequence
alone.
Extrinsic approaches
• In extrinsic (or evidence-based) gene finding
systems, the target genome is searched for
sequences that are similar to extrinsic evidence in
the form of the known sequence of a messenger
RNA (mRNA) or protein product.
• Given an mRNA sequence, it is trivial to derive a
unique genomic DNA sequence from which it had
to have been transcribed.
• Given a protein sequence, a family of possible
coding DNA sequences can be derived by reverse
translation of the genetic code.
Extrinsic approaches
• Once candidate DNA sequences have been
determined, it is a relatively straightforward
algorithmic problem to efficiently search a
target genome for matches, complete or
partial, and exact or inexact.
• BLAST is a widely used system designed for
this purpose.
Ab initio approaches
• Ab Initio gene prediction is an intrinsic method based
on gene content and signal detection.
• Because of the inherent expense and difficulty in
obtaining extrinsic evidence for many genes, it is also
necessary to resort to Ab initio gene finding.
• genomic DNA sequence alone is systematically
searched for certain tell-tale signs of protein-coding
genes.
• These signs can be broadly categorized as either
signals, specific sequences that indicate the presence
of a gene nearby, or content, statistical properties of
protein-coding sequence itself.
Ab initio approaches
(prokaryotes)
• In the genomes of prokaryotes, genes have specific and
relatively well-understood promoter sequences
(signals).
• the sequence coding for a protein occurs as one
contiguous open reading frame (ORF).
• one would expect a stop codon approximately every
20–25 codons, or 60–75 base pairs, in a random
sequence.
• These characteristics make prokaryotic gene finding
relatively straightforward, and well-designed systems
are able to achieve high levels of accuracy.
Open Reading Frame Finder
(Input)
Output
Ab initio approaches
(Eukaryotes)
• Ab initio gene finding in eukaryotes, especially
complex organisms like humans, is considerably
more challenging.
• First: the promoter and other regulatory signals
in these genomes are more complex and less
well-understood.
• Two classic examples of signals identified by
eukaryotic gene finders are CpG islands and
binding sites for a poly(A) tail.
• Second: splicing mechanisms
Combined approaches
• combine extrinsic and ab initio approaches by
mapping protein and EST data to the genome
to validate ab initio predictions.
Comparative genomics approaches
• As the entire genomes of many different species are
sequenced, a promising direction in current research
on gene finding is a comparative genomics approach.
• This is based on the principle that the forces of natural
selection cause genes and other functional elements to
undergo mutation at a slower rate than the rest of the
genome.
• Genes can thus be detected by comparing the
genomes of related species.
• This approach was first applied to the mouse and
human genomes
GeneMarkS