Download DNA to RNA

Document related concepts

RNA polymerase II holoenzyme wikipedia , lookup

Protein moonlighting wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Genome evolution wikipedia , lookup

Protein adsorption wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Epitranscriptome wikipedia , lookup

Molecular cloning wikipedia , lookup

RNA silencing wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcriptional regulation wikipedia , lookup

DNA vaccination wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene wikipedia , lookup

Non-coding RNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

RNA-Seq wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Gene expression wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
Basic Biology for
Bioinformatics:
genes as information
The central dogma of molecular genetics
DNA to RNA to protein to phenotype
Protein functions, synthesis and structure
RNA synthesis and processing
DNA replication
Basics of transmission genetics
Note: many of the figures used in this presentation are copyrighted. Most are taken from "Genetics: From
Genes to Genomes" by Hartwell and colleagues (McGraw Hill)
Biology for bioinformatics:
Alignment of pairs of sequences
Multiple sequence alignment
Prediction of RNA secondary structure
Phylogenetic prediction
Database searching for sequences
Gene prediction
Analysis of microarray expression data
Protein classification
Protein folding / structure prediction
Genome analysis / databases
Genetic variation (haplotypes and allelic association)
What is it about DNA that allows it to carry information?
DNA polymerase
Alberts et al. Fig. 6-36
Molecular genetics:
genes as information
DNA -> RNA -> protein.
DNA is digital information.
Each nucleotide carries 2 bits of information.
Implications
Low-error propagation.
Complete representation in digital databases.
Aquisition of genetic information is the raw fuel behind the explosion of bioinformatics
Clelland et al. Nature 399:533. Hiding messages in DNA microdots.
"For it is not cell nuclei, not even individual chromosomes, but certain parts of certain
chromsomes from certain cells that must be isolated and collected in enormous
quantities for analysis; that would be the precondition for placing the chemist in such a
position as would allow him to analyze [the hereditary material] more minutely than
the morphologists."
- Theodor Boveri 1904
If the information in DNA is contained in
single molecules, how can we know about it?
We reduce the complexity of
the DNA by amplification and
use the power of
complementarity to detect
specific sequences by
hybridization.
Determination of the chromosomal location of TGx in the
human genome by fluorescent in situ hybridization.
(from Daniel Aeschlimann's web site (Univ. of Wales)
http://www.uwcm.ac.uk/study/dentistry/bds/staff/aeschlimann.htm
Microarrays
Array
Scan
Visualize
Analyze
from Konstantin V. Krutovskii and David B. Neale 2001
"Forest Genomics for Conserving Adaptive Genetic Diversity"
Photolithographic arrays
(Affymetrix)
from www.affymetrix.com
Each spot has an oligo with a
distinct sequence
Homologous proteins
conserve elements of
genetic information
(sequence).
New gene functions can arise from pre-existing gene functions
Related genes retain sequence similarity.
phenylketonuria
phenylalanine
buildup in the
brain can cause
mental retardation
Proteins:
enzymes
alkaptonuria
DNA to RNA to protein to phenotype
Proteins:
regulators
DNA to RNA to protein to phenotype
Structural
proteins
Ehlers-Danlos syndrome (joint hypermobility) is one of the
phenotypes associated with mutations in genes encoding collagen.
DNA to RNA to protein to phenotype
Proteins
What do
they do?
see http://www.ncbi.nlm.nih.gov/cgi-bin/COG/palox?fun=all
DNA to RNA to protein to phenotype
DNA to RNA to protein to phenotype
DNA to RNA to protein
DNA to RNA to protein
DNA to RNA to protein
DNA to RNA to protein to phenotype
DNA to RNA to protein to phenotype
Hydrogen bonds
within the protein
and the rigidity of
the peptide bond
are critical
determinants of
protein structure.
protein
a-helix
DNA to RNA to protein to phenotype
Molecular Biology of the Cell.
1994. Figure 3-30
Molecular Biology of the Cell.
1994. Figure 3-29
ß-sheet
DNA to RNA to protein to phenotype
NCBI provides information about proteins
GenBank flat file format for HA oxidase
GenBank fasta file format for HA oxidase
Links to other information about HA oxidase
The HA oxidase gene and its flanking
region on chromosome 3q21
OMIM: Alkaptonuria
is caused by mutations in HA oxidase
Conserved Domains
Three-dimensional
structure of the
protein, if known,
can be viewed.
Lectures 8 and 35
will cover types of
mutation in detail
Gene density in selected genomes
Species
Genome size
Gene #
Ave. Size
4,300
1.1 kb.
Saccharomyces cerevisiae 12.1
6,000
2.0 kb.
C. elegans
97
16,000
6.0 kb.
Arabidopsis
115
25,500
4.5 kb.
Drosophila melanogaster
120
13,600
8.8 kb.
Homo sapiens
3,200
75,000 ?
30,000
40.0 kb.
100.0 kb.
(Mb.)
Eschericia coli
4.7
CDS (coding sequence) sizes do not vary much at all, between 1.3 and 1.5 kb.
What's in the genome besides genes:
introns
What's in the genome besides genes:
remote regulatory DNA
Lecture 14 will cover
transcription in detail
DNA to RNA to protein to phenotype
DNA to RNA to protein to phenotype
DNA to RNA to protein
DNA to RNA
DNA to
RNA to
protein
DNA to RNA to protein
DNA to RNA to protein
DNA must be maintained.
Natural processes can degrade
the information in the DNA
DNA to RNA to protein
Cells and organelles
Molecular Biology of the Cell, third edition, panel 1-1
2C
46 chromosomes
per cell
4C
46 chromosomes
each with 2 duplexes
4C
92 chromosomes
Mitosis:
heterozygosity is
maintained
Meiosis results in
new combinations
of alleles
Mendel's laws of segregation
and independent assortment
come from meiosis
A
B
A
a
B
b
a
b
A
B
a
b
A
B
a
b
A
B
a
b
A
B
a
b
Recombination
A
B
A
a
b
B
a
b
A
B
a
B
A
b
a
b
A
B
a
B
A
b
a
b
Measuring rates of recombination.
Formal definition of linkage disequilibrium
If two loci have alleles A1, A2 with frequencies p1, p2 and B1, B2 with
frequencies q1, q2, there are four possible haplotypes (A1B1, A1B2,
A2B1, and A2B2). Let these frequencies be f1,1, f1,2, f2,1, f2.2.
If there is no linkage disequilibrium,
then f1,1 = p1 q1 , f1,2 = p1 q2 , and so on.
There are a number of measures of linkage disequilibrium.
One of them is D = f1,1f2.2 - f1,2f2.1.
Interpreting allelic association
The general case is described by an isolated population that has high frequencies (p and r
respectively) of both a disease-causing allele D1 and an unlinked marker M1. The descendents
of people who move from that population to a second population with different frequencies will
show association between D1 and M1 even though they are not linked.
The disease-causing allele is at a
high frequency in a small village.
p = .02, r = .5
Affected people in a nearby
city are more likely to have
other alleles, such as M1, that
are found in elevated
frequencies in that village
merely because they have
ancestors from that village.
p = .0001
r = .1
Biology for bioinformatics:
Alignment of pairs of sequences
Multiple sequence alignment
Prediction of RNA secondary structure
Phylogenetic prediction
Database searching for sequences
Gene prediction
Analysis of microarray expression data
Protein classification
Protein folding / structure prediction
Genome analysis / databases
Genetic variation (haplotypes and allelic association)
Next time:
more about the status of those problems
and current state of the art methods.
Tutorial II:
Monday, May 10, 2118 CSIC, 2:00 - 3:45