Download Bioinformatics Molecular Genetics

Document related concepts

Biochemistry wikipedia , lookup

Transformation (genetics) wikipedia , lookup

DNA supercoil wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Genomic library wikipedia , lookup

Gene regulatory network wikipedia , lookup

Genetic code wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Genetic engineering wikipedia , lookup

Epitranscriptome wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Community fingerprinting wikipedia , lookup

Biosynthesis wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Molecular cloning wikipedia , lookup

RNA-Seq wikipedia , lookup

Point mutation wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Non-coding DNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene expression wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Molecular ecology wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Bioinformatics
Molecular Genetics
David Gilbert
Bioinformatics Research Centre
www.brc.dcs.gla.ac.uk
Department of Computing Science, University of Glasgow
Molecular genetics
•
•
•
•
•
•
Cells
Macromolecules (DNA, RNA & proteins)
Replication
Genes
Expression (transcription & translation)
Function
Resources:
• http://www.ornl.gov/hgmis/education/students.html
US government resources
• http://www.ornl.gov/hgmis/publicat/primer/prim1.html
Primer section on molecular genetics
(c) David Gilbert 2008
Molecular Genetics
2
DNA
Genes to systems
"gene"
mRNA
Protein
sequence
Folded
Protein
(c) David Gilbert 2008
Molecular Genetics
3
The three kingdoms of organisms
from http://whyfiles.org/022critters/archaea.html
(c) David Gilbert 2008
Molecular Genetics
4
Divisions of life
• Prokaryotes: organisms without a cell nucleus (= karyon), or any other
membrane-bound organelles
– Archaebacteria The kingdom (or "domain") of single-celled organisms that
live under extreme environmental conditions and have distinctive biochemical
features.
– Bacteria (s: bacterium) A single-celled organism. Found throughout nature
and can be beneficial or pathogenic.
• Eukaryotes (also "eucaryotes"): all organisms with complex cells which have a
nucleus (in which the genetic material is organized ) and organelles.
– animals, plants, & fungi, (mostly multicellular), plus protists, (many are
unicellular)
NB: Viruses - not free living. Microscopic parasite that infects cells in biological organisms.
Can reproduce only by invading and controlling other cells as they lack the cellular
machinery for self-reproduction.
(c) David Gilbert 2008
Molecular Genetics
5
Unicellular vs multicellular
• Unicellular
– can be eukaryotes, e.g. Yeast
(c) David Gilbert 2008
Molecular Genetics
6
Animal and Plant cells
(c) David Gilbert 2008
Molecular Genetics
7
The cell
from http://www.accessexcellence.org/AB/GG/cell.html
(c) David Gilbert 2008
Molecular Genetics
8
The genome
•
Genome: the whole hereditary information of an organism that is encoded in the DNA (or, for
some viruses, RNA). Includes both the genes and the non-coding sequences.
•
Term was first coined in 1920 by Hans Winkler, Prof of Botany, Uni Hamburg, Germany.
•
More precisely, the genome of an organism is a complete DNA sequence of one set of
chromosomes; for example, one of the two sets that a diploid individual carries in every somatic
cell.
The term genome can be applied specifically to mean the complete set of nuclear DNA (i.e., the
nuclear genome) but can also be applied to organelles that contain their own DNA, as with the
mitochondrial genome or the chloroplast genome.
Sequence of a genome of a sexually reproducing species: one set of autosomes & one of each
type of sex chromosome.
“Genome sequence" may be a composite from the chromosomes of various individuals.
The study of the global properties of genomes of related organisms is usually referred to as
genomics, which distinguishes it from genetics which generally studies the properties of single
genes or groups of genes.
Every cell (except sex cells & mature red blood cells) contains the complete genome of an
organism
•
•
•
•
•
[wikipaedia]
(c) David Gilbert 2008
Molecular Genetics
9
Chromosomes
• DNA packaged into individual chromosomes (plus proteins)
• prokaryotes (single-cell organisms, no nucleus) - single
circular chromosome
• eukaryote (organisms with nuclei) - species specific number
and size of chromosomes
• Viruses: anything goes - circular (single or double
stranded), linear, DNA, RNA
(c) David Gilbert 2008
Molecular Genetics
10
Chromosomes
• DNA packaged in one or more large macromolecules called chromosomes.
• Chromosome (Greek chroma = color and soma = body) : a very long,
continuous piece of DNA, which contains many genes, regulatory elements
and other intervening nucleotide sequences.
• In the chromosomes of eukaryotes, the DNA exists in a quasi-ordered
structure inside the nucleus, where it wraps around histones (structural
proteins), and where this composite material is called chromatin.
• Prokaryotes do not possess histones or nuclei.
• In its relaxed state, the DNA can be accessed for transcription, regulation, and
replication.
• Chromosomes were first observed by Karl Wilhelm von Nägeli in 1842 and
their behavior later described in detail by Walther Flemming in 1882. In 1910,
Thomas Hunt Morgan proved that chromosomes are the carriers of genes.
[wikipaedia]
(c) David Gilbert 2008
Molecular Genetics
11
DNA &
Chromosomes
www.ogm-info.com/adn.html
(c) David Gilbert 2008
Molecular Genetics
12
Some model genetic organisms
Name
Genome BP
Genes
Chromosomes
HSV1 (Herpes virus)
1.5x105
70
1
Escherichia Coli
4.6x106
4,300
1
Saccharomyces
cerevisiae
1.2x107
5,900
16
Caenorhabditis
Elegans
Drosophila
melanogaster
Arabidopsis
Thalania
Mus Musculus
1.0x108
19,100
6
1.8x108
13,600
6
1.2x108
25,500
5
2.5x109
?30,000
20+X/Y
Homo sapiens
2.9x109
?30,000
22+X/Y
(c) David Gilbert 2008
Molecular Genetics
13
Image from Human Genome Project http://www.ornl.gov/hgmis
DNA, the molecule of life
22,000
(c) David Gilbert 2008
Molecular Genetics
14
DNA
Genes to proteins
"gene"
mRNA
Protein
sequence
Folded
Protein
(c) David Gilbert 2008
Molecular Genetics
15
Molecular building blocks
DNA = deoxyribonucleic acid, Σ={A,C,G,T}
Double-stranded sequences of bases, Pairings A-T, C-G.
Transcription:
DNA
↓
RNA
•
•
C-A-T-G-T-C-C-A
T-G-G-A-C-A-T-G
•
C-A-U-G-U-C-C-A
RNA = ribonucleic acid, Σ={A,C,G,U}
Sequence of bases. Pairings = A-U, C-G, G-U, ….
Context Free / Context Sensitive Grammar
Translation
RNA
↓
Amino-acids
Proteins, |Σ | =?
E-R-L-N-T-A-S-I-P
Sequence (chain) of amino-acids
(c) David Gilbert 2008
Molecular Genetics
16
DNA & RNA
base
DNA = deoxyribonucleic acid
RNA = ribonucleic acid
P
Biological macromolecules built as long linear chains of chemical
components - nucleotides = sugar+phosphate+base
P
Bases: adenine (A), cytosine (C), guanine (G), thymine (T), uracil (U)
4 letter alphabet:
DNA = A C G T (adenine cytosine
guanine
thymine)
RNA = A C G U (adenine
guanine
uracil)
(c) David Gilbert 2008
cytosine
Molecular Genetics
17
Nucleotides
G
Guanine
base
P
O
HO
P
O
Adenine
A
Cytosine
C
Thymine
T
_
OH
phosphate
deoxyribose
(sugar)
bases
(c) David Gilbert 2008
Molecular Genetics
18
DNA/RNA chains
Biological macromolecule built as long linear chain of chemical
components = nucleotides connected by phosphodiester bonds
P
5’
A
C
P
C
P
T
P
G
P
3’
(c) David Gilbert 2008
Molecular Genetics
19
Sequence in FASTA format
>1HBB:D HEMOGLOBIN A - CHAIN D
1
atggtgcacc tgactcctga ggagaagtct
51 caaggtgaac gtggatgaag ttggtggtga
101 tggtctaccc ttggacccag aggttctttg
151 actcctgatg ctgttatggg caaccctaag
201 agtgctcggt gcctttagtg atggcctggc
251 gcacctttgc cacactgagt gagctgcact
301 cctgagaact tcaggctcct gggcaacgtg
351 tcactttggc aaagaattca ccccaccagt
401 tggtggctgg tgtggctaat gccctggccc
(c) David Gilbert 2008
gccgttactg
ggccctgggc
agtcctttgg
gtgaaggctc
tcacctggac
gtgacaagct
ctggtctgtg
gcaggctgcc
acaagtatca
Molecular Genetics
ccctgtgggg
aggctgctgg
ggatctgtcc
atggcaagaa
aacctcaagg
gcacgtggat
tgctggccca
tatcagaaag
ctaa
20
Base-ambiguity
symbols
(for information)
(c) David Gilbert 2008
IUB symbol
Represented bases
A
A
C
C
G
G
T/U
T
M
A, C
R
A, G
W
A, T
S
C, G
Y
C, T
K
G, T
V
A, C, G
H
A, C, T
D
A, G, T
B
C, G, T
X/N
A, C, G, T
Molecular Genetics
21
DNA
complementarity
(base-pairing)
A-T
C-G
(c) David Gilbert 2008
Molecular Genetics
22
5’
(c) David Gilbert 2008
Molecular Genetics
5’ T-G-G-A-C-A-T-G 3’
C-A-T-G-T-C-C-A 3’
Complentary base-pairing &
double-helix
119D.pdb
23
AAAAGAAAAGGTTAGAAAGATGAGAGATGATAAAGGGTCCATTTGAGGTTAGGTAATATGGTTTGGTATCCCTGTAGTTAAAAGTTTTTGT
CTTATTTTAGAATACTGTGACTATTTCTTTAGTATTAATTTTTCCTTCTGTTTTCCTCATCTAGGGAACCCCAAGAGCATCCAATAGAAGC
TGTGCAATTATGTAAAATTTTCAACTGTCTTCCTCAAAATAAAGAAGTATGGTAATCTTTACCTGTATACAGTGCAGAGCCTTCTCAGAAG
CACAGAATATTTTTATATTTCCTTTATGTGAATTTTTAAGCTGCAAATCTGATGGCCTTAATTTCCTTTTTGACACTGAAAGTTTTGTAAA
AGAAATCATGTCCATACACTTTGTTGCAAGATGTGAATTATTGACACTGAACTTAATAACTGTGTACTGTTCGGAAGGGGTTCCTCAAATT
TTTTGACTTTTTTTGTATGTGTGTTTTTTCTTTTTTTTTAAGTTCTTATGAGGAGGGAGGGTAAATAAACCACTGTGCGTCTTGGTGTAAT
TTGAAGATTGCCCCATCTAGACTAGCAATCTCTTCATTATTCTCTGCTATATATAAAACGGTGCTGTGAGGGAGGGGAAAAGCATTTTTCA
ATATATTGAACTTTTGTACTGAATTTTTTTGTAATAAGCAATCAAGGTTATAATTTTTTTTAAAATAGAAATTTTGTAAGAAGGCAATATT
AACCTAATCACCATGTAAGCACTCTGGATGATGGATTCCACAAAACTTGGTTTTATGGTTACTTCTTCTCTTAGATTCTTAATTCATGAGG
AGGGTGGGGGAGGGAGGTGGAGGGAGGGAAGGGTTTCTCTATTAAAATGCATTCGTTGTGTTTTTTAAGATAGTGTAACTTGCTAAATTTC
TTATGTGACATTAACAAATAAAAAAGCTCTTTTAATATTAGATAA
DNA
(c) David Gilbert 2008
Molecular Genetics
24
What happens to DNA?
DNA
replication
(2 copies of the DNA)
DNA
transcription
RNA
DNA
DNA
translation
(Eventually the cell divides
into 2 cells - mitosis)
DNA
(c) David Gilbert 2008
Protein
synthesis
Protein
= double-stranded DNA
Molecular Genetics
25
Image from Human Genome Project http://www.ornl.gov/hgmis
DNA replication prior to cell division
[anim 12.1 & 2]
(c) David Gilbert 2008
Molecular Genetics
26
DNA Replication
(Image from D.Leader)
(c) David Gilbert 2008
Molecular Genetics
27
The gene
• Basic unit of heredity
• Sequence of bases carrying the information
required to construct a particular protein
• A gene encodes a protein or an RNA
• Estimated number of genes:
–
–
–
–
humans & mice:
100,000
28,000-35,000
20,000-22,000
!!!
C. elegans (worm):
19,000
S. cerevisiae (yeast):
6,000
Tuberculosis microbe:
4,000
(c) David Gilbert 2008
Molecular Genetics
28
Gene
•
•
•
•
•
•
•
•
Genes are the units of heredity in living organisms.
Encoded in the organism's genetic material (usually DNA or RNA), & control the
development and behavior of the organism.
During reproduction, the genetic material is passed on from the parent(s) to the offspring.
Genetic material can also be passed between un-related individuals (e.g. via transfection, or
on viruses). (!)
Genes encode the information necessary to construct the chemicals (proteins etc.) needed for
the organism to function.
The word "gene" (coined 1909 by Danish botanist Wilhelm Johannsen) comes from the
Greek genos ("origin") and is shared by many disciplines, including classical genetics,
molecular genetics, evolutionary biology and population genetics. Because each discipline
models the biology of life differently, the usage of the word gene varies between disciplines.
It may refer to either material or conceptual entities.
Molecular biology: the segments of DNA which cells transcribe into RNA and translate, at
least in part, into proteins.
The Sequence Ontology project defines a gene as: "A locatable region of genomic sequence,
corresponding to a unit of inheritance, which is associated with regulatory regions,
transcribed regions and/or other functional sequence regions".
(c) David Gilbert 2008
Molecular Genetics
29
Nature vs Nurture
•
•
•
•
•
•
Common speech: "gene" often used to refer to the hereditary cause of a trait, disease
or condition—as in "the gene for obesity."
Biologists might refer to an allele or a mutation that has been implicated in or is
associated with obesity. (many factors other than genes decide whether a person is
obese or not: eating habits, exercise, prenatal environment, upbringing, culture and the
availability of food, for example.)
Allele: one of a number of viable DNA codings of the same gene occupying a given
locus (position) on a chromosome. In an organism which has two copies of each of its
chromosomes (diploid organism), 2 alleles make up the individual's genotype.
Very unlikely that variations within a single gene—or single genetic locus—fully
determine one's genetic predisposition for obesity.
Interplay between genes and environment, the influence of many genes—appear to be
the norm with regard to many and perhaps most ("complex" or "multi-factoral") traits.
Phenotype: the characteristics that result from this interplay.
(c) David Gilbert 2008
Molecular Genetics
30
Central Dogma
•
The central dogma of information flow in biology essentially states that the sequence of amino
acids making up a protein and hence its structure (folded state) and thus its function, is
determined by transcription from DNA via RNA.
•
“This states that once ‘information’ has passed into protein it cannot get out again. In
more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic
acid to protein may be possible, but transfer from protein to protein, or from protein to
nucleic acid is impossible. Information means here the precise determination of
sequence, either of bases in the nucleic acid or of amino acid residues in the protein.”
Francis Crick, On Protein Synthesis, in Symp. Soc. Exp. Biol. XII, 138-167 (1958)
•
(Nothing said explicitly about transfer from RNA to DNA)
(c) David Gilbert 2008
Molecular Genetics
31
DNA (gene) → RNA → Protein
control
statement
Termination
(stop)
TATA box
start
control
statement
gene
Ribosome
binding
5’ utr
Transcription (RNA polymerase)
3’ utr
mRNA
Translation (tRNA on the Ribosome)
Protein
(c) David Gilbert 2008
Molecular Genetics
32
Mutations
• alterations of DNA sequence
• can lead to loss or gain of function
• source of developmental problem and disease
• motor of evolution
(c) David Gilbert 2008
Molecular Genetics
33
RNA - ribonucleic acid
structure similar to DNA but
single stranded (except local base pairing)
sugar is ribose
thymine (T) is replaced by uracil (U)
various roles: mRNA, rRNA, tRNA, etc! (newly discovered)
(information storage, catalysis, transfer, regulation, …)
(c) David Gilbert 2008
Molecular Genetics
34
Secondary structure of RNA
base-pairing
A-U
C-G
G-U
G-A
G-G
A-A
(c) David Gilbert 2008
tRNA
Molecular Genetics
35
Transcription
process of copying DNA to RNA
• operated by RNA polymerase
• starts at transcription start; ends at stop signal
5’
DNA
T
T
C
A
A
G
3’
5’
3’
(c) David Gilbert 2008
3’
5’
DNA
RNA
U
U
C
A
A
G
DNA
Molecular Genetics
3’
5’
36
Transcription mechanism
RNA
polymerase
1
start
DNA
promoter
RNA
nucleotides
2
DNA
stop
RNA
3
DNA
stop
[anim 4.3]
(c) David Gilbert 2008
Molecular Genetics
37
Transcription of DNA into RNA
(Image from D.Leader)
(c) David Gilbert 2008
Molecular Genetics
38
Open Reading Framme
• Open Reading Frame (ORF): any sequence of DNA
or RNA that can be translated into a protein.
• In a gene ORFs are
– located between the start-code sequence (initiation codon)
and the stop-code sequence (termination codon).
– part of the sequence that will be translated by the
ribosomes
– long and continue over gaps, or introns.
• However, short ORFs can also occur by chance
outside of genes. Usually ORFs outside genes are not
very long and terminate after a few codons.
(c) David Gilbert 2008
Molecular Genetics
39
Promoters are important for gene expression
(c) David Gilbert 2008
Molecular Genetics
40
Promotor, transcription factor, enhancer,…
enhancer
enhancer-binding protein
DNA
TF
TF
promotor
(c) David Gilbert 2008
Molecular Genetics
exon
41
Transcription factors, promotors, enhancers, silencers
•
•
•
•
•
•
Transcription factor: a protein that binds DNA at a specific promoter or enhancer region or site,
where it regulates transcription. Transcription factors can be selectively activated or
deactivated by other proteins
Promoter: a DNA sequence that enables a gene to be transcribed. The promoter is recognized
by RNA polymerase, which then initiates transcription.
Promoters represent critical elements that can work in concert with other regulatory regions
(enhancers, silencers, boundary elements/insulators) to direct the level of transcription of a
given gene.
Enhancer: a short region of DNA that can be bound with proteins (namely, the trans-acting
factors, much like a set of transcription factors) to enhance transcription levels of genes (hence
the name) in a gene-cluster. An enhancer does not need to be particularly close to the genes it
acts on, but it is on the same chromosome.
An enhancer does not need to bind close to the transcription initiation site to affect its
transcription, as some have been found to bind several hundred thousand base pairs upstream or
downstream of the start site. Enhancers can also be found within introns. An enhancer's
orientation may even be reversed without affecting its function. Furthermore, an enhancer may
be excised and inserted elsewhere in the chromosome, and still affect gene transcription.
Silencer: a DNA sequence capable of binding transcription regulation factors termed
repressors. Upon binding, RNA polymerase is prevented from initiating transcription thus
decreasing or fully suppressing RNA synthesis
(c) David Gilbert 2008
Molecular Genetics
42
Promoter sites
(Image from D.Leader)
(c) David Gilbert 2008
Molecular Genetics
43
Transcription switches
(Image from D.Leader)
(c) David Gilbert 2008
Molecular Genetics
44
RNA splicing
gene
exon1
intron1
exon2
intron2
exon3
intron3
intron1
exon2
intron2
exon3
intron4
exon5
transcription
initial transcript (pre-mRNA)
exon1
exon4
intron3
exon4
intron4
exon5
splicing
spliced mRNA
exon1
exon2
exon3
exon4
exon5
Equivalent to cDNA
(coding DNA without introns)
(c) David Gilbert 2008
Molecular Genetics
45
Alternative splicing
gene
exon1
intron1
exon2
intron2
exon3
intron3
exon4
intron4
transcription
initial transcript (pre-mRNA)
exon1
intron1
exon2
intron2
exon2
protein1
(c) David Gilbert 2008
intron3
exon4
intron4
alternative splicing
mRNA1
exon1
exon3
exon3
exon4
exon1
exon2
Altered structure and function
Molecular Genetics
exon5
exon5
mRNA2
exon3
exon4
exon5
protein2
46
Alternative splicing
http://sequence-www.stanford.edu/group/research/arabidopsis/splicing_big.html
(c) David Gilbert 2008
Molecular Genetics
47
Pseudogene
•
A pseudogene is a nucleotide sequences that is similar to a normal gene, but is not expressed as a
functional protein.
•
Several scenarios have been proposed under which a pseudogene might arise:
1.
A gene duplication event may mean that a genome has two copies of a gene when it only requires one.
Deactivating mutations in one copy of the gene would then not be selected against. In addition, the
duplication event may not have been complete, so they might have incomplete promoters. These
pseudogenes are called duplicated.
2.
Fragments of the mRNA transcript of a gene may be spontaneously reverse transcribed and inserted into
chromosomal DNA (called retrotransposition). These pseudogenes are called processed. Since these
pseudogenes lack the promoters of normal genes, they are never expressed.
3.
A gene may "die" during evolution if the selective pressure for its function is removed. For example, the
environment in which a species lives could change such that the gene product is no longer necessary, or
even harmful. Deactivating mutations in the gene would then no longer be selected against.
(c) David Gilbert 2008
Molecular Genetics
48
Proteins
• macromolecules composed of one or more
polypeptides
• polypeptide = polymer (“many-units”) of amino
acids
• 20 different amino acids can be used to build a
polypeptide
• Thus ‘polypeptide’ is a ‘string composed from a
20-character alphabet’
(c) David Gilbert 2008
Molecular Genetics
49
Amino acids
(c) David Gilbert 2008
Molecular Genetics
50
Amino acid
codes
(c) David Gilbert 2008
One-letter code
Three-letter-code
Name
1
A
Ala
Alanine
2
C
Cys
Cysteine
3
D
Asp
Aspartic Acid
4
E
Glu
Glutamic Acid
5
F
Phe
Phenylalanine
6
G
Gly
Glycine
7
H
His
Histidine
8
I
Ile
Isoleucine
9
K
Lys
Lysine
10
L
Leu
Leucine
11
M
Met
Methionine
12
N
Asn
Asparagine
13
P
Pro
Proline
14
Q
Gln
Glutamine
15
R
Arg
Arginine
16
S
Ser
Serine
17
T
Thr
Threonine
18
V
Val
Valine
19
W
Trp
Tryptophan
20
Y
Tyr
Tyrosine
Molecular Genetics
51
Translation
synthesis of protein from mRNA
• operated by ribosomes
• read RNA by groups of 3 nucleotides = codons
• reading frame = groupings of codons
• starts with start codon; ends with stop codon
5’
N
(c) David Gilbert 2008
U
U
C
Phe (F)
Molecular Genetics
3’
C
RNA
Polypeptide
(amino-acid chain)
52
How to code amino-acids using nucleotides?
• 4-letter alphabet of nucleotides (DNA, RNA)
• 20-letter alphabet of amino-acids used in
proteins
• What is the connection mathematically?
• What are the implications of this?
(c) David Gilbert 2008
Molecular Genetics
53
The genetic code
First
Position
(5’ end)
T
C
A
G
T
TTT
TTC
TTA
TTG
CTT
CTC
CTA
CTG
ATT
ATC
ATA
ATG
GTT
GTC
GTA
GTG
(c) David Gilbert 2008
Phe
Phe
Leu
Leu
Leu
Leu
Leu
Leu
Ile
Ile
Ile
Met*
Val
Val
Val
Val
C
TCT
TCC
TCA
TCG
CCT
CCC
CCA
CCG
ACT
ACC
ACA
ACG
GCT
GCC
GCA
GCG
Second
Position
A
Ser
TAT
Ser
TAC
Ser
TAA
Ser
TAG
Pro
CAT
Pro
CAC
Pro
CAA
Pro
CAG
Thr
AAT
Thr
AAC
Thr
AAA
Thr
AAG
Ala
GAT
Ala
GAC
Ala
GAA
Ala
GAG
Molecular Genetics
Tyr
Tyr
Stop
Stop
His
His
Gln
Gln
Asn
Asn
Lys
Lys
Asp
Asp
Glu
Glu
G
TGT
TGC
TGA
TGG
CGT
CGC
CGA
CGG
AGT
AGC
AGA
AGG
GGT
GGC
GGA
GGG
Cys
Cys
Stop
Trp
Arg
Arg
Arg
Arg
Ser
Ser
Arg
Arg
Gly
Gly
Gly
Gly
Third
Position
(3’ end)
T
C
A
G
T
C
A
G
T
C
A
G
T
C
A
G
54
Transfer RNA
Ser
acceptor stem
acceptor stem
1EVV.pdb
anti-codon
U C A
A G U
anti-codon
codon
5’
(c) David Gilbert 2008
3’
Molecular Genetics
55
Translation
synthesis of protein from mRNA
• operated by ribosomes
• read RNA by groups of 3 nucleotides = codons
• reading frame = groupings of codons
• starts with start codon; ends with stop codon
5’
N
(c) David Gilbert 2008
U
U
C
Phe (F)
Molecular Genetics
3’
C
RNA
Polypeptide
(amino-acid chain)
56
Translation mechanism
1
ribosome
binding site
RNA
start codon
stop codon
ORF
ribosome
2
start codon
RNA
ribosome
ORF
stop codon
amino acids
protein
ribosome
3
RNA
start codon
ORF
protein
[anim 4.5]
(c) David Gilbert 2008
stop codon
Molecular Genetics
57
Non-random use of synonymous codons
Bacterial (highly expressed) genes
A.Acid
Codon
Number
/1000
Fraction
Gly
GGG
13
1.89
0.02
Gly
GGA
3
0.44
0.00
Gly
GGU
365
52.99
0.59
Gly
GGC
238
34.55
0.38
Glu
GAG
108
15.68
0.22
Glu
GAA
394
57.20
0.78
Asp
GAU
149
21.63
0.33
Asp
GAC
298
43.26
0.67
Val
GUC
93
13.50
0.16
Val
GUA
146
21.20
0.26
Val
GUU
289
41.96
0.51
Val
GUC
38
5.52
0.07
(c) David Gilbert 2008
Molecular Genetics
58
NCBI
Nucleotide banner
My NCBI
[Sign In] [Register]
PubMed
Nucleotide Protein
Genome
Structure
Search for
Limits
Preview/Index
History
Display
Show
Range: from
to
Reverse complemented strand
1: J02799. Reports E.coli icd gene e...[gi:146431]
PMC
Taxonomy
Clipboard
Details
Features:
Links
SNP
CDD
OMIM
MGC
HPRD
Books
STS
tRNA
LOCUS
DEFINITION
ACCESSION
VERSION
KEYWORDS
SOURCE
ORGANISM
ECOICD
1568 bp
DNA
linear
BCT 26-APR-1993
E.coli icd gene encoding isocitrate dehydrogenase, complete cds.
J02799
J02799.1 GI:146431
icd gene; isocitrate dehydrogenase.
Escherichia coli
Escherichia coli
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
Enterobacteriaceae; Escherichia.
REFERENCE
1 (bases 1 to 1568)
AUTHORS
Thorsness,P.E. and Koshland,D.E.Jr.
JOURNAL
Unpublished (1987)
REFERENCE
2 (bases 291 to 1538)
AUTHORS
Thorsness,P.E. and Koshland,D.E. Jr.
TITLE
Inactivation of isocitrate dehydrogenase by phosphorylation is
mediated by the negative charge of the phosphate
JOURNAL
J. Biol. Chem. 262 (22), 10422-10425 (1987)
PUBMED
3112144
COMMENT
Original source text: E.coli DNA, clone pTK512.
Draft entry and printed copy of sequence [2],[1] kindly provided by
P.E.Thorsness, 01-JUL-1987.
FEATURES
Location/Qualifiers
source
1..1568
/organism="Escherichia coli"
/mol_type="genomic DNA"
/db_xref="taxon:562"
CDS
291..1541
/note="isocitrate dehydrogenase (icd; EC 1.1.1.42)"
/codon_start=1
/transl_table=11
/protein_id="AAA24006.1"
/db_xref="GI:146432"
(c) David Gilbert 2008
Molecular Genetics
59
/translation="MESKVVVPAQGKKITLQNGKLNVPENPIIPYIEGDGIGVDVTPA
MLKVVDAAVEKAYKGERKISWMEIYTGEKSTQVYGQDVWLPAETLDLIREYRVAIKGP
LTTPVGGGIRSLNVALRQELDLYICLRPVRYYQGTPSPVKHPELTDMVIFRENSEDIY
AGIEWKADSADAEKVIKFLREEMGVKKIRFPEHCGIGIKPCSEEGTKRLVRAAIEYAI
ANDRDSVTLVHKGNIMKFTEGAFKDWGYQLAREEFGGELIDGGPWLKVKNPNTGKEIV
IKDVIADAFLQQILLRPAEYDVIACMNLNGDYISDALAAQVGGIGIAPGANIGDECAL
FEATHGTAPKYAGQDKVNPGSIILSAEMMLRHMGWTEAADLIVKGMEGAINAKTVTYD
FERLMDGAKLLKCSEFGDAIIENM"
ORIGIN
MluI site; 25.3 min on K12 map.
1 cgcgtggcgt ggttttcagg tttacgcctg gtagaacgtt gcgagctgaa tcgcttaacc
61 tggtgatttc taaaagaagt tttttgcatg gtattttcag agattatgaa ttgccgcatt
121 atagcctaat aacgcgcatc tttcatgacg gcaaacaata gggtagtatt gacaagccaa
181 ttacaaatca ttaacaaaaa attgctctaa agcatccgta tcgcaggacg caaacgcata
241 tgcaacgtgg tggcagacga gcaaaccagt agcgctcgaa ggagaggtga atggaaagta
301 aagtagttgt tccggcacaa ggcaagaaga tcaccctgca aaacggcaaa ctcaacgttc
361 ctgaaaatcc gattatccct tacattgaag gtgatggaat cggtgtagat gtaaccccag
421 ccatgctgaa agtggtcgac gctgcagtcg agaaagccta taaaggcgag cgtaaaatct
481 cctggatgga aatttacacc ggtgaaaaat ccacacaggt ttatggtcag gacgtctggc
541 tgcctgctga aactcttgat ctgattcgtg aatatcgcgt tgccattaaa ggtccgctga
601 ccactccggt tggtggcggt attcgctctc tgaacgttgc cctgcgccag gaactggatc
661 tctacatctg cctgcgtccg gtacgttact atcagggcac tccaagcccg gttaaacacc
721 ctgaactgac cgatatggtt atcttccgtg aaaactcgga agacatttat gcgggtatcg
781 aatggaaagc agactctgcc gacgccgaga aagtgattaa attcctgcgt gaagagatgg
841 gggtgaagaa aattcgcttc ccggaacatt gtggtatcgg tattaagccg tgttcggaag
901 aaggcaccaa acgtctggtt cgtgcagcga tcgaatacgc aattgctaac gatcgtgact
961 ctgtgactct ggtgcacaaa ggcaacatca tgaagttcac cgaaggagcg tttaaagact
1021 ggggctacca gctggcgcgt gaagagtttg gcggtgaact gatcgacggt ggcccgtggc
1081 tgaaagttaa aaacccgaac actggcaaag agatcgtcat taaagacgtg attgctgatg
1141 cattcctgca acagatcctg ctgcgtccgg ctgaatatga tgttatcgcc tgtatgaacc
1201 tgaacggtga ctacatttct gacgccctgg cagcgcaggt tggcggtatc ggtatcgccc
1261 ctggtgcaaa catcggtgac gaatgcgccc tgtttgaagc cacccacggt actgcgccga
1321 aatatgccgg tcaggacaaa gtaaatcctg gctctattat tctctccgct gagatgatgc
1381 tgcgccacat gggttggacc gaagcggctg acttaattgt taaaggtatg gaaggcgcaa
1441 tcaacgcgaa aaccgtaacc tatgacttcg agcgtctgat ggatggcgct aaactgctga
1501 aatgttcaga gtttggtgac gcgatcatcg aaaacatgta atgccgtagt ttgttaaatt
1561 tattaacg
//
(c) David Gilbert 2008
Molecular Genetics
60
HEADER
COMPND
COMPND
SOURCE
AUTHOR
AUTHOR
REVDAT
REVDAT
JRNL
JRNL
JRNL
JRNL
JRNL
JRNL
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
…
REMARK
REMARK
REMARK
…
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
(c)
OXIDOREDUCTASE (NAD(A)-CHOH(D))
30-MAY-90
5ICD
ISOCITRATE DEHYDROGENASE (E.C.1.1.1.42) COMPLEX WITH
2 MG2+ AND ISOCITRATE
(ESCHERICHIA $COLI)
J.H.HURLEY,A.M.DEAN,J.L.SOHL,D.E.KOSHLAND *JUNIOR,
2 R.M.STROUD
2
15-JUL-93 5ICDA
1
SHEET
1
15-OCT-91 5ICD
0
AUTH
J.H.HURLEY,A.M.DEAN,J.L.SOHL,D.E.KOSHLAND *JUNIOR,
AUTH 2 R.M.STROUD
TITL
REGULATION OF AN ENZYME BY PHOSPHORYLATION AT THE
TITL 2 ACTIVE SITE
REF
SCIENCE
V. 249 1012 1990
REFN
ASTM SCIEAS US ISSN 0036-8075
038
1
1 REFERENCE 1
1 AUTH
J.H.HURLEY,A.M.DEAN,D.E.KOSHLAND *JUNIOR,R.M.STROUD
1 TITL
CATALYTIC MECHANISM OF /NADP+$-*DEPENDENT
1 TITL 2 ISOCITRATE DEHYDROGENASE: IMPLICATIONS FROM THE
1 TITL 3 STRUCTURES OF MAGNESIUM-*ISOCITRATE AND /NADP+$
1 TITL 4 COMPLEXES
1 REF
BIOCHEMISTRY
V. 30 8671 1991
1 REFN
ASTM BICHAW US ISSN 0006-2960
033
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICDA
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
2
3
4
5
6
7
1
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
2 RESOLUTION. 2.5 ANGSTROMS.
3
5ICD
5ICD
5ICD
46
47
48
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICDA
54
55
56
57
58
59
60
61
62
63
64
65
66
2
4 THE ASYMMETRIC UNIT IS A MONOMER. THE FUNCTIONAL DIMER CAN
4 BE GENERATED BY APPLYING THE SYMMETRY OPERATOR (Y, X, -Z)
4 TO THE COORDINATES PRESENTED IN THIS ENTRY.
5
5 RESIDUE 192 WAS INCORRECTLY REFINED AS ASP. IT HAS BEEN
5 CHANGED TO GLU AND THE SIDE CHAIN ATOMS BEYOND CB HAVE BEEN
5 DELETED.
6
6 RESIDUES MET 1 AND GLU 2 ARE MISSING FROM THE STRUCTURE.
6 SOME OF THE SIDE CHAIN ATOMS OF RESIDUES GLN 10, GLN 17,
6 ASN 18, LYS 20, ASP 81, GLU 182, LYS 186, GLU 204, ASP 259,
6 LYS 273, GLU 331, AND LYS 344 COULD NOT BE LOCATED IN THE
6 DENSITY MAPS.
7
David
Gilbert 2008
Molecular Genetics
61
REMARK
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
7 CORRECTION. CORRECT FORMAT OF SHEET RECORDS. 15-JUL-93.
1
416 MET GLU SER LYS VAL VAL VAL PRO ALA GLN GLY LYS LYS
2
416 ILE THR LEU GLN ASN GLY LYS LEU ASN VAL PRO GLU ASN
3
416 PRO ILE ILE PRO TYR ILE GLU GLY ASP GLY ILE GLY VAL
4
416 ASP VAL THR PRO ALA MET LEU LYS VAL VAL ASP ALA ALA
5
416 VAL GLU LYS ALA TYR LYS GLY GLU ARG LYS ILE SER TRP
6
416 MET GLU ILE TYR THR GLY GLU LYS SER THR GLN VAL TYR
7
416 GLY GLN ASP VAL TRP LEU PRO ALA GLU THR LEU ASP LEU
8
416 ILE ARG GLU TYR ARG VAL ALA ILE LYS GLY PRO LEU THR
9
416 THR PRO VAL GLY GLY GLY ILE ARG SER LEU ASN VAL ALA
10
416 LEU ARG GLN GLU LEU ASP LEU TYR ILE CYS LEU ARG PRO
11
416 VAL ARG TYR TYR GLN GLY THR PRO SER PRO VAL LYS HIS
12
416 PRO GLU LEU THR ASP MET VAL ILE PHE ARG GLU ASN SER
13
416 GLU ASP ILE TYR ALA GLY ILE GLU TRP LYS ALA ASP SER
14
416 ALA ASP ALA GLU LYS VAL ILE LYS PHE LEU ARG GLU GLU
15
416 MET GLY VAL LYS LYS ILE ARG PHE PRO GLU HIS CYS GLY
16
416 ILE GLY ILE LYS PRO CYS SER GLU GLU GLY THR LYS ARG
17
416 LEU VAL ARG ALA ALA ILE GLU TYR ALA ILE ALA ASN ASP
18
416 ARG ASP SER VAL THR LEU VAL HIS LYS GLY ASN ILE MET
19
416 LYS PHE THR GLU GLY ALA PHE LYS ASP TRP GLY TYR GLN
20
416 LEU ALA ARG GLU GLU PHE GLY GLY GLU LEU ILE ASP GLY
21
416 GLY PRO TRP LEU LYS VAL LYS ASN PRO ASN THR GLY LYS
22
416 GLU ILE VAL ILE LYS ASP VAL ILE ALA ASP ALA PHE LEU
23
416 GLN GLN ILE LEU LEU ARG PRO ALA GLU TYR ASP VAL ILE
24
416 ALA CYS MET ASN LEU ASN GLY ASP TYR ILE SER ASP ALA
25
416 LEU ALA ALA GLN VAL GLY GLY ILE GLY ILE ALA PRO GLY
26
416 ALA ASN ILE GLY ASP GLU CYS ALA LEU PHE GLU ALA THR
27
416 HIS GLY THR ALA PRO LYS TYR ALA GLY GLN ASP LYS VAL
28
416 ASN PRO GLY SER ILE ILE LEU SER ALA GLU MET MET LEU
29
416 ARG HIS MET GLY TRP THR GLU ALA ALA ASP LEU ILE VAL
30
416 LYS GLY MET GLU GLY ALA ILE ASN ALA LYS THR VAL THR
31
416 TYR ASP PHE GLU ARG LEU MET ASP GLY ALA LYS LEU LEU
32
416 LYS CYS SER GLU PHE GLY ASP ALA ILE ILE GLU ASN MET
(c) David Gilbert 2008
Molecular Genetics
5ICDA
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
3
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
62
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
(c) David Gilbert 2008
N
CA
C
O
CB
OG
N
CA
C
O
CB
CG
CD
CE
NZ
N
CA
C
O
CB
CG1
CG2
N
CA
C
O
CB
CG1
CG2
N
CA
C
O
SER
SER
SER
SER
SER
SER
LYS
LYS
LYS
LYS
LYS
LYS
LYS
LYS
LYS
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
VAL
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
5
5
5
5
5
5
5
6
6
6
6
6
6
6
7
7
7
7
94.998
95.429
96.454
97.036
94.224
94.552
96.650
97.856
98.520
99.620
97.552
98.668
98.376
97.200
95.914
97.860
98.445
99.368
99.166
97.295
96.274
96.551
100.485
101.528
101.411
101.934
102.838
104.091
102.797
100.655
100.456
101.670
102.130
42.191
40.983
41.317
40.419
40.251
39.002
42.596
42.962
44.122
44.548
43.353
42.597
42.058
41.055
41.720
44.623
45.659
44.978
43.839
46.513
46.818
45.804
45.651
45.162
46.084
47.209
45.299
45.158
44.166
45.655
46.549
46.602
45.550
11.357
10.663
9.581
8.970
10.042
9.430
9.245
8.519
9.242
8.890
7.088
6.343
4.935
4.851
5.009
10.300
11.131
12.166
12.611
11.784
10.703
12.868
12.444
13.338
14.514
14.448
12.561
13.391
11.535
15.547
16.700
17.625
18.072
Molecular Genetics
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
62.19
64.90
67.08
70.87
64.93
62.85
65.06
60.85
53.37
53.84
65.57
72.00
76.77
81.18
81.33
46.52
46.32
45.91
46.98
44.70
42.01
50.64
43.14
38.76
39.00
39.37
39.41
36.07
41.88
39.30
42.34
42.23
43.38
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
5ICD
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
63
DNA &
Chromosomes
www.ogm-info.com/adn.html
(c) David Gilbert 2008
Molecular Genetics
64
Diploidy
(Image from D.Leader)
(Image from D.Leader)
(c) David Gilbert 2008
Molecular Genetics
65
Centromeres and Telomeres
(Image from D.Leader)
(c) David Gilbert 2008
Molecular Genetics
66
Recombination & meiosis
•
•
•
•
•
•
•
Meiosis is the process that transforms one diploid cell into four haploid cells in
eukaryotes in order to redistribute the diploid's cell's genome.
Forms the basis of sexual reproduction and can only occur in eukaryotes.
The diploid cell's genome, composed of ordered structures of coiled DNA called
chromosomes, is replicated once and separated twice, producing four sets of haploid
cells each containing half of the original cell's chromosomes.
These resultant haploid cells will fertilize with other haploid cells of the opposite
gender to form a diploid cell again.
The cyclical process of separation by meiosis and genetic recombination through
fertilization is called the life cycle.
The result is that the offspring produced during germination after meiosis will have a
slightly different blueprint which has instructions for the cells to work, contained in the
DNA. This allows sexual reproduction to occur.
Biochemically, meiosis uses many similar processes that mitosis (cell division) uses in
order to manipulate the redistribution of chromosomes, but with a vastly different
outcome.
(c) David Gilbert 2008
Molecular Genetics
67
Recombination
• Genetic recombination: the transmission-genetic process by which the
combinations of alleles observed at different loci (plural of locus) in
two parental individuals become shuffled in offspring individuals.
• Such shuffling can be the result of recombination via intrachromososomal recombination (crossing over) and via interchromososomal recombination (independent assortment).
• Recombination therefore only shuffles already existing genetic
variation and does not create new variation at the involved loci.
• In molecular biology, recombination generally refers to the molecular
process by which genetic variation found associated at two different
places in a continuous piece of DNA becomes disassociated (shuffled).
In this process one or both of the genetic variants are replaced by
different variants found at same two places in a second DNA molecule.
One mechanism leading to such molecular recombination is
chromosomal crossing over.
(c) David Gilbert 2008
Molecular Genetics
68
Recombination
(Image from D.Leader)
(c) David Gilbert 2008
Molecular Genetics
69
•
•
•
•
Recombination
Can have replication errors here
In meiosis, the precursor cells of the sperm
or ova must multiply and at the same time
reduce the number of chromosomes to one
full set.
During the early stages of cell division in
meiosis, two chromosomes of a
homologous pair may exchange segments,
producing genetic variations in germ cells.
Genes that lie far apart are likely to end up
on two different chromosomes. On the
other hand, genes that lie very close
together are less likely to be separated by a
break and crossing-over.
Genes that tend to stay together during
recombination are said to be linked.
Sometimes, one gene in a linked pair
serves as a "marker" that can be used by
geneticists to infer the presence of the
other (often, a disease-causing gene).
http://www.accessexcellence.org/RC/VL/GG/comeiosis.html
(c) David Gilbert 2008
Molecular Genetics
70
Sexual reproduction
Can have replication errors here
X
Diploid progeny
(c) David Gilbert 2008
Molecular Genetics
71
Cell levels
Literature
PubMed
Systems Behaviour
OMIM
System boundary
Metabolome space
…
Metabolic
networks
…
metabolite1
metabolite2
…
aMAZE
BIND
Brenda
Signalling
Pathways
KEGG
DIP
metabolite3
Proteome space
Protein-protein
Interaction
protein1
Complex1-3
ENZYME
CATH
Prodom
protein2
LIGAND
Klotho
GO
MIPS
…
GelBANK
InterPro
SwissSCOP
2DPAGE
PIR
PDB
protein3
SwissProt
Transcriptome space
RegulonDB
…
GXD
PathDB
TRANSPATH
WIT2
RNA2
Gene
RNA1
Regulatory
Pathways
RNA3
gene1
Molecular Genetics
gene3
Rfam
RNA
Sequence
ArrayExpress Database
Genome space
gene2
(c) David Gilbert 2008
SMD
NDB
COG
MethDB
…
TRANSFAC
TIGR
Ensembl
GenBank/
DDBJ/
EMBL
UCSC
Genome
Browser
72
The bigger story
sequence
Sequencing (fast)
structure
Structure determination
(slow, not all)
function
gene product
⊗ ⊗
Bioengineering?
function
Network/organism
(c) David Gilbert 2008
Molecular Genetics
Biological assays
in-vitro
Biological assays
in-vivo…
73
Summary - message
•
•
•
•
•
•
•
•
•
DNA → RNA → Protein
Central Dogma
Transcription
Translation
Genetic code, codon
“Easy” to sequence DNA
Can compute RNA and Protein sequences from DNA
Protein structure ?
Biological function?
– How does it all “work”?
(c) David Gilbert 2008
Molecular Genetics
74