* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download The human genome
Mitochondrial DNA wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Skewed X-inactivation wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Point mutation wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Minimal genome wikipedia , lookup
Y chromosome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Human genetic variation wikipedia , lookup
Public health genomics wikipedia , lookup
Medical genetics wikipedia , lookup
Gene expression programming wikipedia , lookup
Population genetics wikipedia , lookup
Non-coding DNA wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Neocentromere wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Genomic library wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
X-inactivation wikipedia , lookup
Genetic engineering wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
遺傳統計導論
2006.2.24—2006.6.16
高振宏、程毅豪、杜憶萍教授
課程綱要 I
• Week 1: Course Overview, Basic Knowledge of Genome Biology,
Basic Principles of Population Genetics
•
•
•
•
Week 2: Linkage Analysis for Family Data – I
Week 3: Linkage Analysis for Family Data – II
Week 4: Introduction to Microarray Data Analysis
Week 5: Nature of Discrete Genetic data & Estimating
Frequencies
• Week 6: Disequilibrium & Diversity
• Week 7: Population Structure, Individual Identification &
Outcrossing And Selection
• Week 8: Linkage
• Week 9: Midterm
課程綱要 II
• Week 10. Phylogeny Reconstruction & Quantitative
Genetics I
• Week 11: Quantitative Genetics II
• Week 12: QTL mapping I
• Week 13: QTL mapping II
• Week 14: Population-based Association Analysis
• Week 15: Family-based Association Analysis
• Week 16: Multipoint Association Analysis
• Week 17: Genomewide Association Analysis
Thomas Andrew Knight
(1759-1838)
Thomas Andrew Knight, the first man to practice large-scale, systematic
strawberry breeding, which produced two famous varieties: the
Downton and the Elton. As a founder and long-time president of
England's Royal Horticultural Society, he encouraged others to breed
better varieties of fruits and vegetables.
Thomas Andrew Knight
• Knight's father was a Herefordshire clergyman who died
when his son was five years old. The boy's education
was neglected, and until he was nine he remained
almost illiterate. Since he was unable to read as a child,
he concentrated his curiosity on the plant and animal life
on the family estate. One day, says a story, he saw a
gardener planting beans. The boy asked why the man
was planting sticks of wood and was told they would
grow up to be beans. The gardener's prediction came
true. Knight immediately planted his pocket knife and
waited in anticipation for the miraculous growth of new
knives. When the experiment failed he sat down to
consider the difference in the two cases. Already he was
engrossed with the mysteries of the vital processes in
plants, a preoccupation which would lead later to his
reputation as a brilliant plant physiologist.
Downton (1817)
Elton (1828)
Knight didn’t count,
Mendel did count.
Gregor Mendel
1822-1884
By the 1890's, the invention of better
microscopes allowed biologists to discover
the basic facts of cell division and sexual
reproduction. The focus of genetics
research then shifted to understanding what
really happens in the transmission of
hereditary traits from parents to children. A
number of hypotheses were suggested to
explain heredity, but Gregor Mendel, a little
known Central European monk, was the only
one who got it more or less right. His ideas
had been published in 1866 but largely went
unrecognized until 1900, which was long
after his death. His early adult life was spent
in relative obscurity doing basic genetics
research and teaching high school
mathematics, physics, and Greek in Brno
(now in the Czech Republic). In his later
years, he became the abbot of his monastery
and put aside his scientific work.
因為簡單, 所以偉大
James Watson
Francis Crick
1928--
1916--2004
Slides 15—36 are edited from
and
Bonnie Berger
MIT
The human genome
• The cell is the fundamental working
unit of every living organism.
• Humans: trillions of cells (metazoa);
other organisms like yeast: one cell
(protozoa).
• Cells are of many different types (e.g.
blood, skin, nerve cells), but all can be
traced back to a single cell, the
fertilized egg.
Nucleus
Eukaryota: More on Morphology
The human genome in numbers
•
•
•
•
•
23 pairs of chromosomes;
2 meters of DNA;
3,000,000,000 bp;
35 M (males 27M, females 44M);
30,000-40,000 genes.
The human genome
• The genome, or blueprint for all
cellular structures and activities in our
body, is encoded in DNA molecules.
• Each cell contains a complete copy of
the organism’s genome.
The human genome
• The human genome is distributed
along 23 pairs of chromosomes
22 autosomal pairs;
the sex chromosome pair, XX for
females and XY for males.
• In each pair, one chromosome is
paternally inherited, the other
maternally inherited (cf. meiosis).
The human genome
• Chromosomes are made of compressed
and entwined DNA.
• A (protein-coding) gene is a segment
of chromosomal DNA that directs the
synthesis of a protein.
DNA
• A deoxyribonucleic acid or DNA molecule is a
double-stranded polymer composed of four basic
molecular units called nucleotides.
• Each nucleotide comprises a phosphate group, a
deoxyribose sugar, and one of four nitrogen bases:
adenine (A), guanine (G), Cytosine (C), and
thymine (T)
• The two chains are held together by hydrogen
bonds between nitrogen bases.
• Base-pairing occurs according to the following
rule: G pairs with C, and A pairs with T.
Genes control the making of cell parts
• The gene is a fundamental unit of inheritance
– DNA molecule contains tens of thousands of genes
– Each gene governs the making of one functional element,
one “part” of the cell machine
– Every time a “part” must be made, a piece of the genome
is copied, transported, and used as a blueprint
• RNA is a temporary copy
– The medium for transporting genetic information from
the DNA information repository to the protein-making
machinery is and RNA molecule
– The more parts are needed, the more copies are made
– Each mRNA only lasts a limited time before degradation
The genetic code
• DNA: sequence of four different nucleotides.
• Protein: sequence of twenty different amino
acids.
• The correspondence between DNA’s four-letter
alphabet and a protein’s twenty-letter alphabet is
specified by the genetic code, which relates
nucleotide triplets or codons to amino acids.
Big Picture
Basic human genetics
•46 chromosomes
22 pairs of autosomal chromosomes and
2 sex chromosomes
Double stranded DNA
4 bases: A = Adenine
p-arm
q-arm
Centromere
T = Thymine
G = Guanine
C = Cytosine
Approximately 3 000 000 000 basepairs in the human genome
The Central Dogma of Molecular
Biology
Basic Principles of Population
Genetics
Reference: Kenneth Lange
Mathematical and Statistical
Methods for Genetic Analysis
Mendel’s experiment data
Trait
Characteristics
Dominant
Recessive
tall
short
787
277
pod shape
inflated
constric
ted
882
299
seed shape
round
wrinkle
d
5474
1850
seed colour
yellow
green
6022
2001
flower
position
axial
terminal 651
207
flower colour
purple
white
705
224
pod colour
green
yellow
428
152
stem length
Mendel’s First Law
• First Generation RR x rr
• Second Generation Rr x Rr (self cross)
• Third generation RR+Rr (3/4)
rr
(1/4)
Mendel’s Second Law
Independent two traits
What if the traits are not
independent?
Genetic and physical maps
• Physical distance: number of base pairs
(bp).
• Genetic distance: expected number of
crossovers between two loci, per chromatid,
per meiosis.
Measured in Morgans (M) or centiMorgans
(cM).
• 1cM ~ 1 million bp (1Mb).
Definition
• The genetic map distance (in units of Morgans)
between two loci is defined as the expected
(average) number of crossovers occuring on a
single chromosome (in a gamete) between two
loci.
Ex: Chromosome 1:
Note: 1 Mb . 1 cM
Physical length: 263 Mb
Female map length: 3.76 M = 376 cM
Male map length: 2.21 M = 221 cM
Crossover, Recombination
Mother’s
Chromosomes
Father’s
Chromosomes
Sibling 1
Crossover
Recombination: crossover occurs odd number of times
Haldane Mapping Fun.: A Recombination freq. Fun between 2 genes
Q(d)=(1-exp(-2ld))/2
Assume that the event of Crossover across a Chromosome is a Poisson Process
Haldane Mapping Function
• Assume crossover happens as a Poisson
Process along the chromosome
rate: l
physical distance: d
A
B
d
Haldane Mapping Function
•
AB =
P( Recombination between A and B)
= P( # of crossover {odd number}
between A, B)
(l d )2 k 1 e ld
=
k 0
(2k 1)!
Haldane Mapping Function
(l d )
(l d )
e
k 0 (2k )!
(2k 1)!
ld
e ld
2k
2 k 1
( l d )2 k ( l d )2 k 1
k 0
(2k )!
(2k 1)!
eld e ld (l d )2 k 1
k 0 (2k 1)!
2
1 e 2 ld
AB
2
• The following 5 slides are to help you
keep a reference for the basic human
genetics terminologies.
1.2 Genetics Background
The cells of all organisms, from bacteria to humans, contain one or more sets
of a basic DNA complement that is unique to the species. This fundamental
complement of DNA is called a genome. The genome may be subdivided
into chromosomes, each of which is a very long single continuous DNA
molecule. In its turn, a chromosome can be demarcated along its length
into thousands of functional regions called genes. The word gene is used
originally as the unit factor of heredity. In modern terminology, a gene
is a specific coding sequence of DNA. The alternate forms of a gene are
called alleles. Two persons who share alleles from a common ancestor are
called Identical by Descent, abbreviated as IBD. The pair of alleles in
an individual constitutes that individual’s genotype. The expression of a
particular genotype is called a phenotype.
Sperm and egg are created in a process called meiosis by splitting
the chromosome pairs in half and creating cells with only twenty-three single
chromosomes. When an embryo is formed from an egg and a sperm cell,
it again has a full set of twenty-three pairs, with half of each pair coming
from mother and half from the father. In meiosis, homologous chromosomes
pair up, and they may exchange genetic material between them during a
process called crossover. A chromosome in a gamete, which is a mixture
of the two homologous chromosomes in the parent, can be modeled in the
following way. It starts with either homologous chromosome randomly, moves
a random distance along this chromosome and then switches to the other
chromosome. It moves another random distance, and switches again. This
process continues untill the end of the chromosome is reached.
There are two kinds of distance metric for chromosome. Physical distances are measured in terms of number of base pairs (abbreviated as bp)
Between two points. The units for physical distances are bp and kb (1000
bp). Genetic distances are defined as the expected numbers of crossovers
between two points with unit Morgan. Another common unit for genetic
distances is cM (centi-Morgan). Different models underlying the crossover
process will give different genetic distances. The most popular one is Haldane model, saying that the random distance waiting for a crossover to occur
is an exponential R.V. this implies that the number of crossovers along the
chromosomes is a Poisson process. The genetic length of a a human genome is
about 35 Morgans. See Ott (1991).
If two alleles on the same parental chromosomes are passed to the offspring
together, one says that there is no recombination between them; otherwise,
one says that there is recombination. Another way to explain recombination
Is that there is odd number of crossovers between two genes. When two genes
are inherited independently of each other, the probabilities for recombination
and no recombination are equal, i.e., ½. Two genes are linked if the
recombination frequency between them is smaller than ½. (Notice that the
recombination frequency is never greater than ½.) A mapping function
is a mapping between the recombination frequency and genetic distance for
two loci. For example, under the Haldane model, the mapping function for
Hardy-Weinberg Equilibrium
• The genotype frequencies reach steady
states through the generations.
• Assumptions:
–
–
–
–
–
–
–
1)Infinite population size
2)Discrete generations
3)Random Mating
4)No Selection
5)No migration
6) No mutation
7) Equal initial genotype frequencies in 2 sexes.
Hardy-Weinberg Equilibrium
• Consider a single locus with two alleles (A, a),
the possible genotypes are (AA, Aa, aa)
• Question: How the genotype frequencies
propagate through the generation?
AA
U0
Aa
2V0
genotype freq. U1
2V1
....
....
Un
2Vn
aa
W0
W1
Wn
P0 = P(A) = U0+V0
Q0 = P(a) = W0+V0 = 1- P0
H.W. Equilibrium
Assume random mating
1
1
2
U1 U 2( U 0 2V0 ) (2V0 )
2
4
(U 0 V0 ) 2 P02
2
0
By symmerty W1 Q 02
2V1 2P0 Q 0
......
......
U 2 (U1 V1 ) 2 (P02 P0 Q 0 ) 2 P02 , W2 Q12
.........
U n P02 , 2Vn 2P0 Q 0 Wn Q 02
2V1 2P0 Q 0
HW Equilibrium for X-linked loci
• Assume at generation n
– gene frequency for female qn
– gene frequency for male rn
2
1
qn r q0 r0
=> q lim
n
3
3
HW Equilibrium for X-linked loci
• Proof : Under the similar conditions,
we have r q
=>
n 1
n
1
qn 2 ( rn1 qn1 )
1
qn ( rn1 a rn (1 a ) qn 1 )
2
a
1 a
1
qn rn
( qn 1
rn 1 )
2
2
1 a
a
1
2 1 a
a 2 a 2 0 a 2, 1
HW Equilibrium for X-linked loci
• a=2
1
qn rn ( qn 1 rn 1 ) lim(
q
r
)
0
n
n
n
2
• a = -1
1
1
1
qn rn qn1 rn1 q0 r0
2
2
2
2q0 r0
lim
qn q
n
3
Linkage Equilibrium
A
{ Ai }
alleles
frequency { pi }
B
{B j }
{q j }
Pn ( Ai B j ) : haplotype frequency of Ai B j in the
th
n generation
: recombination frequency(=
1
0
2
1
( m f ) ),
2
Linkage Equilibrium
Pn ( Ai B j ) (1 ) Pn1 ( Ai B j ) pi q j
Pn ( Ai B j ) pi q j (1 )[ Pn 1 ( Ai B j ) pi q j ]
(1 )n [ P0 ( AB
) pi q j ] 0
i
j
if 0
Selection: reproduction capacity
• E.g.
let WA/ A WA/ a Wa / a (fitness) be the expected
genetic contributions to the next generation for
the given genotypes.
W.L.O.G.
let WA/ a 1, WA/ A 1 r,
where r, s 1
Wa / a 1 s
Selection
• Let pn be the allele frequency of A at
generation n. qn 1 pn for allele a
W n (1 r ) pn2 2 pn qn (1 s )qn2
1 rpn2 sqn2
Selection
pn pn1 pn
(1 r ) pn2 pn qn
pn
Wn
(1 r ) pn2 pn qn (1 rpn2 sqn2 ) pn
Wn
pn2 pn qn pn r ( pn2 pn ) pn spn qn2
Wn
pn qn ( rpn sqn )
Wn
pn qn [ s ( r s ) pn ]
Wn
Selection
• To reach equilibrium state
s
p 0 p 1 or 0 or
rs
Assume r, s different sign
• if r > 0, s 0 pn 0 if pn 0, n
=> extinction of A
• if r 0, s > 0
=> extinction of a
Selection
• if r, s have the same sign
s
s
pn 1
pn pn
rs
rs
s
( r s ) pn qn ( pn
)
s
r
s
( pn
)
2
2
1 rpn sqn
rs
s 1 rpn2 sqn2 ( r s ) pn qn
( pn
)[
]
2
2
rs
1 rpn sqn
s 1 rpn sqn
( pn
)[
]
2
2
r s 1 rpn sqn
s
( pn
)ln
rs
Selection
• if r < 0, s < 0, ln 1
s
p 1, if p0
rs
s
p 0, if p0
unstable equilibrium
sr
s
s
p
, if p0
rs
s r
Selection
• If r > 0, s > 0, ln 1
s
p
sr
stable equilibrium
Heterozygote advantage (r, s both positive)
• Geneticists have suggested that reverse
recessive diseases are maintained at high
frequency by the mechanism of Heterozygote
advantage.
• The best evidence favoring this hypothesis
exists for sickle cell anemia. A single dose of
the sickle cell gene appears to confer
protection against malaria.
Sickle Cell Anemia
normal hemoglobin Hb
2 alpha and 2 beta chains
form a 4 chain tetramer
Sickle Cell Anemia
beta chains bind with other
beta chains in RBC when
deoxygenated
–polymerization occurs
–Hb polymers distort RBC
into sickled shapes
–vaso-occlusion