Download The human genome - The Galton Institute

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Molecular cloning wikipedia , lookup

Gene expression wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Community fingerprinting wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Transposable element wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene wikipedia , lookup

Genomic library wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Genome organisation, DNA
fingerprinting, VNTRs…
Andrew P Read Emeritus Professor of Human Genetics, University of Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester
Galton Institute teachers’ day, Nowgen, 30th June 2015
The human genome: things to ask
• How big is it ?
• Why is it organised into chromosomes?
• What is in it ?
• How does it work?
• How does my genome compare to others?
How big is it?
Weight of DNA in a (diploid) cell = 6.9 pg
Mol wt of a base pair (incl
phosphates) ≈ 650
6 x 1023 bp would weigh 650 g
Wikipedia
(6 x 1023/650) x 6.9 x 10-12 bp would weigh 6.9 x 10-12 g
No. of base pairs in a diploid cell = 6.3 x 109
Human genome is ca.3,200,000,000 base pairs of
DNA
The human mitochondrial genome
16,569 bp DNA
13 protein-coding genes 2 ribosomal RNA genes
22 transfer RNA genes
The great majority of
mitochondrial proteins are
encoded in the nucleus.
www.mitomap.org
The human genome: things to ask
• How big is it ?
• Why is it organised into chromosomes?
• What is in it ?
• How does it work?
• How does my genome compare to others?
Why is it organised into chromosomes?
3,200,000,000 base pairs of
DNA
Wikipedia
Base pairs are 0.34 nm apart
Length of DNA in a (diploid)
cell:
Length = (3.2 x 109) x 2 x (0.34 x 10-9) m
= 2 metres !!
Why is it organised into chromosomes?
A normal diploid human cell contains 2m of DNA
Magnify
1,000,000 x
Human cell
Lecture room
nucleus 10 x 10 x 10 μm
10 x 10 x 10 m
2m of DNA
2,000 km of string!
The human genome: things to ask
• How big is it ?
• Why is it organised into chromosomes?
• What is in it ?
• How does it work?
• How does my genome compare to others?
Protein-coding genes…
3’UT
5’UT
DNA
→ RNA → protein
Where is…..
the promoter ?
the transcription start site ?
the transcription termination site ?
the 5’ end of exon 3 ?
the 3’ end of intron 3 ?
The number of exons in a gene varies widely with no evident logic
Table 3.1
Our genome looks chaotic…
• genes seem to be randomly arranged across the chromosomes
• there are very few functional clusters (like E.coli operons)
• e.g. haemoglobin: α-chain gene is on c’me 16, β on c’me 11
• e.g. Fanconi anaemia
•
People with a balanced
chromosomal translocation are
usually normal and healthy
(though they are at risk of having
miscarriages or abnormal babies)
NCG3 Fig 2.14
Fanconi anaemia
An autosomal recessive
condition with
developmental
abnormalities and high
risk of cancer
Due to malfunction of a
multiprotein complex
that repairs damaged
DNA
Locations of genes for
13 of the proteins in the
complex
Genome browsers
Ensembl
UCSC browser
http://www.ensembl.org
http://genome.ucsc.edu/
Genome statistics
www.miRBase.org
The transcriptome and the proteome are much
more complex than the genome
• At least 20,389 protein-coding genes ……
…. but 194,353 gene transcripts.
• One gene can encode multiple proteins by:
• Alternative splicing (average 6.3 splice isoforms per gene)
• Using alternative promoters (70,292 promoter-like sequences in the genome)
• Differential post-translational modification
• Special mechanisms e.g. immunoglobulin genes
One gene – more than one protein
Alternative splicing
Including / skipping an exon
Alternative splice donor sites
Alternative splice acceptor sites
Alternative exons
The neurexin B gene can encode ca. 1,000
different proteins
Promoters
Exon that can be included or skipped
Exon with 2 alternative 5’ splice sites
Exon with 3 alternative 5’ splice sites
Exon with 2 alternative 3’ splice sites
Exon with 2 alternative 5’ splice sites, includes an alternative
stop codon, producing a protein lacking the transmembrane and
cytoplasmic domains encoded by exon 24.
What is in it?
• ca. 20-25,000 protein-coding genes
• Suppose a typical protein is made of 500 aminoacid residues
• It would need 1,500 nucleotides of messenger RNA to encode it
• So our genome might contain around 1,500 x 25,000 bp of coding sequence = 37 million bp.
• This is only 1.16% of the total DNA of our genome!
The best current real estimate is 1.22%
So what about the 98.8% of DNA that is noncoding?
• some is in introns of protein-coding genes
• much of it is in repetitive sequences …
… there are interspersed repeats…
From the initial report of the Human Genome Project (Nature 409: 860–921; 2001)
… and tandem repeats.
Tandem repeats
Jobling et al. Human Evolutionary Genetics 2nd edn Fig 3.16
Tandem repeats
• usually non-coding
• satellite DNA (173 bp repeat unit) in centromeric heterochromatin
• minisatellites particularly near telomeres of chromosomes
• microsatellites scattered throughout the genome
• number of repeat units in a given mini / microsatellite often varies
between individuals – variable number of tandem repeats (VNTR)
3 alleles of a (CA)n microsatellite
DNA fingerprinting
Southern blot, hybridising to a probe that recognises
a whole series of different minisatellite VNTRs
DNA profiling
• DNA fingerprinting has been superseded by DNA profiling
• DNA profiling uses a panel of microsatellites, amplified by PCR and
genotyped on an automated genome sequencer
• A profile using the UK SGM+ panel of 10 microsatellites plus a sex marker.
The human genome: things to ask
• How big is it ?
• Why is it organised into chromosomes?
• What is in it ?
• How does it work?
• How does my genome compare to others?
Epigenetics: making it work
The 2m of DNA in a cell
nucleus is packaged, primarily
by the 5 histone proteins, H1,
H2A, H2B, H3, H4.
Variable covalent modification of
histones (methylation, acetylation,
phosphorylation…), also
methylation of DNA, controls the
packaging and hence gene
activity.
The ENCODE project
Aim: to delineate all functional elements encoded in the human genome
2003 Pilot phase: 44 genomic regions, totalling 30 Mb, examined using a
wide range of experimental and computational methods, with the aim of
identifying all functional elements.
Nature 447: 799 - 816 ; 2007
2007 Production phase: extend to whole genome.
30 papers in September 2012 – see Nature 489: 57; 2012
The Nature ENCODE Explorer: www.nature.com/encode/
5 papers in August 2014 – see Nature 512: 374; 2014
Major findings of the ENCODE project
• The great majority of all the genome is transcribed, at least at some times
and in some types of cell.
• Chromatin can be classified into ‘flavours’ based on patterns of DNA
methylation and covalent modification of histones, and these flavours
correlate with biochemical activity.
• Some sort of function (coding, transcribed, protein-binding…) can be
assigned to 80% of nucleotides in the genome – but see Graur et al.,
Genome Biol Evolution 5: 578-590; 2013 for a dissenting interpretation.
Regulatory elements: enhancers
Enhancers are promoter-like sequences
that may be brought into contact with the
promoter by DNA looping. Promoters and
enhancers bind transcription factors
(DNA-binding proteins that help turn
genes on). Tissue-specific gene
expression is largely controlled by
enhancers.
ENCODE identified 399,124 sequences with enhancer-like
features
The human genome: things to ask
• How big is it ?
• Why is it organised into chromosomes?
• What is in it ?
• How does it work?
• How does my genome compare to others?
Differences between genomes of normal
healthy people
• There would typically be 3-4 million differences: single nucleotide variants,
indels (small insertions/deletions), variable number tandem repeats, copy
number variants (kilobases to megabases) …
• A very few of the differences directly determine a characteristic:
e.g. sex, eye colour, blood groups…
• A larger number of the differences contribute to a characteristic, along with
other genetic and environmental factors:
e.g. height, blood pressure, liability to diabetes…
• The overwhelming majority have no apparent effect.
Summary and conclusions
• We are vastly more complex than Drosophila flies or Caenorhabditis worms,
but we don’t have vastly more protein coding genes.
• Only 1.2% of our genome is protein-coding sequence. Much of the rest is
involved in control (there may also be some ‘junk DNA’).
• Coding sequence is controlled by an interacting network of DNA
methylation (on CpG), histone modifications and non-coding RNAs.
• Cells are awash with non-coding RNAs, few of which have well understood
functions.
• There’s still a very long way to go in understanding how our genomes work
– but the bit we do understand is already fascinating and inspiring..