Download Human Molecular Genetics

Document related concepts

Microsatellite wikipedia , lookup

The Selfish Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Tom Strachan • Andrew Read
Human Molecular Genetics
Fourth Edition
Chapter 9
Organization of the Human Genome
Copyright © Garland Science 2011
• Mitochondrial genome: 16,569 bp, 37 genes,
44% (G+C), Heavy strand (rich in G), Light strand
(rich in C) , and a small section of the genome (7S
DNA) is triple stranded (due to repetitive synthesis).
7S contains many of the control sequences and so is
called CR/D-loop region.
• Human cells vary in the number of mt DNA
molecules (typically thousands of copies/cell).
• Sperms do not contribute mtDNA to the zygote
(strictly maternal). During mitosis, mitochondria are
passed on to daughter cells by random assortment.
• Mt DNA contains 37 genes, 28 use H strand (rich in G) as
their sense strand and 9 use L strand (rich in C).
• Of the 37 mt genes: 22 are tRNA genes; 2 rRNA (23S
rRNA and 12S rRNA); 13 are polypeptide coding
(oxidative phosphorylation).
• Because mt DNA encodes 13 proteins only, its genetic
code has drifted from the universal genetic code.
• 93% of mt DNA is coding, all genes lack introns, for some
coding sequences are overlapping, some lack stop codons
(added post-transcriptionally), replication of H strand
starts at the D loop unidirectionally and 2/3 into the
mtDNA replication shifts to using the L strand from a new
origin of replication and it proceeds in the opposite
direction.
•
Nuclear genome:
- 3100 Mb (3.1 Gb), more than 26,000 genes (6 of which
are RNA genes), ~5% highly conserved including 1.1%
protein coding DNA and 4% of conserved untranslated
& regulatory sequences.
- The coding sequence is present in families of related
sequences generated by gene duplication which resulted
in pseudogenes and gene fragments.
- The 95% non-coding DNA of the human genome is
made up of tandem repeats (head to tail) or dispered
repeats resulting from retrotransposition of RNA
transcripts.
•
Human genome consists of 24 different DNA molecules making 24
chromosomes.
- content: DNA, RNA, histones, non-histones.
- divided to the gene-rich transcriptionally-active euchromatic regions (2.9
Gb) which was used in the Human Genome Project and constitutive
heterochromatin (200Mb) which is transcriptionally- inactive composed of
long arrays of highly repetitive DNA which are difficult to sequence so are the
long arrays of tandemly repeated transcription units encoding 28S, 18S, &
5.8S (were not sequenced as well). Each chromosome has some constitutive
heterochromatin at the centromere but chromosomes 1, 9, 16 & 19 have
significant heterochromatin in the euchromatic region close to the centromere.
Also significant heterochromatin is found in Y and the acrocentric
chromosomes 13, 14, 15, 21, & 22.
for euchromatic component
- Base composition:
Average GC = 41% for euchromatic componenet but there is considerable
variation between chromosomes (38% G+C) for chrm. 4 & 13 and 49% for 19.
Giemsa bands (dark bands, low GC, 37%; light bands, hi GC 45%).
CpG dinucleotides, why are they depleted from vertebrate DNA?
• Human gene number:
- At least 26,000 (6000 of which are RNA genes
- C. elegans (1 mm long worm) has 959 somatic cells, genome is 1/30
that of humans, contains 19,099 protein-coding genes & >1000 RNAcoding genes.
Therefore, genome complexity is not parallel to biological complexity.
• Human gene distribution:
- CpG islands are known to strongly associated with genes. Done by
hybridizing CpG islands to metaphase chromosomes. The results
showed that gene density is high in subtelomeric regions & that some
chromosomes (19 & 22) are gene rich while others are gene poor (X &
18).
- Gene density correlates with Giemsa banding. Dark bands are low in
G+C content and vise-versa is true for light bands.
• Duplication of DNA segments resulted in copy-number variation
and gene families
- Tandem gene duplication: Arise by unequal crossover between
unequally aligned chromatids either on homologous chromosomes
(unequal crossver) or on the same chromosome (unequal sister
chromatid exchange (Fig. 9.5).
- Duplicative transposition: this involves retrotransposition
- Gene duplication by ancestral cell fusion: Invasion by a prokaryotic
cell to a eukaryotic cell resulted in establishing organelles. By time
pieces of organelle genomes have been excised and transferred to the
nuclear genome. This resulted in duplication of cytoplasm encoding
genes in the nuclear genome.
- Large-scale subgenomic duplications: arise by chromosome
translocations (segmental duplication) (Fig 9.6).
- Whole genome duplication: Comparative genomics studies confirmed
that during eukaryote evolution (e.g. chordates).
• Organization, distribution & function human protein-coding genes:
-
Human genes show enormous variation in size and internal
organization. E.g. Dystrophin gene 2.4 Mb is transcribed in 16 hours
Diversity in exon-intron organization: very small number of
genes lack introns. For intron-containing genes, there is an
inverse correlation between gene size and fraction of coding
DNA (Table 9.4).This is not because exons in large genes are
smaller than those in small genes but because large genes have
huge introns.
Diversity in repetitive DNA content: gene have repetitive DNA
within introns, flanking sequences, and to different extents in
coding sequences.
-
Different proteins can be specified by overlapping transcription units:
- Overlapping genes and genes-within-genes: Gene density varies
between chromosomes and within regions of same chromosome.
- In regions with high gene density, overlapping genes maybe found
which are typically transcribed from opposing DNA strands e.g.
HLA complex (Fig 9.7A)
- 9% of the humna protein-coding genes overlap and more of 90% of
such overlaps involve transcription from opposing strands.
However, sometimes small protein-encoding genes are located
within the introns of larger genes e.g. neurofibromatosis type I
(NF1) (Fig. 9.7B)
- Some protein-coding genes share a common promoter and are
transcribed in opposite directions.
-
Protein-coding genes often belong to families that are clustered or
dispersed on multiple chromosomes:
-
Examples of clustered gene families in Fig 9.8 while some gene
families have copies at two or more chromosomal locations
without gene clustering (Table 9.6).
- Three different classes of gene family according to the extent of
sequence identity and structural similarity of the protein products:
1- High degree of homology over most of the length of the gene
or coding sequence e.g. histone and the α- and β-globin gene
families.
2- Members may have very low sequence homology but they
posses one or more common protein domain e.g. the PAX and
SOX gene families (Table 9.7).
3- Gene families are defined by functionally similar short protein
motifs (these encode functionally-related protein with a DEAD
(Asp-Glu-Ala-Asp) or the WD repeat (Fig. 9.9).
-
Gene duplication events that give rise to multigene families also create
pseudogenes and gene fragments.
- Pseudogenes are defective gene copies that contain multiple exons
while gene fragments have only limited parts of the gene sequence
(sometimes a single exon).
- Pseudogenes could be
(a) nonprocessed (e.g. Fig 9.8 and HLA gene family in Fig 9.10).
May result from chromosmal locations that are unstable such as
pericentromeric and subtelomeric regions. These regions are prone
to recombination events that can result in duplicated gene segments
being distributed to other chromosomal locations. Example of
pericentromeric rearrangenements is NF1 gene (Fig 9.11A) and
subteolmeric rearrangements is polycystic kidney disease gene
PKD1 (Fig 9.11B)
(b) processed via retrotransposition by cellular reverse transcriptase
(Fig. 9.12, Table 9.8)
RNA Genes
• Fig 9.13 shows the functional diversity of human ncRNA (noncoding
RNA).
• Table 9.9 is a compilation of all the major classes of ncRNA
- More than a 1000 human genes, mostly within large gene clusters,
encode rRNA or tRNA
- Ribosomal RNA genes:
- Two mitochondrial rRNA molecules (12S & 16S)
- Four types of cytoplasmic rRNA, 3 associated with the large
ribosome subunit (28S, 5.8S, & 5S) and one with the small
ribosome subunit (18S)
- The 5S occur in small gene clusters, the largest cluster is 16
genes on 1q42 close to the telomere.
- The 28S, 5.8S, & 18S rRNA are encoded by a single multigenic
transcription unit that is tandemly repeated to form megabsesized ribosomal DNA arrays (~30-40 tandem repeats or ~100
rRNA genes) on the short arms of each of the acrocentric
chromosomes 13, 14, 15, 21, & 22.
- Transfer RNA genes:
- 22 tRNA genes make 22 different tRNA molecules.
- Nuclear genome has 516 tRNA genes, classified into 49 families based
on codon specificity, that make cytoplasmic tRNA .
- Amino acid frequency doesn’t correlate with the number tRNA genes.
E.g. 30 tRNAs specify the rare cysteine (2.25% of all amino acids in
human proteins) but only 21 tRNA genes specify the more abundant
proline (6.10% of total).
- More than half the tRNA genes (273 out of 516) reside in
chromosome 6 (many clustered in a 4 Mb region) or 1. 18 of the 30
Cys tRNAs are found in a 0.5 Mb stretch of chromosome 7.
• Dispersed gene families make various small nuclear RNAs that facilitate
general gene expression:
-
Various families of small RNA molecules (60-360 nucleotides long) play
a role in assisting general gene expression, mostly at the level of posttranscriptional processing.
There are 3 types:
(i) small nuclear RNAs (snRNAs) are U-rich and bind to various proteins
to function as ribonucleoproteins (snRNPs).
(ii) snRNA that are involved in post-transcriptional processing of rRNA
precursors in the nucleolus were re-classified as small nucleolar RNAs
(snoRNAs).
(iii) Resemble snoRNAs but are confined coiled bodies (discrete
structures in the nucleus that are involved with the maturation of
SnRNPs) and are called Cajal body RNAs (scaRNAs).