Download Viral genomes

Document related concepts

List of types of proteins wikipedia , lookup

Ridge (biology) wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

X-inactivation wikipedia , lookup

Molecular cloning wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene regulatory network wikipedia , lookup

Genomic imprinting wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene expression wikipedia , lookup

Point mutation wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcriptional regulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genomic library wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Genomes
Definition
Complete set of instructions for making an organism
• master blueprints for all enzymes, cellular structures & activities
an organism‘s complete set of DNA
total genetic information carried by a single set of
chromosomes in a haploid nucleus
Viral genomes
Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear or circular
Viruses with RNA genomes:
• Almost all plant viruses and some bacterial and animal viruses
• Genomes are rather small (a few thousand nucleotides)
Viruses with DNA genomes (e.g. lambda = 48,502 bp):
• Often a circular genome.
Replicative form of viral genomes
• all ssRNA viruses produce dsRNA molecules
• many linear DNA molecules become circular
Molecular weight and contour length:
• duplex length per nucleotide = 3.4 Å
• Mol. Weight per base pair = ~ 660
Procaryotic genomes




Generally 1 circular chromosome (dsDNA)
Usually without introns
Relatively high gene density (~2500 genes per mm of E. coli DNA)
Often indigenous plasmids are present
1. Eschericia coli
2. Agrobacterium tumefaciens
Bacterial genomes: E. coli


4288 protein coding genes:
• Average ORF 317 amino acids
• Average gene size 1000 bp
• Very compact: average
distance between genes 118bp
Contour length of genome: 1.7
mm
Easy problem
Bacterial Gene-finding
 Dense Genomes
 Short intergenic regions
 Uninterrupted ORFs
 Conserved signals
 Abundant comparative information
 Complete Genomes
Plasmids
Extra chromosomal circular DNAs










Found in bacteria, yeast and other fungi
Size varies form ~ 3,000 to 250,000 bp.
Replicate autonomously (origin of replication)
May contain resistance genes
May be transferred from one bacterium to another
May be transferred across kingdoms
Multipcopy plasmids (~ up to 400 plasmids/per cell)
Low copy plasmids (1 –2 copies per cell)
Plasmids may be incompatible with each other
used as vectors that could carry a foreign gene of interest
-lactamase
ori
foreign gene
Agrobacterium tumefaciens

Characteristics
• Plant parasite that causes Crown Gall Disease
• Lives in intercellular spaces of the plant
• Encodes a large (~250kbp) plasmid called Tumor-inducing (Ti)
plasmid)
• Plasmid contains genes responsible for the disease
• Wound = entry point  10-14 days later, tumor forms
• Portion of the Ti plasmid is transferred between bacterial cells and
plant cells  T-DNA (Tumor DNA)
Agrobacterium tumefaciens

Characteristics
• T-DNA integrates stably into plant genome
• Single stranded T-DNA fragment is converted to dsDNA fragment by
plant cell


Then integrated into plant genome
2 x 23bp direct repeats play an important role in the excision and
integration process
Agrobacterium tumefaciens

What is naturally encoded in T-DNA?
• Enzymes for auxin and cytokinin synthesis


Causing hormone imbalance  tumor formation/undifferentiated
callus
Mutants in enzymes have been characterized
• Opine synthesis genes (e.g. octopine or nopaline)


Carbon and nitrogen source for A. tumefaciens growth
Insertion genes
• Virulence (vir) genes
• Allow excision and integration into plant genome
Ti plasmid of A. tumefaciens
1. Auxin, cytokinin, opine
synthetic genes
transferred to plant
2. Plant makes all 3
compounds
3. Auxins and cytokines
cause gall formation
4. Opines provide unique
carbon/nitrogen source
only A. tumefaciens can
use!
Fungal genomes: S. cerevisiae




First completely sequenced eukaryote
genome
Very compact genome:
• Short intergenic regions
• Scarcity of introns
• Lack of repetitive sequences
Strong evidence of duplication:
• Chromosome segments
• Single genes
Redundancy: non-essential genes provide
selective advantage
Eucaryotic genomes
Located on several chromosomes
Relatively low gene density (50 genes per mm of DNA in
humans)
Carry organellar genome as well
Human Genomes
50,000 genes X 2 kbp=100 Mbp
Introns=300 Mbp?
Regulatory regions=300 Mbp?
•Only 5-10% of human genome codes for genes
- function of other DNA (mostly repetitive sequences) unknown
but it might serve structural or regulatory roles
2300 Mbp=???
Plant genomes







Plant contains three genomes
Genetic information is divided in the chromosome.
The size of genomes is species dependent
The difference in the size of genome is mainly due to a different number
of identical sequence of various size arranged in sequence
The gene for ribosomal RNAs occur as repetitive sequence and together
with the genes for some transfer RNAs in several thousand of copies
Structural genes are present in only a few copies, sometimes just single
copy. Structural genes encoding for structurally and functionally related
proteins often form a gene family
The DNA in the genome is replicated during the interphase of mitosis
Plant genomes: Arabidopsis thaliana







A weed growing at the roadside of central Europe
It has only 2 x 5 chromosomes
It is just 70 Mbp
It has a life cycle of only 6 weeks
It contains 25,498 structural genes from 11,000
families
The structural genes are present in only few
copies sometimes just one protein
Structural genes encoding for structurally and
functionally related proteins often form a gene
family
Peculiarities of plant genomes







Huge genomes reaching tens of billions of base pairs
Numerous polyploid forms
Abundant (up to 99%) non coding DNA which seriously hinders
sequencing, gene mapping and design of gene contigs
Poor morphological, genetics, and physical mapping of chromosomes
A large number of “small-chromosome” in which the chromosome length
does not exceed 3 μm
The difficulty of chromosomal mapping of individual genes using in situ
hybridization
The number of chromosomes and DNA content in many species is still
unknown
Size of the genome in plants and in human
Genome
Arabidopsis
thaliana
Zea mays
Vicia faba
Human
Nucleus
70 Millions
3900 Millions
14500 Millions
2800 Millions
Plastid
0.156 Millions
0.136 Millions
0.120 Millions
Mitochondrion
0.370 Millions
.570 Millions
.290 Millions
.017 Millions
Organisation of the genome into
chromosome





The nuclear genome is organized in to chromosome
Chromosomes consist of essentially one long DNA helix wound
around nucleosome
At metaphase, when the genome is relatively inactive, the
chromosome are most condensed and therefore most easily
observed cytologically, counted or separated
Chromosomes provide the means by which the plant genome
constituents are replicated and segregated regularly in mitosis and
meiosis
Large genome segments are defined by their conserved order of
constituent genes
Chromosome
Chromosome parts
1. Heterochromatin
Darkly staining portions of chromosomes,
believed due to high degree of coiling
a. Centromere
~ “middle” of Chromosomes
spindle attachment sites
b. Telomeres
1. ends of chromosome
2. important for the stability of
chromosomes tips.
2. Euchromatin
Lightly staining portion of chromosomes
It represents most of the genomes
It contains most of genes.
Genome organization
Protein Coding Genes
Segment of DNA which can be transcribed and translated to amino acid
Protein Coding Genes

Transcribed region ≈ Open Reading Frame (ORF)
• long (usually >100 aa)
• “known” proteins  likely

Basal signals
• Transcription, translation

Regulatory signals
• Depend on organism

Prokaryotes vs Eukaryotes
Protein Coding Genes







Plant contains about 10 000 – 30 000 structural genes
They are present in only a few copies, sometimes just one (single copy gene)
They often form a gene family
The transcription of most structural genes is subject to very complex and
specific regulation
The gene for enzymes of metabolism or protein biosynthesis which proceed in
all cells are transcribed more often
Most of the genes are switched off and are activated only in certain organ and
then often only in certain cells
Many genes are only switched on at specific times
House keeping gene:
The genes which every cell needs for such basic functions independent of its
specialization
Prokaryotic Gene
Promoter
Cistron1
Cistron2
Transcription
CistronN Terminator
RNA Polymerase
mRNA 5’
3’
1
2
Translation
C
N
N
N
Ribosome, tRNAs,
Protein Factors
C
N
C
1
2
Polypeptides
3
Promoter Region on DNA



upstream from transcription start site
initial binding site of RNA polymerase and initiation factors (IFs)
Promoter recognition: a prerequisite for initiation
E. coli consensus promoter regions
-35 site = TTGACA
-10 site: “TATA” box
28
Eukaryotic genes
Pseudogenes



Nonfunctional copies of genes
Formed by duplication of ancestral gene, or reverse
transcription (and integration)
Not expressed due to mutations that produce a stop codon
(nonsense or frame-shift) or prevent mRNA processing, or
due to lack of regulatory sequences
Tandemly Repeated DNA





A large number of identical repeated DNA sequences
It spread over the entirely chromosome
There is therefore within species variation for the number
of copies in allelic arrays
Variations in the lengths of tandemly repeat units have
been used as a sources of molecular marker
It is divided into:
1. Tandemly repeated expressed DNA
2. Tandemly repeated non expressed DNA
Tandemly Repeated Gene





Genes which are duplicated and clustered at many location of the
genome
Ribosomal 18S, 58S, 25S and 5S RNA genes are highly reiterated
in clusters and form at sites called nucleolus organizers (NOR)
There is therefore within species variation for the number of copies
in allelic arrays
Variations in the lengths of rDNA repeat units have been used as a
sources of molecular marker
Tandem repeated expressed DNAs are also observed for tDNA and
histones
Tandemly Repeat non expressed
DNA
Repetitive sequences which are unable to be expressed but found
in huge amount in the genome


Simple-sequence DNA
Moderately repeated DNA (mobile DNA)
Simple Sequence DNA
 Very sort sequences repeated many times in tandem in large clusters
 It is also called as satellite DNA
 It often lies in heterochromatin especially in centromeres and telomeres
(and others)
 It is divided into 2 groups:
Mini satellite : Variable number tandem repeat (VNTR)
Micro satellite : Simple sequence repeat (SSR)
 It is used in DNA fingerprinting to identify individuals
Mobile DNA
Units of DNA which are predisposed to move to another location, sometimes
involving replication of the unit, with the help of products of genes on the
elements or on related element


Move within genomes
Most of moderately repeated DNA sequences found throughout higher
eukaryotic genomes
• L1 LINE is ~5% of human DNA (~50,000 copies)
• Alu is ~5% of human DNA (>500,000 copies)


Some encode enzymes that catalyze movement
2 types:
a. Transposon
b. Retrotransposon
Transposon



Chromosomal loci capable of being transposed from one spot to
another within and among the chromosomes of a complement gene
Movement of mobile DNA
Involves copying of mobile DNA element and insertion into new site
in genome
Why?


Molecular parasite: “selfish DNA”
Probably have significant effect on evolution by facilitating gene
duplication, which provides the fuel for evolution, and exon shuffling
Retrotransposon (retroelement)





Transposon like segment of DNA
Retroviruses lacking the sequence encoding the structural envelope protein
Major component of plant genome
Size ranges from 1 to 13 kb in length
Widely distributed over the chromosomes of many plant species gene
Retrovirus
A virus of higher organism whose genome is rna, but which can
insert a dna copy its genome into host chromosome
The Repetitive DNA Content of Genomes
Tandemly repeated DNA

Microsatellite
• Unit size: at most 5 bp
• ATATATATATATATATATATATAT

Minisatellite
• Unit size: up to 25 bp
• ATTGCTGTATTGCTGTATTGCTGT
Interspersed Genome-wide Repeats

Retrotransposon
•
•
•
•
•
•

RNA intermediate
Retrovirus,
LTR retrotransposon (Long terminal repeat),
Non-LTR retrotransposon,
LINEs (Long interspersed element)
SINEs (Short interspersed element)
Transposon
• DNA intermediate
Classification of
transposable elements
Retrotransposon
Structural feature of
transposable elements
SINEs
AAAAA
(80-630 bp)
LINEs
Orf1
pol
AAAAA
(6-8 kbp)
Retrotransposon
(4-8 kbp)
Endogenous
Retrovirus
(4-8 kbp)
LTR
LTR
gag
gag
pol
pol
LTR
env
LTR
Life cycle of LTR retrotransposons
IN, integrase; PR, protease; RT, reverse transcriptase; VLP, virus-like particle
Eukaryotic cells
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
Mitochondrial genome (mtDNA)








Number of mitochondria in plants can be between 50-2000
One mitochondria consists of 1 – 100 genomes (multiple identical
circular chromosomes. They are one large and several smaller
Size ~15 Kb in animals
Size ~ 200 kb to 2,500 kb in plants
Mt DNA is replicated before or during mitosis
Transcription of mtDNA yielded an mRNA which did not contain the
correct information for the protein to be synthesized. RNA editing is
existed in plant mitochondria
Over 95% of mitochondrial proteins are encoded in the nuclear
genome.
Often A+T rich genomes
Chloroplast genome (ctDNA)






Multiple circular molecules, similar to procaryotic cyanobacteria,
although much smaller (0.001-0.1%of the size of nuclear genomes)
Cells contain many copies of plastids and each plastid contains
many genome copies
Size ranges from 120 kb to 160 kb
Plastid genome has changed very little during evolution. Though
two plants are very distantly related, their genomes are rather
similar in gene composition and arrangement
Some of plastid genomes contain introns
Many chloroplast proteins are encoded in the nucleus (separate
signal sequence)
“Cellular” Genomes
Viruses
Procaryotes
Eucaryotes
Nucleus
Capsid
Plasmids
Viral genome
Bacterial
chromosome
Chromosomes
(Nuclear genome)
Mitochondrial
genome
Chloroplast
genome
Genome: all of an organism’s genes plus intergenic DNA
Intergenic DNA = DNA between genes
Estimated genome sizes
mammals
plants
fungi
bacteria (>100)
mitochondria (~ 100)
viruses (1024)
1e1
1e2 1e3
1e4 1e5
1e6
1e7 1e8
1e9 1e10 1e11 1e12
Size in nucleotides. Number in ( ) = completely sequenced
What Did These Individuals
Contribute to Molecular Genetics?


Anton van Leeuwenhoek
Discovered cells
•
•
•
Bacteria
Protists
Red blood
What Did These Individuals
Contribute to Molecular Genetics?


Gregor Johan Mendel
Discovered genetics
What Did These Individuals
Contribute to Molecular Genetics?


Walter Sutton
Discovered Chromosomes
What Did These Individuals
Contribute to Molecular Genetics?


Thomas Hunt Morgan
Discovered how genes are
transmitted through
chromosomes
What Did These Individuals
Contribute to Molecular Genetics?


Rosalind Elsie Franklin
Research led to the discovery
of the double helix structure
of DNA
What Did These Individuals
Contribute to Molecular Genetics?


James Watson and Francis
Crick
Discovered DNA
DNA’s History
1866
1900
1944
Gregore Mendel
Law of Heredity
Carl Correns, Hugo de Vries& Mendelian Law re-invention
Eric von Tschermak
Avery, Macleod & McCarty
Gene consists of DNA
1952
Hersey dan Chase
DNA as genetic matarials
1953
1971
Watson & Crick
Cohen & Boyer
Double helix DNA
Transformation Technology
1972
Berg
DNA Recombinant Technology
1973
Arber, smith & Nathans
Restriction Enzyme
Gene
 The hereditary determinant of a specified difference between
individual
 The unit of heredity
 The unit which passed from generation to generation following
simple Mendelian inheritance
 A segment of DNA which encodes protein synthesis
 Any of the units occurring at specific points on the chromosomes,
by which hereditary characters are transmitted and determined,
and each is regarded as a particular state of organization of the
chromatin in the chromosome, consisting primarily DNA and
protein
Gene classification
Chromosome
(simplified)
coding genes
intergenic
region non-coding
genes
Messenger RNA
Structural RNA
Proteins
transfer
RNA
Structural proteins
Enzymes
ribosomal
RNA
other
RNA
Gene
Molecular definition:
DNA sequence encoding protein
What are the problems with this definition?
Gene
Some genomes are RNA instead of DNA
Some gene products are RNA (t-RNA, r-RNA, and
others) instead of protein
Some nucleic acid sequences that do not encode gene
products (non-coding regions) are necessary for
production of the gene product (RNA or protein)
Coding region
Nucleotides (open reading frame) encoding the amino acid
sequence of a protein
The molecular definition of gene includes more than just the
coding region
Noncoding regions of eukaryotic
gene

Regulatory regions
• RNA polymerase binding site
• Transcription factor binding sites


Introns
Polyadenylation [poly(A)] sites
Gene
Molecular definition:
Entire nucleic acid sequence necessary for the synthesis of a
functional polypeptide (protein chain) or functional RNA
Bacterial genes


Most do not have introns
Many are organized in operons: contiguous genes,
transcribed as a single polycistronic mRNA, that
encode proteins with related functions
Polycistronic mRNA encodes several proteins
Bacterial operon
Eukaryotic genes
Most have introns
Produce mono-cistronic mRNA
(only one encoded protein)
 Large


Eucaryotic genes
Hemoglobin beta subunit gene
Exon 1
90 bp
Intron A
131 bp
Exon 2
222 bp
Intron B
851 bp
Exon 3
126 bp
Splicing
Introns : intervening sequences within a gene that are not translated into a
protein sequence.
Exons : sequences within a gene that encode protein sequences
Splicing : Removal of introns from the mRNA molecule.
Alternative splicing
mRNA from some genes can be spliced into two or more
different mRNAs
Number of genes in eukaryotes
Species
S. cerevisiae (yeast)
D. melanogaster (fruit fly)
C. elegans (roundworm)
A. thaliana (mustard weed)
H. sapiens (human)
P. troglodytes (chimpanzee)
M. musculus (mouse)
R. norvegicus (rat)
C. familiaris (dog)
# genes
~6700
~14400
~20600
~25000
~24000
~22500
~27000
~23400
~20400
Top Ten Terms in Molecular Genetics
1.
Amino acids: the 20 building blocks of proteins, each coded for by a
specific 3 base-pair codon.
2.
Allele: one of the two copies of a specific gene
3.
Polymorphism: a gene that varies between individual members of
the population in more than 1% of the population. Most commonly,
these are single nucleotide variations (SNPs).
4.
Transcription: the synthesis of an RNA copy from a sequence of
DNA; the first step in gene expression.
5.
Translation: the synthesis of proteins from mRNA and amino acids
Top Ten Terms in Molecular Genetics
1. Gene: specific sequence of nucleotide bases that carries information
for constructing proteins; exons are the regions that actually encode
for the protein
2. Chromosome: physically separate microscopic units of DNA that
comprise the genome
3. Genetics: the study of the patterns of inheritance of specific traits
4. Genomics: the study of an organism’s entire complement of genetic
material and its function
5. Proteomics: the study of an organism’s entire protein material, its
structure and function