Download Genome - Faperta UGM

Document related concepts

DNA supercoil wikipedia , lookup

Public health genomics wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression programming wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

DNA vaccination wikipedia , lookup

NUMT wikipedia , lookup

Polyploid wikipedia , lookup

Epigenomics wikipedia , lookup

Molecular cloning wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Ridge (biology) wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Oncogenomics wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transposable element wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Point mutation wikipedia , lookup

Metagenomics wikipedia , lookup

Pathogenomics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Primary transcript wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Human genome wikipedia , lookup

Genomic library wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Non-coding DNA wikipedia , lookup

Minimal genome wikipedia , lookup

Genomics wikipedia , lookup

Genome editing wikipedia , lookup

Microevolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Genome

Complete set of instructions for making an organism
• master blueprints for all enzymes, cellular structures &
activities




an organism‘s complete set of DNA
The total genetic information carried by a single set of
chromosomes in a haploid nucleus
Located in every nucleus of trillions of cells
Consists of tightly coiled threads of DNA organized into
chromosomes
Viral genomes
Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear or circular
Viruses with RNA genomes:
• Almost all plant viruses and some bacterial and animal viruses
• Genomes are rather small (a few thousand nucleotides)
Viruses with DNA genomes (e.g. lambda = 48,502 bp):
• Often a circular genome.
Replicative form of viral genomes
• all ssRNA viruses produce dsRNA molecules
• many linear DNA molecules become circular
Molecular weight and contour length:
• duplex length per nucleotide = 3.4 Å
• Mol. Weight per base pair = ~ 660
Bacterial genomes: E. coli



4288 protein coding genes:
• Average ORF 317 amino acids
• Very compact: average distance
between genes 118bp
Numerous paralogous gene families:
38 – 45% of genes arisen through
duplication
Homologues:
• H. influenzae (1130 of 1703)
• Synechocystis (675 of 3168)
• M. jannaschii (231 of 1738)
• S. cerevisiae (254 of 5885)
Procaryotic genomes





Generally 1 circular chromosome (dsDNA)
Usually without introns
Relatively high gene density (~2500 genes per
mm of E. coli DNA)
Contour length of E.coli genome: 1.7 mm
Often indigenous plasmids are present
Easy problem
Bacterial Gene-finding






Dense Genomes
Short intergenic regions
Uninterrupted ORFs
Conserved signals
Abundant comparative information
Complete Genomes
Genomes
Gene Content
E. coli
4000 genes X 1 kbp/gene=4 Mbp
Genome=4 Mbp!
Plasmids
Extra chromosomal circular DNAs










-lactamase
ori
Found in bacteria, yeast and other fungi
foreign gene
Size varies form ~ 3,000 bp to 100,000 bp.
Replicate autonomously (origin of replication)
May contain resistance genes
May be transferred from one bacterium to another
May be transferred across kingdoms
Multipcopy plasmids (~ up to 400 plasmids/per cell)
Low copy plasmids (1 –2 copies per cell)
Plasmids may be incompatible with each other
Are used as vectors that could carry a foreign gene of interest (e.g.
insulin)
Agrobacterium tumefaciens

Characteristics
• Plant parasite that causes Crown Gall Disease
• Encodes a large (~250kbp) plasmid called Tumorinducing (Ti) plasmid

Portion of the Ti plasmid is transferred between bacterial
cells and plant cells  T-DNA (Tumor DNA)
Agrobacterium tumefaciens
T-DNA integrates stably into plant genome
Single stranded T-DNA fragment is converted to
dsDNA fragment by plant cell
Then integrated into plant genome
 2 x 23bp direct repeats play an important role in the
excision and integration process

Agrobacterium tumefaciens



Tumor formation = hyperplasia
Hormone imbalance
Caused by A. tumefaciens
• Lives in intercellular spaces of the plant
• Plasmid contains genes responsible for the disease
 Part of plasmid is inserted into plant DNA
 Wound = entry point  10-14 days later, tumor
forms
Agrobacterium tumefaciens

What is naturally encoded in T-DNA?
• Enzymes for auxin and cytokinin synthesis


Causing hormone imbalance  tumor formation/undifferentiated
callus
Mutants in enzymes have been characterized
• Opine synthesis genes (e.g. octopine or nopaline)


Carbon and nitrogen source for A. tumefaciens growth
Insertion genes
• Virulence (vir) genes
• Allow excision and integration into plant genome
Ti plasmid of A. tumefaciens
1. Auxin, cytokinin,
opine synthetic genes
transferred to plant
2. Plant makes all 3
compounds
3. Auxins and cytokines
cause gall formation
4. Opines provide unique
carbon/nitrogen
source only A.
tumefaciens can use!
Fungal genomes: S. cerevisiae




First completely sequenced
eukaryote genome
Very compact genome:
• Short intergenic regions
• Scarcity of introns
• Lack of repetitive sequences
Strong evidence of duplication:
• Chromosome segments
• Single genes
Redundancy: non-essential genes
provide selective advantage
Eucaryotic genomes
Located on several chromosomes
Relatively low gene density (50 genes per mm of
DNA in humans)
Contour length of DNA
Carry organellar genome as well
Human Genomes
Human
50,000 genes X 2 kbp=100 Mbp
Introns=300 Mbp?
Regulatory regions=300 Mbp?
•Only 5-10% of human genome codes for genes
- function of other DNA (mostly repetitive sequences) unknown
but it might serve structural or regulatory roles
2300 Mbp=???
Plant genomes








It contains three genomes
The size of genomes is given in base pairs (bp)
The size of genomes is species dependent
The difference in the size of genome is mainly due to a
different number of identical sequence of various size
arranged in sequence
The gene for ribosomal RNAs occur as repetitive sequence
and together with the genes for some transfer RNAs in
several thousand of copies
Structural genes are present in only a few copies, sometimes
just single copy. Structural genes encoding for structurally
and functionally related proteins often form a gene family
Genetic information is divided in the chromosome
The DNA in the genome is replicated during the interphase of
mitosis
Size of the genome in plants and in
human
Genome
Arabidopsis
thaliana
Zea mays
Vicia faba
Human
Nucleus
70 Millions
3900 Millions
14500 Millions
2800 Millions
Plastid
0.156 Millions
0.136 Millions
0.120 Millions
Mitochondrion
0.370 Millions
.570 Millions
.290 Millions
.017 Millions
Plant genomes: Arabidopsis thaliana








A weed growing at the roadside of
central Europe
It has only 2 x 5 chromosomes
It is just 70 Mbp
It has a life cycle of only 6 weeks
A model plant for the investigation of
plant function
Contains 25,498 structural genes from
11,000 families
The structural genes are present in only
few copies sometimes just one protein
Structural genes encoding for structurally
and functionally related proteins often
form a gene family
Plant genomes: Arabidopsis thaliana



Cross-phylum matches:
• Vertebrates 12%
• Bacteria / Archaea 10%
• Fungi 8%
60% have no match in non-plant
databases
Evolution involved whole genome
duplication followed by
subsequent gene loss and
extensive local gene duplications
Complex
Genome DNA

~10% highly repetitive (300 Mbp)
• NOT GENES

~25% moderate repetitive (750 Mbp)
• Some genes


~25% exons and introns (800 Mbp)
40%=?
• Regulatory regions
• Intergenic regions
Genome organization
“Nonfunctional” DNA
80 kb


Higher eukaryotes have a lot of noncoding DNA
Some has no known structural or regulatory function (no genes)
Duplicated genes



Encode closely related (homologous) proteins
Clustered together in genome
Formed by duplication of an ancestral gene followed by
mutation
Five functional genes and two pseudogenes
Pseudogenes



Nonfunctional copies of genes
Formed by duplication of ancestral gene, or
reverse transcription (and integration)
Not expressed due to mutations that produce a
stop codon (nonsense or frameshift) or prevent
mRNA processing, or due to lack of regulatory
sequences
Repetitive DNA

Moderately repeated DNA
• Tandemly repeated rRNA, tRNA and histone genes (gene
products needed in high amounts)
• Large duplicated gene families
• Mobile DNA

Simple-sequence DNA
• Tandemly repeated short sequences
• Found in centromeres and telomeres (and others)
• Used in DNA fingerprinting to identify individuals
Mobile DNA


Move within genomes
Most of moderately repeated DNA sequences
found throughout higher eukaryotic genomes
• L1 LINE is ~5% of human DNA (~50,000 copies)
• Alu is ~5% of human DNA (>500,000 copies)

Some encode enzymes that catalyze
movement
Transposition


Movement of mobile DNA
Involves copying of mobile DNA element and
insertion into new site in genome
Why?


Molecular parasite: “selfish DNA”
Probably have significant effect on evolution by
facilitating gene duplication, which provides
the fuel for evolution, and exon shuffling
Mitochondrial genome (mtDNA)








Number of mitochondria in plants can be between 502000
One mitochondria consists of 1 – 100 genomes (multiple
identical circular chromosomes. They are one large and
several smaller
Size ~15 Kb in animals
Size ~ 200 kb to 2,500 kb in plants
Mt DNA is replicated before or during mitosis
Transcription of mtDNA yielded an mRNA which did not
contain the correct information for the protein to be
synthesized. RNA editing is existed in plant
mitochondria
Over 95% of mitochondrial proteins are encoded in the
nuclear genome.
Often A+T rich genomes
Chloroplast genome (ctDNA)






Multiple circular molecules, similar to procaryotic
cyanobacteria, although much smaller (0.001-0.1%of the size
of nuclear genomes)
Cells contain many copies of plastids and each plastid contains
many genome copies
Size ranges from 120 kb to 160 kb
Plastid genome has changed very little during evolution.
Though two plants are very distantly related, their genomes
are rather similar in gene composition and arrangement
Some of plastid genomes contain introns
Many chloroplast proteins are encoded in the nucleus (separate
signal sequence)
“Cellular” Genomes
Viruses
Procaryotes
Eucaryotes
Nucleus
Capsid
Plasmids
Viral genome
Bacterial
chromosome
Chromosomes
(Nuclear genome)
Mitochondrial
genome
Chloroplast
genome
Genome: all of an organism’s genes plus intergenic DNA
Intergenic DNA = DNA between genes
Estimated genome sizes
mammals
plants
fungi
bacteria (>100)
mitochondria (~ 100)
viruses (1024)
1e1
1e2 1e3
1e4 1e5
1e6
1e7 1e8
1e9 1e10 1e11 1e12
Size in nucleotides. Number in ( ) = completely sequenced genomes
What Did These Individuals
Contribute to Molecular Genetics?


Anton van Leeuwenhoek
Discovered cells
• Bacteria
• Protists
• Red blood
What Did These Individuals
Contribute to Molecular Genetics?


Gregor Johan Mendel
Discovered genetics
What Did These Individuals
Contribute to Molecular Genetics?


Walter Sutton
Discovered
Chromosomes
What Did These Individuals
Contribute to Molecular Genetics?


Thomas Hunt Morgan
Discovered how genes
are transmitted through
chromosomes
What Did These Individuals
Contribute to Molecular Genetics?


Rosalind Elsie Franklin
Research led to the
discovery of the double
helix structure of DNA
What Did These Individuals
Contribute to Molecular Genetics?


James Watson and
Francis Crick
Discovered DNA
DNA’s History
1866
Gregore Mendel
Law of Heredity
1900
Carl Correns, Hugo de
Vries& Eric von
Tschermak
1944
Avery, Macleod & McCarty Gene consists of DNA
1952
Hersey dan Chase
DNA as genetic matarials
1953
Watson & Crick
Double helix DNA
1971
Cohen & Boyer
Transformation Technology
1972
Berg
DNA Recombinant Technology
1973
Arber, smith & Nathans
Restriction Enzyme
Mendelian Law re-invention
Chromosome parts

Chromatid

Centromere

Telomeres
• sister strands after
replication
• still joined at centromere
• ~ “middle” of Chromosomes
• spindle attachment sites
• ends of chrm
• important for the stability
of chromosomes tips.
Chromosomal Regions
Heterochromatin
compact;
few genes;
largely structural role
Euchromatin
contains most of the genes.
Chromosome
Gene
 The hereditary determinant of a specified difference
between individual
 The unit of heredity
 The unit which passed from generation to generation
following simple Mendelian inheritance
 A segment of DNA which encodes protein synthesis
 Any of the units occurring at specific points on the
chromosomes, by which hereditary characters are
transmitted and determined, and each is regarded as a
particular state of organization of the chromatin in the
chromosome, consisting primarily DNA and protein
Gene classification
coding genes
Chromosome
(simplified)
intergenic
non-coding
region
genes
Messenger RNA
Structural RNA
Proteins
transfer
RNA
Structural proteins
Enzymes
ribosomal
RNA
other
RNA
Gene
Molecular definition:
DNA sequence encoding protein
What are the problems with this
definition?
Gene
Some genomes are RNA instead of DNA
Some gene products are RNA (tRNA, rRNA,
and others) instead of protein
Some nucleic acid sequences that do not
encode gene products (noncoding regions)
are necessary for production of the gene
product (RNA or protein)
Coding region
Nucleotides (open reading frame) encoding the
amino acid sequence of a protein
The molecular definition of gene includes more
than just the coding region
Noncoding regions

Regulatory regions
• RNA polymerase binding site
• Transcription factor binding sites


Introns
Polyadenylation [poly(A)] sites
Gene
Molecular definition:
Entire nucleic acid sequence necessary for the
synthesis of a functional polypeptide (protein
chain) or functional RNA
Bacterial genes


Most do not have introns
Many are organized in operons: contiguous
genes, transcribed as a single polycistronic
mRNA, that encode proteins with related
functions
Polycistronic mRNA encodes several proteins
Bacterial operon
What would be the effect of a mutation in
the control region (a) compared to a
mutation in a structural gene (b)?
Eukaryotic genes



Most have introns
Produce monocistronic mRNA: only one
encoded protein
Large
Eucaryotic genes
Hemoglobin beta subunit gene
Exon 1 Intron A
90 bp 131 bp
Exon 2
222 bp
Intron B
851 bp
Exon 3
126 bp
Splicing
Introns: intervening sequences within a gene that are not translated
into a protein sequence. Collagen has 50 introns.
Exons: sequences within a gene that encode protein sequences
Splicing: Removal of introns from the mRNA molecule.
Alternative splicing


Splicing is the removal of introns
mRNA from some genes can be spliced
into two or more different mRNAs