Download 1. What is a gene?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Plasmid wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Community fingerprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA-Seq wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Genomic library wikipedia , lookup

Gene expression profiling wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
GENES AND GENOMES
This document is licensed under the
Attribution-NonCommercial-ShareAlike 2.5 Italy license,
available at
http://creativecommons.org/licenses/by-nc-sa/2.5/it/
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
1. What is a gene?


Definition: A gene is a discrete unit of DNA (or RNA in some
viruses) that encodes a nucleic acid or protein product that contributes
to or influences the phenotype of the cell or the organism.
Genes are the functional units of chromosomal DNA. Each gene not
only encodes the structure of some cellular product, but also bears
control elements (short sequences) that determine when, where, and
how much of that product is synthesized. Most genes encode protein
products; special classes of genes encode for RNA molecules.

The way genes encode proteins is indirect and involves several steps. The first
step is to copy (transcribe) the information encoded in the DNA of the gene as
a related but single-stranded molecule called messenger RNA. Subsequently
the information in the messenger RNA is translated (decoded) into a string of
amino acids called a polypeptide. The polypeptides, on their own or by
aggregating with other polypeptides and cell constituents, form the functional
proteins of the cell.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
2. Introns and exons


Trying to pinpoint precisely what genes are is complicated by the fact
that many eukaryotic genes contain mysterious segments of DNA,
called introns, interspersed in the transcribed region of the gene.
Introns do not contain information for functional gene product such as
protein. They are transcribed together with the coding regions
(called exons) but are then excised from the initial transcript.
Since correct sequence in the introns (as well as in the regulatory
region) is necessary in order to generate a properly sized transcript at
the right time and place, introns (along with coding and regulatory
regions) should be considered part of the overall functional unit, in
other words, part of the gene
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
4. Schematic gene structure
Generalized gene structure
in prokaryotes and
eukaryotes. The coding
region (dark green) is the
region that contains the
information for the
structure of the gene
product (usually a protein).
The adjacent regulatory
regions (lime green)
contain sequences that are
recognized and bound by
proteins that make the
gene's RNA and by
proteins that influence the
amount of RNA made.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
3. The average lenght of coding regions
Organism
Vibrio cholerae (bacterium)
Saccharomyces cerevisiae (yeast)
Drosophila melanogaster (fruit fly)
Cenorhabditis elegans (nematode)
Arabidopsis thaliana (weed)
Homo sapiens
Average length of
gene product (aa)
304
477
492
436
435
497
Estimates of the average length of polypeptide chains
coded by genes of various organisms; these value have to
be multiplied by 3 in order to obtaing the lenght of the
corresponding coding DNA. Tipical values are 1,000 to
1,500 bp.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
5. Number of introns-exons per gene
Distribution of the number of exons among genes of
three organisms
Many eukaryotic
genes contain
mysterious segments
of DNA, called
introns, interspersed
in the region of the
gene. Introns do not
contain information
for functional gene
product such as
protein.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
6. Genomes and genes
Genome
Group
Size (kb)
Number of genes
13,500
6,000
Eukaryotic nucleus
Saccharomyces cerevisiae
Yeast
Caenorhabditis elegans
Nematode
100,000
13,500
Arabidopsis thaliana
Plant
120,000
25,000
Homo sapiens
Human
3,000,000
100,000
Prokaryote
Escherichia coli
Bacterium
4,700
4,000
Hemophilus influenzae
Bacterium
1,830
1,703
Methanococcus jannaschii
Bacterium
1,660
1,738
Viruses
T4
Bacterial virus
172
300
HCMV (herpes group)
Human virus
229
200
Eukaryotic organelles
S. cerevisiae mitochondria
Yeast
78
34
H. sapiens mitochondria
Human
17
37
121
136
Marchantia polymorpha
chloroplast
Liverwort
The number of genes
increases with
genome size, but the
trend is complicated
due to repetitive
DNA and introns.
Counting genes is
difficult, even in
completely sequenced
genomes
The figure of 100,000
for human is
substantially inflated
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
7. How many genes in the human genome?




Prior to the human genome sequence, the most commonly cited
estimate for the number of protein-coding genes in the human genome
was 100,000, even though the basis of this figure was somewhat
dubious to begin with.
In 2001, when the draft sequences of the human genome were
announced, the estimates were lowered somewhere between 30,000
and 35,000
The completed sequence, published in 2004, provided an even lower
estimate of 20,000 to 25,000 genes
A more recent estimate, based on comparing all the human genes with
those cataloged for dog and mouse, has even decreased the number of
genes below 20,000.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
8. How to count genes



Gene-prediction programs rely heavily on identifying open reading
frames.
However, sequences that have a biological function but don't produce
a protein have been found in large quantity, and several thousand of
“genes” that don't code for proteins have been reported.
Thus an open reading frame "is not enough" to identify a gene, and an
integrated catalog of protein-coding genes should be based more on
comparative evidence from different genomes.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
9. Average gene length
Intron/exon statistics for various organisms
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
10. Plasmid genomes
Bacterial cells isolated from nature often contain small DNA elements that are not
essential for the basic operation of the bacterial cell. These elements are called
plasmids. Plasmids are symbiotic molecules that cannot survive at all outside of cells.
Even though plasmids are not part of the basic operational system of their host cells,
some are quite complex, carrying many genes, so it is quite appropriate to refer to their
distinctive DNA as a "plasmid genome." Bacterial plasmids often contain genes that are
extremely useful to the bacterial host, for example, by promoting bacterial cell fusion,
conferring antibiotic resistance, or producing toxins.
Plasmids also are occasionally found in fungal and plant cells. Most are found inside
mitochondria and chloroplasts, but some are found in nuclei or in the cytosol. Unlike
the bacterial plasmids mentioned above, these eukaryotic plasmids seem to provide no
benefits for their hoststhey seem to exist selfishly, only for the purpose of their own
propagation.
For their replication and maintenance, plasmids depend on the general cellular
machinery encoded by the host genome. Bacterial plasmids are most often circular, but
there are linear types too. In fungi and plants, linear plasmids are most common, but
circular types are known in fungi.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
11. Organellar genomes




Mitochondrial and chloroplast chromosomes consist of double-stranded DNA
molecules. Individual mitochondria and chloroplasts contain identical multiple
copies of their chromosomes, and each eukaryotic cell contains several to many of
these organelles.
The organelle chromosomes contain genes specific to the functions of the organelle
concerned. Nevertheless, most of the biological functions that occur inside these
organelles are specified by genes in the nuclear genome. There is no overlap with
the nuclear genome in gene content.
Mitochondria and chloroplasts probably were originally prokaryotic cells that
entered and took up a symbiotic relationship inside another cell. Throughout
evolution most of the original prokaryotic genes were transferred to the nuclear
genome or lost.
Mitochondrial genomes can be eliminated in some organisms such as yeasts, but
most organisms cannot survive without them, so there is still mutual
interdependence between nuclear and organelle subdivisions of the genome.
Chloroplasts can be eliminated only in photosynthetic organisms that can survive by
taking in preformed nutrients from the environment (that is, that can act as
Genetica per Scienze Naturali
heterotrophs).
a.a. 08-09 prof S. Presciuttini
12. Most eukaryotic DNA does not include genes




Between genes there is DNA, mostly of unknown function. The size
and nature of this DNA vary with the genome.
In bacteria and fungi there is little, but in mammals the intergenic
regions can be huge.
Sequences of DNA that exist quite distant from a given gene can
affect the regulation of that gene. They could thus be considered
part of the functional gene unit, even though separated by long
segments of DNA having nothing to do with the gene in question.
In many eukaryotes some of the DNA between genes is repetitive,
consisting of several different types of units repeated throughout the
genome. Some of the repetitive DNA is dispersed; some is found in
contiguous "tandem" arrays. Repetitive DNA is also found in some
introns. The extent of this DNA is different in different species, and
indeed there is variation of repeat number within species.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
13. Comparing gene densities
Schematic diagram of gene topography in four organisms.
Light green = introns; dark green = exons; white = intergenic regions
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
14. A small fraction of total eukaryotic DNA is coding
In mammals, only a few percent of the DNA is actualy coding:
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
15. Different components of the human genome


Although most
prokaryotic
chromosomes consist
almost entirely of
protein-coding genes,
such elements make up
a small fraction of most
eukaryotic genomes.
As a prime example, the
human genome might
contain as few as 20,000
genes, comprising less
than 1.5% of the total
genome sequence
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini



16. Junk DNA?
Introns account for more than a quarter of the human genome.
Pseudogenes are non-functional copies of coding genes. They include
'classical pseudogenes' (direct DNA to DNA duplicates), 'processed
pseudogenes' (copies that are reverse transcribed back into the genome
from RNA and therefore lack introns) and 'Numts' (nuclear
pseudogenes of mitochondrial origin). The human genome is estimated
to contain about 19,000 pseudogenes.
Transposable elements are divided into Class I elements, which
transpose through an RNA intermediate (long interspersed nuclear
elements - LINEs, endogenous retroviruses, short interspersed nuclear
elements - SINEs and long terminal repeat – LTR – retrotransposons)
and Class II elements, which transpose directly from DNA to DNA
(DNA transposons and miniature inverted repeat transposable elements
(MITEs).
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
17. Coding sequences are needles in the haystack


It is apparent that the coding sequences are only a small part of the
genome in most eukaryotes, particularly in human. Finding these
regions is like finding a needle in the haystack.
In addition, the genes are not uniformly distributed. There are regions
in the genome where the genes are packed together, and regions
where they are sparse, where finding genes is like finding water in a
desert.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
18. Categorizing the genes in eukaryotic genomes

Classification schemes based on gene function suggest that all eukaryotes possess
the same basic set of genes, but that more complex species have a greater number of
genes in each category. For example, humans have the greatest number of genes in
all but one of the categories used in the figure, the exception being ‘metabolism'
where Arabidopsis comes out on top as a result of its photosynthetic capability,
which requires a large set of genes not present in the other four genomes included in
this comparison.
This functional classification
reveals other interesting
features, notably that C.
elegans has a relatively high
number of genes whose
functions are involved in cellcell signaling, which is
surprising given that this
organism has just 959 cells.
Humans, who have 1013 cells,
have only 250 more genes for
cell-cell signaling.
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini
19. Overview of the human genome












Genome size is approximately 3,200 Mb
Gene number is approximately 20,000
Average gene density is 1 per 100 kb (5% of DNA encodes proteins); some areas
are gene rich, others are gene deserts (0 to 64 genes per 100 kb)
Average gene size (including introns) is 27 kb; gene regions account for about 25%
of genome
Average polypeptide size is 1.3 kb
Fraction of genome with coding functions is about 1.5%
At least 50% of genome made of transposable elements (e.g. LINES and Alus)
Intron number ranges from 0 (in histones) to 234 (titin , a muscle protein).
Hundreds of genes appear to have been transferred directly from bacteria to
vertebrate genomes. Mechanism unknown.
Functions have been assigned to 60% of genes.
Largest human gene is dystrophin (mutated in muscular dystrophy): 2.5 Mb (larger
than some bacterial genomes)
1077 blocks of duplicated regions in human genome (contain 10,000 genes):
suggests genome rearrangements common in evolution
Genetica per Scienze Naturali
a.a. 08-09 prof S. Presciuttini