Download Slide 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nutriepigenomics wikipedia , lookup

DNA barcoding wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

X-inactivation wikipedia , lookup

Frameshift mutation wikipedia , lookup

Ploidy wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genetic code wikipedia , lookup

NUMT wikipedia , lookup

Transposable element wikipedia , lookup

DNA vaccination wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

RNA-Seq wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Epigenomics wikipedia , lookup

Mutagen wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Molecular cloning wikipedia , lookup

Oncogenomics wikipedia , lookup

Karyotype wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

DNA supercoil wikipedia , lookup

Minimal genome wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Mutation wikipedia , lookup

Genomic library wikipedia , lookup

Genetic engineering wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Primary transcript wikipedia , lookup

Human genome wikipedia , lookup

Genomics wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Chromosome wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Polyploid wikipedia , lookup

Gene wikipedia , lookup

Microsatellite wikipedia , lookup

Genome editing wikipedia , lookup

Point mutation wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Non-coding DNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Slide 1
Link http://www.blackwellpublishing.com/ridley/images/h_erato.jpg
http://www.nature.com/hdy/journal/v94/n1/thumbs/6800561f4th.gif
Slide 2
Aim of the course
The concept of diversity is easy to understand. We all know that all biological forms
present variation at many different levels of organization: there are different types of
cells within one organism, individuals of the same species present different
appearances (or phenotypes), and the diversity of a community or ecosystem is given
by the number of species inhabiting it.
In the present work we will focus on the study of the principles and processes that
mold variation at the species level. All observed external characters, or external
appearance of organisms, are known as “phenotype”. These phenotypes are, in turn,
governed by the interaction of the genetic material with the environment at a major or
lesser extent. The genetic material of genetic composition is known as “genotype”.
In the first part of this course, we will present the organization of the genetic
information.
Then, we will present methodological approaches to study variation at the DNA level,
and, finally,
the application of molecular markers in the study and inference of ecological,
demographical and evolutionary processes.
Slide 3
The idea of transmission of characters or traits from one generation to the next has
always been intuitively present in our minds, way before the principles and material
of inheritance were disclosed. These general and vague understanding of the first
settlers allowed the domestication of pants and animals from the Neolithic times.
Links
Ships: http://www.fhw.gr/chronos/01/en/gallery/intro/animals/animals1.html
Maize: http://www.athenapub.com/nwdom1.htm
http://www.wsu.edu/gened/learn-modules/top_agrev/6Domestication/domestication4.html
Slide 4
However, it wasn’t till the middle of the 19th century that the rules of inheritance were
discovered. Gregor Mendel, an Austrian monk and a teacher at a local high school,
who studied botany and experimented with pea crosses. He selected 22 varieties of
1
peas experimented with crosses for 8 years and published his seminal work in 1866.
However, his work remained “unattended” by the scientific community for the next
35 years, as major attention was drawn to a book published a few years before: The
Origin of Species, by Charles Darwin.
Slide 5
Mendel experimented crossing pea plants with these characters: tall or short, purple or
white flowers, smooth or wrinkled peas, round or shriveled peas, green or yellow
peas. After crossing purple-flowered plants with white-flowered plants he crossed
pollinated the offspring (F1) . Then, he discovered that three-quarters of the offspring
(F2) were purple-flowered when they bloomed, and one-quarter white. Then, there
were characters that were “masked” in the first generation and passed onto the 2nd
generation, what he called “recessive”, whereas he called “dominant” the characters
that “overshadowed” them. The laws of inheritance of Mendel, can be summarized as;
Law of independent assortment: characters (factors) segregate independently. It
means, the character “tall” or “short” is independent from “purple” or “white”
flowers.
Law of independent segregation: characters occur in alternative forms (today we call
them alleles). They occur in pairs within individuals, and they are inherited from each
parent. These pairs separate (or segregate) during gametes production in the parents,
and recombine in later on in reproduction. Each parent contributes with one allele to
the offspring.
Law of dominance: for each character, one factor is dominant and another recessive
and appears in a ratio approximately 3:1. Combinations of alleles that include the
dominant form will show only that one.
The representation of independent segregation and resulting crosses can be
represented in a way called “Punnet square”. Here, we represent all possible
combination of alleles in the gametes of the parents, and all possible results of the
combination between their gametes. The proportion of offspring with a given
genotype, then, can be predicted in terms of probabilities. Individuals carrying two
different variants of a character, 2 different alleles, are called Hetetozygotes, whereas
individuals who present only one variant, duplicated, are called Homozygotes.. This is
a fundamental concept we will resume later on. As a general rule, we denote dominant
alleles in capital letters
Now, we will see how material of inheritance is organized, and how alleles are
segregated in the gerlime.
Links: http://history.nih.gov/exhibits/nirenberg/images/photos/01_mendel_pu.jpg
www.godrules.net/evolutioncruncher/a10a.htm
http://www.laskerfoundation.org/rprimers/gnn/timeline/1866.html
http://www.emunix.emich.edu/~rwinning/genetics/mendel4.htm
2
Slide 6
Nowadays we know that the inheritance material is carried in DNA (deoxyribonucleic
acid), molecule that is organized in discrete units called chromosomes. Chromosomes
occur in pairs, each member of the pair is inherited from each parent. The process of
Meiosis is fundamental to understand how characters are segregated. Every cell of the
organism has 2 pairs of each chromosome. However, to pass on the information to the
next generation, the information has to be “halved”, as the other half has to be
provided by the other parent. This process of reduction of the genetic information
during the formation of the gametes is called meiosis. In this process, one diploid cell
gives origin to 4 haploid cells.
Prior to the meiotic division, during a period called “Interphase”, the chromosomes
are composed by centromere and 1 chromatide and duplicate their genetic information
resulting in two chromatides attached by the centromere. Thus, at the end of interfase,
the cell contains 2N chromosomes with duplicated genetic information.
In the first stage of division, homologous chromosomes pair with each other and
interchange genetic material. The name of this process is called “crossing over”, and,
in other words, chromatides of homologous chromosomes shuffle fragments to arrive
to a new combination of maternal and paternal genetic information. This is the major
advantage of “sex” in evolution, that provides with a mechanism for generating
genetic variation. At the end of the 1st cell division, then, two cells are obtained, each
contains 1 chromosome from the set of homologous, and each chromosome contains
its information in a duplicated fashion. This first division reduces the number of
chromosomes in the cells.
The 2nd cell division separates the sister chromatides of the chromosomes and results
in 4 haploid cells, carrying only one chromosome of each pair. Ploidy is the number
of chromosome set present in the nucleus of the cells (see also Slide 32 of this
presentation). In this case, the resulting cells are haploid. The meiosis process starts
with a diploid cell.
If we consider the change in content of DNA along the cell division, and we call 2c
the amount of DNA present in Interfase, we see a change to 4c after duplication of the
chromatides. At the end of meiosis 1, the 2 resulting cells contain 2c DNA again.
However, only one pair of each chromosome unlike at the starting point.
The 2nd division halves the amount of genetic information present in these cells,
therefore results in 4 cells with 1c content of genetic information.
Now we can see the link between this process and the Mendel laws. Sorting of
homologous chromosomes into different resulting cells (or gametes) explains the law
of segregation. Also we can see that the independent assortment is related to
characters that are coded for in different chromosomes. That is why peas with purple
or white flowers can have peas that are either smooth or wrinkled.
Link: http://genetics.gsk.com/graphics/meiosis-big.gif
Not used but important: http://www.cellsalive.com/meiosis.htm
3
Slide 7
The process of cellular division in all organisms also implies a process of cell division
with transfer of the genetic information to the 2 resulting “daughter cells”. This
process of cell division followed by cell division is called “cell cycle” and has
common features to the process of meiosis. However, meiosis is a “dead end” and
cells do not enter another cycle of division.
The process by which the chromosomes are copied into an exact copy of themselves
and passed onto the resulting cells in the process of cell division is called mitosis.
This process occurs in all but the germline cells of out bodies. Similarly to meiosis,
the DNA of each chromosome is duplicated. Here, the sister chromatides are sorted
into the resulting cells, but DO NOT INTERCHANGE GENETIC MATERIAL,
and the daughter cells contain exactly the same genetic information as the original cell
that divided into 2. These daughter cells can start now a new cycle of division.
Now that we understand the principles of transmission of inheritance or “characters’
we will see how the genetic information is organized: the concept of genome and
different types of “genomes” carried by eukaryotic organisms.
Link http://genetics.gsk.com/chromosomes.htm
Slide 8
Organization of the genetic information
The complete genetic information possessed by an organism is called GENOME.
Eukaryotic organisms not only contain genetic information in the nucleus of their
cells, where chromosomes are located, but also carry extra nuclear DNA. This is
contained in organoids like mithochondria and chloroplasts.
Link: http://www.lclark.edu/~seavey/plant%20cell72-1.jpg
For genome definition
http://www.stateofthesalmon.org/resource/glossary.asp?let=g
Slide 9
Nuclear genetic information
Nuclear DNA is organized in discrete units called chromosomes, which are visible
during the cell cycle in metaphase when the chromatine contracts and gets condensed
and packed in a scaffold of accompanying protein, the histones.
4
The genetic information is also compartmentalized in these different units. For
example, in mammals and birds, a distinct pair of homologous chromosomes carries
the genetic information for sexual determination. These are called sexual
chromosomes. The rest of the chromosomes are called “somatic”. In mammals,
chromosome X carries the information to determine female, whereas the Ychromosome determines that the individuals that carries it is a male. Mammal females
carry 2 X chromosomes and males one copy of a Y and one copy of an X
chromosome. Then, mammal males are called the “heterogametic sex” because the 2
sexual chromosomes are different, whereas females are the “homogametic sex”. By
contrast to mammals, the heterogametic sex in birds are females.
In this slide, we see the chromosomes of the human species in metaphase mitosis,
when the chromosomes are composed by two sister chromatids, in other words, with
duplicated genetic information. This “set of photographed, banded chromosomes
arranged in order from largest to smallest” is called Karyotype. The human
nuclear genome, then , is composed by a set of 22 somatic chromosomes plus a pair of
sexual chromosomes; XY in the case of males, and XX in the case of females.
The internal structure of this metaphasic chromosomes as seen in the microscope is a
condensed, supercoiled fiber composed by a molecule of DNA and several
accompanying proteins. The DNA molecule is packed around 8 molecules of basic
proteins, the histones H1 H2a H2b and H3. This organization is called chromatin, and
shows different levels of condensation along the cell cycle. The highest level of
condensation is achieved right before cell division.
Following, we are going to see how the molecular structure of DNA.
http://www.vcbio.science.ru.nl/images/cellcycle/mchromatinpackaging_zoom.gif
karyotype : http://www.kumc.edu/gec/glossnew.html#K
Other important links
http://www.ornl.gov/sci/techresources/Human_Genome/glossary/
Slide 10
Molecular structure of DNA
Link: http://www.ashingtonhigh.northumberland.sch.uk/science/biology/DNA4.gif
DNA is a polymeric molecule composed by a string of its component units, the
nucleotides. The DNA is also known as the “double helix”, as two opposing strings of
nucleotides twist in a clock-wise (right handed) manner. Each turn of the helix
5
contains 10 pairs of nucleotides. Each block, unit , or nucleotide is composed by a
nitrogenous base, deoxyribose, a sugar with 5 carbons, (in orange) and a phosphate (in
blue), covalently attached. The carbon atoms are numbered from 1’ to 5’, and the
orientation of the DNA strand is given by these sugar carbon. We see in the picture
that the 2 strands are organized in pairs of nucleotides and that they display opposite
“direction” from the 5’ to3’ end. This order is important for the process of DNA
information we will see later on.
Nitrogenous bases are cyclic compounds that have carbon and nitrogen atoms in their
cycles. Nitrogenous bases with 2 cycles are called purines, and bases with only 1
cycle are pyrimidines. Purines are Adenine and Guanine, and Cytosine and Thymine
are Pyrimidines and are denoted with the capital letter A, G, C and T.
In the double helix, the two chains or strands are in opposite direction and have
nucleotides that match in an ordered fashion: A with T; and C with G. the nature of
this bonding between these nucleotides disrupted with temperature.
The order of the nitrogenous bases in the DNA molecule carries the genetic
information of the make up of all organisms. The organization and types of DNA
sequences that are the main components of the genome follows next.
Links:
http://images.webster-dictionary.org/thumb/c/cd/DNAbasePairing.png
http://www.ashingtonhigh.northumberland.sch.uk/science/biology/DNA4.gif
Slide 11
Watson and Crick
James Watson and Francis Crick were working at Cambridge University and were 24
and 36 years old respectively when they discovered the structure of DNA in 1953.
They were working along with a team from the King's College in London, Maurice
Wilkins and Rosalind Franklin, who obtained crystallographic images of DNA.
Watson and Crick imagined building blocks along a twisted ladder from
crystallographic images obtained by Rosalind Franklin. They won the Nobel price in
1962; Wilkins shared the Nobel price with Watson and Crick. Rosalind Franklin died
of ovarian cancer in 1958. Nobel prizes are not awarded posthumously.
www.hallucinogens.com/lsd/watson-crick.jpg
http://language.chinadaily.com.cn/biography.shtml?id=3034
http://osulibrary.orst.edu/specialcollections/coll/pauling/dna/pictures/portraitwilkins.jpg
http://nobelprize.org/medicine/laureates/1962/wilkins-bio.html
Slide 12
Nuclear DNA: coding and non-coding sequences
Nuclear DNA contains coding and non- coding stretches. The coding stretches are
composed by GENES, basic units of hereditary material that carry information to
6
give origin to a product. There are other stretches of DNA along the genome that do
not code for known products, or are just intervening sequences between coding
stratches. Generically speakin, these sequences were originally called “junk DNA”.
The nomination of this DNA as “junk” is somewhat unfair, as some of these
sequences have a regulatory function that pick up signals to switch on-off certain
genes. Despite their debated functionality, they can provide invaluable information
about evolutionary processes.
Link = gene definition
http://www.biochem.northwestern.edu/holmgren/Glossary/Definitions/DefG/gene.html
Slide 13
Genes: the coding DNA
The products of genes are RNA, another nucleic acid that, by contrast to DNA, it is
composed by a single strand, and Thymine is replaced by another Pyrimidine: Uracyl.
There are three types of RNA: messenger RNA, transport RNA and ribosomal RNA.
Messenger RNA is an intermediate between the DNA contained in the nucleus and
the machinery to construct proteins, situated in the cytoplasm of the cells. Therefore,
the function of this mRNA is to be a “messenger”, carrying the information to
construct proteins were the “factory” is, in the cytoplasm. Messenger RNA is copied
of DNA in a process called transcription. Once in the cytoplasm, the ribosomes, of
ribosomal RNA, “read” the genetic information and construct proteins with the help
of transfer RNA. Transfer RNA’s function is to bring and place the blocks or units
that compose proteins in position. The final product of this process is a protein, a
polymeric molecule composed by a chain of units called AMINOACIDS. There are
20 aminoacids, and each one is recognized by a specific transfer RNA.
The sizes of the different types of RNA products vary: whereas a protein-coding gene
can generate mRNA of several hundreds or thousands of nitrogenous bases (we will
call them just “bases” from now on), tRNA are only 70-90 bases long.
This chain of events constitutes one of the fundamental dogmas of molecular biology:
genetic information is copied from DNA to RNA in the nucleus, and this information
is translated into proteins in the cytoplasm.
Links
http://fajerpc.magnet.fsu.edu/Education/2010/Lectures/26_DNA_Transcription_files/i
mage006.jpg
http://genomebiology.com/content/figures/gb-2003-4-12-237-3-l.jpg
7
Slide 14
Organization of tRNA and rRNA genes
In the human genome, there are approximately 500 genes coding for cytoplasmic
tRNA, which are locate in all chromosomes except Y and 22.
The ribosomes are composed of RNA: a large is formed by the 28S, 5.8S and 5S
coding regions, whereas the small subunit is coded by the 18S gene.
The organization of the ribosomal genes consists of two types of clusters of repeats of
100s –1000s units composed of alternating stretches of transcribed and nontranscribed DNA. One cluster codes for 5S RNA genes . The second cluster codes for
the other 3 ribosomal genes: 18S, 5.8S and 28S. They appear separated by transcribed
stretches: ITS1 and ITS 2, that are excised prost-transcriptionally.
The coding regions for ribosomal RNA are highly conserved across species along the
evolutionary range, for which they are frequently applied in the resolution of deep
phylogenies. The ITS (Internal Transcribed Spacers) are less constricted than the
surrounding coding regions, they are prone to gain mutations and change more rapidly
than the ribosomal coding DNA. For this reason, they are frequently used to resolve
shallow phylogenies, above and below the species level.
The chromosomal location of genes coding for ribosomal RNA are called NOR or
Nucleolus organizing region. In cytological preparations with silver, these regions of
the chromosome are intensely stained due to their high transcriptional activity.
Cytogenetic evolutionary studies in the ‘60s and ‘70s used these regions to determine
homology between chromosomes of closely related species, as their position in the
chromosomes tend to be conserved across species.
http://en.wikipedia.org/wiki/Transfer_RNA
http://www.recipeland.com/facts/Transfer_RNA
http://en.wikipedia.org/wiki/Ribosomal_DNA
image http://wheat.pw.usda.gov/ggpages/bgn/19/a19-10.html
Slide 15
Genes: Organization of single copy DNA
The regions coding for these three types of RNA are also in different in internal
organization and copy number in eukaryotes.
Single copy DNA is coding for proteins. Upstream of the DNA sequence that will be
transcribed into messenger RNA, there are a group of regulatory sequences for the
transcription that remain untranscribed. The internal structure of transcribed region is
that of a block of coding with interspersed blocks of non-coding DNA, which are
called respectively exons and introns. These are transcribed as a single unit into
mRNA and processed whitin the nucleus prior to the “exportation” to the cytoplasm.
The introns are exciced; other post-transcriptional processes include,.for instance, the
addition of a G in the 5’ end, and a poly- adenine tail that will be used as a signal for
transportation to the cytoplasm.
8
http://nitro.biosci.arizona.edu/courses/EEB600A-2003/lectures/lecture24/figs/pol1.jpg
Slide 16
Proteins and Gene families
Genes coding for proteins are usually called “structural genes”. The human genome
contains approximately 25 000 genes, a very small number in comparison to the size
of the genome: 3 000 million base pairs. Other genes code for other products of
regulatory function that we will not discuss in the present course.
During the course of evolution, sometimes genes result duplicated and tend to appear
in tandem. The processes involved in the origin of these gene families range from
unequal crossing over that lead to gene duplication to structural rearrangement of
chromosome segments. The results are clusters of related genes that keep the same or
similar function, or the function may even diverge.
http://www.informatics.jax.org/silver/images/figure5-5.gif
Slide 17
Gene families: concepts
As implied in the previous slide, a gene family is a set of genes related by homology.
Homology is a relationship of identity by descent: in other words, two genes share a
common ancestor. This concept applies to genes in different species (case b in the
slide) or even gene duplication in the same genome (case a in the slide). An example
of this case is given by the globin genes in human chromosome 16.
A special case of homologs genes is orthologs: through speciation process, genes
accumulate differences but retain the same function. This concept is crucial to predict
gene function in newly sequenced genomes.
Paralogs are genes that were duplicated in the same genome, but didn’t retain the
same ancestral function along the course of evolution.
Examples of these cases: ribosomal 18S gene, commonly used to resolve deep
phylogenies due to its slow evolutionary (mutation) rate. An example of paralogy in
evolution is the origin duplication and divergence in function between prolactin and
somatotrophin genes during vertebrate evolution.
See example in Slide 9, Class III
Slide 18
Non-coding DNA: Satellite DNA
Satellite DNA is also know as “highly repetitive DNA” or ‘junk DNA”. It is or
composed by tandem repeats of the same sequence motif, a stretch of DNA of a few
hundreds of thousands of base pairs. They are usually locate in close to the
9
centromers the telomeric region in the chromosomes. The repeat motif is usually
conserved and DNA sequence similarity in the repeat unit is generally conserved
among closely related species. In the ‘80s the study of satellite DNA was very popular
in evolutionary studies above the species level. The function of these repeated
stretches is unknown, by functions like they representing binding sites for proteins
(http://en.wikipedia.org/wiki/Satellite_DNA) . initially, some function related to aging
was attributed to the telomeric repetitive DNA, which is related to their role of
protecting the vulnerable end of the chromosomes.
In the photo, we see metaphasic chromosomes of cattle that were probed with
repetitive DNA with a technique call FISH (fluorescent in situ hybridization). The
two colours are representing different types of highly repetitive DNA that were
detected with specific probes.
The most extended molecular techniques that were applied in the study of satellite
DNA in comparative studies was the fragmentation of the DNA with restriction
enzymes, followed by separation of the fragments in agarose gels, Southern blotting
and probing with labeled known fragments of DNA.
Link: http://www.chrombios.com/PictureGal/Pages/PagesRep/Gal_Rep6.html
References
Prashad N, Cutler RG., (1076) Biochim Biophys Acta. 418(1):1-23. Percent satellite
DNA as a function of tissue and age of mice.)
Stephen Neidle, Gary N Parkinson (2203) The structure of telomeric DNA Current
Opinion in Structural Biology, 13, (3), 275-283.
Slide 19
Minisatellites and the origin of DNA fingerprinting
Minisatellites, other tandem arrangements of repeats are, were discovered in 1980.
The unit of repeat, in this case, is smaller than that one of satellite DNA, ranging from
a few 10s to a hundred of base pairs approximately. Minisatellites are not located in
telomeric or centromeric region, but interspersed among genes. In humans, they are
mostly found in subtelomeric regions and the most common unit of repeat is
TTAGGG.
In some cases minisatellites have been associated with regulatory functions of gene
expression or with the origin of certain diseases of genetic origin, like fragile X in
humans (http://en.wikipedia.org/wiki/Minisatellite).
Minisatellites are characterized by high levels of polymorphism: in other words, a
very high number of alleles are usually found within species, and the allelic number is
given by the NUMBER OF REPEAT UNITS. These loci are also called VNRT or
Variable Number of Tandem Repeats.
10
The origin of the elevated number of alleles can be attributed to errors during DNA
duplication processes and unequal crossing over during meiosis.
The high level of variation of minisatellite loci made them very attractive in forensic
cases and identification of individuals for pedigree reconstruction during the 80’s. The
utilization of different loci in the identification of individuals resulted extremely
accurate, as the probabilities of having a second profile “by chance” could be as low
as 1 in 20 billion, depending on the number of loci utilized, the number of alleles per
loci and their relative frequency in the population. Then a new term was coined by
Alec Jeffreys for these techniques: DNA Fingerprinting. He was the first scientist to
apply these loci in a forensic case, a paternity dispute among foreign immigrants in
England. Later on, however other loci with even higher allelic variation replaced the
minisatellites in individual identification, ecological studies and microevolutionary
processes: the microsatellites.
The techniques to study variation at the minisatellite loci were very similar to those
applied to the study of satellites: DNA was fragmented with restriction enzymes,
electrophoresed, transferred to a membrane (process known as Southern Blot) and
probed with a labeled fragment of known DNA.
See examples in Slide 6, Class II
Slide 20
Non-coding DNA: microsatellites
Microsatellites replaced the utilization of minisatellites from the ‘90s. Similarly to the
previously described type of loci, microsatellites also consist of a motif repeated in
tandem, and the number or repeats of the motif also give the allelic variation. The
repeat motifs vary between 2-6 base pairs, and so are also called STRs, Short Tandem
Repeats. The allelic variation is even higher than minisatellites, implying a higher
mutation rate. The position of these motifs in the genome differs from the last 2
markers: microsatellites can be found interspersed among genes, and even within
genes in intronic regions.
The application of microsatellites is wider then minisatellites as, besides accurate
individual identification and inference of evolutionary processes they are utilized in
genetic mapping, the inference of the relative position of loci along a chromosome.
The generation of microsatellites profiles is simpler than minisatellites as only implies
PCR reactions with primers matching the flanking regions, and resolution of the PCR
products by electrophoresis. Nowadays modern electrophoresis systems do not use
agarose gels but capillary electrophoresis and authomatation.
See examples in Slides 10 and 11, Class II; Slides 11 and 12, Class III
Slide 21
Mobile elements; jumping genes.
Mobile elements, transposons or “jumping genes” are fragments of DNA that can
move around to different positions in the genome of a single cell. In the process, they
11
may cause mutations, or increase (or decrease) the amount of DNA in the genome.(
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Transposons.html)
Jumping elements were discovered by Barbara Mc Clintock in maize, in 1948, who
noticed deletions, insertions and translocations. She won the Nobel Price for her
discoveries 35 years later.
Slide 22
Transposons II
Nowadays we understand the process of self-propagation of these elements. There are
two basic types: retrotranposons and transposons. Their molecular structure is of an
internal coding region flanked by terminal repeats (e.g. LTR or Long terminal repeats
Transposons propagate themselves in a “cut and paste” fashion. They code for an
enzyme than excises these sequences from the site, and then they integrate in the
genome somewhere else.
Retrotransposons are elements that copy themselves into RNA, which, in turn, it is
copied into DNA and inserted in other regions of the genome in a “copy and paste”
fashion. The process involves copying RNA into DNA with an enzyme coded by the
retrotransposon itself. This is a reverse process from the known as “central dogma of
molecular biology”.
The process of copying RNA into DNA is shared by retroviruses like HIV, HTLV, or
T-cell leukaemia virus, for which the evolutionary origin of these elements is though
to have arisen from viral infections.
About 40 % of the human genome and 50% of the maize genome consist of
retrotransposons.
Examples of elements originated by retrotransposition are:
LINES or Long Interspersed Sequence Repeats. They consist of repeats of a few
hundreds to 9000 base pairs and there are 850 000 in the human genome.
SINES or Short Interspersed Sequence Repeats: consists of repeats of a few hundreds
of base pairs. The most important SINES in the human genome are the Alu elements,
(so called after the restriction enzyme that allowed their detection). They consist of
300 bp repeats and make up to 11% of our genome. Many of the Alu elements occur
within introns or structural genes.
The LINES and SINES sequences show similarity in closely related species and they
were used in evolutionary studies in the ‘80s following a methodological procedure
similar to that applied to satellite DNA.
Links: http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Transposons.html
http://en.wikipedia.org/wiki/Transposons#Class_I:_Retrotransposons
Slide 23
The Other genome: mitochondrial DNA
Mitochondrial DNA is a covalently closed circle of DNA, which varies in size
between approximately 12-20 Kb in animals. This DNA is located in the
mitochondria, and codes 37 genes. Thirteen of them are proteins (enzymes) involved
in the respiratory cycle to produce energy in the form of ATP. The mtDNA also codes
12
22tRNA and 2 ribosomal RNA that will translate the mitochondria-encoded proteins.
The mtDNA also contains a region that controls transcription, which usually displays
large amount of variation among individuals within species. This is called “control
region”, D-loop or hypervariable region.
By contrast to the nuclear DNA genes, the mitochondrial genome is very compact,
lacks of repetitive elements and genes are not structured in coding regions and
intervening non-coding regions. The lack of introns and the circular structure of the
mitochondrial genome resembles the prokaryotic genomic DNA. For this reason, the
evolutionary origin of eukaryotic cells are likely endosymbiotic: one cell was
absorbed into another without being digested.
After fertilization, mtDNA contained in sperm is destroyed; therefore this type of
DNA is only maternally inherited. There are exceptional cases in the animal kingdom
where the mtDNA is frequently transmitted by the male parent, for instance in
mussels. Besides, the mitochondrial DNA is not subjected to recombination. The
inheritance fashion of mtDNA allows the reconstruction of maternal lineages and
trace them back in time. Similarly, the utilization of Y- chromosome information
allows the reconstruction of paternal lineages. The allelic variants for the uniparentally inherited DNA present in a population or species are called
HAPLOTYPES.
Some mt genes, e.g. the coding for ribosomal 16S RNA and COI, are frequently
utilized in phylogenetic reconstruction. Other genes display higher mutation rates,
which, combined with the small affective population size for mtDNA, results in
genetic drift (see Slide 20, Class II) that is more pronounced than that for nuclear
genes. For this reason, mtDNA is especially attractive in the study of
microevolutionary processes, and population genetics. Of particular interest is its
application to phylogeography: the study of the processes controlling the geographic
distributions of lineages by constructing the genealogies of populations and genes.
See examples of lineages in Slide 26, Class II; phylogeographic reconstruction in
Slides 1-3, Class III.
http://www.cbmpan.gdynia.pl/Images/hex0.gif
http://en.wikipedia.org/wiki/Mitochondrial_DNA
http://en.wikipedia.org/wiki/Phylogeography
image: human mtDNA http://jon.thackray.org/biochem/dna.html
Slide 24
The genetic code
Previously we saw that the information contained in the coding regions of DNA,
either nuclear or mitochondrial, is transported by the processed messenger RNA to the
cytoplasm where it will be translated into protein. How is this information coded?
The mRNA is “read” from it 5’ end to its 3’ end by the tRNA every three bases.
These two RNAs match by their complementary triplets, the codon in the mRNA and
the anticodon in the tRNA by non- covalent Hydrogen bonds. After one “block” or
Unit of information is read, the translation machinery moves one more space and
reads the following codon, and adds the corresponding aminoacid to the growing
13
protein chain. Remember that the mRNA is copied from the antisense strand of DNA.
Therefore, the codons are read along the mRNA in the same sense they are present in
the DNA from the 5’ to the 3’ end.
When we calculate all the possible combination of 4 elements taken by 3, we obtain a
total of 64 different possibilities. However, there are only 20 aminoacids to code for.
This results in aminoacids being coded by more than one possible combination, and
for this reason, the genetic called is REDUNDANT.
Besides, some codons give START or STOP signals to the translation process. For
instance, AUG is the START codon and is going to define the reading frame of a
string of nucleotides.
The table in the figure shows the code for nuclear vertebrate DNA. We can see that
some aminoacids are coded by 4 or by two codons. A codon is said to be “four-fold
degenerate” if any nucleotide at the 3rd position determines the same aminoacid.
Similarly, a “two-fold degenerate” codon, determines the same aminoacid when the
3rd position is occupied by any purine or any pyrimidine. Also, the genetic code is
tolerant to mutations that do not affect the aminoacid property. For instance, NUN (N
stands for any nucleotide) tends to code for hydrophobic aminoacids. 0-fold
degenerate sites are those that determine and aminoacid change.
The redundancy makes the genetic code more tolerant to fault mutations at the third
position. Changes in the coding region that do not affect the result in the protein are
called SILENT MUTATIONS and constituted the foundation of the Neutral Theory
of Evolution.
A few differences exist in the code between eukaryotes and prokaryotes, and
consequently, the mtDNA code is more closely related to that of prokaryotes.
genetic code http://employees.csbsju.edu/hjakubowski/classes/ch331/dna/trnacode.gif
http://en.wikipedia.org/wiki/Genetic_code
Slide 25
Examples genetic code
This is an example of the alignment of a short fragment of DNA coding for the
mitochondrial gene COI (cytochrome oxidase I) in several krill species of the genus
Euphausia.
Highlighted is yellow are the codons that present silent mutations; none of the
observed differences across species determine aminoacid change. The nonsynonymous substitutions are shown in red.
In the second aminoacid position of the alignment we see an example of aminoacid
replacement. The mutation that gave origin to the aminoacid change from Glycine to
Cysteine in Euphausia gibboides is in the 1st position of the codon.
Alanine occupies the 3rd position in the aminoacid chain. This is a case of 4-fold
degeneracy of the code, we can see the presence of either A, T, C or G in the third
position of the corresponding codon in different krill species.
14
At the end of the alignment we can see several aminoacid replacements. Hystidine is
replace by Glutamine, two charged aminoacids. The codons for these two aminoacids
are 2-fold degenerate. We can see that a transversion at the 3rd position changing A to
C (or vice versa) between Euphausia tenera and Euphausia tricantha still codes for
Glutamine.
Slide 26
Mutations
The concept of mutation can be simply expressed as a change in the heritable
material. The persistence of this new variants in the population or species will be
ultimately dependant on other factors like their adaptive values, and if non-adaptive,
would be dependant on external factors like selection, genetic drift and other
stochastic factors.
The most important changes of evolutionary significance are point mutations,
duplications, chromosomal rearrangements and polyploidization.
Slide 27
Mutations within loci
The variation within loci gives origin to new alleles. Although this type of mutation is
more commonly known as “point mutation”, there are other mechanisms that also
generate new alleles: insertions and deletions.
We previously saw point mutations in protein coding regions. Changes in a single
nucleotide are called synonymous and non-synonymous substitutions when they are
silent or determine aminoacid change respectively. Insertions and deletions in coding
regions also occur, but unless the change involves 3 nucleotide, that change in the
reading frame would determine this allelic variant being deleterious for the organism.
Deletions and insertions are the mutations that generate allelic variation in
microsatellites and minisatelllite loci.
Changes in RNAs of can cause modification in their secondary structure.
Slide 28
Point mutations
We briefly explained before the constitution of the DNA and the organization of the
complementary bases. Mutations do not occur entirely at random. Some changes are
more frequent than other, for instance changes to another nucleotide of the same type,
purine to purine or pyrimidine to pyrimidine are more frequent than changes that
incur in a change of nucleotide type (purine to pyrimidine of vice versa). The first
changes are called transitions whereas the second type of change are called
transversions.
15
Slide 29
Insertions – deletions
We previously saw a few examples of point mutations in protein-coding regions. In
this slide, it is shown the effect of insertions and deletions in a protein coding region:
The example shows an insertion with the consequence in the shift of reading frame in
the corresponding protein. In this case, not only new aminoacids are incorporated into
the protein, but also the translation process will finalize at an earlier point. Changes of
this type can determine that the resulting product is non functional and therefore
deleterious for the organism.
Slide 30
Changes in ITS1
This is an example of an insertion (or deletion) in the ITS-1 region of two individuals
of the krill species Euphausia recurva. We see that on single insertion (or deletion) of
Adenine favors or prevents, repectively, the fomation of a 2-base-long small stem.
The absence of this A therefore determines the formation of a bigger loop.
The function of this secondary structure is unknown, and we cannot ascertain if it
confers any adaptive value to the organism. Since this polymorphism is frequent in
this species, we assume it doesn’t.
WE have already seen examples of gene duplication in Slide 17. We will see other
major changes in the genome at the chromosomal level.
Slide 31
There are changes that also affect the structural organization of the genome at the
chromosomal level. Closely related species sometimes show different number of
chromosomes, and a few chromosomes that differ in size and shape. This is the result
of a major processes of chromosomal rearrangements that produce breakpoints,
fusion, fision, inversion of fragments within the same chromosome, and translocation
of fragments between non-homologous chromosomes.
Animations (not included):
http://www.emunix.emich.edu/~rwinning/genetics/variat2a.htm
Slide 32
Ploidy
Ploidy is the number of single sets of chromosomes in a cell or organisms. For
instance, we are diploid organisms as we carry 2 sets of homologous chromosomes.
Our gametes, however, are haploid cells as they carry only 1 set of chromosomes.
16
Cells within the same organisms can have different ploidy: our liver cells are
octaploid.
Cells or organisms that carry more than 2 sets of chromosomes are known as
polyploids (triploids when they have 3 sets, tetraploid when they have 4 sets).
The evolutionary changes in the number of chromosome sets is very frequent among
plants, that can give origin to different species after cross- species fertilization by
simply no disjunction of chromosomes in the first meiotic division. As a result, the
second meiotic division separates the sister chromatides of all existing chromosomes.
Therefore, the new species will automatically carry all chromosomes, all genetic
information from parental species.
Examples of tetraploids are Pelargonium, maize, cotton, cabbage, and leek.
Examples of hexaploids: wheat, oat.
Examples of octaploids: strawberry, sugar cane.
http://en.wikipedia.org/wiki/Polyploid#Polyploid_crops
Slide 33
Mutation rates
Mutation rate is the number of mutation events per unit of time. The units of time are
measured differently depending on the objective of the study: we can consider
generations, number of cell divisions, etc. In the initial studies of mutants the rate
used to be expressed in number of mutation per million gametes, and it was obtained
from counting mutant individuals resulting from controlled crosses.
The rate at which new variant arises in a genome depends on the type of DNA and
genome. This rate varies greatly across species, among genes within the same species
and between the nuclear and the mitochondrial genome.
In human, for instance, the incidence –or frequency per gametes- of mutations causing
achondroplasia (dwarfism) is 4-12 x 10-5, and hemophilia A 2-4 x 10-5 .
In an evolutionary scenario we are interested in expresing the mutation rates in other
units, like mutations per generation
For coding regions, on average the rates per generation are approximately ~10-8 - 10-9
per base pair (or site); ~10-6 - 10-5 per gene and per genome ~0.02 - 1
Microsatellites show a mutation rate that is 10 to 100 times faster.
Mitochondrial DNA control region in humans is approximately 0.011.
Form the figures in this slide we can see that the different mutation rates in different
regions of the genome would allow to infer evolutionary processes at different level
of organization. For instance, mitochondrial control region and microsatellites are
very effective in the inference of historical and demographical scenarios, whereas the
more conserved nuclear protein-coding genes would be uninformative at the
population level, and are used, instead, in the inference of phylogenetic relationship
between species.
17
http://www.zoology.ubc.ca/~whitlock/bio434/LectureNotes/06.Mutation/Mutation.ht
ml
http://biology.clc.uc.edu/courses/bio104/dna.htm
mt mutation rate http://hammerlab.biosci.arizona.edu/Publications/BonneTamir_2003.pdf
Reference:
Am J Hum Genet. 2000 May;66(5):1599-609. Epub 2000 Apr 7.
Slide 34
Molecular clocks
Tightly linked to the concept of substitution o mutation rate is the idea of “molecular
clocks”. This concept implies that the rate at which mutations occur and are kept
constant along the evolutionary time, in other word, the mutations “click” regularly
spaced in time. Although this is not absolutely true, as the mutation rates varies along
and between lineages, it can give an approximate idea of speciation times.
For instance, it is widely accepted the the general rate for mtDNA genes is a change
of 2% per million years.
18