* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 2
Nutriepigenomics wikipedia , lookup
DNA barcoding wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
X-inactivation wikipedia , lookup
Frameshift mutation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genetic code wikipedia , lookup
Transposable element wikipedia , lookup
DNA vaccination wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Epigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Molecular cloning wikipedia , lookup
Oncogenomics wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
DNA supercoil wikipedia , lookup
Minimal genome wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Genomic library wikipedia , lookup
Genetic engineering wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Primary transcript wikipedia , lookup
Human genome wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Designer baby wikipedia , lookup
Genome (book) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Microsatellite wikipedia , lookup
Genome editing wikipedia , lookup
Point mutation wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Non-coding DNA wikipedia , lookup
Slide 1 Link http://www.blackwellpublishing.com/ridley/images/h_erato.jpg http://www.nature.com/hdy/journal/v94/n1/thumbs/6800561f4th.gif Slide 2 Aim of the course The concept of diversity is easy to understand. We all know that all biological forms present variation at many different levels of organization: there are different types of cells within one organism, individuals of the same species present different appearances (or phenotypes), and the diversity of a community or ecosystem is given by the number of species inhabiting it. In the present work we will focus on the study of the principles and processes that mold variation at the species level. All observed external characters, or external appearance of organisms, are known as “phenotype”. These phenotypes are, in turn, governed by the interaction of the genetic material with the environment at a major or lesser extent. The genetic material of genetic composition is known as “genotype”. In the first part of this course, we will present the organization of the genetic information. Then, we will present methodological approaches to study variation at the DNA level, and, finally, the application of molecular markers in the study and inference of ecological, demographical and evolutionary processes. Slide 3 The idea of transmission of characters or traits from one generation to the next has always been intuitively present in our minds, way before the principles and material of inheritance were disclosed. These general and vague understanding of the first settlers allowed the domestication of pants and animals from the Neolithic times. Links Ships: http://www.fhw.gr/chronos/01/en/gallery/intro/animals/animals1.html Maize: http://www.athenapub.com/nwdom1.htm http://www.wsu.edu/gened/learn-modules/top_agrev/6Domestication/domestication4.html Slide 4 However, it wasn’t till the middle of the 19th century that the rules of inheritance were discovered. Gregor Mendel, an Austrian monk and a teacher at a local high school, who studied botany and experimented with pea crosses. He selected 22 varieties of 1 peas experimented with crosses for 8 years and published his seminal work in 1866. However, his work remained “unattended” by the scientific community for the next 35 years, as major attention was drawn to a book published a few years before: The Origin of Species, by Charles Darwin. Slide 5 Mendel experimented crossing pea plants with these characters: tall or short, purple or white flowers, smooth or wrinkled peas, round or shriveled peas, green or yellow peas. After crossing purple-flowered plants with white-flowered plants he crossed pollinated the offspring (F1) . Then, he discovered that three-quarters of the offspring (F2) were purple-flowered when they bloomed, and one-quarter white. Then, there were characters that were “masked” in the first generation and passed onto the 2nd generation, what he called “recessive”, whereas he called “dominant” the characters that “overshadowed” them. The laws of inheritance of Mendel, can be summarized as; Law of independent assortment: characters (factors) segregate independently. It means, the character “tall” or “short” is independent from “purple” or “white” flowers. Law of independent segregation: characters occur in alternative forms (today we call them alleles). They occur in pairs within individuals, and they are inherited from each parent. These pairs separate (or segregate) during gametes production in the parents, and recombine in later on in reproduction. Each parent contributes with one allele to the offspring. Law of dominance: for each character, one factor is dominant and another recessive and appears in a ratio approximately 3:1. Combinations of alleles that include the dominant form will show only that one. The representation of independent segregation and resulting crosses can be represented in a way called “Punnet square”. Here, we represent all possible combination of alleles in the gametes of the parents, and all possible results of the combination between their gametes. The proportion of offspring with a given genotype, then, can be predicted in terms of probabilities. Individuals carrying two different variants of a character, 2 different alleles, are called Hetetozygotes, whereas individuals who present only one variant, duplicated, are called Homozygotes.. This is a fundamental concept we will resume later on. As a general rule, we denote dominant alleles in capital letters Now, we will see how material of inheritance is organized, and how alleles are segregated in the gerlime. Links: http://history.nih.gov/exhibits/nirenberg/images/photos/01_mendel_pu.jpg www.godrules.net/evolutioncruncher/a10a.htm http://www.laskerfoundation.org/rprimers/gnn/timeline/1866.html http://www.emunix.emich.edu/~rwinning/genetics/mendel4.htm 2 Slide 6 Nowadays we know that the inheritance material is carried in DNA (deoxyribonucleic acid), molecule that is organized in discrete units called chromosomes. Chromosomes occur in pairs, each member of the pair is inherited from each parent. The process of Meiosis is fundamental to understand how characters are segregated. Every cell of the organism has 2 pairs of each chromosome. However, to pass on the information to the next generation, the information has to be “halved”, as the other half has to be provided by the other parent. This process of reduction of the genetic information during the formation of the gametes is called meiosis. In this process, one diploid cell gives origin to 4 haploid cells. Prior to the meiotic division, during a period called “Interphase”, the chromosomes are composed by centromere and 1 chromatide and duplicate their genetic information resulting in two chromatides attached by the centromere. Thus, at the end of interfase, the cell contains 2N chromosomes with duplicated genetic information. In the first stage of division, homologous chromosomes pair with each other and interchange genetic material. The name of this process is called “crossing over”, and, in other words, chromatides of homologous chromosomes shuffle fragments to arrive to a new combination of maternal and paternal genetic information. This is the major advantage of “sex” in evolution, that provides with a mechanism for generating genetic variation. At the end of the 1st cell division, then, two cells are obtained, each contains 1 chromosome from the set of homologous, and each chromosome contains its information in a duplicated fashion. This first division reduces the number of chromosomes in the cells. The 2nd cell division separates the sister chromatides of the chromosomes and results in 4 haploid cells, carrying only one chromosome of each pair. Ploidy is the number of chromosome set present in the nucleus of the cells (see also Slide 32 of this presentation). In this case, the resulting cells are haploid. The meiosis process starts with a diploid cell. If we consider the change in content of DNA along the cell division, and we call 2c the amount of DNA present in Interfase, we see a change to 4c after duplication of the chromatides. At the end of meiosis 1, the 2 resulting cells contain 2c DNA again. However, only one pair of each chromosome unlike at the starting point. The 2nd division halves the amount of genetic information present in these cells, therefore results in 4 cells with 1c content of genetic information. Now we can see the link between this process and the Mendel laws. Sorting of homologous chromosomes into different resulting cells (or gametes) explains the law of segregation. Also we can see that the independent assortment is related to characters that are coded for in different chromosomes. That is why peas with purple or white flowers can have peas that are either smooth or wrinkled. Link: http://genetics.gsk.com/graphics/meiosis-big.gif Not used but important: http://www.cellsalive.com/meiosis.htm 3 Slide 7 The process of cellular division in all organisms also implies a process of cell division with transfer of the genetic information to the 2 resulting “daughter cells”. This process of cell division followed by cell division is called “cell cycle” and has common features to the process of meiosis. However, meiosis is a “dead end” and cells do not enter another cycle of division. The process by which the chromosomes are copied into an exact copy of themselves and passed onto the resulting cells in the process of cell division is called mitosis. This process occurs in all but the germline cells of out bodies. Similarly to meiosis, the DNA of each chromosome is duplicated. Here, the sister chromatides are sorted into the resulting cells, but DO NOT INTERCHANGE GENETIC MATERIAL, and the daughter cells contain exactly the same genetic information as the original cell that divided into 2. These daughter cells can start now a new cycle of division. Now that we understand the principles of transmission of inheritance or “characters’ we will see how the genetic information is organized: the concept of genome and different types of “genomes” carried by eukaryotic organisms. Link http://genetics.gsk.com/chromosomes.htm Slide 8 Organization of the genetic information The complete genetic information possessed by an organism is called GENOME. Eukaryotic organisms not only contain genetic information in the nucleus of their cells, where chromosomes are located, but also carry extra nuclear DNA. This is contained in organoids like mithochondria and chloroplasts. Link: http://www.lclark.edu/~seavey/plant%20cell72-1.jpg For genome definition http://www.stateofthesalmon.org/resource/glossary.asp?let=g Slide 9 Nuclear genetic information Nuclear DNA is organized in discrete units called chromosomes, which are visible during the cell cycle in metaphase when the chromatine contracts and gets condensed and packed in a scaffold of accompanying protein, the histones. 4 The genetic information is also compartmentalized in these different units. For example, in mammals and birds, a distinct pair of homologous chromosomes carries the genetic information for sexual determination. These are called sexual chromosomes. The rest of the chromosomes are called “somatic”. In mammals, chromosome X carries the information to determine female, whereas the Ychromosome determines that the individuals that carries it is a male. Mammal females carry 2 X chromosomes and males one copy of a Y and one copy of an X chromosome. Then, mammal males are called the “heterogametic sex” because the 2 sexual chromosomes are different, whereas females are the “homogametic sex”. By contrast to mammals, the heterogametic sex in birds are females. In this slide, we see the chromosomes of the human species in metaphase mitosis, when the chromosomes are composed by two sister chromatids, in other words, with duplicated genetic information. This “set of photographed, banded chromosomes arranged in order from largest to smallest” is called Karyotype. The human nuclear genome, then , is composed by a set of 22 somatic chromosomes plus a pair of sexual chromosomes; XY in the case of males, and XX in the case of females. The internal structure of this metaphasic chromosomes as seen in the microscope is a condensed, supercoiled fiber composed by a molecule of DNA and several accompanying proteins. The DNA molecule is packed around 8 molecules of basic proteins, the histones H1 H2a H2b and H3. This organization is called chromatin, and shows different levels of condensation along the cell cycle. The highest level of condensation is achieved right before cell division. Following, we are going to see how the molecular structure of DNA. http://www.vcbio.science.ru.nl/images/cellcycle/mchromatinpackaging_zoom.gif karyotype : http://www.kumc.edu/gec/glossnew.html#K Other important links http://www.ornl.gov/sci/techresources/Human_Genome/glossary/ Slide 10 Molecular structure of DNA Link: http://www.ashingtonhigh.northumberland.sch.uk/science/biology/DNA4.gif DNA is a polymeric molecule composed by a string of its component units, the nucleotides. The DNA is also known as the “double helix”, as two opposing strings of nucleotides twist in a clock-wise (right handed) manner. Each turn of the helix 5 contains 10 pairs of nucleotides. Each block, unit , or nucleotide is composed by a nitrogenous base, deoxyribose, a sugar with 5 carbons, (in orange) and a phosphate (in blue), covalently attached. The carbon atoms are numbered from 1’ to 5’, and the orientation of the DNA strand is given by these sugar carbon. We see in the picture that the 2 strands are organized in pairs of nucleotides and that they display opposite “direction” from the 5’ to3’ end. This order is important for the process of DNA information we will see later on. Nitrogenous bases are cyclic compounds that have carbon and nitrogen atoms in their cycles. Nitrogenous bases with 2 cycles are called purines, and bases with only 1 cycle are pyrimidines. Purines are Adenine and Guanine, and Cytosine and Thymine are Pyrimidines and are denoted with the capital letter A, G, C and T. In the double helix, the two chains or strands are in opposite direction and have nucleotides that match in an ordered fashion: A with T; and C with G. the nature of this bonding between these nucleotides disrupted with temperature. The order of the nitrogenous bases in the DNA molecule carries the genetic information of the make up of all organisms. The organization and types of DNA sequences that are the main components of the genome follows next. Links: http://images.webster-dictionary.org/thumb/c/cd/DNAbasePairing.png http://www.ashingtonhigh.northumberland.sch.uk/science/biology/DNA4.gif Slide 11 Watson and Crick James Watson and Francis Crick were working at Cambridge University and were 24 and 36 years old respectively when they discovered the structure of DNA in 1953. They were working along with a team from the King's College in London, Maurice Wilkins and Rosalind Franklin, who obtained crystallographic images of DNA. Watson and Crick imagined building blocks along a twisted ladder from crystallographic images obtained by Rosalind Franklin. They won the Nobel price in 1962; Wilkins shared the Nobel price with Watson and Crick. Rosalind Franklin died of ovarian cancer in 1958. Nobel prizes are not awarded posthumously. www.hallucinogens.com/lsd/watson-crick.jpg http://language.chinadaily.com.cn/biography.shtml?id=3034 http://osulibrary.orst.edu/specialcollections/coll/pauling/dna/pictures/portraitwilkins.jpg http://nobelprize.org/medicine/laureates/1962/wilkins-bio.html Slide 12 Nuclear DNA: coding and non-coding sequences Nuclear DNA contains coding and non- coding stretches. The coding stretches are composed by GENES, basic units of hereditary material that carry information to 6 give origin to a product. There are other stretches of DNA along the genome that do not code for known products, or are just intervening sequences between coding stratches. Generically speakin, these sequences were originally called “junk DNA”. The nomination of this DNA as “junk” is somewhat unfair, as some of these sequences have a regulatory function that pick up signals to switch on-off certain genes. Despite their debated functionality, they can provide invaluable information about evolutionary processes. Link = gene definition http://www.biochem.northwestern.edu/holmgren/Glossary/Definitions/DefG/gene.html Slide 13 Genes: the coding DNA The products of genes are RNA, another nucleic acid that, by contrast to DNA, it is composed by a single strand, and Thymine is replaced by another Pyrimidine: Uracyl. There are three types of RNA: messenger RNA, transport RNA and ribosomal RNA. Messenger RNA is an intermediate between the DNA contained in the nucleus and the machinery to construct proteins, situated in the cytoplasm of the cells. Therefore, the function of this mRNA is to be a “messenger”, carrying the information to construct proteins were the “factory” is, in the cytoplasm. Messenger RNA is copied of DNA in a process called transcription. Once in the cytoplasm, the ribosomes, of ribosomal RNA, “read” the genetic information and construct proteins with the help of transfer RNA. Transfer RNA’s function is to bring and place the blocks or units that compose proteins in position. The final product of this process is a protein, a polymeric molecule composed by a chain of units called AMINOACIDS. There are 20 aminoacids, and each one is recognized by a specific transfer RNA. The sizes of the different types of RNA products vary: whereas a protein-coding gene can generate mRNA of several hundreds or thousands of nitrogenous bases (we will call them just “bases” from now on), tRNA are only 70-90 bases long. This chain of events constitutes one of the fundamental dogmas of molecular biology: genetic information is copied from DNA to RNA in the nucleus, and this information is translated into proteins in the cytoplasm. Links http://fajerpc.magnet.fsu.edu/Education/2010/Lectures/26_DNA_Transcription_files/i mage006.jpg http://genomebiology.com/content/figures/gb-2003-4-12-237-3-l.jpg 7 Slide 14 Organization of tRNA and rRNA genes In the human genome, there are approximately 500 genes coding for cytoplasmic tRNA, which are locate in all chromosomes except Y and 22. The ribosomes are composed of RNA: a large is formed by the 28S, 5.8S and 5S coding regions, whereas the small subunit is coded by the 18S gene. The organization of the ribosomal genes consists of two types of clusters of repeats of 100s –1000s units composed of alternating stretches of transcribed and nontranscribed DNA. One cluster codes for 5S RNA genes . The second cluster codes for the other 3 ribosomal genes: 18S, 5.8S and 28S. They appear separated by transcribed stretches: ITS1 and ITS 2, that are excised prost-transcriptionally. The coding regions for ribosomal RNA are highly conserved across species along the evolutionary range, for which they are frequently applied in the resolution of deep phylogenies. The ITS (Internal Transcribed Spacers) are less constricted than the surrounding coding regions, they are prone to gain mutations and change more rapidly than the ribosomal coding DNA. For this reason, they are frequently used to resolve shallow phylogenies, above and below the species level. The chromosomal location of genes coding for ribosomal RNA are called NOR or Nucleolus organizing region. In cytological preparations with silver, these regions of the chromosome are intensely stained due to their high transcriptional activity. Cytogenetic evolutionary studies in the ‘60s and ‘70s used these regions to determine homology between chromosomes of closely related species, as their position in the chromosomes tend to be conserved across species. http://en.wikipedia.org/wiki/Transfer_RNA http://www.recipeland.com/facts/Transfer_RNA http://en.wikipedia.org/wiki/Ribosomal_DNA image http://wheat.pw.usda.gov/ggpages/bgn/19/a19-10.html Slide 15 Genes: Organization of single copy DNA The regions coding for these three types of RNA are also in different in internal organization and copy number in eukaryotes. Single copy DNA is coding for proteins. Upstream of the DNA sequence that will be transcribed into messenger RNA, there are a group of regulatory sequences for the transcription that remain untranscribed. The internal structure of transcribed region is that of a block of coding with interspersed blocks of non-coding DNA, which are called respectively exons and introns. These are transcribed as a single unit into mRNA and processed whitin the nucleus prior to the “exportation” to the cytoplasm. The introns are exciced; other post-transcriptional processes include,.for instance, the addition of a G in the 5’ end, and a poly- adenine tail that will be used as a signal for transportation to the cytoplasm. 8 http://nitro.biosci.arizona.edu/courses/EEB600A-2003/lectures/lecture24/figs/pol1.jpg Slide 16 Proteins and Gene families Genes coding for proteins are usually called “structural genes”. The human genome contains approximately 25 000 genes, a very small number in comparison to the size of the genome: 3 000 million base pairs. Other genes code for other products of regulatory function that we will not discuss in the present course. During the course of evolution, sometimes genes result duplicated and tend to appear in tandem. The processes involved in the origin of these gene families range from unequal crossing over that lead to gene duplication to structural rearrangement of chromosome segments. The results are clusters of related genes that keep the same or similar function, or the function may even diverge. http://www.informatics.jax.org/silver/images/figure5-5.gif Slide 17 Gene families: concepts As implied in the previous slide, a gene family is a set of genes related by homology. Homology is a relationship of identity by descent: in other words, two genes share a common ancestor. This concept applies to genes in different species (case b in the slide) or even gene duplication in the same genome (case a in the slide). An example of this case is given by the globin genes in human chromosome 16. A special case of homologs genes is orthologs: through speciation process, genes accumulate differences but retain the same function. This concept is crucial to predict gene function in newly sequenced genomes. Paralogs are genes that were duplicated in the same genome, but didn’t retain the same ancestral function along the course of evolution. Examples of these cases: ribosomal 18S gene, commonly used to resolve deep phylogenies due to its slow evolutionary (mutation) rate. An example of paralogy in evolution is the origin duplication and divergence in function between prolactin and somatotrophin genes during vertebrate evolution. See example in Slide 9, Class III Slide 18 Non-coding DNA: Satellite DNA Satellite DNA is also know as “highly repetitive DNA” or ‘junk DNA”. It is or composed by tandem repeats of the same sequence motif, a stretch of DNA of a few hundreds of thousands of base pairs. They are usually locate in close to the 9 centromers the telomeric region in the chromosomes. The repeat motif is usually conserved and DNA sequence similarity in the repeat unit is generally conserved among closely related species. In the ‘80s the study of satellite DNA was very popular in evolutionary studies above the species level. The function of these repeated stretches is unknown, by functions like they representing binding sites for proteins (http://en.wikipedia.org/wiki/Satellite_DNA) . initially, some function related to aging was attributed to the telomeric repetitive DNA, which is related to their role of protecting the vulnerable end of the chromosomes. In the photo, we see metaphasic chromosomes of cattle that were probed with repetitive DNA with a technique call FISH (fluorescent in situ hybridization). The two colours are representing different types of highly repetitive DNA that were detected with specific probes. The most extended molecular techniques that were applied in the study of satellite DNA in comparative studies was the fragmentation of the DNA with restriction enzymes, followed by separation of the fragments in agarose gels, Southern blotting and probing with labeled known fragments of DNA. Link: http://www.chrombios.com/PictureGal/Pages/PagesRep/Gal_Rep6.html References Prashad N, Cutler RG., (1076) Biochim Biophys Acta. 418(1):1-23. Percent satellite DNA as a function of tissue and age of mice.) Stephen Neidle, Gary N Parkinson (2203) The structure of telomeric DNA Current Opinion in Structural Biology, 13, (3), 275-283. Slide 19 Minisatellites and the origin of DNA fingerprinting Minisatellites, other tandem arrangements of repeats are, were discovered in 1980. The unit of repeat, in this case, is smaller than that one of satellite DNA, ranging from a few 10s to a hundred of base pairs approximately. Minisatellites are not located in telomeric or centromeric region, but interspersed among genes. In humans, they are mostly found in subtelomeric regions and the most common unit of repeat is TTAGGG. In some cases minisatellites have been associated with regulatory functions of gene expression or with the origin of certain diseases of genetic origin, like fragile X in humans (http://en.wikipedia.org/wiki/Minisatellite). Minisatellites are characterized by high levels of polymorphism: in other words, a very high number of alleles are usually found within species, and the allelic number is given by the NUMBER OF REPEAT UNITS. These loci are also called VNRT or Variable Number of Tandem Repeats. 10 The origin of the elevated number of alleles can be attributed to errors during DNA duplication processes and unequal crossing over during meiosis. The high level of variation of minisatellite loci made them very attractive in forensic cases and identification of individuals for pedigree reconstruction during the 80’s. The utilization of different loci in the identification of individuals resulted extremely accurate, as the probabilities of having a second profile “by chance” could be as low as 1 in 20 billion, depending on the number of loci utilized, the number of alleles per loci and their relative frequency in the population. Then a new term was coined by Alec Jeffreys for these techniques: DNA Fingerprinting. He was the first scientist to apply these loci in a forensic case, a paternity dispute among foreign immigrants in England. Later on, however other loci with even higher allelic variation replaced the minisatellites in individual identification, ecological studies and microevolutionary processes: the microsatellites. The techniques to study variation at the minisatellite loci were very similar to those applied to the study of satellites: DNA was fragmented with restriction enzymes, electrophoresed, transferred to a membrane (process known as Southern Blot) and probed with a labeled fragment of known DNA. See examples in Slide 6, Class II Slide 20 Non-coding DNA: microsatellites Microsatellites replaced the utilization of minisatellites from the ‘90s. Similarly to the previously described type of loci, microsatellites also consist of a motif repeated in tandem, and the number or repeats of the motif also give the allelic variation. The repeat motifs vary between 2-6 base pairs, and so are also called STRs, Short Tandem Repeats. The allelic variation is even higher than minisatellites, implying a higher mutation rate. The position of these motifs in the genome differs from the last 2 markers: microsatellites can be found interspersed among genes, and even within genes in intronic regions. The application of microsatellites is wider then minisatellites as, besides accurate individual identification and inference of evolutionary processes they are utilized in genetic mapping, the inference of the relative position of loci along a chromosome. The generation of microsatellites profiles is simpler than minisatellites as only implies PCR reactions with primers matching the flanking regions, and resolution of the PCR products by electrophoresis. Nowadays modern electrophoresis systems do not use agarose gels but capillary electrophoresis and authomatation. See examples in Slides 10 and 11, Class II; Slides 11 and 12, Class III Slide 21 Mobile elements; jumping genes. Mobile elements, transposons or “jumping genes” are fragments of DNA that can move around to different positions in the genome of a single cell. In the process, they 11 may cause mutations, or increase (or decrease) the amount of DNA in the genome.( http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Transposons.html) Jumping elements were discovered by Barbara Mc Clintock in maize, in 1948, who noticed deletions, insertions and translocations. She won the Nobel Price for her discoveries 35 years later. Slide 22 Transposons II Nowadays we understand the process of self-propagation of these elements. There are two basic types: retrotranposons and transposons. Their molecular structure is of an internal coding region flanked by terminal repeats (e.g. LTR or Long terminal repeats Transposons propagate themselves in a “cut and paste” fashion. They code for an enzyme than excises these sequences from the site, and then they integrate in the genome somewhere else. Retrotransposons are elements that copy themselves into RNA, which, in turn, it is copied into DNA and inserted in other regions of the genome in a “copy and paste” fashion. The process involves copying RNA into DNA with an enzyme coded by the retrotransposon itself. This is a reverse process from the known as “central dogma of molecular biology”. The process of copying RNA into DNA is shared by retroviruses like HIV, HTLV, or T-cell leukaemia virus, for which the evolutionary origin of these elements is though to have arisen from viral infections. About 40 % of the human genome and 50% of the maize genome consist of retrotransposons. Examples of elements originated by retrotransposition are: LINES or Long Interspersed Sequence Repeats. They consist of repeats of a few hundreds to 9000 base pairs and there are 850 000 in the human genome. SINES or Short Interspersed Sequence Repeats: consists of repeats of a few hundreds of base pairs. The most important SINES in the human genome are the Alu elements, (so called after the restriction enzyme that allowed their detection). They consist of 300 bp repeats and make up to 11% of our genome. Many of the Alu elements occur within introns or structural genes. The LINES and SINES sequences show similarity in closely related species and they were used in evolutionary studies in the ‘80s following a methodological procedure similar to that applied to satellite DNA. Links: http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Transposons.html http://en.wikipedia.org/wiki/Transposons#Class_I:_Retrotransposons Slide 23 The Other genome: mitochondrial DNA Mitochondrial DNA is a covalently closed circle of DNA, which varies in size between approximately 12-20 Kb in animals. This DNA is located in the mitochondria, and codes 37 genes. Thirteen of them are proteins (enzymes) involved in the respiratory cycle to produce energy in the form of ATP. The mtDNA also codes 12 22tRNA and 2 ribosomal RNA that will translate the mitochondria-encoded proteins. The mtDNA also contains a region that controls transcription, which usually displays large amount of variation among individuals within species. This is called “control region”, D-loop or hypervariable region. By contrast to the nuclear DNA genes, the mitochondrial genome is very compact, lacks of repetitive elements and genes are not structured in coding regions and intervening non-coding regions. The lack of introns and the circular structure of the mitochondrial genome resembles the prokaryotic genomic DNA. For this reason, the evolutionary origin of eukaryotic cells are likely endosymbiotic: one cell was absorbed into another without being digested. After fertilization, mtDNA contained in sperm is destroyed; therefore this type of DNA is only maternally inherited. There are exceptional cases in the animal kingdom where the mtDNA is frequently transmitted by the male parent, for instance in mussels. Besides, the mitochondrial DNA is not subjected to recombination. The inheritance fashion of mtDNA allows the reconstruction of maternal lineages and trace them back in time. Similarly, the utilization of Y- chromosome information allows the reconstruction of paternal lineages. The allelic variants for the uniparentally inherited DNA present in a population or species are called HAPLOTYPES. Some mt genes, e.g. the coding for ribosomal 16S RNA and COI, are frequently utilized in phylogenetic reconstruction. Other genes display higher mutation rates, which, combined with the small affective population size for mtDNA, results in genetic drift (see Slide 20, Class II) that is more pronounced than that for nuclear genes. For this reason, mtDNA is especially attractive in the study of microevolutionary processes, and population genetics. Of particular interest is its application to phylogeography: the study of the processes controlling the geographic distributions of lineages by constructing the genealogies of populations and genes. See examples of lineages in Slide 26, Class II; phylogeographic reconstruction in Slides 1-3, Class III. http://www.cbmpan.gdynia.pl/Images/hex0.gif http://en.wikipedia.org/wiki/Mitochondrial_DNA http://en.wikipedia.org/wiki/Phylogeography image: human mtDNA http://jon.thackray.org/biochem/dna.html Slide 24 The genetic code Previously we saw that the information contained in the coding regions of DNA, either nuclear or mitochondrial, is transported by the processed messenger RNA to the cytoplasm where it will be translated into protein. How is this information coded? The mRNA is “read” from it 5’ end to its 3’ end by the tRNA every three bases. These two RNAs match by their complementary triplets, the codon in the mRNA and the anticodon in the tRNA by non- covalent Hydrogen bonds. After one “block” or Unit of information is read, the translation machinery moves one more space and reads the following codon, and adds the corresponding aminoacid to the growing 13 protein chain. Remember that the mRNA is copied from the antisense strand of DNA. Therefore, the codons are read along the mRNA in the same sense they are present in the DNA from the 5’ to the 3’ end. When we calculate all the possible combination of 4 elements taken by 3, we obtain a total of 64 different possibilities. However, there are only 20 aminoacids to code for. This results in aminoacids being coded by more than one possible combination, and for this reason, the genetic called is REDUNDANT. Besides, some codons give START or STOP signals to the translation process. For instance, AUG is the START codon and is going to define the reading frame of a string of nucleotides. The table in the figure shows the code for nuclear vertebrate DNA. We can see that some aminoacids are coded by 4 or by two codons. A codon is said to be “four-fold degenerate” if any nucleotide at the 3rd position determines the same aminoacid. Similarly, a “two-fold degenerate” codon, determines the same aminoacid when the 3rd position is occupied by any purine or any pyrimidine. Also, the genetic code is tolerant to mutations that do not affect the aminoacid property. For instance, NUN (N stands for any nucleotide) tends to code for hydrophobic aminoacids. 0-fold degenerate sites are those that determine and aminoacid change. The redundancy makes the genetic code more tolerant to fault mutations at the third position. Changes in the coding region that do not affect the result in the protein are called SILENT MUTATIONS and constituted the foundation of the Neutral Theory of Evolution. A few differences exist in the code between eukaryotes and prokaryotes, and consequently, the mtDNA code is more closely related to that of prokaryotes. genetic code http://employees.csbsju.edu/hjakubowski/classes/ch331/dna/trnacode.gif http://en.wikipedia.org/wiki/Genetic_code Slide 25 Examples genetic code This is an example of the alignment of a short fragment of DNA coding for the mitochondrial gene COI (cytochrome oxidase I) in several krill species of the genus Euphausia. Highlighted is yellow are the codons that present silent mutations; none of the observed differences across species determine aminoacid change. The nonsynonymous substitutions are shown in red. In the second aminoacid position of the alignment we see an example of aminoacid replacement. The mutation that gave origin to the aminoacid change from Glycine to Cysteine in Euphausia gibboides is in the 1st position of the codon. Alanine occupies the 3rd position in the aminoacid chain. This is a case of 4-fold degeneracy of the code, we can see the presence of either A, T, C or G in the third position of the corresponding codon in different krill species. 14 At the end of the alignment we can see several aminoacid replacements. Hystidine is replace by Glutamine, two charged aminoacids. The codons for these two aminoacids are 2-fold degenerate. We can see that a transversion at the 3rd position changing A to C (or vice versa) between Euphausia tenera and Euphausia tricantha still codes for Glutamine. Slide 26 Mutations The concept of mutation can be simply expressed as a change in the heritable material. The persistence of this new variants in the population or species will be ultimately dependant on other factors like their adaptive values, and if non-adaptive, would be dependant on external factors like selection, genetic drift and other stochastic factors. The most important changes of evolutionary significance are point mutations, duplications, chromosomal rearrangements and polyploidization. Slide 27 Mutations within loci The variation within loci gives origin to new alleles. Although this type of mutation is more commonly known as “point mutation”, there are other mechanisms that also generate new alleles: insertions and deletions. We previously saw point mutations in protein coding regions. Changes in a single nucleotide are called synonymous and non-synonymous substitutions when they are silent or determine aminoacid change respectively. Insertions and deletions in coding regions also occur, but unless the change involves 3 nucleotide, that change in the reading frame would determine this allelic variant being deleterious for the organism. Deletions and insertions are the mutations that generate allelic variation in microsatellites and minisatelllite loci. Changes in RNAs of can cause modification in their secondary structure. Slide 28 Point mutations We briefly explained before the constitution of the DNA and the organization of the complementary bases. Mutations do not occur entirely at random. Some changes are more frequent than other, for instance changes to another nucleotide of the same type, purine to purine or pyrimidine to pyrimidine are more frequent than changes that incur in a change of nucleotide type (purine to pyrimidine of vice versa). The first changes are called transitions whereas the second type of change are called transversions. 15 Slide 29 Insertions – deletions We previously saw a few examples of point mutations in protein-coding regions. In this slide, it is shown the effect of insertions and deletions in a protein coding region: The example shows an insertion with the consequence in the shift of reading frame in the corresponding protein. In this case, not only new aminoacids are incorporated into the protein, but also the translation process will finalize at an earlier point. Changes of this type can determine that the resulting product is non functional and therefore deleterious for the organism. Slide 30 Changes in ITS1 This is an example of an insertion (or deletion) in the ITS-1 region of two individuals of the krill species Euphausia recurva. We see that on single insertion (or deletion) of Adenine favors or prevents, repectively, the fomation of a 2-base-long small stem. The absence of this A therefore determines the formation of a bigger loop. The function of this secondary structure is unknown, and we cannot ascertain if it confers any adaptive value to the organism. Since this polymorphism is frequent in this species, we assume it doesn’t. WE have already seen examples of gene duplication in Slide 17. We will see other major changes in the genome at the chromosomal level. Slide 31 There are changes that also affect the structural organization of the genome at the chromosomal level. Closely related species sometimes show different number of chromosomes, and a few chromosomes that differ in size and shape. This is the result of a major processes of chromosomal rearrangements that produce breakpoints, fusion, fision, inversion of fragments within the same chromosome, and translocation of fragments between non-homologous chromosomes. Animations (not included): http://www.emunix.emich.edu/~rwinning/genetics/variat2a.htm Slide 32 Ploidy Ploidy is the number of single sets of chromosomes in a cell or organisms. For instance, we are diploid organisms as we carry 2 sets of homologous chromosomes. Our gametes, however, are haploid cells as they carry only 1 set of chromosomes. 16 Cells within the same organisms can have different ploidy: our liver cells are octaploid. Cells or organisms that carry more than 2 sets of chromosomes are known as polyploids (triploids when they have 3 sets, tetraploid when they have 4 sets). The evolutionary changes in the number of chromosome sets is very frequent among plants, that can give origin to different species after cross- species fertilization by simply no disjunction of chromosomes in the first meiotic division. As a result, the second meiotic division separates the sister chromatides of all existing chromosomes. Therefore, the new species will automatically carry all chromosomes, all genetic information from parental species. Examples of tetraploids are Pelargonium, maize, cotton, cabbage, and leek. Examples of hexaploids: wheat, oat. Examples of octaploids: strawberry, sugar cane. http://en.wikipedia.org/wiki/Polyploid#Polyploid_crops Slide 33 Mutation rates Mutation rate is the number of mutation events per unit of time. The units of time are measured differently depending on the objective of the study: we can consider generations, number of cell divisions, etc. In the initial studies of mutants the rate used to be expressed in number of mutation per million gametes, and it was obtained from counting mutant individuals resulting from controlled crosses. The rate at which new variant arises in a genome depends on the type of DNA and genome. This rate varies greatly across species, among genes within the same species and between the nuclear and the mitochondrial genome. In human, for instance, the incidence –or frequency per gametes- of mutations causing achondroplasia (dwarfism) is 4-12 x 10-5, and hemophilia A 2-4 x 10-5 . In an evolutionary scenario we are interested in expresing the mutation rates in other units, like mutations per generation For coding regions, on average the rates per generation are approximately ~10-8 - 10-9 per base pair (or site); ~10-6 - 10-5 per gene and per genome ~0.02 - 1 Microsatellites show a mutation rate that is 10 to 100 times faster. Mitochondrial DNA control region in humans is approximately 0.011. Form the figures in this slide we can see that the different mutation rates in different regions of the genome would allow to infer evolutionary processes at different level of organization. For instance, mitochondrial control region and microsatellites are very effective in the inference of historical and demographical scenarios, whereas the more conserved nuclear protein-coding genes would be uninformative at the population level, and are used, instead, in the inference of phylogenetic relationship between species. 17 http://www.zoology.ubc.ca/~whitlock/bio434/LectureNotes/06.Mutation/Mutation.ht ml http://biology.clc.uc.edu/courses/bio104/dna.htm mt mutation rate http://hammerlab.biosci.arizona.edu/Publications/BonneTamir_2003.pdf Reference: Am J Hum Genet. 2000 May;66(5):1599-609. Epub 2000 Apr 7. Slide 34 Molecular clocks Tightly linked to the concept of substitution o mutation rate is the idea of “molecular clocks”. This concept implies that the rate at which mutations occur and are kept constant along the evolutionary time, in other word, the mutations “click” regularly spaced in time. Although this is not absolutely true, as the mutation rates varies along and between lineages, it can give an approximate idea of speciation times. For instance, it is widely accepted the the general rate for mtDNA genes is a change of 2% per million years. 18