Download The Central Dogma of Genetics

Document related concepts

Polyadenylation wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Nucleosome wikipedia , lookup

Expression vector wikipedia , lookup

Molecular cloning wikipedia , lookup

Genomic library wikipedia , lookup

Gene regulatory network wikipedia , lookup

RNA wikipedia , lookup

Community fingerprinting wikipedia , lookup

Messenger RNA wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Genetic code wikipedia , lookup

DNA supercoil wikipedia , lookup

RNA silencing wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Biochemistry wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Lac operon wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Biosynthesis wikipedia , lookup

Epitranscriptome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene expression wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Transcript
GENOMES & GENOME EVOLUTION
Genomes and Genome Evolution - BIOL 4301/6301
What to expect and some suggestions.
– I like think of myself as fair but reasonably tough
• I want people to do well but I’m not willing to compromise on the material or ethical
guidelines to make it happen.
– There is no extra credit. This is non-negotiable.
• Study for the exams and do well on them.
– Ask questions IN CLASS
•
•
•
•
•
Makes things more interesting for me
Others probably have the same question
You’re paying, get your money’s worth
Interactions with other humans tends to wake people up
Office hours!!!!!!!!!! I have them. Take advantage.
– I am an evolutionary biologist. This class is taught from an
evolutionary perspective.
Genomes and Genome Evolution - BIOL 4301/6301
What to expect and some suggestions.
– Absorb and critique anything related to the subject. This
includes but is not restricted to:
–
–
Popular news articles, TV shows (CSI, Bones, etc.), textbooks, wikipedia, etc.
Genomics is everywhere.
Bring in what you find for discussion.
– Website - http://www.myweb.ttu.edu/daray/Teaching.htm
–
–
–
–
–
Username & password
Again, ask questions during class
Ask questions DURING CLASS
Did I mention that you should ask questions during class?
You WILL see pictures of my adorable children. This is also
non-negotiable.
Course Objectives and Assumptions
• Objectives: By the end of this course you should be able
to…
• describe the methods and principle of modern
genome analysis
• describe the components and structure of viral,
prokaryotic and eukaryotic genomes
• explain the basic techniques of genome sequencing
and analysis
• describe the way genomes change over time
• apply principles of genomics to modern biological
questions
• explain the outcomes of a variety of genome projects
Course Objectives and Assumptions
• Assumptions: I am assuming that you…
• have a working knowledge of Mendelian genetics
• have a working knowledge of DNA, RNA and proteins
• understand the basic differences between eukaryotes
and prokaryotes
• have a basic understanding of the concept of a gene
• have a working knowledge of the ‘central dogma’ of
Biology
• give a rat’s behind about learning this stuff
• Have considered enrolling in Bioinformatics. While
not required, it would be a good idea to take Caleb
Phillips’ course concurrently
“No course should ever be taught the first time”
UNIT 1
FUNDAMENTAL CONCEPTS
The biggest failure of science education is…
• Most people can’t discriminate between what is
scientific and what is not scientific.
• This is due, in part, to the fact that definitions of
science tend to be fairly nebulous.
• Moreover, any moron can get a Ph.D.
Science
• A method for discovering how the world around us works
• Assumes that all things can be explained by natural
processes
• Does not allow supernatural explanations
• Why?
• Rooted in hypothesis formation, observation, testing, and
constant re-examination of evidence
• Hypotheses MUST be abandoned if they are not supported
by evidence
• The scientific community is intensely critical of its own ideas
and the ideas of others. The advantage of this isn’t that
mistakes aren’t made, its that this method pretty much
guarantees that mistakes are caught quickly.
Science
• Step 1. Propose as many ideas as you can think of to
explain a phenomenon then pick one or several.
• Step 2. Try to disprove it/them.
• Step 3. Allow others to try and disprove it/them.
• Basic philosophy - Ideas that survive this process are more
likely to reflect the real world than ideas that don’t.
Other belief systems
Religion
• A way of “knowing” that is not rooted in scientific
principles, but rather is based upon alternate
philosophies, mythologies, etc. Most religions
have some supernatural aspects. Many religions
are opposed to critical inquiry of the beliefs
professed.
Pseudoscience
• Any non-scientific belief system that uses
scientific jargon in an attempt to give it scientific
credence. Again, criticism of the concepts is
often discouraged.
Ways of thinking
• Of these ways of thinking, science is the “new kid on the
block”
• Science is a relatively new invention (arguably only a few
hundred years old, if that)
• But think of all the progress that’s been made in those few hundred
years because of scientific thinking
UNIT 1
FUNDAMENTAL BIOLOGICAL
CONCEPTS
Genome
• Definition depends upon organism, organelle, or virus one is
•
•
•
•
talking about
Generic definition: Minimum DNA complement that define an
organism/organelle/virus
Organelles are not, in and of themselves, living creatures.
Thus something can have a genome and not be “alive.”
Viruses may or may not be alive, depending upon how one
defines life
The dead have genomes too.
Things with genomes
• Prokaryotes
• Monera (bacteria)
• Archaea
• Mitochondria
• Chloroplasts
• Viruses
• Eukaryotes
• Animals
• Plants
• Fungi
• Protists
Things without genomes
• Dirt
• Rocks
• Water
• Air
• Fire
• But even these things
may be contaminated
with genomic DNA
(…well, maybe not fire)
What genomes can and can’t do
• A genome constrains but does not dictate the
features of an organism
• Environmental impacts
• Toxins, exercise, exposure to disease
• Epigenetic impacts
• If someone were to clone Hitler…?
Genomics
• Study of genomes?
• Research in which robotics, automated
sequencing, and advanced computational
methods are utilized to rapidly and efficiently
characterize genomes and their components
The Central Dogma
• DNA  RNA  Protein
• Generally unidirectional
Nucleic Acids
• Ribonucleic acid (RNA) and
deoxyribonucleic acid (DNA)
• Composed of chains of
nucleotides (ribonucleotides
for RNA, deoxyribonucleotides
for DNA)
Nucleic Acids
• Deoxyribonucleic acid
• A polymer of nucleotides
linked by phosphodiester
bonds
Nucleic Acids
– Purine vs. pyrimidine
– Carbon positions
Nucleic Acids
• Deoxyribonucleic acid
• Antiparallel strands held
together by hydrogen bonds
• Strands are complementary
DNA in 3D
Scanning-tunneling
electron micrograph
Pretty uncanny resemblance, don’t you think?
Nucleic Acids
• Deoxyribonucleic acid can denature, renature &
hybridize
• Denaturation – separation of the double helix by the
addition of heat or chemicals
• Renaturation – the reformation of double stranded DNA
from denatured DNA
• The rate at which a particular sequence will reassociate
is proportional to the number of times it is found in the
genome
• Given enough time, nearly all of the DNA in a heat
denatured DNA sample will renature.
Nucleic Acids
•
•
•
•
Ribonucleic acid
Ribose vs. deoxyribose
Thymine = 5 methyl-uracil
Usually single stranded
Nucleic Acids
• Intramolecular basepairing
• Enhanced base-pairing
capacity due to G:U bonding
• Hairpins
• Bulges
• Loops
• Stem-loop structures
• Pseudoknots
Nucleic Acids
• Complex tertiary
structures
• Much more flexible than
DNA
• Capable of triple bonds and
base-backbone interactions
• Often ‘molded’ by proteins
and snoRNPs
• Leads to complex 3°
structures with catalytic
capability - ribozymes
Nucleic Acids
NB
DNA
RNA
P
NB
OH
O
C
OH
O
NB
OH
O
P
O
C
O
OH
OH
P
O
C
OH
RNA World
• RNAs can have complex 3D structures
• They can store genetic information
• Some RNAs known as ribozymes can catalyze reactions
• Thus it has been hypothesized that life may have arisen first through
RNA with protein and DNA being integrated later
Replication
• DNA is replicated in a semi-conservative
fashion, i.e., each daughter molecule is
composed of one strand of the original
molecule and one newly synthesized
strand.
• DNA polymerase is the enzyme that
catalyzes synthesis of new strands out of
dNTPs.
Replication: Key points
• DNA polymerase cannot generate a new strand
•
•
•
•
•
•
without a 3’ OH on which to add a nucleotide.
Primers are required.
New strands generated from 5’ to 3’.
Replication is bidirectional. Replication forks
proceed from an initiation site in both directions.
Multiple sites of initiation are found along a
chromosome. Initiation sites are often AT rich as
AT base pairs are less stable and thus come apart
more easily.
Okazaki fragments are generated along lagging
strand.
http://www.johnkyrk.com/DNAreplication.html
http://www.dnalc.org/resources/3d/04-mechanismof-replication-advanced.html
RNA
• Normally single-stranded
• Generated from NTPs by RNA
polymerase using DNA as a
template (transcription)
• As with DNA replication, new strand
assembled in 5’ to 3’ direction by
phosphodiester bond formation
• RNA is inherently less stable than
DNA
Major types of RNA
• Messenger RNA (mRNA) – carries genetic
instructions (coded in DNA) from the
nucleus into the cytoplasm. mRNA
molecules are often called transcripts.
• Ribosomal RNA (rRNA) – a structural
component of ribosomes (the complexes
that are involved in assembling proteins
based upon information in mRNA
templates)
• Transfer RNA (tRNA) – acts as carrier of
amino acids during protein assembly
• Regulatory RNAs – Many groups; miRNAs,
siRNAs, CRISPR RNAs, antisense RNAs,
long non-coding RNAs
Transcription
• Generation of an RNA strand from a
DNA template
• Much of the control over cell
development comes at the
transcriptional level – All somatic
cells have same DNA but can differ
tremendously in morphology and
function
• Differential gene expression
Transcription: Key points
• Transcription starts at the promoter, a site along the DNA
•
•
•
•
•
•
molecule where RNA polymerase binds.
RNA polymerase is recruited to the promoter by
transcription factors.
New strand generated from 5’ to 3’.
Only one of the two DNA strands serves as a template
(antisense strand). The other strand (sense strand) has
the same sequence as the mRNA molecule except
dTMPs have been substituted with UMPs.
Which stand is used as a template differs between
genes.
After transcription, mRNA undergoes post-transcriptional
modifications. Generally, a methyl-guanosine cap is
added to the 5’ end and a tail of adenosine nucleotides
(poly-A tail) is added to the 3’ end.
In eukaryotes, the mRNA undergoes post-transcriptional
splicing – introns are removed and exons are spliced
together.
Transcription models
• http://www.johnkyrk.com/DNAtranscription.html
• http://www.dnalc.org/resources/3d/13-transcription-
advanced.html
A few definitions
• Precursor mRNA (pre-mRNA) or heterogeneous nuclear RNA
(hnRNA): mRNA immediately after transcription and before posttranscriptional modification
• Mature mRNA (or simply mRNA): Transcript after post-transcriptional
modifications.
• cDNA (complementary DNA): A DNA molecule generated in a reaction
catalyzed by reverse transcriptase using mature mRNA as the
template.
rRNA
• Associated with proteins to form
ribosomes
• Several different rRNAs
• Genes that code for rRNA are
typically referred to as rDNA
sequences
• rDNA sequences found in more or
less tandem repeats in genome
tRNA
• tRNA molecules deliver amino acids to ribosomes during
•
•
•
•
•
•
protein synthesis (translation)
tRNAs have considerable secondary structure due to base
pairing
Clover leaf 2D structure
L-shaped 3D structure
There are more than 20 tRNAs (i.e., there is some
redundancy)
tRNA structure is highly conserved (e.g., human tRNAs can
function in yeast)
http://www.myweb.ttu.edu/daray/Genomes/ribosome/riboso
me/ribosome_jmol_play.html
Amino acids
• Proteins are made of chains of amino acids
• There are 20 amino acids utilized by biological
•
•
•
•
systems
Each codon in mRNA represents an amino acid or
a start/stop signal
Amino acids can be acidic (net negative charge),
basic (positive charge), uncharged polar (ends
have different net charges), and non-polar.
Uncharged polar, acidic, and basic amino acids
tend to be hydrophilic and thus are often found on
the outside of proteins.
Non-polar amino acids tend to be hydrophobic and
thus are clustered in the middle of proteins.
Genetic code
Formation of a peptide bond
• At physiological pH (7.0), both the amino
and carboxyl groups are ionized.
• The peptidyl transferase ribozyme
catalyzes the formation of peptide bonds
with the concomitant release of a water
molecule.
Translation
• Construction of an amino acid chain (protein) by a
ribosome based upon the nucleotide sequence of a
mRNA molecule
• While there are minor differences between eukaryotic and
prokaryotic translation processes, most steps in
translation are well conserved.
http://www.johnkyrk.com/DNAtranslation.html
Spatial separation of transcription and translation
is seen in eukaryotes, not prokaryotes
What is a gene?
• How do we identify a gene?
• A priori methods –
• recognize sequence patterns within expressed genes and the regions
•
•
•
•
flanking them
Distinctive patterns of codon statistics (most obviously, a reduced
frequency of stop codons)
Proximity of start codon and known promoter sites
GT/AG pairs in exons
Codon usage statistics can be ‘typical’ of genes in an organism
• Use a set of known genes to identify regions with similar codon usage stats
• ‘Been there, seen that’ methods –
• Recognize regions corresponding to previously characterized genes.
• The changing definition of a ‘gene’
The structure of a typical coding gene
Genes vs. alleles vs. loci
• Gene: “Region of DNA that controls a discrete hereditary
characteristic, often (but not always) corresponding to a
single protein or RNA. This definition includes the entire
functional unit, encompassing coding DNA sequences,
non-coding regulatory DNA sequences, and introns.”
• Allele: “One of a set of alternative forms of a gene.”
• Locus: “The position of a gene on a chromosome.
Different alleles of the same gene all occupy the same
locus.”
• Definitions from Alberts et al. (1994)
Recombination
• Protein-mediated (1) exchange of a DNA
region between two different DNA molecules
OR (2) replacement of a DNA region in one
molecule by DNA from another
• Almost always requires at least some
homology between sequences involved
Recombination
• Non-homologous recombination
• Duplication/deletion
Recombination
• Gene Conversion
• Non-crossover recombination –
replacement of one allele with an
alternative
• Function and impacts
• Regulation of gene expression
• Homogenization of genome
sequence
• 21-hydroxylase – 95% of
pathogenic mutations arise by
gene conversion of neighboring
pseudogene
Expression patterns
• There are ~23,000 protein coding human genes, which
can give rise to a minimal protein set
• No single cell needs to express all of those proteins
• Ex. - Lac operon in bacteria, insulin in humans
• Or may need alternate versions of them
• Alternative splicing
• The amount of a protein must also be regulated
• Overexpression of a single gene rarely causes disease but,
• Lack of expression of a single gene can cause major problems
Expression patterns
Xeroderma pigmentosa (XP)
7 distinct types, all caused by
deficient NER system
Extreme sensitivity to sunlight,
high incidence of skin cancer
DNA repair enzyme containing
creams help
Transcriptional regulation
• Most regulation takes place at the transcription level
• Simple in prokaryotes - Repressors, activators, the lac
operon
Transcriptional regulation
•
•
•
•
•
•
•
•
•
The lac operon
Leaky control of lacZ
Allolactose version of lactose actually metabolized
Allolactose acts as a ligand that turns on transcription
(deactivates repressor)
Lactose converted to allolactose using β-galactosidase
How, if lacZ turned off?
Leaky genes
Every once in a while, RNA pol slips into place on the
promoter in place of repressor
Constitutive low level expression
Protein activity regulation
•
•
•
•
Protein turnover
Chemical modification
Inhibition
Allostery
Transcriptional regulation
•
•
•
•
The lac operon regulation
Lactose+, glucose- environment
Allolactose acts as the ligand that
turns on transcription
• Allolactose binds to lac repressor
to allosterically disable binding to
operator
cAMP levels in cell inversely related
to glucose levels
• Low glucose = high cAMP
• cAMP allosterically activates
CAP
+lactose
+allolactose
Allosteric binding to lac repressor
+lacZ expression
+lactose metabolism
-glucose
+cAMP
Allosteric binding to CAP
+lacZ expression
+lactose metabolism
Transcriptional regulation
•
•
The lac operon regulation
Lactose-, glucose+ environment
-lactose
-allolactose
No allosteric binding to lac repressor
lacZ repression
-lactose metabolism
+glucose
-cAMP
No allosteric binding to CAP
-lacZ expression
-lactose metabolism
Transcriptional regulation
•
•
•
The lac operon regulation
Lactose+, glucose+ environment
Repressor not inhibited but
expression not increased by CAP
+lactose
+allolactose
Allosteric binding to lac repressor
+ lacZ expression
+lactose metabolism
+glucose
-cAMP
No allosteric binding to CAP
No activation of CAP
No increased lacZ expression (basal metabolism)
Transcriptional regulation
• Most regulation takes place at the transcription level
• Much more complex in eukaryotes
Transcriptional regulation
• Most regulation takes place at the transcription level
• Increased complexity in eukaryotes - β-globin regulation
Transcriptional regulation
•
•
•
•
Gene silencing
Imprinting – selective expression of one parental allele
Neighboring genes, Igf2 and H19, are on and off depending on parental
source
• Igf2 – Insulin-like growth factor 2
• Highly active during fetal development
• H19 – a non-coding RNA
• May act as a tumor suppressor
What is involved in this regulation?
• Downstream enhancer
• CTCF – regulatory protein
• ICR – imprinting control region
Transcriptional regulation
•
•
•
•
Gene silencing
Activators bound to enhancer could potentially activate both genes
Maternal chromosome is unmethylated in this region
• Lack of methylation allows binding of CTCF to ICR
• CTCF blocks activation of Igf2
• … allows activation of H19
Paternal chromosome is methylated in this region
• Methylation blocks binding of ICR
• … blocks activation of H19 via MeCP2
Transcriptional regulation
•
•
•
•
•
•
Gene silencing
Beckwith-Wiedemann syndrome (BWS)
~1/15,000 births
Increased risk of cancer (Wilms’ tumor)
Hemihypertrophy
Improper imprinting
• Biallelic expression of Igf2
• No expression of H19
Translational regulation
•
•
•
•
•
RNA interference
Ligand binding to Shine-Delgarno
RNA lifespan
Alternative splicing
tRNA availability/codon usage
Supplemental review
• Review material to brush up on these subjects is available
on the course website
• Structural tutorials
• Walkthroughs of DNA synthesis, DNA replication,
Transcription, Translation, Recombination, etc.