Download mapping within a gene

Document related concepts

Gene expression profiling wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene regulatory network wikipedia , lookup

Gene desert wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Molecular cloning wikipedia , lookup

Deoxyribozyme wikipedia , lookup

RNA-Seq wikipedia , lookup

Genetic engineering wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Mutation wikipedia , lookup

SNP genotyping wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Genome evolution wikipedia , lookup

Community fingerprinting wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
LECTURE #12
SNPs & LINKAGE: POSITIONAL CLONING
Mapping within a gene
• can recombination happen within a gene?
• in other words – do gene mutations change the whole gene at a single stroke?
• or does a mutation change only a specific part of the gene where the mutation is
located?
• how do you locate these mutations?
• these questions are answered by mapping within a gene
Seymour Benzer
• used recombination analysis to show that two
different mutations that don’t complement each
other (i.e. make up for each other) are located
on the same gene
• proposed that XO can also occur within a gene =
Intragenic Recombination
• spent 10 years working on this project!!
Complementation Testing
• cross-overs between homologous chromosomes with different mutations can generate:
• 1. a wild-type allele AND
• 2. a recombinant allele with both mutations
• determined through Complementation Testing
T4 bacteriophage
• Benzer chose to work with the T4
bacteriophage and the gene
called rII (rII locus)
• bacteriophage are viruses that
infect bacteria
• T4 infects E.coli
• head –haploid DNA genome of
200,000 bp or 120 genes
• 17 genes are essential to DNA
replciation
• 48 genes required for production of
tail and tail fibers
• remaining genes function in the life
cycle in other ways
T4 Life Cycle
• infection – introduction of T4 genome into bacterial cell
• phage can only replicate in bacterial cells
• bacteria stops making its own proteins and begins to make T4 proteins
• 30 minutes later – lysis and release of progeny T4 (300 per bacterial cell)
~ 30 minutes
300 progeny
per cell
~ 2 minutes
~ 6 minutes
~ 17 minutes
Advantages to using bacteriophage
• each phage can produce 100 – 1000 progeny every hour (depending on
environmental conditions)
• easy to produce large numbers of progeny – allows for detection of rare genetic
events
• certain conditions can allow for the proliferation of only recombinant phages and the
death of parental phages
T4 Bacteriophage: Experimental Protocol
• mix bacteria and bacteriophage and pour onto an agar plate – create a lawn of bacteria and
phage
• death of the bacteria because of the phage leaves a clear area on the plate called a plaque
• the release of phage from one bacterial cell diffuses away to infect and kill neighboring bacteria producing the
plaque
• a typical plaque = 1 x 106 to 1 x 107 viral progeny
Benzer and the T4 Complementation assay
• Benzer created T4 mutants that produced large plaques because of abnormally rapid lysis of bacterial host cells
• called these mutations ‘r’ for rapid
• many of these r mutations mapped to the rII locus in the phage genome
• phage with mutations – rII• wild type phage – rII+
• infected two strains of E.coli – B and K(l)
• observed different phenotypes depending on the strain infected
note that rII- phage
cannot form plaques
in K(l) bacteria
Benzer and the T4 Complementation assay
• but before Benzer could perform his experiments on recombination – he had to confirm that
when he infected E.coli – two T4 phages entered the host cell
• used a simple complementation test using two different types of T4 phage
• used these phage to identify two complementation groups (i.e. genes) in the rII locus:
• group rIIA – mutations in gene rIIA
• group rII B – mutations in gene rIIB
rIIA
• used the K(l) bacteria because
only a wild type rII gene can
cause lysis in this bacterial strain
rIIAA
rIIB
rIIB B
Benzer and the T4 Complementation assay
• reasoning: if the two mutations were on different genes – recombination would
regenerate a wild-type allele – bacterial lysis of K(l) would result = plaques
rIIA
rIIB
Benzer and the T4 Complementation assay
• if they were on the same gene – no lysis and plaques would result
rIIB
Can Recombination occur between two mutations in a single gene?
• having confirmed that he could use two T4 phages and get them both to
infect a single bacterial cell
• used two T4 phages BUT these phages had different mutations IN THE
SAME GENE
• called these rII- mutations rIIA1 and rIIA2
• devised a simple and elegant test based on his observation that two T4
genomes must enter a bacterial cell for complementation to happen
between these two mutations
• used the two strains of E.col called B and K(l)
• rII- mutant bacteriophages cannot form plaques in the K(l) strain
Can Recombination occur between two mutations in a single gene?
• REASONING: if recombination takes place within a gene – then it will generate a wild type allele (with no rII
mutations) and a mutated allele with both rIIA1 and rIIA2 mutations
• those bacteriophage with the wild type allele can lyse K(l) bacteria and produce plaques
• those bacteriophage with both rIIA mutations cannot lyse the bacteria and produce plaques
• the presence of plaques
allowed Benzer to confirm
that recombination can
happen within a gene
Can Recombination occur between two mutations in a single gene?
• as a control: he infected K(l) bacteria with only one of the two T4 mutant
strains
• proving that T4 with mutated rIIA genes are unable to lyse K(l) bacteria and
produce plaques
Trans vs. Cis
• complementation test on T4 showed two things:
• 2 mutants were in two different genes = trans configuration
• 2 mutants on the same gene = cis configuration
• complementation tests are also known as a cis-trans test
• Benzer called any complementation group identified by this test = cistron
• often used synonomously for gene
Using deletions to map mutations on the same gene
• cross a bacteriophage with a mutation with a bacteriophage with a deletion
in that region
• scan slide
• if point mutation is located in the same region as the deletion – no
recombination is possible  no plaques
• if the point mutation lies outside the deletion – recombination can occur
Using deletions to map mutations on the same gene
• crosses between an uncharacterized mutation and a known deletion will reveal where this
mutation is = Deletion Mapping
Benzer’s Strategy for Fine Mapping
• fine mapping – genes are made up of discrete units arranged in a linear fashion in a small
portion of a chromosome
• his strategy:
• 1. co-infect mutants to define complementation groups (genes)
• determine relationships between deletions
• 2. co-infect each rII- mutation with deletions to group them
• to localize the point mutation
• 3. look for recombination between mutations in the same deletion region
• produces a “fine map” of a specific gene
Deletion Mapping of the rII locus
• Benzer divided the rII regions into a series of intervals
• assigned a mutation to an interval by looking to see if recombination occurred to give the wild type allele
and infection/plaque formation
• mapped 1612 mutations and several deletions
• hot spots = regions
that spontaneously
mutate more
frequently
Take a Break
NEXT TOPIC: LINKAGE ANALYSIS IN
HUMANS – POSITIONAL CLONING
Detecting a mutation within a gene
• cystic fibrosis
• in vitro fertilization
• blastocyst – remove a cell
• PCR to amplify chromosome 7
• perform a Southern Blot
• run the PCR product on an agarose gel and transfer to a nitrocellulose membrane
• expose the membrane to 2 probes:
• 1. wild type sequence
• 2. mutant sequence
• alternatively – could PCR the
region known to have the CFTR
mutation
• but what if you don’t know where
the mutation is exactly on a
gene??
Genotyping
• genotyping = to detect which alleles are present in the donor cells
• geneticists use an array of molecular tools to detect DNA differences among individuals
• e.g. PCR amplification of regions of genome
• DNA genotyping can then predict the possibility of a disease
Strategies for Analyzing Genomes
• 1. Genotyping protocols
• DNA fingerprinting, PCR, microarrays
• 2. Positional cloning
• application of genotyping protocols to linkage analysis in an
organism
• 1. mapping
• 2. cloning
• 3. mutation screening
• 3. Haplotype associated studies
• mapping of a disease locus in humans
• relies upon the chromosome’s evolutionary history
Positional Cloning Key Concepts
• locating a gene to a location on a chromosome – Positional Cloning
• PC is the method of choice for identifying genetic mutations underlying
diseases with simple Mendelian inheritance
• it is a method of gene identification in which a gene is identifying only by its
approximate chromosomal location
• the candidate region for the gene is initially ID’d by linkage analysis
• PC is then used to narrow the region
Positional Cloning Key Concepts
• how does linkage mapping work in positional cloning?
• remember linked alleles co-segregate
• co-segregation increases as map distance decreases
• the frequency of recombination is proportional to map distance – RF increases as map
distance increases
• strategy: pinpoint the gene’s location on a chromosome by determining which
nearby alleles segregate with it
• the more often they segregate together – the closer the gene is to that allele’s region on
the chromosome
• PC is very effective for locating disease genes with Mendelian inheritance
• e.g. muscular dystrophy, cystic fibrosis, Huntington’s disease
Positional Cloning Key Concepts
• PROBLEM: finding non-disease alleles in humans that are single trait genes with
straightforward Mendelian inheritance
• not enough phenotypically evident single gene traits with which to analyze linkage
• most phenotypes in humans are caused by more than one gene trait
• PROBLEM: can’t do two and three-point test crosses in humans
• BUT the entire human genome has been sequenced
• can now map with respect to genetic variations that don’t cause a visible phenotype
• e.g. map with respect to a single nucleotide polymorphism or SNP
Positional Cloning in Humans
• in humans – link a gene to genetic variations known as
polymorphisms
Polymorphisms
• members of the same species show enormous DNA variation in their genomes
• 250 kb region of the CFTR gene
• 1 difference every 1000 bp
• 250 differences total
• of the 250kb – only 4.0 kb of this region (2%) is the exons that code for the CFTR protein
• mutations in these exons are negative mutations and are not usually passed on
• variations in the intron sequences are passed on – little to no effect on the health of the individual
Polymorphisms
• variations at a position in the genome can be considered as an alternate allele of a specific
•
•
•
•
•
•
•
•
locus
• originally studied using breeding and mutation studies
• now studied molecularly
two or more alleles at a specific locus = polymorphic locus
variations = DNA polymorphisms
for genotyping – are 5 classes of DNA polymorphisms:
1. SNP – single nucleotide polymorphisms
2. microsatellite & minisatellite DNA – simple sequence repeats (SSRs) or small tandem
repeats (STRs)
3. Insertion/Deletions or DIPs
4. Copy # variation (CNV)/Copy # polymorphisms - VNTRs
5. Complex Variants
Polymorphism
SNP
DIP
SSR/microsatellite
Copy # variant
Frequency – 1
per….
1 bp
1 kb
1 to 100 bp
10 kb
1 to 10bp repeats 30 kb
10 bp to 1 Mb
3Mb
Size
SNPs
• single nucleotide polymorphisms
• most prevalent type of polymorphism
• about 1/700 base pairs differ in the human genome
• arise from a mutation of a single base pair
• errors in DNA replication or from a mutagen
• still a very low mutation rate – 1 in 30 million bases
• can be located anywhere in the genome – inside and outside
of gene
• bi-allelic – have two forms (maternal and paternal)
• over 50 million SNPs known to date
• 15 million of them are human
• database - http://www.ncbi.nlm.nih.gov/SNP/
• most have no phenotypic effect
SNPs
SNP Variations
• most SNP variations in humans are confined to a limited number of positions
• e.g. genomes of Craig Venter and James Watson and an anonymous donor were analyzed
for known human SNPs
• each of the three men have over one
million unique SNPs
• 2.6 million SNPs were shared either by
two or all three of them
• most were “silent” SNPs – no effect on
genes
• 5000 SNPs had an effect on the amino
acid sequence of a protein
SNP Variations
• chromosome 7 – 400 kb region (base pair 116,700,001 to 117,100,000)
• vertical lines represent the locations of the SNPs
• all SNPs = human data base of SNPs
• 3.3 million used SNPs to distinguish the two genomes
• 82% of known human SNPs (i.e. all SNPs) found in Venter
• 86% of known human SNPs found in Watson
• 20 kb block of SNPs found to be in common between the two men and the human database
Detecting SNPs
• How are SNPs be detected?
• 1. Sequencing – expensive
• 2. Restriction Fragment Length Polymorphisms (RFLP)
• relies on the use of restriction enzymes that cut DNA sequences at
specific sequences
• Restriction Enzyme – isolated from bacteria
• used by bacteria to “cut-up” invading DNA – e.g. from bacteriophages
• recognizes a unique sequence
• the RE cuts the phosphodiester bond between two nucleotides within that
sequence
• produces either “sticky” ends or “blunt” ends
RFLP Analysis
• several EcoRI sites located within the human genome
• cutting the genome gives a specific pattern of DNA fragments when run on an agarose gel
• if one EcoRI site has an SNP – the RE will no longer cut
• OR an SNP could create an extra EcoRI site in the genome
• run the results of the restriction enzyme digest on
an agarose gel
• polymorphisms result in a unique banding pattern
for each DNA sample
RFLP Analysis
• three vs. two restriction enzyme sites in a genome
• allele 1 – three sites  4 fragments
• allele 2 has lost one site due to an SNP  3 fragments
RFLP Analysis & the Southern Blot
• fragments are separated by size on an agarose gel
• the DNA is transferred to a positively charged membrane overnight (through simple
attraction of –ve DNA to the +ve membrane)
• the membrane is then incubated with
“probes” to well-known SNP
sequences
• the probes hybridize to the bands on
the membrane and are easily
visualized
Jack and Jill and RFLP
• Jack and Jill – section of the genome with two EcoRI sites
• cut their DNA samples with EcoRI
• since Jack and Jill are diploid – they have two forms of this section of DNA – i.e. two alleles
• the allele shown below – they are identical
Jack 1: -GAATTC---(8.2 kb)---GCATGCATGCATGCATGCAT---(4.2 kb)---GAATTCJill 1: -GAATTC---(8.2 kb)---GCATGCATGCATGCATGCAT---(4.2 kb)---GAATTC-
Jack and Jill and RFLP
• but this allele – they are NOT
• Jack is missing the EcoRI site on the left side of the genomic fragment
Jack 2: -CCCTTC---(8.2 kb)---GCATGCATGCATGCATGCAT---(4.2 kb)---GAATTCJill 2: -GAATTC---(8.2 kb)---GCATGCATGCATGCATGCAT---(4.2 kb)---GAATTC-
• therefore, when Jack and Jill have their DNA subject to RFLP analysis, they will have one band in common and
one band that does not match the other's in molecular weight:
Jack and Jill and RFLP
• to easily identify the differences – Southern blotting using a probe to a piece of DNA that lies between the
two EcoRI sites
• because Jack is missing his “left-side” EcoRI – he will have a larger DNA fragment than Jill and the probe
will make it easy to see on the gel
Detecting SNPs
• more modern methods of detecting SNPs now exist
• allow for the detection of millions of SNPs
• 3. Microarrays:
• detection of SNP alleles at over 1 million loci in the human genome
• freely accessible database at the NCBI
SNPs and Positional Cloning of a Disease-causing gene
• GOAL: locate the position of a disease-causing gene in the human
genome
• WHY?
• better basic understanding of the disease process
• better diagnosis of the disease
• design of more specific treatments
• gene therapy to “cure” the disease
SNPs and Positional Cloning of a Disease-causing gene
• STRATEGY: collect the DNA from populations in which the disease has been
characterized
• easiest diseases to positionally clone are Mendelian
• create a pedigree chart
• look for alleles that are found in diseases individuals more often than by
chance – Linkage analysis
• a lot like locating the “eye size” gene in Drosophila
• indicates that the disease gene lies nearby
• narrows the location to a chromosome or even a region on a chromosome
• use linkage analysis & SNPs to narrow the region even further
SNPs and Positional Cloning of a Disease-causing gene
• because SNPs are scattered throughout the human genome - they make it possible to test
linkage to a gene with virtually any genomic location
• APPROACH:
• to effectively locate the disease gene – should use SNPs that are located every 10 centiMorgans (or map
•
•
•
•
•
units)
in the human genome – a cM is about 1,000,000 bp
the human genome is 3000 cM – so 300 SNPs will “cover” the entire genome
of these 300 – find the SNP alleles that segregate with the disease more often than by chance
these are the linked SNPs
if they are not linked then the SNP and disease gene will segregate with equal frequency in diseased and
non-diseased individuals
SNPs and Positional Cloning of a Disease-causing gene
• once the disease gene has been localized
to a smaller region of a chromosome – use
SNPs in that smaller region
• genotype the individuals of the pedigree
(diseased and non-diseased) for the SNPs
in this region
• determine if these new SNPs are linked
SNPs
• if there is less than 1% recombination
between the SNP and the disease locus –
then the gene is less than 1 cM away
SNPs and Positional Cloning of a Disease-causing gene
• continue to narrow down the region
until you find the SNPs that show
100% linkage with the disease
• you have identified its specific location
on the chromosome
• this entire process of narrowing down
is known as Positional Cloning
Huntington’s Disease
• 1st gene to be positionally cloned
• 1984
Mapping – the X chromosome
Other DNA polymorphisms: DIPs
• deletion-insertion polymorphisms
• also know as InDels
• typically a few base pairs in length
• 2nd most common form of genetic variation
• result from errors in replication or DNA repair
• if they occur in a protein coding region – can produce a shift in reading frame
• e.g. Venter genome vs. human reference seq database – 292,102 unique DIPs
• 1bp to 571 bps
• as the length of the DIP increases, their frequency decreases
• e.g. CFTR gene – DIPs once every 10 kb of DNA
• -75% are only 1 to 2 base pairs
Other DNA polymorphisms: SSRs
• simple sequence repeats or microsatellite DNA
• also known as small tandem repeat (STRs)
• arise by random events that produce a short repetitive sequence – 4 to 5 bp units
repeated
• can be highly polymorphic in the number of repeats – repeats from 10 to 100 times
• produces alleles
• e.g. maternal allele – 10 repeats
•
paternal allele – 25 repeats
• produced by the “stuttering” of the DNA polymerase during replication
• one of the most common SSR is a 2 bp repeat – “CA” repeat
Other DNA polymorphisms: SSRs
• once they form – they can lengthen by the “stuttering” of the DNA polymerase during
replication
• DNA polymerase pauses - top daughter
strand slips and produces a loop
• replication continues and the daughter
strand is replicated
• during the next round of replication when
this daughter strand is the template – the
“straightening” out makes for a longer
template and the resulting
template/daughter ds helix now has an
increased # of repeats
SSRs can be detected by PCR
• SSR alleles differs in length
• e.g. maternal allele 1 – 15 repeats
•
paternal allele 2 – 35 repeats
• PCR using primers that flank these
SSR regions
• size differences easy to see on a gel
• larger SSR allele won’t run as far
into the agarose gel
Polymerase Chain Reaction PCR
• used to amplify specific regions of DNA
• mimics DNA replication that takes place in cells
• uses temperature to mimic some of the
enzymatic steps
• e.g. heat ds DNA to 94C to “melt” or denature the two
strands – mimics the helicase
• uses custom designed primers that target
specific DNA sequences
• takes the place of primase making them
• polymerase = Taq polymerase
• heat resistant polymerase from bacteria (Thermus
aquaticus)
PCR Primers
• primers are designed to hybridize with (i.e. to
•
•
•
•
•
“anneal”) to the template DNA
primers “flank” the target region you want to
amplify
forward primer (primer 1) - anneals to the antisense template
reverse primer (primer 2) – anneals to the sensetemplate strand
annealing takes place when the PCR reaction is
cooled from 94C to a specific annealing
temperature – usually between 55 and 65C
the polymerase binds to the template-primer
double stranded region – just like it would inside
the nucleus of a cell
Taq polymerase
• heat resistant bacterial polymerase
• capable of withstanding high denaturing
temperatures like 94C
• binds to the primer-template doubled stranded
complex and “extends” from it
• moves along the template strand in the 3’ to 5’
direction – making “daughter” DNA that grows in
the 5’ to 3’ direction
• this extension step occurs at 72C
The PCR Reaction
• PCR is performed over multiple “cycles”
CYCLE 1
• one cycle has three temperatures
• 1. Denaturing = 94C
• 2. Annealing = 55 to 65C
• 3. Extension = 72C
original template
daughter DNA
daughter DNA
original template
• after the 1st cycle – you have two ds DNA strands
original template
• 2nd cycle – DNA denatures  4 single DNA strands that the
primers anneal to
• amplification of the targeted region begins
• after 2 cycles – 4 ds DNA strands
daughter template
daughter template
original template
• multiple cycles result in the amplification of the
targeted DNA
CYCLE 2
original template
daughter DNA
daughter template
daughter DNA
daughter template
original template
SSRs can be detected by PCR
• SSR analysis produces a DNA
“fingerprint”