Download Candidate gene screening using long-read sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genome wikipedia , lookup

Oncogenomics wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Frameshift mutation wikipedia , lookup

Epigenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Transposable element wikipedia , lookup

Behavioral epigenetics wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Primary transcript wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Point mutation wikipedia , lookup

Genetic engineering wikipedia , lookup

NEDD9 wikipedia , lookup

Genome (book) wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

DNA sequencing wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene expression programming wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome evolution wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene therapy wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Copy-number variation wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic library wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene nomenclature wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Exome sequencing wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Candidate gene screening using long-read sequencing
Jenny Ekholm, Ting Hon, Yu-Chih Tsai, David Greenberg, Tyson A. Clark, Steve Kujawa
PacBio, Menlo Park, CA USA
SMRT Sequencing overview
We have developed several candidate gene
screening applications for both Neuromuscular
and Neurological disorders. The power behind
these applications comes from the use of longread sequencing. It allows us to access
previously unresolvable and even unsequencable
genomic regions. SMRT Sequencing offers
uniform coverage, a lack of sequence context
bias, and very high accuracy. In addition, it is also
possible to directly detect epigenetic signatures
and characterize full-length gene transcripts
through assembly-free isoform sequencing.
Alzheimer's disease (AD): TOMM40 gene
Gene isoform characterization
Experimental pipeline
Size Selection & PCR
amplification
cDNA synthesis with
adapters
1
b
a
Experimental Pipeline
2
SMRTbell ligation
polyA mRNA
3
Poly(A) mRNA
AAAAA
AAAAA
TTTTT
AAAAA
5PacBio raw
sequence reads
4
AAAAA
AAAAA
AAAAA
TTTTT
AAAAA
TTTTT
AAAAA
AAAAA
AAAAA
TTTTT
AAAAA
TTTTT
AAAAA
Informatics
Sequel/PacBio
RS II pipeline
Sequencing
AAAAA
TTTTT
cDNA synthesis
with adapters
AAAAA
TTTTT
AAAAA
TTTTT
AAAAA
Remove adapters
Remove artifacts
AAAAA
Clean
sequence reads
AAAA
TTTT
AAAA
TTTT
Reads clustering
AAAA
TTTT
Coding sequence
5’ primer
Raw
Size partitioning &
SMRT adapter
PCR amplification
(AAA)nn
SampleNet: Iso-Seq Method
with Clonetech cDNA
Synthesis Kit
A variable poly-T repeat at the rs10524523 SNP
within intron 6 of the TOMM40 gene that in
combination with APOE3 allele will affect the
Poly-T
age of onset of AD1)
Poly(A) tail
3’AAAA
primer
TTTT
(TTT)n
AAAA
Isoform clusters
SMRT adapter
(TTT)n TTTT
Consensus calling
AAAA
TTTT
Nonredundant
transcript isoforms
AAAA
TTTT
AAAA
TTTT
Quality filtering
SMRTbell ligation
(AAA)nn
Informatics Pipeline
6
Final isoforms
7
PacBio raw
sequence
reads
Reads of Insert
8
RS sequencing
9
Classify
sequence
reads
Isoform
clusters
Evidenced-based
to reference genome
gene Map
models
10
Nonredundant
transcript
isoforms
Final isoforms
Evidence-based gene models
Remove adapters
Remove artifacts
Reads clustering
Consensus
calling
Quality
filtering
Map to
reference genome
DevNet: Iso-Seq wiki page
Figure 1
gDNA & Transcripts from SK-BR-3 Cell Line Captured with NimbleGen
Oncology Panel - example AURKA gene
Sample 1 All
Sample 1 Phase 0
Sample 1 Phase I
Long reads and heterozygous SNPs allow you to phase reads into two haplotypes
Phased Transcripts reveal retained introns and skipped exons
In addition to calling the bases, SMRT
Sequencing uses the kinetic information from
each nucleotide to distinguish between modified
and native bases.
A
G
C
mA
T
G
T T
Template strand
We successfully captured and sequenced the
associated poly-T repeat in the TOMM40 gene
and were able to determine that the sample had
one short (15) allele and one Very Long (34)
allele.
Schizophrenia: C4 gene
C
GA
G
C
T
AG
TTC A T
G
T
Determines if
2 aa difference
Long or Short
in exon 26 determines
C4A or C4B isotype
Template strand
Different possible
combinations
(L/S/C4A/C4B)
Amplification-free targeted sequencing
using CRISPR/Cas9
Example: N6-methyladenine
Long-read sequencing data properties
Accuracy
Read Length
- Two functionally distinct genes (isotypes); C4A
and C4B
- Both isotypes can have 1 - 3 functional copies
- A human endogenous retroviral (HERV) insertion
in intron 9 changes the length of the gene
Data generated with 20 kb size-selected
human library using 6 hour movies with
P6-C4 chemistry using PacBio RS II,
analyzed with SMRT® Analysis v 2.3.
Each SMRT Cell generates ~55,000.
reads.
E. coli 20 kb-insert library, SMRT
Analysis v2.3
Targeted sequencing and multiplexing
Targeted sequencing workflow
The C4 structural variant is highly complex:2)
Barcoding options
1. Barcoded adapters
C4 plays a role in signaling which connections
between neurons should be “pruned” or removed,
as the brain develops after childhood. And the
more C4 was present, the higher the risk of
developing schizophrenia. Certain versions of the
C4 gene seem to increase people’s risk for
developing schizophrenia by 27 to 50 percent.
Repeat expansion disorders are challenging to
interrogate due to the long repetitive regions.
Using CRISRP/Cas9 we are able to access the
repeat counts, interruption sequences as well as
epigenetic information without introducing PCR
HTT gene in HD sample
bias.
We successfully captured and sequenced the
C4 gene as part of a MHC capture panel. We
were able to see the two different C4 isotypes
(C4A and C4B) as well as seeing the 7 kb
HERV insertion in intron 9.
Phase 0
2. Barcoded Universal Primer
Direct methylation detection of FMR1
pre-mutation sample
Phase 1
Example: Isotype
C4A/L was
successfully
called for a
sample using
long-read
sequencing
The Huntingtin gene
CAG repeat counts in
HD patients
Phase 0
Phase 1
CGG repeat region appears to be heavily
methylated (5mC)
References
Intron 9
1)
2)
Roses AD, et al. (2010). A TOMM40 variable-length polymorphism predicts the age
of late-onset Alzheimer’s disease. Pharmacogenomics J. 10(5): 375-84
Sekar A, et al . (2016). Schizophrenia risk from complex variation of complement
component 4. Nature. 530(7589):177-83
For Research Use Only. Not for use in diagnostics procedures. © Copyright 2016 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences.
BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. All other trademarks are the sole property of their respective owners.