Download Identifying a Novel Isoform of the AZIN1 Gene by Combining High

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of human development wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Point mutation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Frameshift mutation wikipedia , lookup

Transposable element wikipedia , lookup

Epigenomics wikipedia , lookup

RNA silencing wikipedia , lookup

Genetic engineering wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Non-coding RNA wikipedia , lookup

Human genome wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Copy-number variation wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene wikipedia , lookup

Gene desert wikipedia , lookup

DNA sequencing wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene therapy wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene nomenclature wikipedia , lookup

Public health genomics wikipedia , lookup

Epitranscriptome wikipedia , lookup

Pathogenomics wikipedia , lookup

Primary transcript wikipedia , lookup

Gene expression profiling wikipedia , lookup

Microevolution wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomic library wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Metagenomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Exome sequencing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Identifying a Novel Isoform of the AZIN1 Gene by Combining High-throughput Technology with
Traditional Sequencing Methods
Brandon Holbert, Jennifer Li-Pook-Than, Donald Sharon and Michael Snyder
Department of Genetics, School of Medicine, Stanford University
300 Pasteur Dr., Stanford, CA 94305-5120, USA
ABSTRACT
M AT E R I A L S A N D M E T H O D S
The rapid development of high-throughput sequencing technologies has made it possible to envision a world
where personalized medicine is standard healthcare practice. Personalized medicine has several advantages
over the traditional symptoms-oriented approach to medicine, the most obvious being its ability to act
preemptively. Specifically, an individual’s various “omes” such as the genome, transcriptome, and proteome
can be combined to create an integrative Personalized Omics Profile (iPOP), which can then be used to
prevent disease and monitor disease progression. This study analyzed the transcriptome of a 55-year old male
by combining high-throughput and traditional sequencing methods to find and validate a novel isoform of the
AZIN1 gene in peripheral blood mononuclear cells (PBMCs). The procedure included the following steps: (1)
separation of PBMCs from whole blood (2) extraction of RNA from PBMCs (3) creation of cDNA libraries (4)
whole genome sequencing with Illumina RNA-seq (5) identification of novel isoform with PacBio technology
(6) validation with Sanger sequencing. PacBio sequencing revealed a new isoform of the AZIN1 gene that
contains an extra intron within exons 12 and 13. The alternative splicing event causes a shift in the mRNA’s
reading frame that would change the terminus of the subsequent protein from Ser-Asp-Glu-Asp-stop to PheArg-stop. Follow-up studies could validate this finding on the protein level and then measure gene expression
of this new isoform in various tissues, subjects, and time-points. Moreover, the methods of this experiment
can be used to find similar RNA-related events among other known genes. For example, Illumina RNA-seq
detected the known isoforms of the AZIN1 gene, which were then compared with the new isoform that was
found using PacBio. This suggests that combining high-throughput technologies may be more effective than
using any of them in isolation, especially when traditional sequencing can be used as a validation tool.
Unfortunately, the interpretation of high-throughput sequencing data still requires extensive man-hours, which
limits its application to iPOP and the ultimate goal of personalized medicine. It is expected that new
technologies will eventually compensate for this, making personalized health care a realistic possibility in the
near future.
STEP 1
Generate Reads
(≈ 101 bp)
ATTGACATA
T
TCCGTCAAT
G
CCGGTATCG
A
CGGTAATTT
C
AAATGCCCT
A
GTAATAGAT
T
CCGACATAT
G
ATAATCCAC
A
AATTCCGGG
T
TAATCCGGT
C
Reference
STEP 2
Map to
Reference
Genome
TAACGGCTAACGCCCTATTTACACAAAAGAGGGCTCCACTGGATCACTGACTTTAGCCCGGAT
TA
TAACGGCTAACGCCCTATTTACACAAAAGAGGGCTCCACTGGATCACTGACTTTAGCCCGGAT
TA
Sample
AACGCCCTA
T
CGGCTAACG
CACAAAAGA
CGCCCTATTT
AAGAGGGCT
GAGGGCTCC
GATCACTGA
CTGACTTTA
G
TTTAGCCCG
G
CCTATTTAC
A
Figure 2. Sequencing the human genome using
Illumina technology
Alternative splicing is an RNA-related event that can
create multiple transcripts (isoforms) from a single gene
(figure 3). Differential expression of isoforms can then
lead to higher incidence of disease. For example, one
particular isoform of the AZIN1 gene modifies the
disease severity of fibrotic liver diseases such as
hepatitis C (Paris et al., 2011). Our study examined this
gene to find a new isoform that might be disease-related
and to highlight the benefits of combining multiple
sequencing technologies to accurately examine an
individual’s transcriptome.
Figure 4. Separation of PBMC
layer from whole blood
RNA extraction
cDNA
synthesis
(RNA-seq
library
formation)
Primer
set 1
(F1/R1)
1 Kb DNA
ladder
Primer
set 2
(F1/R2)
Primer
set 3
(F2/R2)
Known isoforms
(888 bp segment)
1000 bp
850 bp
650 bp
500 bp
400 bp
Novel isoform
(637 bp segment)
300 bp
200 bp
Figure 5. Isoform separation
with 2% agarose gel
Sequencing by synthesis
(Illumina)
1
PBMCs
B A C KG R O U N D
Personalized medicine has several advantages over
the traditional symptoms-oriented approach. It has
the potential to prevent disease in healthy
individuals as well as treat patients according to
their own molecular characteristics.
Highthroughput technologies can examine an
individual’s entire biological system and the various
“omes” that contribute to an individual's phenotype,
such as the genome, epigenome, transcriptome,
proteome, metabolome, and auto-antibodyome.
These omes can be combined to create an
individual’s integrative Personalized Omics Profile
(iPOP), which can then be analyzed to determine
risk factors for disease or susceptibility toward
specific medications.
Peripheral blood mononuclear cells (PBMCs) were extracted
from whole blood of a 55-year-old male volunteer using a
Ficoll gradient (figure 4). RNA was extracted from PBMCs
using an AllPrep kit (QIAGEN). Complementary DNA
(cDNA) libraries were created from the RNA extract. cDNA
from PBMCs was then subjected to deep whole-genome
sequencing using Illumina (101 base-pair reads). A candidate
novel isoform of the AZIN1 gene was then identified on the
reverse strand using Pacific Biosciences (1000+ base-pair
reads). An 888 base-pair segment of cDNA that contained
the candidate novel isoform of the AZIN1 gene was
amplified using polymerase chain-reaction (PCR). The
isoforms of this cDNA were then separated by gel
electrophoresis (figure 5). Sanger sequencing was then
performed to validate the candidate novel isoform of the
AZIN1 gene. Figure 6 summarizes the step-by-step process
of transcriptome sequencing (RNA-seq) from isolated
PBMCs. Several steps (not all shown) are involved in
sequencing an entire cDNA library (Illumina) or a selected
portion of it (PacBio and Sanger sequencing).
C O N C LU S I O N S
2
Single molecule real-time sequencing
(PacBio)
3
Chain termination
(Sanger)
Figure 6. Step-wise representation of RNA sequencing.
R E S U LT S
A novel isoform was found within the coding
Exon 12
Exon 13
region of the AZIN1 gene using single
molecule real-time sequencing (PacBio).
5’
3’
Figure 7 shows several cDNA reads, four of
which (#3-6) are missing a large segment of
cDNA within exons 12 and 13. It appears that
an alternative splicing event creates an extra
intron within these exons that is 217 basepairs (bp) long. This would change the
New intron
terminus of the resulting protein from SerNew splice
New splice
site
site
Asp-Glu-Asp-stop to Phe-Arg-stop. We
predict the resulting transcript to be 3,986 bp Figure 7. Single molecule real-time sequencing (PacBio)
long (known variants are 4,203 and 4,348 bp reveals novel intron within exons 12 and 13 of AZIN1 gene
long). It is significant that the new splice sites
are located within the coding region of the
mRNA because it suggests that the resulting
New splice
New splice
protein is directly truncated.
site
site
Reference
READ 1
READ 2
READ 3
READ 4
READ 5
READ 6
READ 7
READ 8
Figure 1. iPOP for personalized medicine
Figure 1 summarizes an iPOP study that
compared an individual’s omics profile
between healthy (gray) and diseased (green)
states (Chen et al., 2012).
High-throughput sequencing can sequence an
entire human genome by randomly chopping
the DNA into small bits (reads), amplifying the
fragments , and mapping the bits to a reference
genome (figure 2).
A person’s entire
transcriptome (protein coding and non-protein
coding RNA) can also be sequenced this way
by using reverse transcriptase to convert the
RNA into complementary DNA (cDNA).
5’ UTR
pre-mRNA
Intron
B
A
3’ UTR
C
Exon
mRNA isoform 1
A
B
C
A
C
11
11
12
12
13
Known isoform
Figure 3. Alternative splicing
5’
3’
…
New exon/exon
junction
Novel isoform
5’
3’
Figure 8. Chain-termination method (Sanger)
confirms new isoform of AZIN1 gene
13
11 11
22
13
Novel isoform
Figure 9. New isoform of AZIN1 gene
Novel isoform (found in this experiment)
3’
5’
Known isoform (missing exon 3)
13
12 11
10 9 8
7
6 5
4
3
2
1
Figure 10. Relative gene expression of AZIN1 exons
This experiment would benefit by measuring gene expression of the new isoform in various tissues, subjects,
and time-points and then validating the novel isoform at the protein level. Additional studies could also
investigate the relationship of this new isoform to disease. Moreover, next-generation technology is still
flawed and thus it would be prudent to investigate the pros and cons of each technology in further detail.
Even so, the potential benefits of using multiple sequencing technologies to examine RNA-related events are
promising. Figure 11 shows how transcriptome analysis can be used for personalized medicine in the future.
Under the larger umbrella of integrative personalized omics profiling (iPOP), transcriptome analysis can be
combined with other “omes” such as the genome, proteome, metabolome, and microbiome to identify
physiologic markers that might predispose an individual towards disease. This integration can then assist
healthcare by leading to early diagnosis of disease, targeted therapeutic treatments, and disease prevention.
Unfortunately, several steps are still required to make personalized medicine a reality. While the costs of
genome-sequencing have gone down astronomically over the past decade, the man-hours required to
interpret such data is still extensive. Nevertheless, continued technological improvements will likely make
personalized health care attainable in the near future.
Gene X mRNA isoform 1
40%
Tissue
Sample
30%
30%
At-risk isoform
A
B
C
Gene X mRNA isoform 2
A B C
Tissue
Sample
70%
20%
10%
Gene X mRNA isoform 3
A
C
Consult physician
Person 1
Figure 11. Transcriptome analysis for personalized medicine
Person 2
ACKNOWLEDGEMENTS
Brandon Holbert was supported through an award from The U.S. Health Resources and Services
Administration (HRSA), D34HP16299 (Charles Mouton, PI). Special thanks to the Snyder lab and Azad
Raeisdana.
L I T E R AT U R E C I TAT I O N S
Pre-mRNA
mRNA isoform 2
The peaks in figure 10 (right) represent
the relative amount of gene expression
of each exon of the AZIN1 gene as
detected
by
Illumina
RNA-seq.
DNAnexus was used as a genome
browser where horizontal green lines
(top) represent the reverse strand (two
known isoforms), read right to left.
TGATGAGCTTGATCAAATTGTGGAAAG-CT...ATGTCATTCAGTGATTGGTATGAGATGCAA...T-CCGCTGAAGCTT
TGATGAGCT-GATCAAATTGTGGAAAG-CT...ATGTCATTCAGTGATTGGTATGAGATGCAA...TTCCGCTGAAGCTT
TGATGAGCTTGATCAAATTGTGGAGGG-CT...ATGTCATTCAGTGATTGGTATGAGATGCAA...-T-CCGCTGAAGCT
TGATGAGCTT--------------------...------------------------------...T-CCGCTGAAGCTTGATGAGCTT--------------------...------------------------------...TTCCGCTGAAGCTT
TGATGAGCTT--------------------...------------------------------...T-CCGCTGAAGCTT
GGTGAGGCTT--------------------...------------------------------...TTCCGCTGA-GCTTGATGAGCTTGATCAAATTGTGGAAAG-CT...ATGTCAT-CAGTGATTGGTATGAGATGCAA...TTCCGCTGAAGCTT
TGATGAGCTTGATCAAATTGTGGAAAG-CT...ATGTCATTCAGTGAT-GGTATGAGATGCAA...TTCCGCTGAAGCTT
Known isoform
The isoform was not detected originally with
Illumina RNA-seq, but was found later using
PacBio technology. Sanger sequencing was
performed to validate the PacBio discovery.
Figure 8 shows the chromatographs that were
obtained by chain–termination (Sanger) for a
known isoform and the novel isoform. Each
nucleotide corresponds to a different
wavelength of light that was emitted by a
chain-terminator. Figure 9 summarizes the
alternative splicing event that creates the new
AZIN1 isoform.
A novel isoform of the AZIN1 gene was found by combining next-generation sequencing technology with
traditional Sanger sequencing. Illumina RNA-seq did not distinguish the new isoform that was found using
PacBio technology. However, a measurement of gene expression using Illumina RNA-seq did reveal a small
peak at exon 3 corresponding to a known isoform of the AZIN1 gene (figure 10). This compelled us to
perform additional tests that confirmed that the new isoform is also missing exon 3. Thus, by combining
Illumina and PacBio technology, we were able to investigate the new isoform at a level that would not be
possible using either technology alone. New isoforms in other genes might be found by using Illumina-RNA
seq to find gene expression peaks that do not correspond to the reference genome. PacBio could then
determine if the peaks represent true isoforms and Sanger sequencing could validate the PacBio data.
Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HYK, et al. (2012) Personal omics profiling reveals
dynamic molecular and medical phenotypes. Cell 148: 1293–1307.
Human PBMC Isolation and Counting Using the Scepter™ 2.0 Handheld Automated Cell
Counter. Millipore Technical Publications AN3311EN00 (2011). Web. 25 July 2013.
Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J., and Marra,
M.A. (2009). Circos: an information aesthetic for comparative genomics. Genome Res. 19,
1639–1645.
Li-Pook-Than J and Snyder M (2013). iPOP goes the world: integrated omics profiling and road toward
improved health care. Chemistry & Biology 20: 660-666.
Paris, A. J., Snapir, Z., Christopherson, C. D., Kwok, S. Y., Lee, U. E., Ghiassi-Nejad, Z., Kocabayoglu, P.,
Sninsky, J. J., Llovet, J. M., Kahana, C. and Friedman, S. L. (2011), A polymorphism that delays
fibrosis in hepatitis C promotes alternative splicing of AZIN1, reducing fibrogenesis. Hepatology,
54: 2198–2207.