Download Identifying a Novel Isoform of the AZIN1 Gene by Combining High

Identifying a Novel Isoform of the AZIN1 Gene by Combining High-throughput Technology with Traditional Sequencing Methods Brandon Holbert, Jennifer Li-Pook-Than, Donald Sharon and Michael Snyder Department of Genetics, School of Medicine, Stanford University 300 Pasteur Dr., Stanford, CA 94305-5120, USA ABSTRACT M AT E R I A L S A N D M E T H O D S The rapid development of high-throughput sequencing technologies has made it possible to envision a world where personalized medicine is standard healthcare practice. Personalized medicine has several advantages over the traditional symptoms-oriented approach to medicine, the most obvious being its ability to act preemptively. Specifically, an individual’s various “omes” such as the genome, transcriptome, and proteome can be combined to create an integrative Personalized Omics Profile (iPOP), which can then be used to prevent disease and monitor disease progression. This study analyzed the transcriptome of a 55-year old male by combining high-throughput and traditional sequencing methods to find and validate a novel isoform of the AZIN1 gene in peripheral blood mononuclear cells (PBMCs). The procedure included the following steps: (1) separation of PBMCs from whole blood (2) extraction of RNA from PBMCs (3) creation of cDNA libraries (4) whole genome sequencing with Illumina RNA-seq (5) identification of novel isoform with PacBio technology (6) validation with Sanger sequencing. PacBio sequencing revealed a new isoform of the AZIN1 gene that contains an extra intron within exons 12 and 13. The alternative splicing event causes a shift in the mRNA’s reading frame that would change the terminus of the subsequent protein from Ser-Asp-Glu-Asp-stop to PheArg-stop. Follow-up studies could validate this finding on the protein level and then measure gene expression of this new isoform in various tissues, subjects, and time-points. Moreover, the methods of this experiment can be used to find similar RNA-related events among other known genes. For example, Illumina RNA-seq detected the known isoforms of the AZIN1 gene, which were then compared with the new isoform that was found using PacBio. This suggests that combining high-throughput technologies may be more effective than using any of them in isolation, especially when traditional sequencing can be used as a validation tool. Unfortunately, the interpretation of high-throughput sequencing data still requires extensive man-hours, which limits its application to iPOP and the ultimate goal of personalized medicine. It is expected that new technologies will eventually compensate for this, making personalized health care a realistic possibility in the near future. STEP 1 Generate Reads (≈ 101 bp) ATTGACATA T TCCGTCAAT G CCGGTATCG A CGGTAATTT C AAATGCCCT A GTAATAGAT T CCGACATAT G ATAATCCAC A AATTCCGGG T TAATCCGGT C Reference STEP 2 Map to Reference Genome TAACGGCTAACGCCCTATTTACACAAAAGAGGGCTCCACTGGATCACTGACTTTAGCCCGGAT TA TAACGGCTAACGCCCTATTTACACAAAAGAGGGCTCCACTGGATCACTGACTTTAGCCCGGAT TA Sample AACGCCCTA T CGGCTAACG CACAAAAGA CGCCCTATTT AAGAGGGCT GAGGGCTCC GATCACTGA CTGACTTTA G TTTAGCCCG G CCTATTTAC A Figure 2. Sequencing the human genome using Illumina technology Alternative splicing is an RNA-related event that can create multiple transcripts (isoforms) from a single gene (figure 3). Differential expression of isoforms can then lead to higher incidence of disease. For example, one particular isoform of the AZIN1 gene modifies the disease severity of fibrotic liver diseases such as hepatitis C (Paris et al., 2011). Our study examined this gene to find a new isoform that might be disease-related and to highlight the benefits of combining multiple sequencing technologies to accurately examine an individual’s transcriptome. Figure 4. Separation of PBMC layer from whole blood RNA extraction cDNA synthesis (RNA-seq library formation) Primer set 1 (F1/R1) 1 Kb DNA ladder Primer set 2 (F1/R2) Primer set 3 (F2/R2) Known isoforms (888 bp segment) 1000 bp 850 bp 650 bp 500 bp 400 bp Novel isoform (637 bp segment) 300 bp 200 bp Figure 5. Isoform separation with 2% agarose gel Sequencing by synthesis (Illumina) 1 PBMCs B A C KG R O U N D Personalized medicine has several advantages over the traditional symptoms-oriented approach. It has the potential to prevent disease in healthy individuals as well as treat patients according to their own molecular characteristics. Highthroughput technologies can examine an individual’s entire biological system and the various “omes” that contribute to an individual's phenotype, such as the genome, epigenome, transcriptome, proteome, metabolome, and auto-antibodyome. These omes can be combined to create an individual’s integrative Personalized Omics Profile (iPOP), which can then be analyzed to determine risk factors for disease or susceptibility toward specific medications. Peripheral blood mononuclear cells (PBMCs) were extracted from whole blood of a 55-year-old male volunteer using a Ficoll gradient (figure 4). RNA was extracted from PBMCs using an AllPrep kit (QIAGEN). Complementary DNA (cDNA) libraries were created from the RNA extract. cDNA from PBMCs was then subjected to deep whole-genome sequencing using Illumina (101 base-pair reads). A candidate novel isoform of the AZIN1 gene was then identified on the reverse strand using Pacific Biosciences (1000+ base-pair reads). An 888 base-pair segment of cDNA that contained the candidate novel isoform of the AZIN1 gene was amplified using polymerase chain-reaction (PCR). The isoforms of this cDNA were then separated by gel electrophoresis (figure 5). Sanger sequencing was then performed to validate the candidate novel isoform of the AZIN1 gene. Figure 6 summarizes the step-by-step process of transcriptome sequencing (RNA-seq) from isolated PBMCs. Several steps (not all shown) are involved in sequencing an entire cDNA library (Illumina) or a selected portion of it (PacBio and Sanger sequencing). C O N C LU S I O N S 2 Single molecule real-time sequencing (PacBio) 3 Chain termination (Sanger) Figure 6. Step-wise representation of RNA sequencing. R E S U LT S A novel isoform was found within the coding Exon 12 Exon 13 region of the AZIN1 gene using single molecule real-time sequencing (PacBio). 5’ 3’ Figure 7 shows several cDNA reads, four of which (#3-6) are missing a large segment of cDNA within exons 12 and 13. It appears that an alternative splicing event creates an extra intron within these exons that is 217 basepairs (bp) long. This would change the New intron terminus of the resulting protein from SerNew splice New splice site site Asp-Glu-Asp-stop to Phe-Arg-stop. We predict the resulting transcript to be 3,986 bp Figure 7. Single molecule real-time sequencing (PacBio) long (known variants are 4,203 and 4,348 bp reveals novel intron within exons 12 and 13 of AZIN1 gene long). It is significant that the new splice sites are located within the coding region of the mRNA because it suggests that the resulting New splice New splice protein is directly truncated. site site Reference READ 1 READ 2 READ 3 READ 4 READ 5 READ 6 READ 7 READ 8 Figure 1. iPOP for personalized medicine Figure 1 summarizes an iPOP study that compared an individual’s omics profile between healthy (gray) and diseased (green) states (Chen et al., 2012). High-throughput sequencing can sequence an entire human genome by randomly chopping the DNA into small bits (reads), amplifying the fragments , and mapping the bits to a reference genome (figure 2). A person’s entire transcriptome (protein coding and non-protein coding RNA) can also be sequenced this way by using reverse transcriptase to convert the RNA into complementary DNA (cDNA). 5’ UTR pre-mRNA Intron B A 3’ UTR C Exon mRNA isoform 1 A B C A C 11 11 12 12 13 Known isoform Figure 3. Alternative splicing 5’ 3’ … New exon/exon junction Novel isoform 5’ 3’ Figure 8. Chain-termination method (Sanger) confirms new isoform of AZIN1 gene 13 11 11 22 13 Novel isoform Figure 9. New isoform of AZIN1 gene Novel isoform (found in this experiment) 3’ 5’ Known isoform (missing exon 3) 13 12 11 10 9 8 7 6 5 4 3 2 1 Figure 10. Relative gene expression of AZIN1 exons This experiment would benefit by measuring gene expression of the new isoform in various tissues, subjects, and time-points and then validating the novel isoform at the protein level. Additional studies could also investigate the relationship of this new isoform to disease. Moreover, next-generation technology is still flawed and thus it would be prudent to investigate the pros and cons of each technology in further detail. Even so, the potential benefits of using multiple sequencing technologies to examine RNA-related events are promising. Figure 11 shows how transcriptome analysis can be used for personalized medicine in the future. Under the larger umbrella of integrative personalized omics profiling (iPOP), transcriptome analysis can be combined with other “omes” such as the genome, proteome, metabolome, and microbiome to identify physiologic markers that might predispose an individual towards disease. This integration can then assist healthcare by leading to early diagnosis of disease, targeted therapeutic treatments, and disease prevention. Unfortunately, several steps are still required to make personalized medicine a reality. While the costs of genome-sequencing have gone down astronomically over the past decade, the man-hours required to interpret such data is still extensive. Nevertheless, continued technological improvements will likely make personalized health care attainable in the near future. Gene X mRNA isoform 1 40% Tissue Sample 30% 30% At-risk isoform A B C Gene X mRNA isoform 2 A B C Tissue Sample 70% 20% 10% Gene X mRNA isoform 3 A C Consult physician Person 1 Figure 11. Transcriptome analysis for personalized medicine Person 2 ACKNOWLEDGEMENTS Brandon Holbert was supported through an award from The U.S. Health Resources and Services Administration (HRSA), D34HP16299 (Charles Mouton, PI). Special thanks to the Snyder lab and Azad Raeisdana. L I T E R AT U R E C I TAT I O N S Pre-mRNA mRNA isoform 2 The peaks in figure 10 (right) represent the relative amount of gene expression of each exon of the AZIN1 gene as detected by Illumina RNA-seq. DNAnexus was used as a genome browser where horizontal green lines (top) represent the reverse strand (two known isoforms), read right to left. TGATGAGCTTGATCAAATTGTGGAAAG-CT...ATGTCATTCAGTGATTGGTATGAGATGCAA...T-CCGCTGAAGCTT TGATGAGCT-GATCAAATTGTGGAAAG-CT...ATGTCATTCAGTGATTGGTATGAGATGCAA...TTCCGCTGAAGCTT TGATGAGCTTGATCAAATTGTGGAGGG-CT...ATGTCATTCAGTGATTGGTATGAGATGCAA...-T-CCGCTGAAGCT TGATGAGCTT--------------------...------------------------------...T-CCGCTGAAGCTTGATGAGCTT--------------------...------------------------------...TTCCGCTGAAGCTT TGATGAGCTT--------------------...------------------------------...T-CCGCTGAAGCTT GGTGAGGCTT--------------------...------------------------------...TTCCGCTGA-GCTTGATGAGCTTGATCAAATTGTGGAAAG-CT...ATGTCAT-CAGTGATTGGTATGAGATGCAA...TTCCGCTGAAGCTT TGATGAGCTTGATCAAATTGTGGAAAG-CT...ATGTCATTCAGTGAT-GGTATGAGATGCAA...TTCCGCTGAAGCTT Known isoform The isoform was not detected originally with Illumina RNA-seq, but was found later using PacBio technology. Sanger sequencing was performed to validate the PacBio discovery. Figure 8 shows the chromatographs that were obtained by chain–termination (Sanger) for a known isoform and the novel isoform. Each nucleotide corresponds to a different wavelength of light that was emitted by a chain-terminator. Figure 9 summarizes the alternative splicing event that creates the new AZIN1 isoform. A novel isoform of the AZIN1 gene was found by combining next-generation sequencing technology with traditional Sanger sequencing. Illumina RNA-seq did not distinguish the new isoform that was found using PacBio technology. However, a measurement of gene expression using Illumina RNA-seq did reveal a small peak at exon 3 corresponding to a known isoform of the AZIN1 gene (figure 10). This compelled us to perform additional tests that confirmed that the new isoform is also missing exon 3. Thus, by combining Illumina and PacBio technology, we were able to investigate the new isoform at a level that would not be possible using either technology alone. New isoforms in other genes might be found by using Illumina-RNA seq to find gene expression peaks that do not correspond to the reference genome. PacBio could then determine if the peaks represent true isoforms and Sanger sequencing could validate the PacBio data. Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HYK, et al. (2012) Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148: 1293–1307. Human PBMC Isolation and Counting Using the Scepter™ 2.0 Handheld Automated Cell Counter. Millipore Technical Publications AN3311EN00 (2011). Web. 25 July 2013. Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J., and Marra, M.A. (2009). Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645. Li-Pook-Than J and Snyder M (2013). iPOP goes the world: integrated omics profiling and road toward improved health care. Chemistry & Biology 20: 660-666. Paris, A. J., Snapir, Z., Christopherson, C. D., Kwok, S. Y., Lee, U. E., Ghiassi-Nejad, Z., Kocabayoglu, P., Sninsky, J. J., Llovet, J. M., Kahana, C. and Friedman, S. L. (2011), A polymorphism that delays fibrosis in hepatitis C promotes alternative splicing of AZIN1, reducing fibrogenesis. Hepatology, 54: 2198–2207.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Identifying a Novel Isoform of the AZIN1 Gene by Combining High