Download Poster: Pyo, Chul-Woo et al. (2015) Complete resequencing of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Alzheimer's disease wikipedia , lookup

Neurogenomics wikipedia , lookup

Transcript
Complete resequencing of extended genomic regions using fosmid target capture and single molecule ®
real-­‐6me (SMRT ) long read sequencing technology. Chul-­‐Woo Pyo , Kevin Wang, Cynthia Vierra-­‐Green, Ruihan Wang, Yoon Soo Pyon, Kevin Eng, Richard Hall, Swa6 Ranade, Daniel E. Geraghty Fred Hutchinson Cancer Research Center, SeaNle, WA Na6onal Marrow Donor Program, Minneapolis, MN Pacific Biosciences of California Inc., Menlo Park, CA MHC conserved extended haplotypes and autoimmune disease Abstract A longstanding goal of genomic analysis is the iden6fica6on of causal gene6c factors contribu6ng to disease. While the common disease/common variant hypothesis has been tested in many genome-­‐wide associa6on studies, few advancements in iden6fying causal varia6on have been realized, and instead recent findings point away from common variants towards aggregate rare variants as causal. A challenge is obtaining complete phased genomic sequences over extended genomic regions from sufficient numbers of cases and controls to iden6fy all poten6al varia6on causal of a disease. To address this, we modified methods for targeted DNA isola6on using fosmid technology and single-­‐molecule, long-­‐sequence-­‐
read genera6on that combine for complete, haplotype-­‐resolved resequencing across extended genomic subregions. As proof of principal, we validated the approach by resequencing four 800 kbp segments that span a major histocompa6bility complex (MHC) common extended haplotype (CEH) associated with disease. The data revealed the extent of conserva6on exposing a near iden6ty among four DR4 CEHs over conserved regions, detailing rare varia6on and measuring sequence accuracy. In a second test, we sequenced the complete KIR haplotypes from 8 individuals within a specific 6meframe and cost. Single molecule long-­‐read sequencing technology generated con6guous full-­‐length fosmid sequences of 30 to 40kb in a single read, allowing assembly of resolved haplotypes with very liNle data processing. All of the sequences produced from these projects were con6guous, phased, with accuracy above 99.99%. The results demonstrated that cost-­‐effec6ve scale-­‐up is possible to generate scores to hundreds of phased chromosomal sequences of extended lengths that can encompass genomic regions associated with disease. Resequencing the KIR genomic region. Bounded by FCAR and ILTs Genomic phase underlies expression and func6on Ac6va6ng and inhibitory receptors for MHC class I ligands !
Extension and intersec6on of target autoimmune diseases over common extended haplotypes in the MHC. KIR gene family – Evidence of func6onal selec6on for haplotype and allelic diversity Single tube fosmid target capture workflow. DR4 chromosome sequences reveal segments unique to T1D Fosmid target capture workflow. Simplifica6ons in the fosmid-­‐based targeted sequencing workflow. Modifica6ons were made as follows: (1) Simplified library construc6on including elimina6ng pulse field gel frac6ona6on and implemen6ng library bar codes; (1, lower) elimina6ng content mapping through single tube libraries for (3) direct clone recovery through recombineering, followed by (4) Pacific Biosciences single molecule long-­‐read sequencing technology. A 0010-6217
!
Sequence alignment and varia6on profile of four DR4 haplotypes. Mul6ple sequence alignment was performed using MAFFT. Nucleo6de diversity Pi (y-­‐axis) was measured using a sliding window –
window length (100 sites) –Step size (100 sites) with DnaSP v5. Visualiza6on and edi6ng of the alignment was carried out with Geneious v5.5 (www.geneious.com). The red box encompasses sequences that are iden6cal in cases but different from control. The four sequences are publicly available in genbank under accession #’s KJ657694-­‐7. Four steps in a modified fosmid-­‐based targeted sequencing workflow. Modifica6ons were made in each of the four steps as follows: (1) increased library density; (2) molecular inversion probes for content mapping; (3) direct clone recovery through recombineering; (4) Pacific Biosciences single molecule long-­‐read sequencing technology. Reads Spanning 40 kb Generated on PacBio RS II
38,907 kb 3DL3
3DL3
44 full-­‐length reads with many par6als >20 kb from a linearized fosmid SMRTbell™ library 2DL3
2DS2
2DP1
2DL2
3DP1
2DL1/S1
2DL4
3DL2
3DL1
2DS4
3DL2
Full length fosmid sequences enable trivial assembly of phased haplotype sequences References 1.  Pyo, C.W., Wang, R., Vu, Q., Cereb, N., Yang, S.Y., Duh, F.M., Wolinsky, S., Mar6n, M.P., Carrington, M., and Geraghty, D.E. (2013). Recombinant structures expand and contract inter and intragenic diversifica6on at the KIR locus. BMC Genomics 14, 89. 2.  Shen, S., Pyo, C.W., Vu, Q., Wang, R., and Geraghty, D.E. (2013). The essen6al detail: the gene6cs and genomics of the primate immune response. ILAR Journal 54, 181-­‐195. 3.  Pyo, C.W., Wang, K., Shen, S., Li, C. Wang, R., Vu, Q., Eng, K., Bowman, B., Ranade, S., and Geraghty, D.E., (2015) Haplotype-­‐resolved resequencing of conserved extended haploytpes associated with disease using fosmid target capture and single molecule sequencing. 4.  Roe, D., Vierra-­‐Green, C., Pyo, C.W., Eng, K., Hall, R., Spellman, S., Ranade, S., Geraghty, D.E. and Maiers, M. (2015) Diploid Sequences of the KIR Region on 8 Individuals using Single Molecule, Real-­‐Time Sequencing 5.  Nelson, W.C., Pyo, C.W., Vogan, D., Wang, R., Pyon, Y.S., and Geraghty, D.E., (2015) An integrated genotyping approach for HLA and other complex gene6c systems. Hum Immunol Summary Sequences within the 325 kbp shared by DR4’s are not alone causal of disease, or alterna6vely, point to the possibility of epigene6c causal variants. Clear boundaries dis6nguishing case and controls exist, sugges6ng regions containing causal variants. A simple visual inspec6on is the only data analysis required to focus aNen6on on a region dis6nguishing T1D cases from DR matched controls. •  The biological challenge is the iden6fica6on of causal gene6c varia6on in common diseases. •  The technical challenge in response is to obtain high quality con6guous sequences from targeted genomic regions from mul6ple cases and controls. •  Fosmid target capture combined with single molecule real-­‐6me sequencing can meet the biological challenge. •  The single-­‐tube library approach using KIR probes was effec6ve in recombineering depending on characteris6cs of the the targe6ng probe, further simplifying fosmid target capture. •  Addi6onal development will streamline the single tube fosmid workflow, and extend targe6ng to other genomic regions.