Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1028 From Tissue to Mutations Genetic Profiling of Colorectal Cancer LUCY MATHOT ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2014 ISSN 1651-6206 ISBN 978-91-554-9042-3 urn:nbn:se:uu:diva-231508 Dissertation presented at Uppsala University to be publicly examined in Rudbecksalen, Rudbecklaboratoriet, Dag Hammarskjölds väg 20, Uppsala, Friday, 7 November 2014 at 09:15 for the degree of Doctor of Philosophy (Faculty of Medicine). The examination will be conducted in English. Faculty examiner: Assc. Professor Alberto Bardelli (University of Torino). Abstract Mathot, L. 2014. From Tissue to Mutations. Genetic Profiling of Colorectal Cancer. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1028. 47 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9042-3. Comprehensive characterisation of the mutational landscapes of solid tumours is a multistep process involving the collection of suitable samples, the extraction of nucleic acids and the preparation of these materials for mutational analyses. In this thesis, I aimed to develop a streamlined process for the analysis of colorectal cancer (CRC) patient samples in order to identify novel mutations that hallmark the development of advanced disease. Papers I and II outline a technique for serial extraction of nucleic acids from frozen tissue that we developed and subsequently implemented on a robotic platform to enable high-throughput processing. The extracted nucleic acids were validated in downstream processes relevant for genetic analyses, including traditional Sanger and next generation sequencing techniques. In Paper III, we developed a genotyping method based on multiplex ligation-dependent genome amplification. The method was designed such that InDel polymorphisms of between 30 and 70 % prevalence in a European population were selected and amplified in a multiplex PCR assay. DNA from 24 patient-matched colorectal tumour and normal tissues was genotyped and paired with a high match probability. In Paper IV, we performed targeted resequencing of 107 primary CRCs, of which approximately half developed metastatic disease or had distant metastases at the time of diagnosis. We chose to analyse 676 genes based on their involvement in key signalling pathways in CRC. We found an enrichment of mutations in the Eph receptor tyrosine kinase gene family in metastatic patients, indicating a potential role for these genes in CRC metastasis. This thesis outlines a series of procedures that can be employed in a high-throughput setting for the analysis of solid tumours. We applied these methods to the analysis of colorectal tumours and propose a link between novel somatic mutations and metastatic disease. Keywords: Nucleic acid extraction, genotyping, targeted sequencing, mutation analysis, colorectal cancer, metastatic disease Lucy Mathot, Department of Immunology, Genetics and Pathology, Rudbecklaboratoriet, Uppsala University, SE-751 85 Uppsala, Sweden. © Lucy Mathot 2014 ISSN 1651-6206 ISBN 978-91-554-9042-3 urn:nbn:se:uu:diva-231508 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-231508) Éist le fuaim na habhann agus gheobhfaidh tú breac. Do mo thuismitheoirí List of Papers This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I Mathot, L., Lindman, M. and Sjöblom, T. (2011) Efficient and scalable serial extraction of DNA and RNA from frozen tissue samples. Chem Commun. 47: 547-549 – Reproduced by permission of The Royal Society of Chemistry. II Mathot, L., Wallin, M. and Sjöblom, T. (2013) Automated serial extraction of DNA and RNA from biobanked tissue specimens. BMC Biotechnol, 13, 66. III Mathot, L., Falk-Sörqvist, E., Moens, L., Allen, M., Sjöblom, T. and Nilsson, M. (2012) Automated genotyping of biobank samples by multiplex amplification of insertion/deletion polymorphisms. PloS ONE, 7, e52750. IV Mathot, L.*, Ljungström, V.*, Moens, L., Adlerteg, T., FalkSörqvist, E., Mayrhofer, M., Sundström, M., Micke, P., Botling, J., Isaksson, A., Birgisson, H., Glimelius, B., Nilsson, M† and Sjöblom, T†. (2014) Somatic Eph receptor mutations in primary colorectal cancers are associated with metastasis. Manuscript. * These authors contributed equally to this work. † Equal last authors. BMC Biotechnology and PLOS ONE are open-access journals and the authors retain rights to their work. Contents Introduction ................................................................................................... 11 Cancer ....................................................................................................... 11 Genes and pathways involved in cancer development ........................ 11 Genomic instability in cancer .............................................................. 12 Characterisation of cancer genomes on a large scale ............................... 13 Sample availability and quality ............................................................ 13 Nucleic acid isolation ........................................................................... 14 Sample identification and matching..................................................... 16 Mutational analysis .............................................................................. 17 Colorectal cancer ........................................................................................... 22 Epidemiology and aetiology ................................................................ 22 Pathology and staging .......................................................................... 23 Genetic basis of disease development.................................................. 23 Instability phenotypes in colorectal cancer .......................................... 24 Candidate CRC driver mutations ......................................................... 26 Therapeutic interventions..................................................................... 27 Prognostic markers in metastatic disease ............................................. 28 Present investigations .................................................................................... 30 Aims.......................................................................................................... 30 Serial extraction of DNA and RNA from frozen tissue............................ 31 Genotyping of patient-matched tumour/normal tissue ............................. 32 Deep sequencing reveals mutations associated with CRC metastasis ...... 33 Concluding remarks and future perspectives ................................................ 35 Acknowledgements ....................................................................................... 38 References ..................................................................................................... 40 Abbreviations CIMP CIN CRC ctDNA DNA DSBR FAP FFPE HNPCC InDel MLGA MMR MSI MSS NAPI NGS PARP PCR RNA RT-PCR SNP STR T/N TNM TSG 5-FU CpG island methylator phenotype Chromosomal instability Colorectal cancer Circulating tumour DNA Deoxyribonucleic acid Double strand break repair Familial adenomatous polyposis Formalin-fixed, paraffin-embedded Hereditary non-polyposis colorectal cancer Insertion/deletion polymorphism Multiplex ligation-dependent genome amplification Mismatch repair Microsatellite instability Microsatellite stable Nucleic acid purification and isolation Next generation sequencing Poly ADP ribose polymerase Polymerase chain reaction Ribonucleic acid Reverse transcription polymerase chain reaction Single nucleotide polymorphism Short tandem repeat Tumour/normal Tumour, nodes, metastases Tumour suppressor gene 5-Fluorouracil Introduction Cancer Cancer is a genetic disease, caused by mutations in genes controlling proliferation and the regulation of cellular activities. Genetic alterations found in cancer cells include subtle alterations (point mutations, small insertions and deletions) and chromosomal aberrations (translocations, insertions and deletions). Some of these cancer-causing mutations are present in the germline, resulting in a hereditary susceptibility to cancer; however, the vast majority are somatic mutations, accumulated over time. In common solid tumours, it is estimated that an average of between thirty-three and sixty-six genes display subtle somatic alterations1. Consequently, no single gene defect causes cancer, as may be the case in other genetic diseases, such as Cystic fibrosis or Duchenne muscular dystrophy. Instead, cancer development is a multistep process, where some of the mutations acquired provide a growth advantage to the cell, causing it to proliferate uncontrollably2,3. Genes and pathways involved in cancer development Although much remains to be discovered about the complexities of cancer development, in particular in terms of the progression to advanced disease stages, many of the genes and alterations responsible have been identified. Genes involved in cancer development can be divided into oncogenes, tumour suppressor genes (TSGs) and stability genes4. Oncogenes and TSGs, when mutated, drive disease progression by stimulating cell growth through constitutive activation of a proliferation signal, or by inhibiting cell-cycle arrest or cell death, respectively. For oncogenes, the activation of one allele is generally sufficient to confer a growth advantage to the cell, e.g. activating KRAS mutations in colorectal cancer (CRC). Oncogene activation can arise by point mutations, gene amplifications, chromosomal translocations and genomic rearrangements creating fusion genes. Conversely, for TSGs, both alleles generally require inactivation, e.g. APC mutations and loss of heterozygosity in CRC. This is known as the “two-hit” hypothesis, first proposed by Knudson in 19715. Like oncogenes, the inactivation of TSGs can also occur through various mechanisms; point mutations resulting in missense or nonsense mutations that render the protein product inactive or truncated, small insertions/deletions, deletion of a chromosomal region (or entire 11 chromosomes), rearrangements disrupting the coding sequence or silencing by DNA hypermethylation. Mutations in, or the dysregulation of stability genes promote tumourigenesis indirectly, by increasing mutation rate. Stability genes maintain genomic integrity and are involved in repairing and sustaining the genome during normal replication and following exposure to mutagens6. This class of genes include those involved in base excision repair, nucleotide excision repair, mismatch repair (MMR) and double-strand break repair (DSBR). Inactivation of these stability genes leads to an overall increased mutation rate, termed genomic instability, increasing the probability for a cancer-initiating mutation. Although there are many genes mutated in cancer, only a limited number of pathways are affected. Mutations in genes belonging to the same pathway lead to similar phenotypes. Alteration events within the same pathway are generally mutually exclusive, i.e. a mutation of one gene in the pathway is sufficient and it is unlikely that two genes of the same pathway will be affected in cancer4. Understanding which pathway the mutated gene belongs to can provide opportunities for therapeutic intervention. Genomic instability in cancer Genomic instability has been described as an “enabling characteristic” in cancer whereby an increased mutation rate is often required to accelerate tumour progression2,3. The maintenance of genomic integrity is usually so well-controlled by surveillance systems involving stability genes that mutation rates in normal cells are low (1 – 2.5 × 10-8 mutations per generation7,8) and a sufficient number of mutations would take a longer time to accumulate without increasing mutability. Instability in cancer is linked with almost all solid tumours9, both hereditary and sporadic. In some hereditary cancer syndromes, a germline mutation in one of the stability genes results in a predisposition to cancer development, often at a younger age compared to sporadic cancers of the same tissue of origin. For example, mutations involved in genes regulating DSBR such as BRCA1 and BRCA2 predispose to breast cancer while germline mutations in MMR genes in hereditary non-polyposis colorectal cancer (HNPCC) predispose to CRC10. In non-inherited cancers, the situation is more complex. In many sporadic tumours, where no known predisposing condition exists, it is unclear whether an increased mutability is required for cancer initiation, or if it is sufficient for genomic instability to be acquired during progression. In addition, conflicting studies on CRCs have reported both a lack of evidence of genomic instability in early lesions while, conversely, mathematical models have predicted that instability is a required initiating event11,12. Intriguingly, some solid tumours do not show genomic instability, indicating that some 12 cancers develop with a normal mutation rate, or one that is increased by extrinsic factors. Bearing the inherent genomic instability of cancer in mind, it is evident that the delineation of driver mutations in cancer is not a trivial task, since many other genes become mutated along the way, i.e. passengers, which are not required for tumour progression. Driver mutations are those driving tumour progression, they are selected for and confer a growth advantage to the cell (an estimated 0.4 % increase in the difference between cell birth and cell death rate for each driver mutation)13. Passenger mutations in cancer are those that arise as a result of instability and increased cell division but which do not confer a growth advantage to the cell. Therefore, passenger mutations are somatic but not necessarily selected for1. As a consequence, it is important to be able to distinguish passenger mutations from mutations that are driving tumour growth, since these are the alterations that must be targeted in order to halt or decrease the rate of tumour progression. Studies suggest that the majority of solid tumours require five to eight driver mutations to develop, however, this may be an underestimation, and more driver genes are likely to be discovered as technologies improve1. Characterisation of cancer genomes on a large scale Large-scale efforts to systematically characterise many different types of solid tumours at a mutational level have been a major focus of the cancer genetics field for the past decade14–24, greatly broadening our knowledge of cancer genomes and the complexities underlying this disease. Characterisation of the cancer genome is a multi-step process, with each step presenting its own challenges and pitfalls. From the time a sample is taken from the patient to the final diagnosis or new research finding, many factors must be taken into consideration. A properly conducted study requires adequate sample storage facilities, means to extract nucleic acids in a robust fashion, quality control of these extracted components and sensitive techniques for downstream applications such as mutation detection by sequencing. Sample availability and quality Translational cancer research is dependent on the availability of a large number of samples of high quality. Gaining access to patient material has been identified as one of the main bottlenecks in large-scale cancer research efforts25. Issues with consent, ethical permits and the availability of the correct samples for each kind of analysis performed are some of the difficulties commonly encountered. Biobanking, particularly in the field of cancer research, is becoming an increasingly important resource. Tumour biobanks are specimen and patient 13 information storage facilities that are designed to collect, store and distribute samples of tumour and normal tissue26. These samples are intended for further use in fundamental and translational cancer research and greatly impact the development of targeted or personalised anti-cancer treatment. A large tumour sample repository is vital for studies such as large-scale next generation sequencing (NGS) efforts, where hundreds to thousands of tumour/normal (T/N) paired specimens are necessary for deducing less commonly mutated genes that may be involved in tumourigenesis27. Issues of biobanking in cancer research Researchers face two major problems when it comes to cancer biobanking. Firstly, that they cannot always gain access to a sufficient number of samples and secondly, that the available samples are not suitable for the required purposes28. In addition, although biospecimens stored in biobanks are generally considered to give an accurate representation of the biology of a patient's disease, this is often not the case. The biospecimen may instead reflect a combination of disease biology and the biospecimen's response to a wide range of biological stresses, thus providing irreproducible data and confounding the results of many downstream applications, which ultimately leads to the misinterpretation of artefacts29. As biospecimen quality is a critical issue for tumour banks25,30, cancer tissue biobanking involves striking a balance between maintaining diagnostic utility of tissue specimens while preserving nucleic acid integrity31. This is important since biological materials are vulnerable to rapid change once removed from the patient and need to be preserved so that the in vitro state resembles as closely as possible that of the in vivo state28. Nucleic acid isolation Nucleic acid purification and isolation (NAPI) from biological specimens is a process that has been employed by researchers for more than a century, ever since Miescher isolated “nuclein” from leukocytes in 186932. Since then, several techniques have been developed for optimal isolation of both DNA and RNA from biological sources. Methods of nucleic acid preparation from tissues When dealing with tissue samples, the specimen may be fresh, frozen or stabilised in a fixative and each type needs to be processed differently depending on the nature of the sample. Fresh and frozen tissues produce biomolecules of higher integrity than fixed specimens and require less processing prior to nucleic acid isolation33. Fresh tissues, however, are susceptible to endonuclease digestion unless processed immediately. Fixed specimens have been chemically modified to preserve biomolecule integrity for long periods of storage at room temperature. The most commonly used fixa14 tive is formaldehyde, which denatures DNA and introduces sequencing artefacts by crosslinking cytosine nucleotides34–36. Tissues fixed in formaldehyde are generally embedded in paraffin wax to maintain tissue morphology37. Hence, paraffin must be removed prior to nucleic acid isolation either by heat treatment or removal using an organic solvent such as xylene38. Formalin-fixed, paraffin-embedded (FFPE) tissue also affects the quality of RNA that can be extracted, often due to incomplete lysis and high tissue content of RNases that can degrade RNA during fixation39. To recover the cell-associated and tissue-linked molecules, in addition to removing debris and large molecules which may interfere with the extraction of nucleic acids, the sample must first be disrupted by homogenisation, using physical means, enzymatic digestion, detergents or a combination of all three. Physical disruption utilises various apparatuses and technologies, e.g. bead mill homogenisers, shaking-type bead mills, blade homogenisers, freeze fracturing, grinders etc.40. Enzymatic digestion techniques such as the use of proteinase K are widely employed for the removal of contaminating proteins and nuclease digestion. An additional advantage of proteinase K in nucleic acid isolation is that it retains activity in the presence of denaturing agents and detergents which are commonly used for destabilising and removing proteins when isolating DNA and RNA from complex biological mixtures41,42. Following sample disruption and denaturation, various techniques can be used to isolate nucleic acids (Table 1). Many of these are traditional, wellestablished methods and may be employed either in liquid phase with the application of organic solvents (phenol-chloroform followed by ethanol precipitation)43 or salting out procedures44, or in solid phase with the selective binding of the nucleic acid to a solid support (mainly silica) under chaotropic conditions45–47. High-throughput tissue processing NAPI is critical for many major analytical procedures and poor sample preparation leads to sub-optimal results in downstream processes. Recent genome sequencing efforts have led to an increase in the number of samples that nucleic acids are routinely prepared from. Consequently, there is a requirement not only for high quality biomolecules, but also for methods by which samples can be prepared in a high-throughput manner. It is also desirable that both DNA and RNA can be serially extracted from the same tissue lysate, not only to enable certain biological questions to be addressed e.g. relating genomic and transcriptomic analyses, but also to reduce consumption of both biological samples and expensive reagents. Robotic platforms provide a means for simplifying many routine laboratory procedures, such as nucleic acid extraction. High-throughput processing allows for the simultaneous preparation of many samples, thus reducing the need for time-consuming and labour-intensive protocols. Automating a pro15 cess calls for a reduction in the use of instrumentation such as centrifuges, since these components are some of the primary causes of system failures. The use of robotic platforms when extracting nucleic acids is increasing, with many suppliers of nucleic acid extraction kits offering an automatic alternative. Most of these instruments use solid phase extraction methods, where silica columns or magnetised beads facilitate the isolation of DNA and RNA based on hydrogen-binding interactions. However, magnetic separation techniques, such as the method described in Papers I and II, have an advantage over column-based techniques when it comes to automation since they obviate the need for centrifugation. Consequently, magnetic separation is a simple and efficient way to separate nucleic acid-bound particles during binding and elution, and does not generate shear forces that may lead to nucleic acid degradation48. Table 1. Established and routinely used NAPI techniques. Isolation technique Suitable sample types Phenolchloroform43 Whole blood, cultured cells, tissue homogenates Silica or glass particles45–47 Salting out44 Whole blood, cultured cells, tissue homogenates Whole blood, cultured cells, tissue homogenates Advantages Disadvantages Suitable for automation* Inexpensive, high DNA yield Time consuming, phenol is corrosive, toxic and flammable, interference downstream No Flexible, rapid, yields of high molecular weight Laborious, low yields No Non-toxic Laborious, requires centrifugation No Chromatography (anionexchange and silica gel membranes)45– Non-toxic, no mechanical Nucleic acids may Yes, but All sample disruption, bind irreversibly, centrifugation eluted nucleic types columns become may cause acids are ready overloaded problems 47,49 to use Non-toxic, no Lower yields than shearing of Magnetic bead other silica-based nucleic acids, Most sample Yes based isolatmethods, nucleic eluted nucleic types ion50–52 acids may bind acids are ready irreversibly to use *Suitability for automation based on the use of non-toxic substances and minimal additional components on the robotic platform/liquid handler. Sample identification and matching In recent years, whole genome sequencing efforts have revealed the extent of human genetic variation53. Genotyping techniques have made use of these variable regions in order to successfully differentiate and match individuals 16 at a genetic level. This is becoming increasingly important in large biobanks, where sample mix-up due to manual handling is a rare yet inevitable occurrence leading to misinterpretation of results. There are usually at least two tissue specimens per cancer patient (tumour and normal sections for comparison), and more for those with metastases. It is vital that large-scale genomic analyses utilise reliable input data, and a major feature of this requirement is that patient data is matched to the correct sample. In order to accomplish this, it is desirable that all sample pairs (i.e. tumour and normal tissue) are typed and matched following nucleic acid extraction to ensure that no mix-ups have occurred. While essential, tissue matching can be labour-intensive and time-consuming, and should ideally be carried out in an automated fashion in 96- or 384-well format in order to be feasible for large studies. Many of the available commercial kits for genotyping are simple and PCR-based but are time-consuming, may require expensive analysis software such as GeneMapper® (Applied Biosystems), or are not sufficiently discriminatory to type samples on a stand-alone basis e.g. the PCR-based Investigator DIPplex Kit from Qiagen54. Genotyping techniques commonly make use of highly polymorphic short tandem repeats (STRs) found throughout the genome. These di-, tri- or tetranucleotide repeats show a high level of allelic variation with regard to the number of repeats and can be analysed using simple PCR-based techniques. However, biallelic variations such as single nucleotide polymorphisms (SNPs) and insertion/deletion (InDel) polymorphisms have recently been a subject of growing interest in the forensic field55. Despite being less informative than STRs, InDels have a number of advantages, including (1) high abundance, (2) high variability within and between populations (3) location within regions of genomic stability (4) ability to type short fragments in degraded samples and (5) lower mutation rates than STRs. Like STRs, it is also possible to use multiplex PCR-based genotyping methods to study InDels56, which we make use of in the method described in Paper III. Mutational analysis As cancer is essentially a genetic disease, resulting from the accumulation of mutations over time, identifying genetic alterations that arise specifically in tumour cells forms the basis for studying the disease at a genetic level. In order to do this, the genomic sequence in both the tumour and normal cells must be determined and compared, thus generating a list of somatic mutations. Many of the most frequently mutated genes in a variety of cancer types have been identified in this way using traditional methods like Sanger sequencing. Technological improvements in sequencing during the past twenty years have produced more sensitive sequencing methods capable of detecting less frequently mutated genes present in subpopulations of the tumour. However, the identification of genetic mutations with a low fre17 quency of occurrence requires a large number of samples to confidently identify less common driver mutations. Next generation sequencing NGS consists of sequencing technologies that generally rely on a combination of template preparation, sequencing, imaging and genome alignment57. Several commercially available platforms e.g. SOLiD (ABI), 454 (Roche), Illumina (Illumina Inc.), PacBio RS II (Pacific Biosystems), Ion Torrent and Ion Proton (Life Technologies) use massively parallel sequencing technologies with distinct nucleotide signal detection methods to generate millions of reads of DNA/RNA sequence. Additional, more recent techniques include the Oxford Nanopore Technologies’ strand sequencing method, which aims to provide longer read lengths without using amplification, labelling or optical instrumentation; instead individual DNA strands are passed through a nanopore by a processive enzyme, generating electronic signals58,59. NGS has been widely implemented for de novo genome sequencing, targeted resequencing of exons, transcriptome sequencing and epigenome sequencing and has played a particularly important role in cancer research with worldwide efforts to catalogue all mutations found in each cancer type ongoing. Much of the large-scale somatic mutation analyses in cancer have utilised the Illumina sequencing technique17–24, requiring three major processing steps; 1) library preparation of genomic DNA fragments, 2) amplification of templates by isothermal ‘bridging’ to form DNA clusters and 3) sequencingby-synthesis where fluorescently-labelled reversible terminators are incorporated (Figure 1). Incorporated nucleotides are then identified by excitation of the fluorophores and imaging of fluorescent signals60. Illumina technology can be employed for whole genome, exome, transcriptome or targeted gene panel sequencing using a variety of DNA library preparation techniques and sequencing platforms. NGS has revealed a large number of previously unknown alterations in cancer, including point mutations, InDels, copy number alterations and structural rearrangements. However, the number of novel recurrent mutations found by NGS efforts has been low in many cancer types, partly due to the fact that many of the important tumour initiating mutations are present at a high frequency and thus suitable for detection by older techniques. NGS efforts have generated a vast quantity of data, increasing the complexity of the categorisation and classification of each cancer type on a genetic level. However, functional studies to assign a role to these mutations in cancer have not yet been completed for most of these novel alterations. Additionally, it has been demonstrated that there are a higher number of unique causal gene mutations in each tumour than previously thought and significant intraand inter-tumour heterogeneity lead to highly complex mutational profiles61. Recently, studies that elucidated mutational spectra at single cell level in breast cancer have further characterised the complexities of heterogeneity in 18 cancer, indicating that bulk analysis of tumours may conceal important lesions contained within individual tumour cell subpopulations, and even within individual tumour cells62,63. Appreciating the implications of these findings and the frequency of unique or private events within individual tumours provides us with an incentive to work towards personalised diagnostics and treatment. a. b. c. d. e. f. g. h. A T G i. C Figure 1. Illumina cluster generation and sequencing-by-synthesis chemistry. (a) DNA fragments with adaptor sequence on either end. (b) Denatured singlestranded fragments are annealed to complementary oligonucleotides on the flow-cell surface. (c) Solid-phase bridge amplification. (d) Nucleotides are incorporated to build a double-stranded bridge. (e) Templates are denatured to produce singlestranded templates anchored to the flow-cell surface. (f) The process is repeated to produce a cluster of molecules. (g) Reverse stands are removed. (h) Sequencing primers are annealed to the single-stranded template and four labelled reversible terminators are added to the flow-cell. (i) Fluorescently-labelled nucleotides are incorporated based on the sequence of the template. Targeted sequencing strategies Poor depth of coverage (the average number of times each nucleotide is sequenced) in whole genome and exome sequencing results in insufficient data for the detection of low frequency variants in samples with low tumour cell content or a high degree of heterogeneity. In addition, despite the cost reduc19 tion in NGS in recent years, storage of large datasets and data analysis requirements mean that costs remain high for comprehensive genome or exome sequencing. In cancer diagnostics, depth of coverage of approximately 1000-fold is desirable, whereas 30-fold coverage may suffice in constitutional disease testing64. In addition, the expense involved in sequencing an entire genome or exome to identify a small number of mutations that may be relevant for a clinician or to provide the basis for the choice of a particular therapy means that these strategies are not feasible outside the field of research. Instead, focus is now on the use of targeted NGS whereby a limited (and clinically relevant) gene set is selected and enriched for64. In a clinical setting, where sample volume is high but the number of targetable mutations is relatively low, it is more time and cost-effective to sequence only the genomic regions of interest65. Indeed, sequencing has already been implemented within clinical settings, albeit to a limited degree, to aid in treatment decision-making (e.g. patients with KRAS mutations are not treated with antiEGFR therapies). In a research setting, targeted resequencing of genes likely to harbour driver mutations (by assessing the ratio of non-synonymous to synonymous mutations within that gene, or an unexpected high mutation rate relative to gene size) at a greater read depth may help to identify disease-causing alterations that can be targeted by specific treatments in the future. By limiting the region of interest, a higher coverage of selected gene targets can be achieved66. This information may highlight regions of particular interest and importance in cancer and identify possible drug target strategies. There are currently a number of methods by which libraries for targeted resequencing can be prepared, most of which are based on hybrid capture (array-based or in-solution), multiplexed PCR or molecular inversion probes. In this thesis, targeted sequencing was performed using a variation of amplicon sequencing i.e. where highly multiplexed PCR is used to amplify genomic DNA that has been hybridized to a selector probe, thus facilitating the preparation of up to tens of thousands of amplicons in a single reaction67,68. Custom gene panels can be designed for the preparation of sequencing libraries for a specific application, and some companies now provide cancer panels where genes that are commonly mutated in cancer can be targeted. This enriched library can then be sequenced on a massively parallel sequencing platform and analysed. Bioinformatic analysis and mutation-calling NGS produces a large amount of data, requiring computational tools for analysis. Generally, these technologies generate fluorescent signals corresponding to the sequence template, which are recorded by an imagecapturing device, resulting in readouts of nucleotides. A bioinformatics pipeline is necessary for analysing this data in order to detect somatic variants69. In amplicon-based sequencing, this consists of: 1) base calling and quality 20 control, 2) removal or “trimming” of sequencing adaptors, 3) alignment of trimmed reads to the reference sequence, 4) amplicon mapping of aligned reads, 5) variant calling and 6) variant annotation. For somatic mutation analysis, variant calling involves the detection of variant alleles present in the tumour sample but not in the normal sample. This is a challenging task as many types of errors can produce false positives, e.g. PCR amplification errors, sequencing errors, and errors in read alignment. In addition, somatic mutation analysis in solid tumours is confounded by normal cell contamination in the tumour and the presence of tumour cell subclones. There are a number of variant-callers for somatic mutation detection that are publicly available, e.g. MuTect70 and VarScan 271. However, many others are developed and used privately within research groups or projects. Currently no best practice for somatic mutation calling exists. Discrepancies between individual callers using the same data highlight the issue that the choice of which algorithm to use must be carefully considered72. 21 Colorectal cancer Epidemiology and aetiology CRC is the third most commonly diagnosed cancer, with over 1.4 million new cancer cases diagnosed worldwide in 201273. CRC is characterised by the development of normal colonic epithelial cells into malignant polyps, a process that takes approximately 8-10 years in the absence of a known predisposing condition74. The strongest risk factor in the development of CRC is a family history of the disease, but only 10-15 % of CRCs are caused by a hereditary component. Known risk factors for sporadic disease development include environmental factors (such as diet, alcohol use and cigarette smoking), increasing age and a sedentary lifestyle74,75. The prevalence is higher in males than in females (male:female ratio of 1.2:1)73,76. It has been suggested that hormone replacement therapy in postmenopausal females has a protective effect, which may explain the lower incidence in women77. Ulcerative colitis and Crohn’s disease are inflammatory bowel diseases that account for approximately two-thirds of CRC incidence, with the risk of disease development increasing with a longer duration of illness74,78. Several syndromes are associated with an increased risk of CRC development and for each, a particular mutation (or group of mutations) has been identified which gives rise to an increased risk for the disease. Mutations in these pre-disposing genes result in the development of numerous benign growths in the colon and rectum, some of which may become cancerous. Examples of pre-disposing syndromes include familial adenomatous polyposis (FAP) caused by a mutation in the APC gene79 and HNPCC/Lynch syndrome caused by mutations in MMR genes80,81. Other, less common Mendelianly-inherited CRC syndromes include Peutz-Jeghers syndrome (mutant LKB1/STK11), juvenile polyposis (mutant SMAD4, BMPR1A), MUTYHassociated polyposis (mutant MUTYH), hereditary mixed polyposis (mutant GREM1) and more recently, inherited syndromes caused by mutations in the POLE and POLD1 genes82,83. Whilst these syndromes account for less than 5 % of all colorectal cancer cases, it is estimated that a hereditary component is implicated in 15- 30 % of CRCs. Therefore, it is reasonable to assume that many mutations predisposing to CRC remain unidentified, perhaps due to small sample sizes. It is also possible that other contributing factors accelerate disease development in families that are exposed to the same environment. 22 Pathology and staging Most CRCs are located in the sigmoid colon and rectum, however a high number are also found in other areas of the large intestine, with the particular location seemingly linked with the type of genetic instability present. Carcinomas and adenocarcinomas of the colon and rectum are classified as malignant when they have penetrated through the muscularis mucosa into the submucosa. Spread via lymphatic and blood vessels can occur early in disease progression and result in systemic disease. Tumours are clinically staged using the tumour, nodes, metastases (TNM) classification (previously Dukes classification) and staging is based on the depth of tumour penetration, the involvement of regional lymph nodes and the presence or absence of distant metastases78. These individual parameters are then evaluated collectively to give a pathological stage between 0 and IV (Figure 3). M - Distant metastases N - Lymph node T - Primary Tumour Tis T1 T2 T3 T4 Carcinoma in situ Invasion of submucosa Invasion of muscularis propria Invasion of pericolorectal tissue Invasion of peritoneum or adjacent organs N0 N1 N2 No regional node involvement Metastases in 1-3 nodes Metastasis in 4 or more nodes M0 M1 No distant metstases Distant metastases in one or more organs Colorectal Tumour M1 M0 N0 N1 T4 T1/T2 Tis T1/T2 T3 Stage 0 Stage I Stage IIA Stage IIIA Stage IIB Stage IIIB Stage IIC Stage IIIC N2 T3/T4 T3/T4 Stage IV Figure 2. Colorectal cancer staging. Staging according to the TNM classification system78,84. Genetic basis of disease development CRC develops due to a series of well-characterised mutations that arise in a particular order and affect both oncogenes and TSGs. The progression of CRC has been extensively studied and characterised on a genetic level, due to the availability of biopsies from various stages in disease progression, i.e. from benign polyps to advanced carcinomas. The evolution of benign polyps to malignant tumours was first recognised by Muto et al.85 in 1975 and the 23 genetic changes at each stage were described in 1990 by Fearon and Vogelstein86, revealing that the inactivation of three TSGs and the activation of one oncogene are required events87. Mutations in the Wnt pathway, which regulates proliferation by the phosphorylation of β-catenin, thereby preventing its accumulation in the cell, are the earliest event in CRC tumourigenesis and appear in most cases to be required for tumour initiation88. Activating mutations in the Ras-MAPK pathway affecting KRAS, BRAF or NRAS are found in up to 60 % of all CRCs89–91 while point mutations activating the kinase activity of PIK3CA, leading to activation of the PI3K-Akt pathway are found at a frequency of 15 – 25 %92. Allelic loss and point mutations in TP53 are implicated later in tumour progression, eventually affecting 60 – 70 % of colorectal tumours93. Other relatively common genes that are targeted by mutations in CRC and become inactivated include FBXW7 mutations (20 %)15,94, PTEN (10 %)95, SMAD2 (5 %), SMAD4 (10 – 15 %)96 and TGFβRII (25 %)97,98. The progression from normal epithelium to carcinoma is depicted in Figure 4, with the most frequent mutation event at each stage indicated. IV Chromosomal Instability Metastasis III TP53/SMAD2/SMAD4 Disease Stage (e) ? ? Advanced Carcinoma II (d) BAX/TFG-βRII Early Carcinoma (c) PIK3CA PTEN Large Adenoma I (b) KRAS BRAF Small Adenoma 0 APC Normal Epithelium (a) MMR germine mutation / MLH1 hypermythylation Microsatellite Instability Time Figure 3. Simplified genetic model for colorectal cancer evolution. Development of disease from stages 0-IV occurs by accumulation of mutations in (a) Wnt/MMR, (b) Ras-MAPK, (c) PI3K-Akt, (d) p53/TGFβ and (e) unknown pathways. Instability phenotypes in colorectal cancer There are two main types of genomic instability in colorectal cancer, chromosomal instability (CIN) and microsatellite instability (MSI). These CRC types are generally considered to be mutually exclusive and comprise a dif24 ferent mutational profile depending on the type of genomic instability9. However, the genes affected are grouped within the same pathways. Chromosomal instability The majority (up to 85 %) of sporadic CRCs as well as tumours from FAP patients exhibit large chromosomal abnormalities, both structural and numerical, known as CIN. The rate of chromosomal loss or gain is estimated to be in excess of 10-2 per generation in these tumours99. These chromosomal aberrations are so complex that there is no representative karyotype for CIN CRC. It is currently unknown what causes CIN, although alterations in genes regulating the formation of the mitotic spindle and segregation of chromosomes at mitosis may contribute91. Inactivation of APC has also been proposed as a potential initiator of CIN100, but this has been disputed in other studies, as some cancers with APC mutations do not lead to CIN. CIN tumours follow the “traditional pathway” for tumourigenesis, involving the progression of a precursor adenomatous polyp to an advanced carcinoma, with loss of function of APC and mutations in KRAS, PIK3CA, TP53, SMAD2 and SMAD4 being the most common alterations. CIN tumours are typically found in the distal colon and are shown to have a worse prognosis than microsatellite unstable tumours. Microsatellite instability The MMR system maintains genomic integrity by repairing replicationassociated lesions like base-base mismatches and insertion/deletion loops arising as a result of DNA polymerase slippage. Mutations in MMR genes, most commonly MLH1, MSH2, MSH6, and PMS2 give rise to MSI, where widespread insertions and deletions occur in repetitive DNA sequences known as microsatellites. Microsatellites are short repeat sequences in the genome that are particularly prone to replication errors due to slippage of polymerase machinery, leading to an overall increased rate of mutation in the genome. MSI was first recognized in HNPCC101, where approximately 95 % of tumours are MSI+. MSI also occurs in approximately 15 % of sporadic CRC, due to hypermethylation of the MLH1 promoter, resulting in loss of expression of MLH1102. MSI tumours are characterised by mutations in BRAF, TGFβRII and PTEN and are primarily located in the proximal colon, having arisen from adenomatous or serrated polyps. These types of tumours generally have a better prognosis than CIN tumours. CpG island methylator phenotype A more recently described alternative pathway progresses from a serrated hyperplastic polyp, and is characterised by a CpG island methylator phenotype (CIMP). CIMP tumours have a higher incidence of hypermethylation at multiple promoter regions, including promoter hypermethylation of MLH1, resulting in MSI103. CIMP tumours also have a high frequency of BRAF mu25 tations104. There is a strong link between features of CIMP and sporadic MSI tumours and CIMP tumours also mainly arise in the proximal colon. However, these tumours have unique clinical feature in cases that are microsatellite stable (MSS), and these patients have a particularly poor prognosis105. Other CRC subgroups Although the three instability phenotypes described above are those most commonly used to classify CRC, Jass proposed a more comprehensive characterisation of CRC with five molecular subgroups as summarised in Table 2. Group 1 is more commonly referred to as sporadic MSI-high CRC, group 4 as MSS CRC arising either sporadically, FAP-associated or MUTYHassociated polyposis, and group 5 as HNPCC. Table 2. Molecular and clinical features of CRC groups 1-5106. Feature Group 1 Group 2 Group 3 Group 4 Group 5 MSI status Methylation High +++ Stable/Low +++ Stable/Low ++ Stable +/- High +/- Precursor lesion Serrated polyp Serrated polyp Adenoma Adenoma Frequency 12 % 8% 57 % 3% Serrated polyp or adenoma 20 % In addition, recent studies have classified sporadic CRCs into three subgroups: 1) non-hypermutated (generally CIN, MSS tumours), 2) hypermutated MSI tumours and 3) hypermutated MSS tumours with somatic POLE mutations20,82, although the prognostic implications of the third subgroup are currently unknown. Candidate CRC driver mutations Although changes in the aforementioned genes and pathways are the most common alterations in CRC, other mutations appear to be involved in disease development, some of which have only come to light in the NGS era. Recurrent mutations in candidate cancer genes first identified by Sjöblom et al.14 and Wood et al.15 have since been identified by other studies, as well as a myriad of other potentially novel driver mutations20,107–112, including those described in Paper IV. Although numerous bioinformatic tools are available for determining driver mutations, several of these (such as MutSig CV113) are biased with prior knowledge like background mutation rates and replication time, meaning that novel driver mutations may be continuously bypassed. It is however, sometimes necessary to input prior knowledge when designing such tools, otherwise, large genes that can accumulate many passenger mutations in a background of genomic instability will continuously surface as putatively significant genes. Although tools for rank ordering the 26 most likely driver mutations are useful, only functional and biological studies can definitively classify a candidate cancer gene. Due to the time-consuming nature of functional studies in using cell lines and animal models, the majority of novel mutations found at an intermediate frequency of 2 – 20 % have yet to be assigned a role in disease, and their importance in CRC development remains elusive. Examples of cancer genes that are mutated at a lower frequency, but have a proposed functional role in CRC development are shown in Table 3. Although the selected genes are by no means exhaustive, considering that many hundreds of candidate cancer genes have been identified it is evident that much remains to be done on a functional level. Table 3. Examples of CRC genes mutated at low frequency with assigned functions. Gene Mutation prevalence* Functional role TCF7L2 12 % TSG, modulates proliferation, encodes transcription factor downstream of Wnt114 FAM123B 11 % TSG, negative regulator of Wnt signalling by promoting degradation of β-catenin115 ARID1A 9% PRDM2 8% FAT1 6% TSG, encodes component of SWI/SNF chromatin remodelling complex regulating gene expression116,117 TSG, histone methyltransferase, induces cell cycle arrest and apoptosis118 TSG, negative regulator of Wnt signalling encodes cadherin-like protein which binds β-catenin119 Oncogene, histone methyltransferase, enhancermediated transcriptional regulation120 * Mutation prevalence obtained from the cBioPortal for Cancer Genomics121,122. Genes that are chosen are mutated at a frequency between 2 and 20 % in CRC and have been characterised by functional studies. KMT2D 5% Therapeutic interventions Surgery is currently the main treatment for patients with potentially curable disease. Radiotherapy is also frequently employed for CRC therapy, as both a pre- and post-operative treatment for rectal cancer, or as a palliative treatment in advanced colon cancer. In Sweden, the use of preoperative radiotherapy has improved the survival of rectal cancer patients and is currently standard practice123. For advanced CRC, a number of systemic treatment options are currently used in clinical practice, which may complement resection of the tumour in advanced disease stages. These include fluoropyrimidines such as 5-fluorouracil (5-FU) and capecitabine, and cytoxic agents such as irinotecan and oxaliplatin. 27 Interestingly, the instability phenotype of the tumour can have a direct consequence for treatment efficacy, as with MSI tumours which in some studies demonstrate a lack of response to 5-FU treatment124. This suggests that the assessment of genomic instability can have worthwhile predictive consequences for therapy. Targeted therapies such as bevacizumab (a VEGF-A inhibitor) and cetuximab or panitumumab (EGFR inhibitors) may also be implemented, although it is important to note that acquired resistance has been a major issue in such targeted therapeutics125–127. Currently, it is recommended that all patients are tested for mutations in KRAS prior to treatment with EGFR blockade128 due to the lack of response in patients with KRAS mutations129. However, these patients should also be assessed for mutations in other downstream targets of EGFR signalling i.e. KRAS, NRAS, BRAF and PIK3CA to predict response due to intrinsic mechanisms of resistance130. These treatments may be used in various combinations depending on disease stage, level of patient tolerance and mutational profile131. In stage IV disease, FOLFOX (leucovorin, 5-FU and oxaliplatin) or FOLFIRI (leucovorin, 5-FU and irinotecan) in combination with cetuximab or panitumumab followed by liver resection have shown promising results for treating liver metastases in CRC132,133. However, the benefits of adjuvant treatment at different stages of disease are complicated and controversial (in particular in stage II CRC) and correct disease staging is of utmost importance, as this determines the treatment used134. In CRC, the five-year survival rate for localised disease is approximately 91 %, whereas it is only 13 % in cases where distant metastases are found135. Prognostic markers in metastatic disease CRCs primarily metastasise to the liver, likely due to the nature of portal circulation and the fact that the sinusoids in the liver do not represent a major barrier to extravasation. CRCs also metastasise to the lungs, bone and brain, although this occurs less frequently. There are four main elements to the metastatic process in solid tumours: 1) increased angiogenesis of the primary tumour, 2) intravasation of tumour cells, 3) extravasation by proteolysis of the extra-cellular matrix and 4) migration into the surrounding tissue136. A number of classifications have been proposed to illustrate the types of genes that are important during each process; metastasis initiating genes that provide a growth advantage to the primary tumour (thus favouring intravasation), metastasis progression genes that provide growth advantages both in the primary tumour and at the metastatic site and metastasis virulence genes that provide a selective growth advantage at the metastatic site only137. These distinctions may provide an explanation for the opposing views that metastatic disease is either a pre-determined or evolving process, as it may be the case that both of these hypotheses are plausible. Indeed, a recent study has confirmed that most of the genetic alterations known to be 28 early events in CRC are shared between the primary and metastatic tumours, and only a small number of mutations are unique to the metastatic tumours138. Despite the great deal of knowledge available regarding frequent mutations causing CRC, the pathogenesis of the disease is still not fully understood, in particular with regard to the development of distant metastases. Approximately 20%–25% of patients with CRC present with metastatic disease at time of diagnosis, and 20%–25% of patients will develop metastases later. This leads to a relatively high overall mortality rate of 40%– 45%128. Therefore, there is a great need for prognostic and predictive genetic markers that can help to stratify patients at higher risk of developing advanced metastatic disease and to aid in decision-making for adjuvant treatment. The prognostic value of BRAF mutations appears promising, but only for a poor overall survival139,140. Large-scale NGS efforts have identified genes where mutations appear to have a protective effect for metastatic disease development, e.g. in FBXW720. Nevertheless, no specific mutations are currently associated with disease free survival and testing for MSI and CIN are the sole predictors for the risk of relapse141. Increased knowledge of the mechanisms involved in metastasis may help us to understand what genes to search for alterations in. However, as it is often difficult to obtain biopsies of metastatic tumours (as many are not resectable), profiling these tumours for specific gene mutations remains a challenging task. We have identified a group of mutations potentially related to metastatic disease development in Eph receptor tyrosine kinases (detailed in Paper IV), although the functional role of these alterations remains to be determined. 29 Present investigations Aims The aim of this thesis was to establish robust, high-throughput methods for the analysis of CRC tissue, starting from fresh frozen tissue and leading to the identification of variants with high sensitivity. The specific aims of each paper were as follows: I Develop a method for serial nucleic acid extraction from fresh frozen tissue that is suitable for automation. II Implement the technique developed in Paper I on a robotic platform and assess the integrity of the extracted nucleic acids, in addition to their performance in downstream applications. III Establish a method for the rapid identification of biobanked samples from the same individual by targeting InDel polymorphisms. IV Perform a pathway-oriented mutational analysis of a large cohort of CRCs to discover mutations associated with metastasis using targeted deep sequencing. 30 Serial extraction of DNA and RNA from frozen tissue With increasing biobanking of biological samples, methods for large-scale extraction of nucleic acids are in demand. The lack of such techniques designed for extraction from tissues results in a bottleneck in downstream genetic analyses, particularly in the field of cancer research. Although several methods for nucleic acid extraction from tissue are available, not all are cost effective or suitable for automation due to centrifugation, disposable plastic ware and the use of toxic reagents. For those solutions available on an automated platform, one is restricted in the choice of technology due to closed systems. We aimed to develop an open automated procedure for tissue homogenisation and serial extraction of DNA and RNA into separate fractions from the same frozen tissue specimen, thus negating the need for multiple specimens from the same patient. Results and discussion A process for serial extraction of nucleic acids from the same frozen tissue sample based on magnetic silica particles with differential affinities for DNA and RNA was developed in Paper I and automated on a Tecan Freedom Evo robotic workstation in Paper II. A schematic representation of the extraction process is shown in Figure 5. Fresh-frozen human normal and tumour tissue samples (n = 864) from breast and colon were serially extracted in batches of 96 samples, and the yields and quality of DNA and RNA were determined. The DNA was evaluated in several downstream analyses, and the stability of RNA was determined after 9 months of storage. Tissue sample Lysis Selective beads Capture Wash Elute DNA RNA Figure 4. Schematic representation of the DNA/RNA extraction process. The tissue sample is lysed and DNA is bound to selective paramagnetic silica beads, washed and eluted. The process is repeated for RNA using the same lysate with DNA removed. 31 The extracted DNA performed consistently well across numerous processes including Sanger sequencing, PCR-based STR analysis, HaloPlex target enrichment and deep sequencing on an Illumina platform, and gene copynumber analysis using microarrays. The RNA has performed well in RTPCR analyses and maintains integrity upon storage. Comparing our results with established techniques, such as DNA extraction using a phenolchloroform based method; we saw no decrease in performance in downstream applications. Instead, we demonstrate an advantage of our methodology concerning reduction in the time taken to perform the extractions. The technology developed in Papers I and II enabled the simultaneous processing of many tissue samples, reducing the sample preparation bottleneck in cancer research. The open automation format also facilitated integration with upstream and downstream devices for automated sample quantitation or storage. Our technology was used for the extraction of DNA used in a large scale sequencing effort on 107 matched T/N CRC specimens, which is described in Paper IV. Genotyping of patient-matched tumour/normal tissue Characterisation of cancer genomes entails mutational analyses of vast numbers of patient-matched tumour and normal tissue samples. However, this increases the risk of patient sample mix up due to manual handling, confounding the interpretation of results. Correctly matching T/N pairs from the same patient is essential for somatic mutation analyses, as material from the normal counterpart is required in order to filter away germline polymorphisms and determine the contribution of acquired mutations to the disease. Genotyping methods are often laborious and time-consuming and scalable sample identification procedures are essential for pathology biobanks and research efforts where large numbers of patient samples are analysed simultaneously. In Paper III, we aimed to develop a method using multiplex amplification of prevalent InDel polymorphisms in order to match samples from the same individual based on their unique genetic variation. We also sought to establish a program for the rapid analysis and comparison of profiles between many samples to facilitate large-scale efforts in the field of cancer genomics. The method should also be suitable for low amounts of input DNA, as well as fragmented DNA, since these are often limiting factors when analysing DNA from FFPE specimens. Results and discussion A method originally described by Isaksson et al.142 was adapted in Paper III for InDel genotyping using multiplex ligation dependent genomic amplification (MLGA). Fifty-three prevalent InDel polymorphisms found in a European population were targeted by restriction digestion of genomic DNA fol32 lowed by ligation to selector probes and multiplex amplification by PCR with fluorescently-labelled primers. The PCR products were separated in a capillary sequencer and a peak spectrum was obtained and analysed using an in-house fragment analysis program. Twenty-four fresh frozen T/N colorectal samples from different patients were successfully matched using this method. The technique was found to be sensitive with as little input DNA as 0.3 ng. The technique was also successful when using DNA extracted from FFPE tissue, despite a lower number of amplifiable fragments. As the technique is PCR-based and a program for the simultaneous analysis of many pairs at once was developed for the assay, this technique can be scaled up for use in 96- or 384-well formats. 107 T/N CRC pairs were successfully genotyped with the assay prior to the somatic mutation analysis study described in Paper IV. Deep sequencing reveals mutations associated with CRC metastasis In Paper IV the aim was to uncover mutations associated with metastasis in CRC by employing a targeted deep sequencing strategy. The genetic model for CRC development is well defined and the stepwise progression of the disease is largely understood91. CRC progression occurs over many years, with the development from a large adenoma founder cell to an advanced carcinoma founder cell taking as long as 26.5 years143. However, there remains a large gap in our understanding of the genetic basis for the development of CRC from an advanced carcinoma to metastatic disease, a process taking only 1.8 years on average143. This presents a significant problem for clinicians, since patients presenting with stage II and II disease cannot easily be stratified into high and low risk groups, where adjuvant therapy may or may not be beneficial. This also suggests that the majority of mutations required for metastases are already present in the advanced carcinoma. As genetic events driving metastases are suspected to occur late in the disease, conventional sequencing by Sanger and low coverage NGS techniques may not have sufficient sensitivity to discover base level somatic mutations driving the development of metastases. We therefore performed a targeted deep sequencing strategy (~1000-fold coverage) in 107 CRCs in an attempt to uncover less frequent mutations that may be associated with late stage disease. Results and discussion Previous approaches for cataloguing the mutation spectrum in CRC were limited either by the number of patients analysed or a low depth of coverage14,20,107. In this study, we present pathway-oriented mutational profiles of 33 a large number of primary CRCs, of which approximately half eventually developed metastatic disease. We observed the expected frequencies of mutations in known CRC genes such as APC, TP53 and KRAS, confirming the sensitivity of our somatic mutation calling. In addition, we discovered a high mutation rate in genes such as TCF7L2, SETD1B and BMPR2 in microsatellite repeat regions in MSI tumours, suggesting a greater sensitivity of our mutation detection software in regions that are difficult to analyse. Interestingly, we uncovered a significantly higher mutation rate in the Ephrin receptor tyrosine kinase family in MSS primary CRC tumours that proceeded to develop distant metastases compared to those that showed no such progression. Due to the pattern of mutations found in these genes, we propose that they may have a metastasis suppressor role in their wild type state, and that mutations may inactivate this function, thus promoting a more aggressive disease progression. The Ephrin receptor gene family has long been thought to have a role in metastatic disease development and expression analyses have linked both over- and under-expression with cancer progression in many different tissue types. However, specific mutations associated with metastases have not yet been characterised. As a number of patients had mutations in more than one Ephrin receptor gene, we hypothesise that the inactivating effect of these mutations may be cumulative, requiring a number of mutations to promote metastases. This proposal is also supported by the high level of sequence similarity between genes in this family. In conclusion, although a significant difference was observed between metastatic and nonmetastatic patient groups, a larger number of patients would be required to confirm our findings, as well as functional analyses of the mutations found in this study. 34 Concluding remarks and future perspectives Problems associated with sample preparation and correctly identifying patient material have long been recognised, as well as the vast amounts of preanalytical variables that may confound interpretation of results. Currently, the Biorepositories and Biospecimen Research Branch (BBRB) of the National Cancer Institute in the US, and the European Organisation for Research and Treatment of cancer (EORTC) are developing guidelines for the standardisation and improvement of biospecimen handling to ensure that findings are reproducible and reflective of disease state in vivo. Unfortunately, the failure rate in drug development for the treatment of cancer indicates that much remains to be done. Studies by Amgen and Bayer HealthCare attempting to reproduce pre-clinical data have reported dismally low success rates from 11 – 25 %, even with studies from papers considered to show “landmark” findings144,145. Although one cannot only blame poor experimental design and non-standardised sample preparation for the irreproducibility of pre-clinical findings (cancer in itself is a particularly difficult disease due to both inter- and intra-patient heterogeneity), these are certainly some of the limiting factors. Problems with poor sample handling and preparation, combined with the complexity of the disease have resulted in a wealth of problems for drug development. The field of personalised medicine, once an exciting buzzword for finding the ultimate cure to cancer, has become littered with reports of resistance to therapy and relatively small gains in overall or disease-free survival130,146-150. For treatment of advanced CRC, fluoropyrimidines and cytotoxic agents, in combination with surgical resection are still the best treatment options available. Considering that these adjuvant therapies were used prior to the influx of vast quantities of knowledge on the genetic basis of CRC and that targeted treatments have not delivered the results we had hoped for, it is worth questioning the current model of drug development. There is a considerable need to find better strategies for tackling advanced stage cancer. Despite the fact that CRC is a heterogeneous disease, many patients are treated with the same therapy, a “one size fits all” approach. Improving the classification of CRCs at a genetic level, and appreciating the fact that different subtypes respond better or worse to various cytotoxic dugs may help to improve survival rates. In this thesis I attempted to address some of these issues by: 1) developing a means for the comprehensive analysis of a large number of colorectal tissues, 2) using these methods to prepare high quality input 35 materials and 3) analysing this material with the latest sequencing technologies in order to further develop our understanding of advanced stage CRC. In much of the developed world where screening measures for early disease detection in at-risk populations are in place, such as Papanicolaou smears for cervical cancer, mammography for breast cancer and colonoscopy for colorectal cancer, there have been promising results for the prevention and treatment of cancer. As distant metastases cause the majority of deaths from cancer, early detection of disease is imperative. In the Nordic countries, where a faecal occult blood screening programme has been set up to identify patients eligible for colonoscopy for CRC detection, results have been positive, with estimates showing that up to 1% of all premature deaths in a particular age-group can be affected by invitation to the programme151. In fact, during the past two decades mortality from CRC has declined, in particular in Northern and Western Europe. This has in part been attributed to improved screening resulting in earlier detection128. New sequencing technologies for the non-invasive detection of cancer in the circulation are another promising method for potentially detecting tumours at an early stage. There have been a wealth of publications in the field of circulating tumour DNA (ctDNA) in the last few years, and it is clear that as amplification techniques become less error-prone, sequencing methods become more sensitive, and mutation detection software can distinguish true from false positives in an unbiased manner; screening can successfully be performed even in populations not defined as being at-risk152-157. As sequencing costs continue to decrease, one may be able to use the mutational information acquired by sequencing primary tumours to identify cancers in the blood at an early stage, thus preventing advanced disease. This year, Bettegowda and colleagues proposed that the detection of early stage cancers by ctDNA analysis was possible in 50 % of the patients studied155, while Misale et al. demonstrated the potential of “liquid biopsies” for evaluating acquired resistance to therapy in metastatic disease158, indicating that these types of screening strategies may indeed be feasible in the future. For targeted and personalised therapies, there have been a number of successful developments e.g. Gleevec/Imatinib in the treatment of chronic myeloid leukaemia159. In addition, synthetic lethality approaches in the case of poly ADP ribose polymerase (PARP) inhibitors160,161 have shown efficacy in the treatment of the appropriate subset of patients, i.e. cancers with deficient DSBR. Efforts to move away from personalised therapy by targeting processes cancer cells depend on, e.g. by promoting oxidative damage in cancer cells by MTH1 inhibition162 are also promising, but it remains to be seen how these will behave in a clinical setting. Mechanisms of inherent and acquired resistance have caused many problems for tackling disease e.g. as is the case with PARP inhibitors163. This is particularly problematic for EGFR blockade in CRC, due to intrinsic (mutations in genes downstream of EGFR, EGFR copy number, HER3 overex36 pression) and extrinsic (genetic alterations in the Ras pathway, mutations in EGFR and amplification of MET) mechanisms of resistance130. However, as our knowledge of the mechanisms in cancer increases, ways to circumvent the problems of acquired resistance are being proposed. For example, in CRC it has recently been proposed that crizotinib could be used in combination with EGFR blockade in order to circumvent resistance due to amplification of the MET receptor in suitable patients164. In fact, few genetic alterations that are suitable targets for therapy have been identified in CRC. Therefore sequencing and other methods for profiling cancer genomes provide value, to identify possible targets and to allow a better understanding of the genetic mechanisms of innate and acquired resistance, potentially providing a means and rationale for overcoming these problems. 37 Acknowledgements This work was carried out at the Department of Immunology, Genetics and Pathology, at Uppsala University. Financial support was provided by grants from the Swedish Cancer Foundation, The Uppsala/Umeå Comprehensive Cancer Consortium (U-CAN), Bio-X VINNOVA, IMI and the Swedish Foundation for Strategic Research. I would like to express my gratitude to a number of people: First of all, I would like to thank my supervisor Tobias Sjöblom, for taking me in and giving me this great opportunity, for allowing me to develop as a scientist and for encouraging me to delve into the world of business. Thank you for the confidence you have always shown in me. To my co-supervisor Mats Nilsson, it has been a pleasure to work with you. Thank you for all of the patience you have shown and for always having a solution for everything! Thank you to all of the people working at IGP, especially the IGP administration and Christina Magnusson, for taking care of the PhD students and for always being there to listen to our problems and queries. Thanks also to past and present members of our office (the best one at Rudbeck) for making it a pleasure to come to work in the mornings. Thank you to the Uppsala Genome Center for the valuable resource you provide and to Simin Tahmasebpoor for preparing all of the tissue samples used in this thesis. To my colleagues, in particular the members of the Molcan group; Chatarina, Evangelia, Ivaylo, Muhammad, Sara, Snehansghu, Tanja, Tom and Verónica and the now-dwindling Selector group; Elin and Lotte, thank you for all the problem-solving discussions and extra-curricular activities! To all the people that have worked with me on the projects included in this thesis, thank you for some great collaborative efforts. I would particularly like to thank Bengt Glimelius for always being ready to shed some light on the clinical side of CRC. Special thanks to Monica for showing me how to 38 work PCR product-free J and Viktor for valuable discussions and light relief during the “Comedia” months. To the unofficial U-CAN group, thanks for all the fun evenings we shared together. I hope there will be many more! Thanks in particular to Tony for the imaginative email subject lines. To the ExScale team: Karin, Ammar, Sten and Anna, thank you for showing me that there is a world outside of research; working with all of you has been a wonderful and eye-opening experience. To the MGC teaching crew: Anja, Diego and Vasil, thanks for all the fun we managed to have despite the hectic schedules (which version was it again?). I’m sad that there will be no more “red red” emails! To all the friends I have made since coming to Sweden, thank you for taking me in and making me feel that I had a home away from home. To the Irish contingent in Uppsala; Nicola, Lesley, Paul, Frank, and Emma for always being a source of teabags, Cadburys and sanity, go raibh maith agaibh! Thank you Fiona, Linnea and Tanja, for becoming such dear friends to me. To my friends at home, thank you for always making it seem that it was only yesterday when we last saw each other. In particular, thank you Saoirse and Eoin for being the best friends one could ever wish for. To my “Swedish family”: Anette, Josefin, Mikael and Erik, thanks for all the support you have shown me over the past few years and for welcoming me so much into your lives. To my godparents Mary and Liam, thank you for always supporting me in my endeavours and never forgetting to send birthday cards. Johan, det går inte att beskriva hur mycket jag älskar dig för allt du gör. Tack för att du alltid finns där för mig. To my beloved family, Dad, Mam, Judith, Catherine, Tim and Johnny, for your love and all of the support you have given me, thank you. I am so lucky to have a family like you! Dad, a special thanks to you for being the reason I am doing research, you have always encouraged me to learn more. Finally, thank you to all of the patients that agreed to donate samples to these studies and made this work possible. 39 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 40 Vogelstein, B. et al. Cancer Genome Landscapes. Science 339, 1546–1558 (2013). Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000). Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–74 (2011). Vogelstein, B. & Kinzler, K. W. Cancer genes and the pathways they control. Nat Med 10, 789–99 (2004). Knudson, A. G. Mutation and Cancer: Statistical Study of Retinoblastoma. Proc. Natl. Acad. Sci. U. S. A. 68, 820–823 (1971). Kinzler, K. W. & Vogelstein, B. Gatekeepers and caretakers. Nature 386, 761– 763 (1997). Tomlinson, I. P. M., Novelli, M. R. & Bodmer, W. F. The mutation rate and cancer. Proc. Natl. Acad. Sci. U. S. A. 93, 14800–14803 (1996). Nachman, M. W. & Crowell, S. L. Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304 (2000). Lengauer, C., Kinzler, K. W. & Vogelstein, B. Genetic instabilities in human cancers. Nature 396, 643–9 (1998). Hoeijmakers, J. H. J. Genome maintenance mechanisms for preventing cancer. Nature 411, 366–374 (2001). Sieber, O. M., Heinimann, K. & Tomlinson, I. P. M. Genomic instability--the engine of tumorigenesis? Nat. Rev. Cancer 3, 701–708 (2003). Michor, F., Iwasa, Y., Vogelstein, B., Lengauer, C. & Nowak, M. A. Can chromosomal instability initiate tumorigenesis? Semin. Cancer Biol. 15, 43–49 (2005). Bozic, I. et al. Accumulation of driver and passenger mutations during tumor progression. Proc. Natl. Acad. Sci. U. S. A. 107, 18545–18550 (2010). Sjoblom, T. et al. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268–74 (2006). Wood, L. D. et al. The genomic landscapes of human breast and colorectal cancers. Science 318, 1108–13 (2007). Stephens, P. J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400–4 (2012). Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012). Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013). Cancer Genome Atlas Research Network et al. Comprehensive molecular characterization of gastric adenocarcinoma. Nature (2014). doi:10.1038/nature13480 Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–7 (2012). 21. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507, 315–322 (2014). 22. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012). 23. Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014). 24. Cancer Genome Atlas Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–15 (2011). 25. Riegman, P. H., Morente, M. M., Betsou, F., de Blasio, P. & Geary, P. Biobanking for better healthcare. Mol. Oncol. 2, 213–22 (2008). 26. Oosterhuis, J. W., Coebergh, J. W. & van Veen, E. B. Tumour banks: wellguarded treasures in the interest of patients. Nat. Rev. Cancer 3, 73–7 (2003). 27. Hudson, T. J. et al. International network of cancer genome projects. Nature 464, 993–8 (2010). 28. Carter, A. & Betsou, F. Quality Assurance in Cancer Biobanking. Biopreservation Biobanking 9, 7 (2011). 29. Moore, H. M. et al. Biospecimen Reporting for Improved Study Quality. Biopreservation Biobanking 9, 57–70 (2011). 30. Hewitt, R. E. Biobanking: the foundation of personalized medicine. Curr. Opin. Oncol. 23, 112–9 (2011). 31. Mee, B. C. et al. Maintaining breast cancer specimen integrity and individual or simultaneous extraction of quality DNA, RNA and proteins from Allprotectstabilized and nonstabilized tissue samples. Biopreservation Biobanking 9, 10 (2011). 32. Choudhuri, S. Some major landmarks in the path from nuclein to human genome (1). Toxicol. Mech. Methods 16, 137–59 (2006). 33. Farragher, S. M., Tanney, A., Kennedy, R. D. & Paul Harkin, D. RNA expression analysis from formalin fixed paraffin embedded tissues. Histochem. Cell Biol. 130, 435–45 (2008). 34. Douglas, M. P. & Rogers, S. O. DNA damage caused by common cytological fixatives. Mutat. Res. 401, 77–88 (1998). 35. McGhee, J. D. & von Hippel, P. H. Formaldehyde as a probe of DNA structure. 3. Equilibrium denaturation of DNA and synthetic polynucleotides. Biochemistry (Mosc.) 16, 3267–76 (1977). 36. Williams, C. et al. A high frequency of sequence alterations is due to formalin fixation of archival specimens. Am. J. Pathol. 155, 1467–71 (1999). 37. Carpi, F. M., Di Pietro, F., Vincenzetti, S., Mignini, F. & Napolioni, V. Human DNA extraction methods: patents and applications. Recent Pat. DNA Gene Seq. 5, 1–7 (2011). 38. Goelz, S. E., Hamilton, S. R. & Vogelstein, B. Purification of DNA from formaldehyde fixed and paraffin embedded human tissue. Biochem. Biophys. Res. Commun. 130, 118–26 (1985). 39. Von Ahlfen, S., Missel, A., Bendrat, K. & Schlumpberger, M. Determinants of RNA Quality from FFPE Samples. PLoS ONE 2, (2007). 40. Schmitt, M. et al. European Organisation for Research and Treatment of Cancer (EORTC) Pathobiology Group standard operating procedure for the preparation of human tumour tissue extracts suited for the quantitative analysis of tissue-associated biomarkers. Eur J Cancer 43, 835–44 (2007). 41. Ebeling, W. et al. Proteinase K from Tritirachium album Limber. Eur. J. Biochem. FEBS 47, 91–7 (1974). 41 42. Hilz, H., Wiegers, U. & Adamietz, P. Stimulation of proteinase K action by denaturing agents: application to the isolation of nucleic acids and the degradation of ‘masked’ proteins. Eur. J. Biochem. FEBS 56, 103–8 (1975). 43. Sambrook, J. & Russell, D. Molecular cloning, a laboratory manual. (Cold Spring Harbor Laboratory Press, 2001). 44. Miller, S. A., Dykes, D. D. & Polesky, H. F. A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res. 16, 1215 (1988). 45. Vogelstein, B. & Gillespie, D. Preparative and analytical purification of DNA from agarose. Proc. Natl. Acad. Sci. U. S. A. 76, 615–9 (1979). 46. Boom, R. et al. Rapid and simple method for purification of nucleic acids. J. Clin. Microbiol. 28, 495–503 (1990). 47. Boom, R. et al. Improved silica-guanidiniumthiocyanate DNA isolation procedure based on selective binding of bovine alpha-casein to silica particles. J. Clin. Microbiol. 37, 615–9 (1999). 48. Berensmeier, S. Magnetic particles for the separation and purification of nucleic acids. Appl Microbiol Biotechnol 73, 495–504 (2006). 49. Endres, H. N., Johnson, J. A. C., Ross, C. A., Welp, J. K. & Etzel, M. R. Evaluation of an ion-exchange membrane for the purification of plasmid DNA. Biotechnol. Appl. Biochem. 37, 259–266 (2003). 50. Bartl, K., Wenzig, P. & Kleiber, J. Simple and broadly applicable sample preparation by use of magnetic glass particles. Clin. Chem. Lab. Med. CCLM FESCC 36, 557–9 (1998). 51. Hourfar, M. K. et al. High-throughput purification of viral RNA based on novel aqueous chemistry for nucleic acid isolation. Clin Chem 51, 1217–22 (2005). 52. Hawkins, T. L., O’Connor-Morin, T., Roy, A. & Santillan, C. DNA purification and isolation using a solid-phase. Nucleic Acids Res. 22, 4543–4 (1994). 53. Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006). 54. Friis, S. L. et al. Typing of 30 insertion/deletions in Danes using the first commercial indel kit--Mentype(R) DIPplex. Forensic Sci. Int. Genet. 6, e72–4 (2012). 55. Fondevila, M. et al. Forensic performance of two insertion-deletion marker assays. Int. J. Legal Med. 126, 725–37 (2012). 56. Pereira, R. et al. A new multiplex for human identification using insertion/deletion polymorphisms. Electrophoresis 30, 3682–90 (2009). 57. Metzker, M. L. Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31–46 (2010). 58. Schneider, G. F. & Dekker, C. DNA sequencing with nanopores. Nat. Biotechnol. 30, 326–328 (2012). 59. Oxford Nanopore Technologies. Oxford Nanopore introduces DNA ‘strand sequencing’ on the high-throughput GridION platform and presents MinION, a sequencer the size of a USB memory stick. Press Release (2012). at <https://www.nanoporetech.com/news/press-releases/view/39> 60. Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–9 (2008). 61. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 366, 883–92 (2012). 62. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011). 63. Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature (2014). doi:10.1038/nature13600 42 64. Hagemann, I. S., Cottrell, C. E. & Lockwood, C. M. Design of targeted, capture-based, next generation sequencing tests for precision cancer therapy. Cancer Genet. 206, 420–431 (2013). 65. Bodi, K. et al. Comparison of Commercially Available Target Enrichment Methods for Next-Generation Sequencing. J. Biomol. Tech. JBT 24, 73–86 (2013). 66. Johansson, H. et al. Targeted resequencing of candidate genes using selector probes. Nucleic Acids Res. 39, e8 (2011). 67. Dahl, F., Gullberg, M., Stenberg, J., Landegren, U. & Nilsson, M. Multiplex amplification enabled by selective circularization of large sets of genomic DNA fragments. Nucleic Acids Res. 33, e71 (2005). 68. Dahl, F. et al. Multigene amplification and massively parallel sequencing for cancer mutation discovery. Proc. Natl. Acad. Sci. U. S. A. 104, 9387–92 (2007). 69. Altmann, A. et al. A beginners guide to SNP calling from high-throughput DNA-sequencing data. Hum. Genet. 131, 1541–1554 (2012). 70. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013). 71. Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012). 72. Xu, H., DiCarlo, J., Satya, R. V., Peng, Q. & Wang, Y. Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15, 244 (2014). 73. Ferlay J, S. I. GLOBOCAN 2012 v1.0, Cancer Incidence and Mortality Worldwide. (2013). at <http://globocan.iarc.fr> 74. Cunningham, D. et al. Colorectal cancer. Lancet 375, 1030–47 (2010). 75. Boyle, P. & Langman, J. S. ABC of colorectal cancer: Epidemiology. BMJ 321, 805–8 (2000). 76. Jemal, A. et al. Global cancer statistics. CA. Cancer J. Clin. 61, 69–90 (2011). 77. Chlebowski, R. T. et al. Estrogen plus progestin and colorectal cancer in postmenopausal women. N. Engl. J. Med. 350, 991–1004 (2004). 78. Hamilton, S. R., A., L.A. Pathology and Genetics of Tumours of the Digestive System. (IARC Press, 2000). 79. Bodmer, W. F. et al. Localization of the gene for familial adenomatous polyposis on chromosome 5. Publ. Online 13 August 1987 Doi101038328614a0 328, 614–616 (1987). 80. Lynch, H. T. & Krush, A. J. Cancer family ‘G’ revisited: 1895-1970. Cancer 27, 1505–1511 (1971). 81. Lynch, H. T. & Smyrk, T. Hereditary nonpolyposis colorectal cancer (Lynch syndrome): An updated review. Cancer 78, 1149–1167 (1996). 82. Briggs, S. & Tomlinson, I. Germline and somatic polymerase ε and δ mutations define a new class of hypermutated colorectal and endometrial cancers. J. Pathol. 230, 148–153 (2013). 83. Palles, C. et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat. Genet. 45, 136–144 (2013). 84. National Cancer Institute. Rectal Cancer Treatment (PDQ®). National Cancer Institute at <http://www.cancer.gov/cancertopics/pdq/treatment/rectal/Health Professional/page3> 85. Muto, T., Bussey, H. J. R. & Morson, B. C. The evolution of cancer of the colon and rectum. Cancer 36, 2251–2270 (1975). 43 86. Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759–67 (1990). 87. Kinzler, K. W. & Vogelstein, B. Lessons from hereditary colorectal cancer. Cell 87, 159–170 (1996). 88. Powell, S. M. et al. APC mutations occur early during colorectal tumorigenesis. Nature 359, 235–7 (1992). 89. Vogelstein, B. et al. Genetic alterations during colorectal-tumor development. N Engl J Med 319, 525–32 (1988). 90. Rajagopalan, H. et al. Tumorigenesis: RAF/RAS oncogenes and mismatchrepair status. Nature 418, 934 (2002). 91. Fearon, E. R. Molecular Genetics of Colorectal Cancer. Annu. Rev. Pathol. Mech. Dis. 6, 479–507 (2011). 92. Samuels, Y. et al. High Frequency of Mutations of the PIK3CA Gene in Human Cancers. Science 304, 554 (2004). 93. Baker, S. J. et al. p53 Gene Mutations Occur in Combination with 17p Allelic Deletions as Late Events in Colorectal Tumorigenesis. Cancer Res. 50, 7717– 7722 (1990). 94. Akhoondi, S. et al. FBXW7/hCDC4 Is a General Tumor Suppressor in Human Cancer. Cancer Res. 67, 9006–9012 (2007). 95. Danielsen, S. A. et al. Novel mutations of the suppressor gene PTEN in colorectal carcinomas stratified by microsatellite instability- and TP53 mutationstatus. Hum. Mutat. 29, E252–E262 (2008). 96. Leary, R. J. et al. Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers. Proc. Natl. Acad. Sci. U. S. A. 105, 16224–16229 (2008). 97. Markowitz, S. et al. Inactivation of the type II TGF-beta receptor in colon cancer cells with microsatellite instability. Science 268, 1336–1338 (1995). 98. Grady, W. M. et al. Mutational Inactivation of Transforming Growth Factor β Receptor Type II in Microsatellite Stable Colon Cancers. Cancer Res. 59, 320– 324 (1999). 99. Lengauer, C., Kinzler, K. W. & Vogelstein, B. Genetic instability in colorectal cancers. Nature 386, 623–627 (1997). 100. Fodde, R. et al. Mutations in the APC tumour suppressor gene cause chromosomal instability. Nat. Cell Biol. 3, 433–8 (2001). 101. Aaltonen, L. A. et al. Clues to the pathogenesis of familial colorectal cancer. Science 260, 812–816 (1993). 102. Herman, J. G. et al. Incidence and functional consequences of hMLH1 promoter hypermethylation in colorectal carcinoma. Proc. Natl. Acad. Sci. U. S. A. 95, 6870–6875 (1998). 103. Toyota, M. et al. CpG island methylator phenotype in colorectal cancer. Proc. Natl. Acad. Sci. U. S. A. 96, 8681–8686 (1999). 104. Kambara, T. et al. BRAF mutation is associated with DNA methylation in serrated polyps and cancers of the colorectum. Gut 53, 1137–1144 (2004). 105. Issa, J.-P. CpG island methylator phenotype in cancer. Nat. Rev. Cancer 4, 988–993 (2004). 106. Jass, J. R. Classification of colorectal cancer based on correlation of clinical, morphological and molecular features. Histopathology 50, 113–130 (2007). 107. Bass, A. J. et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat. Genet. 43, 964–8 (2011). 108. Kim, T. M., Laird, P. W. & Park, P. J. The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 155, 858–68 (2013). 44 109. Cajuso, T. et al. Exome sequencing reveals frequent inactivating mutations in ARID1A, ARID1B, ARID2 and ARID4A in microsatellite unstable colorectal cancer. Int. J. Cancer 135, 611–623 (2014). 110. Seshagiri, S. et al. Recurrent R-spondin fusions in colon cancer. Nature 488, 660–4 (2012). 111. Tahara, T. et al. Colorectal Carcinomas With CpG Island Methylator Phenotype 1 Frequently Contain Mutations in Chromatin Regulators. Gastroenterology 146, 530–538.e5 (2014). 112. Tuupanen, S. et al. Identification of 33 candidate oncogenes by screening for base-specific mutations. Br. J. Cancer (2014). doi:10.1038/bjc.2014.429 113. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–8 (2013). 114. Angus-Hill, M. L., Elbert, K. M., Hidalgo, J. & Capecchi, M. R. T-cell factor 4 functions as a tumor suppressor whose disruption modulates colon cell proliferation and tumorigenesis. Proc. Natl. Acad. Sci. U. S. A. 108, 4914–9 (2011). 115. Major, M. B. et al. Wilms tumor suppressor WTX negatively regulates WNT/beta-catenin signaling. Science 316, 1043–1046 (2007). 116. Patsialou, A., Wilsker, D. & Moran, E. DNA-binding properties of ARID family proteins. Nucleic Acids Res. 33, 66–80 (2005). 117. Xie, C., Fu, L., Han, Y., Li, Q. & Wang, E. Decreased ARID1A expression facilitates cell proliferation and inhibits 5-fluorouracil-induced apoptosis in colorectal carcinoma. Tumour Biol. J. Int. Soc. Oncodevelopmental Biol. Med. (2014). doi:10.1007/s13277-014-2074-y 118. Chadwick, R. B. et al. Candidate tumor suppressor RIZ is frequently involved in colorectal carcinogenesis. Proc. Natl. Acad. Sci. 97, 2662–2667 (2000). 119. Morris, L. G. T. et al. Recurrent somatic mutation of FAT1 in multiple human cancers leads to aberrant Wnt activation. Nat. Genet. 45, 253–261 (2013). 120. Guo, C. et al. KMT2D maintains neoplastic cell proliferation and global histone H3 lysine 4 monomethylation. Oncotarget 4, 2144–2153 (2013). 121. Gao, J. et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Sci. Signal. 6, pl1–pl1 (2013). 122. Cerami, E. et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov. 2, 401– 404 (2012). 123. SRCT. Improved Survival with Preoperative Radiotherapy in Resectable Rectal Cancer. N. Engl. J. Med. 336, 980–987 (1997). 124. Bracht, K., Nicholls, A. M., Liu, Y. & Bodmer, W. F. 5-Fluorouracil response in a large panel of colorectal cancer cell lines is associated with mismatch repair deficiency. Br. J. Cancer 103, 340–346 (2010). 125. Bardelli, A. & Siena, S. Molecular Mechanisms of Resistance to Cetuximab and Panitumumab in Colorectal Cancer. J. Clin. Oncol. 28, 1254–1261 (2010). 126. Kopetz, S. et al. Phase II Trial of Infusional Fluorouracil, Irinotecan, and Bevacizumab for Metastatic Colorectal Cancer: Efficacy and Circulating Angiogenic Biomarkers Associated With Therapeutic Resistance. J. Clin. Oncol. 28, 453–459 (2010). 127. Misale, S. et al. Emergence of KRAS mutations and acquired resistance to anti-EGFR therapy in colorectal cancer. Nature 486, 532–536 (2012). 128. Schmoll, H. J. et al. ESMO Consensus Guidelines for management of patients with colon and rectal cancer. A personalized approach to clinical decision making. Ann. Oncol. 23, 2479–2516 (2012). 129. Van Cutsem, E. et al. Cetuximab and Chemotherapy as Initial Treatment for Metastatic Colorectal Cancer. N. Engl. J. Med. 360, 1408–1417 (2009). 45 130. Van Emburgh, B. O., Sartore-Bianchi, A., Di Nicolantonio, F., Siena, S. & Bardelli, A. Acquired resistance to EGFR-targeted therapies in colorectal cancer. Mol. Oncol. (2014). doi:10.1016/j.molonc.2014.05.003 131. Segal, N. H. & Saltz, L. B. Evolving Treatment of Advanced Colon Cancer. Annu. Rev. Med. 60, 207–219 (2009). 132. Adam, R. et al. Patients with initially unresectable colorectal liver metastases: is there a possibility of cure? J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 27, 1829–1835 (2009). 133. Folprecht, G. et al. Survival of patients with initially unresectable colorectal liver metastases treated with FOLFOX/cetuximab or FOLFIRI/cetuximab in a multidisciplinary concept (CELIM study). Ann. Oncol. 25, 1018–1025 (2014). 134. Gill, S. et al. Pooled analysis of fluorouracil-based adjuvant therapy for stage II and III colon cancer: who benefits and by how much? J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 22, 1797–806 (2004). 135. Siegel, R., DeSantis, C. & Jemal, A. Colorectal cancer statistics, 2014. CA. Cancer J. Clin. 64, 104–117 (2014). 136. Chambers, A. F., Groom, A. C. & MacDonald, I. C. Metastasis: Dissemination and growth of cancer cells in metastatic sites. Nat Rev Cancer 2, 563–572 (2002). 137. Nguyen, D. X. & Massagué, J. Genetic determinants of cancer metastasis. Nat. Rev. Genet. 8, 341–352 (2007). 138. Brannon, A. R. et al. Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions. Genome Biol. 15, 454 (2014). 139. Roth, A. D. et al. Prognostic role of KRAS and BRAF in stage II and III resected colon cancer: results of the translational study on the PETACC-3, EORTC 40993, SAKK 60-00 trial. J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 28, 466–474 (2010). 140. Gavin, P. G. et al. Mutation profiling and microsatellite instability in stage II and III colon cancer: an assessment of their prognostic and oxaliplatin predictive value. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 18, 6531–6541 (2012). 141. Mouradov, D. et al. Survival in stage II/III colorectal cancer is independently predicted by chromosomal and microsatellite instability, but not by specific driver mutations. Am. J. Gastroenterol. 108, 1785–1793 (2013). 142. Isaksson, M. et al. MLGA--a rapid and cost-efficient assay for gene copynumber analysis. Nucleic Acids Res 35, e115 (2007). 143. Jones, S. et al. Comparative lesion sequencing provides insights into tumor evolution. Proc. Natl. Acad. Sci. U. S. A. 105, 4283–8 (2008). 144. Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712–712 (2011). 145. Begley, C. G. & Ellis, L. M. Drug development: Raise standards for preclinical cancer research. Nature 483, 531–533 (2012). 146. Turner, N. C. & Reis-Filho, J. S. Genetic heterogeneity and cancer drug resistance. Lancet Oncol. 13, e178–185 (2012). 147. Groenendijk, F. H. & Bernards, R. Drug resistance to targeted therapies: Déjà vu all over again. Mol. Oncol. (2014). doi:10.1016/j.molonc.2014.05.004 148. Workman, P., Al-Lazikani, B. & Clarke, P. A. Genome-based cancer therapeutics: targets, kinase drug resistance and future strategies for precision oncology. Curr. Opin. Pharmacol. 13, 486–496 (2013). 46 149. Chong, C. R. & Jänne, P. A. The quest to overcome resistance to EGFRtargeted therapies in cancer. Nat. Med. 19, 1389–1400 (2013). 150. Von Manstein, V. et al. Resistance of Cancer Cells to Targeted Therapies Through the Activation of Compensating Signaling Loops. Curr. Signal Transduct. Ther. 8, 193–202 (2013). 151. Sigurdsson, J. A., Getz, L., Sjönell, G., Vainiomäki, P. & Brodersen, J. Marginal public health gain of screening for colorectal cancer: modelling study, based on WHO and national databases in the Nordic countries. J. Eval. Clin. Pract. 19, 400–407 (2013). 152. Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med 20, 548–54 (2014). 153. Diehl, F. et al. Circulating mutant DNA to assess tumor dynamics. Nat Med 14, 985–90 (2008). 154. Diehl, F. et al. Detection and quantification of mutations in the plasma of patients with colorectal tumors. Proc. Natl. Acad. Sci. U. S. A. 102, 16368–73 (2005). 155. Bettegowda, C. et al. Detection of circulating tumor DNA in early- and latestage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014). 156. Forshew, T. et al. Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. Sci. Transl. Med. 4, 136ra68 (2012). 157. Swisher, E. M. et al. Tumor-specific p53 sequences in blood and peritoneal fluid of women with epithelial ovarian cancer. Am. J. Obstet. Gynecol. 193, 662–7 (2005). 158. Misale, S. et al. Blockade of EGFR and MEK Intercepts Heterogeneous Mechanisms of Acquired Resistance to Anti-EGFR Therapies in Colorectal Cancer. Sci. Transl. Med. 6, 224ra26–224ra26 (2014). 159. Druker, B. J. et al. Efficacy and Safety of a Specific Inhibitor of the BCR-ABL Tyrosine Kinase in Chronic Myeloid Leukemia. N. Engl. J. Med. 344, 1031– 1037 (2001). 160. Bryant, H. E. et al. Specific killing of BRCA2-deficient tumours with inhibitors of poly(ADP-ribose) polymerase. Nature 434, 913–917 (2005). 161. Farmer, H. et al. Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy. Nature 434, 917–921 (2005). 162. Gad, H. et al. MTH1 inhibition eradicates cancer by preventing sanitation of the dNTP pool. Nature 508, 215–221 (2014). 163. Edwards, S. L. et al. Resistance to therapy caused by intragenic deletion in BRCA2. Nature 451, 1111–1115 (2008). 164. Bardelli, A. et al. Amplification of the MET receptor drives resistance to antiEGFR therapies in colorectal cancer. Cancer Discov. 3, 658–673 (2013). 47 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1028 Editor: The Dean of the Faculty of Medicine A doctoral dissertation from the Faculty of Medicine, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine”.) Distribution: publications.uu.se urn:nbn:se:uu:diva-231508 ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2014