Download 1. A brief overview of sequencing biochemistry

Supplementary reading materials on Genome sequencing (optional) The materials are from Mark Blaxter’s lecture notes on Sequencing strategies and Primary Analysis 1. A brief overview of sequencing biochemistry Modern DNA sequencing uses primer directed extension of a DNA strand from a single-stranded template using a DNA polymerase. Primers are 18-25 bases in length. Either temperate or thermostable polymerases can be used, but thermostable polymearse (Taq polymerase and related enzymes) are the norm. Most sequencing now uses the dideoxy termination system. While the DNA polymerase will add a dideoxynucleotide complementary to the template strand, it cannot further extend that product after the addition of a dideoxynucleotide. This biochemistry is used to produce populations of products specifically terminated at either A, G, C or T residues. These are labeled in some way and visualised after separation by electrophoresis. One method for labeling is to use radioactive nucleotides (P32 or P33 or S35) to label the oligonucleotide primer. Four reactions are performed (one each for A,G,C and T), and electrophoresed side by side in a denaturing polyacrylamide gel. The products are separated by size at base resolution and the sequence read from the pattern of bands on the gel. Labelling of primers is a time consuming step. Alternatively, radiolabelled nucleotide triphosphate (usually S35 dCTP) can be added to reactions performed with unlabelled primer, and the products run as before. This method was the one most widely used before the introduction of automated sequencers. Sequence read lengths from radioactive gels were typically 200-450 bases from four or eight lanes (the same reactions are often electrophoresed twice, once for four hours, to resolve long fragments, and once for two hours). To allow automated, nonradioactive sequencing, dye-labelled sequencing was devised. This method uses a set of rhodamine-based fluorescent dyes that are detected after excitation with a laser. The sequencing can proceed as for radioactive labelling, with dye-labelled primer, or dye labelled nucleotide in a dideoxyterminator reaction. These are run (four lanes per sample) on an acrylamide gel, and the fluorescence read by scanning with an infrared laser as the DNA products migrate past a particular point on the gel. The availability of multiple dyes with different emission spectra led to the development of the four-dye one-lane system. Four aliquots of primer end-labelled with the four different dyes are used to perform the A,G,C and T reactions. These are pooled and run in a single lane of a gel. The sequenator reads the gel by using a spectrop[hotometer to distinguish between the different dye primers, and thus the different bases. This system has been further improved by the development of dye-labeled terminators (dideoxynucleotides) that will simultaneously terminate and fluorescently tag a product. These reactions can be performed in a single tube, and run in a single lane. Currently, the four-dye systems can routinely read >600 bases/lane, and the four-lane one-dye systems can read over 1kb per reaction. There is continuing effort to improve both the machines for running sequence and the chemistry of the reactions. Brighter dyes that give better resolution between emission spectra, and give more even incorporration have been developed. in terms of instrumentation, it is now possible to perform the electrophoresis and detection in a calillary tube system, resulting in much improved throughput (current machines can do 96 reads of >400 bases every 3 hrs, each, for only about £300,000/machine). Some regions of DNA are difficult to sequence due to the intrinsic properties of the DNA. This can be compositional bias (AT versus GC content), homopolymeric runs (long stretches of a single nucleotide) or the presence of heat stable hairpin-forming seuences that prevent or impede the passage of the polymerase. To sequence such regions it is possible to try different methods (dye primer versus dye terminator), to use nucleotide analogues (inosine instead of guanosine) and to add modifiers to the reaction mix (such as dimethylsulphoxide). 2 Strategies for sequencing For smaller pieces of DNA (individual clones, small viruses, plasmids) it is possible to sequence them to completion by primer walking strategies. A start point is made using a primer to a region of known sequence. A sequencing reaction is performed and the new sequence is used to design a new primer further along the molecule. This primer is used for sequencing and the process repeated until the molecule has been completely sequenced on both strands. The problems associated with this strategy are it is slow (400 bases at a time, and 2-6 days between sequencing events) it is expensive (as so many different new primers are needed). Despite these limitations, this is the standard way used by most non-genome labs for sequencing fragments longer than 400 bases. Alternative methods devised for sequencing such small pieces of DNA involve generating nested deletions of the cloned DNA fragment, recloning these and sequencing the resultant population of clones from a universal primer site in the cloning vector. If a set of nested deletions are made, each 400 bases shorter than the previous one, a sequencing walk can be undertaken using only one primer. There are several ways of making deletions, and restriction enzymes and non-specific exonucleases can be used. This method is slow because it involves the step of making the deletion clones which can be tricky. Primer Walking for sequencing use a known primer to get first sequence use new sequence to predict new primer, repeat Contiguate sequences (from both strands) Physical mapping A physical map is a set of cloned DNA fragments whose position relative to each other in the genome is known. The complete DNA sequence of a gene or genome is the ultimate physical map. However, it is useful to construct intermediate level physical maps from cloned fragments: these cloned fragments can subsequently be used for sequencing or other manipulations. A large genome (say a bacterial genome of 3 million base pairs (Mb)) can be subcloned into a lambda phage vector, capable of carrying between 15 and 20 kb. Thus the minimum number of clones required to cover the genome will be 2000, if there is no overlap. A library of such clones can be compared to each other and those that overlap aligned and placed in position on the chromosome relative to each other. The sets of clones, called contigs or contiguated clones, can then be checked for stability, exact representation of the starting genome, etc. It is usual to use large-insert vectors for physical mapping. Thus lambda phage (up to 20 kb), cosmid (<35 kb), bacterial artificial chromosome or BAC (<150 kb) and yeast artificial chromosome or YAC (<3 Mb) vectors are routinely used. For smaller sequencing projects it may be viable to use plasmid vectors. A genomic library A genome....cut into pieces And cloned as a library in a vector (red) Building a physical map The clones are ordered by either hybridisation, fingerprinting or end-sequencing. Hybridisation methods use labelled probes to detect clones that share sequence. Probes can be generated from each end of the clone by "end rescue" or DNA fragments isolated in other ways can be used (for example, cDNA clones). One problem with the hybridisation method is that in the presence of a significant repeat content in the genome, some probes/clone ends will fail to provide a unique link to the next segment of the genome. This method was used to generate a map of the Schizosaccahromyces pombe genome. Hybridisation mapping: 1 pick clones into a grid 2 hybridise to probe 1 3 hybridise to probe 2 4 build contigs In this case, two clones hybridised to both probes and thus they are predicted to overlap. Those hybridising to only one probe are predicted to extend out to the left or right. Fingerprinting methods rely on the presence of unique restriction sites (based on unique sequence) in segments of DNA shared by two overlapping clones. DNA is prepared from clones and digested with one or two restriction enzymes to generate a set of subfragments. These are analysed on high-resolution gels, and the "fingerprint" pattern of bands used to identify the clone. Detection of bands can be by radioactive labelling of each one, or by staining with sensitive DNA-detection dyes and visualisation using fluorescence readers. Computer algorithms are used to compare fingerprints from different clones and define overlaps. The C. elegans genome was physically mapped in this way using a cosmid clone library of 17,000 clones and a two-enzyme digest. Fingerprinting Mapping 1. Digest clones with restriction enzyme and label, electrophorese on gel. (V=vector bands present in all clones) 2. Determine overlap by shared patterns of bands. In completing a physical map, it is often essential to use more than one library, and more than one cloning system. In random sampling from the library, it is possible that certain segments of the genome are not represented and others overrepresented. This stochastic selection will result in a physical map with gaps. The gaps can be crossed by using directed approaches using hybridization selection of "bridging" clones. However, not all DNA is equally easily cloned. Bacteria for example, tend to "dislike" highly repetitive sequences, and thus repetitive DNA will be underrepresented in a bacterial clone library. To over come this differential representation problem, several solutions have been used. Vectors that have lower copy number per cell tend to yield libraries with better representation (as it is less likely that a "poisonous" sequence will kill the cell, or a repetitive sequence find a partner to recombine and delete with). Alternatively, different cloning hosts (bacteria versus yeast in general) have different properties, and it is often possible to recover "unclonable" DNA from an alternative host. Yeast, for example, is able to maintain AT rich DNA more effectively than E. coli. A portion of the C. elegans physical map. The longer lines at the top are YAC clones (their names start with a Y). The shorter items below are cosmid clones. The bold YACs are ones used in mapping cDNAs to the genome. The yellow boxed cosmids are those sequenced. Cosmids with a following * are ones that are cananical for a set of smaller cosmids, that are not displayed. The yellow bar at the bottom represents the sequenced DNA. The triangles indicate points of transposon insertion in strains of C. elegans.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1. A brief overview of sequencing biochemistry