Download FZ4201 Assignment I Part 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Synthetic biology wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

DNA sequencing wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Molecular evolution wikipedia , lookup

Exome sequencing wikipedia , lookup

Genome evolution wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Transcript
FZ4201 Assignment I Part 1
Good against Evil
The question of the essay consists of two parts: describe and discuss the
methodology. Therefore I will start with the description of the methods and strategies
of both parties involved and will end with a discussion.
Description
The International Human Genome Sequencing Consortium
This Consortium was a collaboration of 20 groups from different countries brought
together to produce a draft human genome sequence.
In order to produce such a draft, a technique called basic shotgun sequencing was
considered. But this technique could not be used with repeat-rich genomes such as the
human genome as misalignment and misassembly would occur all too frequent.
Two solutions were available:

Whole-genome shotgun analysis: this technique had been used in the past for
the repeat-poor genomes of viruses, bacteria and flies using linking
information and computational analysis to avoid misassemblies.

Hierarchical shotgun sequencing: large-insert clones (100-200kb). Some of
these may suffer rearrangement but this can be reduced by `clone fingerprints`.
They decided to use the hierarchical shotgun sequencing technique for several
reasons1:
1. After the draft sequence would be complete, the ultimate frequency of
misassembly would probably be lower than with the whole-genome approach,
in which it would be more difficult to identify regions in which the assembly
was incorrect.
2. Heterozygosity and SNP`s can make the assembly more difficult with the
whole-genome approach as for hierarchical shotgun sequencing, each large
clone is derived from a single haplotype and will not experience these kind of
problems.
3. Hierarchical shotgun sequencing would be more able to deal with cloning
biases, because it would be easier to sequence under-represented sequences
afterwards.
4. Hierarchical shotgun sequencing allowed work and responsibility to be
internationally distributed
Figure 1. Hierarchical Shotgun Sequencing technique used by the HGP 2.
The Human Genome Project (HGP) adopted a `map first-sequence later' approach1
(figure 1). Fragments of DNA up to several thousand base pairs long are produced by
restriction enzymes, and inserted into synthetic chromosomes known as bacterial
artificial chromosomes (BAC`s). These BAC`s are then grown in batches and
subsequently mapped on the genome's chromosomes by looking for distinctive
marker sequences, called sequence tagged sites (STS`s), whose location had already
been pinpointed.
Figure 2. Working backwards from the gel to the genome3.
By working backwards (figure2) from the physical map created by the STS`s a
fingerprint clone contig was assembled by using a computer program called FPC to
analyse the restriction enzyme digestion patterns of the large-insert clones. To
minimize overlap between adjacent clones, clones were then chosen of which all of its
restriction enzyme fragments were being shared with at least one of its neighbours on
each side in the contig. Once these overlapping clones had been sequenced, the set
was called a `sequence-clone contig` (figure 3).
Figure 3. Assembly done by FPC1.
.
When all the selected clones from a fingerprint clone contig were sequenced on slabgel or capillary based devices, the sequence-clone contig was the same as the
fingerprint clone contig. After all contigs were sequenced, GigAssembler merged
them together and tried to order and orient the contigs, hereby creating the draft
genome (figure 4).
Figure 4. Assembly of the contigs into scaffolds1.
Celera Genomics
Whole-genome strategy
Celera chose a mixed-strategy4: a whole-genome strategy and a regional chromosome
assembly, each combining sequence data from Celera and the Human Genome Project
whose data was publicly available.
The first strategy combined data from both parties in the form of additional synthetic
shotgun data, and the second strategy was a compartmentalized assembly process that
first divided the Celera and HGP data into scaffolds, localized to larger chromosomal
segments which were assembled afterwards. This final step in assembling the draft
genome was to order and orient the scaffolds on the chromosomes and was done using
the physical mapping markers.
It left out the mapping stage with the BAC’s and went straight for subcloning the
DNA fragments in plasmids (figure 5). This approach saved time and effort but it
would make the assembly more dependent on algorithms and computers.
Figure 5. Whole genome shotgun sequencing method2.
The reason for leaving the mapping step out was explained by Gene Myers, vicepresident of informatics research with Celera, and James Weber of the Marshfield
Medical Research Foundation in Wisconsin. They argued that the reassembly process
of the cloned fragments by using algorithms could be applied to cloned random
fragments taken from the genome as a whole5. The correct position of these scaffolds
on the genome was then worked out using STS`s.
Discussion
Two researchers Michael Olivier (Stanford) and George Church (Harvard) found that
the draft assemblies were similar in size, contain comparable numbers of unique
sequences…and exhibit similar statistics6. But because the assembly process was
different in both projects, the gap distributions and sizes of the contigs were different
and this must have had an impact on the quality of the sequence. As HGP presents
more stages of assembly by providing four phases of sequence data compared to
Celera, it should have a more accurate sequence.
Two articles, published at the same time as Celera and HGP published their articles,
discuss several features of the strategies. Incidentally or not, the article in Science7 is
in favour of the Celera strategy and vice versa3.
Galas7 explains that since Celera used paired-end sequences to link contigs together,
they could put them in the right order and orientation. As they used several known
sizes of plasmid clones for sequencing and always generated sequence pairs at known
distances from each other they could also put the contigs at the right distance from
each other, even when there were gaps in between.
One weakness in the HGP explained by Olson3 was that there was not always a BAC
clone to cover every part of the genome; and overlaps between clones could have
been obscured by data errors or the presence of large-scale repeats in the genome.
So why did the researchers from the HGP choose for BAC’s? Because the finishing
phase is easier: Resolving the internal gaps and discrepancies could be done by
subcloning each BAC.
Conclusion
We could conclude that Celera used more or less the same strategy as the HGP but in
reverse order. Celera first sequenced the contigs and mapped them afterwards, while
the HGP first mapped the contigs and then assembled them into a genome.
Both strategies had advantages and disadvantages and it is difficult to determine
which one has obtained the better genome sequence. One thing remains clear though:
Two sequences are better than one.
References
1. International Human Genome Sequencing Consortium (2001) Initial sequencing
and analysis of the human genome. Nature 409, 860 – 921
2. Campbell & Heyer (2002) Sequencing whole genomes. Available from:
Http://occawlonline.pearsoned.com/bookbind/pubbooks/bc_mcampbell_genomics_1/
medialib/method/shotgun.html [Accessed 20th October 2004].
3. Olson, M.V. (2001) The maps: Clone by clone by clone. Nature 409, 816 – 818
4. Venter, J. C. et al. (2001) The Sequence of the Human Genome. Science 291, 1304
- 1351.
5. Weber, J. L. & Myers, E. W. (1997) Human whole-genome shotgun sequencing.
Genome Research 7, 401 - 409.
6. Aach, J. et al. (2001) Computational comparison of two draft sequences of the
human genome. Nature 409, 856 – 859.
7. Galas, J. (2001) Making sense of the sequence. Science 291, 1257