Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Comparative genomic hybridization wikipedia , lookup

Gene regulatory network wikipedia , lookup

List of types of proteins wikipedia , lookup

DNA sequencing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Exome sequencing wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene expression wikipedia , lookup

Molecular cloning wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular evolution wikipedia , lookup

Community fingerprinting wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Chapter – 20: Genomics and Global Screening
Chapter - 20 Outline

Definition

History

Open eading frame (ORF)

First Genome sequence

Vectors for genome sequencing
o
YAC
o
BAC

Sequence Tagged Sites

Mapping with STSs

Shotgun-sequencing method

GeneEngine platform

Differential Display-PCR

Serial analysis of gene expression(SAGE):

DNA microarrays

Impact on Bioinformatics
Definition
Genome is defined as the collection of all genes in an organism. The genome includes both the
genes and the non-coding sequences of the DNA. Hans Winkler, Professor of Botany at the
University of Hamburg, Germany first adopted the term in 1920.
Genomics is the study of genome and role of genes, alone and together, in directing life. The
field includes intensive efforts to determine the entire DNA sequence of organisms and finescale genetic mapping efforts. Genomics was established by Fred Sanger when he first
sequenced the complete genomes of a virus and a mitochondrion. His group established
techniques of sequencing, genome mapping, data storage, and bioinformatic analyses in the
1970-1980s. The most important tool is microarrays in genomics. The first DNA-based genome
to be sequenced in its entirety was that of bacteriophage Φ-X174(5,368 bp), sequenced by
Frederick Sanger in 1977. The first free-living organism to be sequenced was that of
Haemophilus influenzae (1.8 Mb) in 1995, and since then genomes are being sequenced at a
rapid pace.
Functional genomics is a field of molecular biology that uses the vast wealth of data produced
by genomic projects (such as genome sequencing projects) to describe gene (and protein)
functions and interactions. Functional genomics attempts to answer questions about the
function of DNA at the levels of genes, RNA transcripts, and protein products. It attempts to
answer questions such as gene transcription, translation and protein-protein interaction.The
goal of functional genomics is to understand the relationship between an organism's genome
and its phenotype. Functional genomics uses mostly multiplex techniques to measure the
abundance of many or all gene products such as mRNAs or proteins within a biological sample
to quantitate the various biological processes and improve our understanding of gene and
protein functions and interactions.
Global gene screening uses the techniques like Differential display-PCR (dd-pcr), Serial
analysis of gene expression(sage) and DNA array technology to analyze DNA samples to
detect the presence of a gene or genes associated with an inherited , Analyzing DNA to look for
a genetic alteration that may indicate an increased risk for developing a specific disease or
disorder
History
After the sequence of the phase x174 by Sanger in 1977, in 1995 Craig Ventor and Hamilton
Smith sequenced the first genome of the free-living organism: Haemophilus Influenza and
Mycoplasma genitalium. Haemophilus Influenza contained 1,830,137 bp and was the first to be
completely sequenced. In 1996, the baker’s yeast containing the 12-million-bp was sequenced.
Also in 1996, the first genome of an organism of the third domain of life, the archaea was
sequenced. In 1997, E.coli genome containing 4.6-million-bp was sequenced. In 1998, the first
animal genome, Carnorhabditis elegans was sequenced.
Figure-1: DNA Sequence Trace
Milestones in Genomic Sequencing
The Human Genome Project was a 13 year old mega project, that was launched in the year
1990 and completed in 2003. The human genome project international consortium announced
the publication of a draft sequence and analysis of the human genome—the genetic blueprint
for the human being. An American company—Celera, led by Craig Venter and the other huge
international collaboration of distinguished scientists led by Francis Collins, director, National
Human Genome Research Institute, U.S., both published their findings. In 2000, the first rough
draft of the human genome was completed. The genome has been completely sequenced using
the definition employed by the International Human Genome Project. A graphical history of the
human genome project shows that most of the human genome was complete by the end of
2003. The mouse genome was completely sequenced in 2002. And by the end of 2006, 453
completer genomes have been sequenced. To read the complete 3.2 billion base pairs it would
require 60 years of 8 hours per day at 5 bp per second.
Open Reading Frame (ORF)
An open reading frame is a sequence of bases that if translated in one frame, contains no stop
codons for a relatively long distance-long enough to code for one of the phase proteins. The
open reading frame usually starts with an ATG(or occasionally a GTG)triplet, corresponding to
an AUG (GUG) translation initiation codon, and end with a stop codon(UAG, UAA, OR UGA).
An open reading frame is the same as a gene’s coding region. In a gene, ORFs are located
between the start-code sequence (initiation codon) and the stop-code sequence (termination
codon). ORFs are usually encountered when shifting through pieces of DNA while trying to
locate a gene. Since there exist variations in the start-code sequence of organisms with altered
genetic code, the ORF will be identified differently. the DNA sequence can be read in six
reading frames in organisms with double-stranded DNA; three on each strand. The longest
sequence without a stop codon usually determines the open reading frame. the base sequence
of the phage DNA also tells us the amino acid sequence of all phage proteins. One uses the
genetic code to translate the DNA base sequence of each of the reading frame into the
corrosponding amino acid sequences.
First Genome Sequence
The first genome that was sequenced was E.Coli phase called x174 by Sanger in 1977 which
had 5375 nt sequenced. the analysis of the open reading frame of the x174 phase revealed
that some of the phase genes overlap. in the picture below, we see that the coding region of
gene B lies within the region of gene A and the coding region of gene E lies within the region of
gene D. Even though the genes occupy the same region, they code for different proteins
because they encounter different codons for their reading frames.
Figure-2: Phage fX174: Fred Sanger, 1977 5357 nt (1st genome sequenced)
(a) Each letter is a gene. (b) Overlapping reading frame: only the non-template sequence
(coding or sense strand) shown.
Genome sequencing:
Genome sequencing is the process that determines the complete DNA sequence of an
organism’s genome at a single time. Biological samples such as saliva, epithelial cells, bone
marrow, hair or anything else that have DNA-containing cells can provide the genetic material
necessary for full genome sequencing. Large-scale sequencing aims at sequencing very long
DNA pieces, such as whole chromosomes. Common approaches consist of cutting (with
restriction enzymes) or shearing (with mechanical forces) large DNA fragments into shorter
DNA fragments which are then cloned and individually sequenced and then out together. The
two approaches to sequence human genome are clone-by-clone approach and shot-gun
approach.
Figure-3: Genome Sequencing
Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from
individual bacterial clones is sequenced and the sequence is assembled by using overlapping
DNA regions.
Clone-by-Clone Approach: map then sequence
Figure-4: Cole-by-clone Approach
During this approach, DNA is mapped first and followed by sequencing. The chromosomes
were mapped and then split up into sections. A rough map was drawn for each of these
sections, and then the sections themselves were split into smaller bits, with plenty of overlap
between each of the bits. Each of these smaller bits would be sequenced, and the overlapping
bits would be used to put the genome jigsaw back together again. Since every DNA sequence is
derived from a known region, it is relatively easy to keep track of the project and to determine
where there are gaps in the sequence. Moreover, assembly of relatively short regions of DNA is
an efficient step. However, mapping can be a time-consuming, and costly, process. This
process uses yeast chromosomal vector or bacterial chromosomal vector for cloning. This
method was used and invented by Francis Collin.
Shotgun Approach: sequence then map
Figure-5: Shotgun Approach
The alternative to the clone-by-clone approach is the shotgun sequencing, developed by Fred
Sanger in 1982. First, all the DNA is first broken into fragments. The fragments are then
sequenced at random and assembled together by looking for overlaps. The advantage of the
whole-genome shotgun is that it requires no prior mapping. Its disadvantage is that large
genomes need vast amounts of computing power and sophisticated software to reassemble the
genome from its fragments. Reassembling these sequenced fragments requires huge
investments in IT, and, unlike the clone-by-clone approach, assemblies can't be produced until
the end of the project. . This process uses bacterial chromosomal vector for cloning. This
method was used and invented by Craig Venter.
Vectors for Genome Sequencing
Yeast artificial chromosome(YAC):
Yeast artificial chromosome(YAC) was useful in mapping human genome because they could
hold hundreds of thousands of kilobases each. It contains a left and a right yeast chromosomal
telomere, which are both necessary for yeast chromosomal replication and a yeast centromere,
which is necessary for segregation of sister chromatids to the opposite poles of the dividing
yeast cell. The centromere is placed adjacent to the left telomere and a huge piece of DNA can
be placed between he centromere and the right telomere. The large pieces of DNA are
prepared with by digesting long pieces of DNA with restricting enzyme. Then the YAC’s with
their DNA inserts are placed into yeast cells, where they replicate as normal yeast cells.
Figure-6: Yeast artificial chromosome
Yeast artificial chromosome (YAC), Cloning in yeast artificial chromosomes (yellow, telomere;
red, C, centromere; L, left arm; R, right arm with telomere; blue, large piece of foreign gene,
several hundred kb). YAC can replicate in yeast and take a million bp insert.
Even though it was useful for human genome mapping, it had the disadvantages such as
inefficient, unstable, cloning efficiency low, hard to isolate from yeast. thus to solve the
problems, scientist started to use bacterial artificial chromosome.
Bacterial artificial chromosome(BAC):
Bacterial artificial chromosome(BAC) solved the problems that arise with the YAC’s and this
was the choice for the sequencing phase of the human genome project. they are based on the F
plastid inhibited in the E.Coli cells. this plastid allows conjugation between bacterial cells and
can be transferred from a F+ cell, the donor cell to F cell, a recipient cell, thus converting the
recipient to a F+ cell. An small piece of host DNA can be transferred in to the F plasmid or the F
plasmid can insert into the host chromosome and mobilize the host chromosome to pass from
the donor to the recipient cell plasmid can accommodate large inserts of DNA. BAC, bacterial
artificial chromosome, takes about 300,000 bp, became the vector of choice for human genome
project. Developed by Melvin Simon, 1992. Par genes govern the distribution of plasmids into
the daughter cells that keep the plasmid copy at 2 per cell. So, it is stable.
Figure-7:Bacterial artificial chromosome
The figure shows the first BAC’s developed by the Melvin Simon and colleagues in 1992. It has
the cloning sites HindIII and BamHI, at top; the chloramphenicol resistance gene(CmR), used as
a selection tool; the origin of replication(oriS); and the genes governing partition of plasmids to
daughter cells (ParA and ParB).
Sequence Tagged Sites
The Sequence-Tagged Site (STS) is a relatively short, easily polymerase chain reaction (PCR)amplified sequence (200 to 500 bp) which can be specifically amplified by polymerase chain
reaction (PCR) and detected in the presence of all other genomic sequences and whose
location in the genome is mapped. One needs to know enough of the DNAs equence in the
region being mapped to design short primers that will hybridize a few hundred basee pairs
apart and cause amplification of predictabl;e length of DNA in between.STSs can be easily
detected by the polymerase chain reaction (PCR) using these two primers. If the proper size
amplified DNA fragment appears, then the unknown DNA has the STS of interest. It also must
hybridize a specific number of base pairs to give the right size of PCR fragment ahich provides
a check on the specificity of hybridization. For this reason they are useful for constructing
genetic and physical maps from sequence data reported from many different laboratories. They
serve as landmarks on the developing physical map of a genome. They are used in shotgun
sequencing, specifically to aid sequence assembly. The advantage of STSs over other mapping
landmarks is that the means of testing for the presence of a particular STS can be completely
described as information in a database: anyone who wishes to make copies of the marker would
simply look up the STS in the database, synthesize the specified primers, and run the PCR
under specified conditions to amplify the STS from genomic DNA.
Figure-8: Sequence-Tagged Site (STS)
Here, we start with long pieces of DNA extending indefinitely in either direction. once the
sequence of small areas of the DNA are known, we design primers that will hybridize this
regions and allow PCR to produce double stranded fragments of predictable length. Here the
PCR primer of 250 bp apart have been used. Several cycles of the PCR generate many double
stranded PCR products that are precisely 250 bp long. Electrophoresis of this product allows
one to measure its size exactly and confirm that it is the correct one.
Mapping With STSs
STS’s are useful in physical mapping or locating specific sequences in a genome.
Microsatellites are a developed class of STSs that are highly polymorphic. Microsatellites are
repeating sequences of 2-6 base pairs of DNA. Microsatellites are typically neutral and codominant. They are used as molecular markers in genetics, for kinship, population and other
studies. They can also be used to study gene duplication or deletion. The most common way to
detect microsatellites is to design PCR primers that are unique to one locus in the genome and
that base pair on either side of the repeated portion. Therefore, a single pair of PCR primers will
work for every individual in the species and produce different sized products for each of the
different length microsatellites.
In the picture below, at the top left several representative of the BACs are shown, with different
symbols representing different STSs placed at specific intervals. In step (a) screen for two or
more widely spaced STSs-STS1 and STS4. all the BACs that contain them are shown on the
top right. The identified STSs are shown in color. In step (b), each the positive BACs are
screened for more STSs-STS2, STS3 and STS5. the colored symbols on the BACs in the
bottom right denotes the STS detected in each BAC.In step (c), align the STSs in each BAC to
form a contig, a overlapping DNAs spanning long distances. Measuring the lengths of the BACs
by pulsed-field gel electrophoresis helps to pin down the spacing between pairs of BACs.
Figure-9: Mapping with STSs
Shotgun-sequencing Method
First proposed by Craig Venter, Hamilton Smith and Leroy Hood in 1996, focuses on the
sequencing stage and then mapping., it starts with a BAC clone with very large inserts,
averaging about 150 kb. The inserts in each BACs are sequenced on both ends using an
automated sequencer that can easily read about 500bases at a time, so 500 bases at each end
of the clone will be determined. these 500-base sequences serve as an identity tag, called a
sequence-tagged connecter(STC) for each BAC clone. Following, each clone is fingerprinted by
digesting with a restricting enzyme to determine the insert size and to eliminate the aberrant
clones whose fragmentation patterns for not fit the consensus of the overlapping clones. Then
we subdivide the BACs into smaller clones in a pUC vector with inserts averaging only about 2
kb. this whole BAC sequence allows the identification of the 30 or more BACs that overlap with
the seed. Next, one selects the BAC with minimal overlap and proceed to sequence them. This
process is repeated with other BACs with minimal overlap with the second set. This process
also known as BAC walking allow one laboratory to sequence the whole human genome. In
summary, one assembles libraries of clones with different size inserts, then sequences the
inserts at random. this method relies on the computer program to find areas of overlap among
the sequences and piece them together. In the picture below, (a) chromosomes are cloned into
a BAC vector, yielding a collection of 300,000 BAC clones. A 96-well microtiter plate is shown
with 96 of the clones. (b) a seed BAC is selected for sequencing. (c) the seed is subcloned into
a plasmid vector, yielding a plasmid library.(d) three thousand of the plasmid clones are
sequenced, and the sequences are ordered by their overlaps, producing the sequence of the
whole 150-kb BAC.(e)find the BACs with the overlapping STCs, compare them with
fingerprinting with minimal overlaps and sequence them. this process known as BAC walking
creates contig of the whole chromosome.
Figure-10: Shotgun sequencing
GeneEngine™ Platform
The process of direct analysis by the GeneEngine™ platform begins with the isolation of target
material (DNA, RNA or protein) from a biological source, followed by fluorescently tagging the
sample material at specific sites of interest (e.g. a nucleotide sequence motif or protein epitope).
The sample is then injected into the nano-fluidic system of the GeneEngine™ Instrument, and
the sample passes through an interrogation region consisting of several laser spots. Each
molecule is detected by the laser excitation of the fluorescent tags on the molecule. Thousands
of molecules pass through the system per minute; for DNA analysis, this represents a
throughput of 10-30 million base pairs per second. Thousands of molecules pass through the
system per minute; for DNA analysis, this represents a throughput of 10-30 million base pairs
per second. The Trilogy™ technology combines advances in nanofluidics, optical engineering,
and novel labeling strategies with life science applications in research, drug discovery and
development, and diagnostics. The first applications of U.S. Genomics’ platform include direct
detection and analysis of RNA, small RNAs (siRNA, miRNA, etc), and protein molecules as well
as analyses of the molecules’ interactions.
Differential Display-PCR
Differential display, also known as DD-PCR, is the technique where one can identify and
analyze altered gene expression at the mRNA level. It can also be used to indentify genes those
are suppressed or induced. In this technique, one can analyze two or more samples to study the
gene expression patterns. these samples can be obtained from any eukaryotic organism,
including plants, fish, amphibians, reptiles, insects, yeast, fungi and mammals. In this technique
one uses the limited number of primers to systematically amplify and visualize most of the
mRNA in a cell. It is one of the most commonly used techniques for identifying differentially
expressed genes at the mRNA level. It was first designed by Liang and Pardee (Science 257,
967,1992) and Welsh et al. (NAR, 20, 4965,1992) and the goal is to display all of mRNAs of a
cell. The method depends on PCR and PAGE. The advantages of this method is that its
simplicity, the ability to monitor the process at several stages, the requirement for total RNA, the
fact that results can be evaluated side-by-0side comparison to polyacrylamide gels and that it
yields CDNA fragme4nts that can be easily sequenced and identified. The total overview of the
method as follows:
Treat cells or tissue→Collect RNA→Treat RNA with DNase → Split RNA into aliquots and
perform reverse transcription reaction on each using a different primer→Perform PCR using
cDNA subsets as template with specific primer together with an arbitrary primer →Load PCR
reactions on sequencing gene →I identify induced or inhibited genesRepeat experiments to
confirm resultsExcise band from gelReamplify CDNA using same PCR conditionsUse
PCR products as a probe in Northern blot clone CDNAs that are positive in northern
blotscreen clones to identify unique speciesidentify clones that works in the
NorthernSequence clone to obtain full length cDNA.
Figure-11: The Differential Display-PCR method of analyzing samples
Serial analysis of gene expression(SAGE)
SAGE (Serial analysis of gene expression) is an alternate method of gene expression analysis
based on RNA sequencing rather than hybridization. SAGE relies on the sequencing of 10-17
base pair tags which are unique to each gene. These tags are produced from poly-A mRNA and
ligated end-to-end before sequencing. It was originally developed by Dr. Victor Velculescu at the
Oncology Center of Johns Hopkins University and published in 1995. SAGE is a powerful tool
for the analysis of gene expression. It does not require a preexisting clone, can identify and
quantify known and new genes and can pick up low-abundance transcripts. Tags produced by
SAGE can be identified using high-throughput sequencing and carries enough information to
uniquely identify each mRNA transcript. Data can be searched on the SAGEmap database
(www.ncbi.nlm.nih.gov/SAGE) and the information can then be analyzed and stored for future
analysis.
Briefly, SAGE experiments proceed as follows:
•
•
•
•
Isolate the mRNA of an input sample
Extract a small chunk of sequence from a defined position of each mRNA molecule.
Link these small pieces of sequence together to form a long chain
Clone these chains into a vector which can be taken up by bacteria.
•
•
Sequence these chains using modern high-throughput DNA sequencers.
Process this data with a computer to count the small sequence tags.
Figure-12: SAGE Technique
Although SAGE was originally conceived for use in cancer studies, it has been successfully
used to describe the transcriptome of other diseases and in a wide variety of organisms.
Genzyme Molecular Oncology (GMO) provided research support to KWK and has licensed the
SAGE technology1 from The Johns Hopkins University for commercial purposes; the technology
is freely available to academia for research purposes. Invitrogen has subsequently sublicensed
the SAGE technology from GMO for the purpose of providing a SAGE kit. The University and
researchers (VEV, LZ, BV, KWK) have a financial interest in GMO, the arrangements for which
are managed by the University in accordance with its conflict of interest policies.
DNA Microarray
DNA microarray evolved from Southern blotting is a technique where fragmented DNA is
attached to a substrate and then probed with a known gene or fragment. It is a device that
allows for DNA to be bound to it for analysis with homologous cDNA or RNA. It measure the
amount of mRNA in a sample that corresponds to a given gene or probe DNA sequence. The
pbrobe sequences are immobilized on a solid surface and then hybridize with fluorescentlylabeled “target” mRNA. To measure the abundance of that mRNA sequence in the sample, one
has to find out the intensity of the fluorescence of a spot is proportional to the amount of target
sequence that has hybridized to that spot. It was first used in 1995 for gene expression profiling
and a complete genome was published in 1997. Two main types of array: ‘microarray’ and ‘DNA
chip’, depending on how nucleotide sequences are put onto the chip. Microarrays use presynthesized DNA (about 100 bases) for probing, whereas DNA chips use in situ synthesized
oligonucleotide probes (25 bases). More recently, types of array are distinguished by the
amount of genes that can be measured, since DNA chips allow for increased number of probes.
An oligonucleotide is a short nucleic acid polymer, typically with twenty or fewer bases.
Microarray works by putting a large number (upto 100,000 or more) of cDNA sequences or
synthetic DNA oligomers onto a glass slide (or other subtrate) in known locations on a grid.
Label an RNA sample and hybridize and measure amounts of RNA bound to each square in the
grid. Finally make comparisons between cancerous and normal tissue, treated and untreated
and the time course. DNA microarrays can be used to measure changes in gene expression,
mutation Detection (single base, such as one type of diabetes), polymorphism analysis,
mapping (locating genes within chromosomes), evolutionary Studies (identifying common
ancestors), pharmacogenomics (the search for therapeutic responses to drugs given the genetic
profiles of patients).
Figure-13: Chip making
It is a schematic diagram of a DNA microarray. This drawing represents a standard 1inch by
3 inch glass microscope slide with an array of 5808 tiny spots of DNA. Each dot is 200
micrometer in diameter and the distance between the dot centers is 40 micrometer. It is
possible to place more than 10,000 spots on a slide of this size.
Figure-14: Growing Oligonucleotides on a Glass Substrate
the glass is coated with a refractive group that is blocked with a photosensitive agent(red).
The blocking agent can be removed with light and the thus the parts of the plate is
unmasked(blue) and light can go through. In the first cycle four spots are masked and thus
light can only reach two of the unmasked spots. The unblocked spots are chemically
coupled with guanosine(G) nucleotide. During the second cycle, three spots are masked
and protected from light. While the other three are unmasked, including a spot form the first
cycle and light reaches them. these spots are chemically coupled with adenosine
nucleotide(A). Thus the spot going through the two cycles will have G-A nucleotide. In this
patter, the cycles are repeated over and over again with different nucleotides.
Figure-15: Creating DNA microarray
Figure-16: An example of the results of
DNA microarray
DNA microarrays are created by robotic machines that arrange minuscule amounts of
hundreds or thousands of gene sequences on a single microscope slide. Researchers have
a database of over 40,000 gene sequences that they can use for this purpose. When a gene
is activated, cellular machinery begins to copy certain segments of that gene. The resulting
product is known as messenger RNA (mRNA), which is the body's template for creating
proteins. The mRNA produced by the cell is complementary, and therefore will bind to the
original portion of the DNA strand from which it was copied. To determine which genes are
turned on and which are turned off in a given cell, a researcher must first collect the
messenger RNA molecules present in that cell. The researcher then labels each mRNA
molecule by attaching a fluorescent dye. Next, the researcher places the labeled mRNA
onto a DNA microarray slide. The messenger RNA that was present in the cell will then
hybridize - or bind - to its complementary DNA on the microarray, leaving its fluorescent tag.
A researcher must then use a special scanner to measure the fluorescent areas on the
microarray.
Figure-17: DNA microarray and gene expression
Hierarchical cluster analysis of normal tissue specimens. (a) Thumbnail overview of the two-way
hierarchical cluster of 115 normal tissue specimens (columns) and 5,592 variably-expressed
genes (rows). Mean-centered gene expression ratios are depicted by a log2 pseudocolor scale
(ratio fold-change indicated); gray denotes poorly-measured data. Selected gene-expression
clusters are annotated. The dataset represented here is available as Additional data file 2. (b)
Enlarged view of the sample dendrogram. Terminal branches for samples are color-coded by
tissue type.
Shyamsundar et al. Genome Biology 2005 6:R22 doi:10.1186/gb-2005-6-3-r22
The two-way unsupervised analysis also identified clusters of coexpressed genes which
represented both tissue-specific structures and systems and coordinately regulated cellular
processes. For example, on the basis of the shared characteristics of well annotated genes in
the clusters, we identified clusters representing cell proliferation mitochondrial ATP production,
mRNA processing, protein translation and endoplasmic reticulum-associated protein
modification and secretion. Interestingly, proliferation, mitochondrial ATP production and protein
translation were each represented by two distinct clusters of genes, suggesting that subsets of
these functions might be differentially regulated among different tissues. One gene cluster
corresponded to sequences on the mitochondrial chromosome; we interpret this feature to
reflect the relative abundance of mitochondria in each tissue sample.
Impact on Bioinformatics
Bioinformatics combines the application of computer science to molecular biology. The term
was first introduced by Paulien Hogeweg in 1979 for the study of informatic processes in biotic
systems.itys application has been in genomics and genetics, particularly in those areas of
genomics involving large-scale DNA sequencing.Genomics produces high-throughput, highquality data, and bioinformatics provides the analysis and interpretation of these massive data
sets. It is impossible to separate genomics laboratory technologies from the computational tools
required for data analysis. Bioinformatics now entails the creation and advancement of
databases, algorithms, computational and statistical techniques, and theory to solve formal and
practical problems arising from the management and analysis of biological data. Common
activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning
different DNA and protein sequences to compare them and creating and viewing 3-D models of
protein structures.
References:
www.yourgenome.org/hgp/hgp2/hgp_5.shtml
www.ncbi.nlm.nih.gov/projects/genome/probe/doc/TechSTS.shtml\
www.fass.org/fass01/pdfs/kemppainen.pdf
http://genomebiology.com/2005/6/3/R22
www.wikipedia.org