Download DNA and its Building Blocks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Biochemistry wikipedia , lookup

Biochemical cascade wikipedia , lookup

Synthetic biology wikipedia , lookup

Biomolecular engineering wikipedia , lookup

Developmental biology wikipedia , lookup

DNA-encoded chemical library wikipedia , lookup

Non-coding DNA wikipedia , lookup

Chemical biology wikipedia , lookup

Gene regulatory network wikipedia , lookup

State switching wikipedia , lookup

History of biology wikipedia , lookup

Symbiogenesis wikipedia , lookup

History of molecular biology wikipedia , lookup

Biology wikipedia , lookup

Genetics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Introduction to genetics wikipedia , lookup

Life wikipedia , lookup

Transcript
Integrated Computational
Approach for Translational
Biomedical Research
Seungchan Kim, Ph.D.
CSE, Arizona State University
and
MDTV/GenSIP, Translational Genomics Research Institute
AI @ ASU Lunch Bunch
Oct. 25, 2005
BY 510
Biomedical Problems
• Can we recognize disease subtypes?
• Can we identify molecular markers for certain
type of disease?
• Can we learn regulatory mechanism governing
cellular phenotype, i.e. disease?
• Can we find a new therapeutic target for the
treatment of disease?
• Etc.…
AI@ASU, BY510, Oct. 25, 2005
Cells: Basic Features
• All living things are made of cells.
• All cells share the same machinery for their most
basic functions.
• All cells store their hereditary information in the
same linear chemical code, stored in a doublestranded molecule, the deoxyribonucleic acid
(DNA).
• All cells replicate their hereditary information by
templated polymerization.
AI@ASU, BY510, Oct. 25, 2005
Cells: Basic Features
• All cells transcribe portions of their hereditary
information into single stranded molecules known as
ribonucleic acids (RNA).
• All cells translate RNA into protein (long polymer
chains) in the same way.
• All cells use proteins to catalyze most chemical
reactions.
• All cells function as biochemical factories dealing with
the same basic molecular building blocks.
AI@ASU, BY510, Oct. 25, 2005
Prokaryotic v. Eukaryotic
• Living organisms can be classified on the basis
of cell structure into two groups:
– Eukaryotes (plants, fungi, and animals)
– Prokaryotes (bacteria)
• Eukaryotes keep their DNA in a distinct
membrane-bounded intracellular compartment
called the nucleus.
• Prokaryotes have no distinct nuclear
compartment to house their DNA.
AI@ASU, BY510, Oct. 25, 2005
A Typical Prokaryotic Cell
© Garland Science, Molecular Biology of The Cell, 4th Edition
AI@ASU, BY510, Oct. 25, 2005
A Typical Eukaryotic Cell
© Garland Science, Molecular Biology of The Cell, 4th Edition
AI@ASU, BY510, Oct. 25, 2005
A “Simplified” Cell
•
•
The membrane is the lipid bi-layer and
associated proteins that encloses all
cells.
The nucleus is a prominent membranebounded organelle in a eukaryotic cell,
containing DNA organized into
chromosomes.
membrane
•
nucleus
chromatin
nuclear
envelope
ribosomes
•
•
•
The nuclear envelop is a double
membrane surrounding the nucleus. It
consists of an outer and inner
membrane and is perforated by nuclear
pores.
The chromatin is the complex of DNA
and various proteins that are found in
the nucleus of a eukaryotic cell. It is
the material that chromosomes are
made of.
The cytoplasm is the contents of the cell
that are contained within its plasma
membrane but, in the case of
eukaryotic cells, outside the nucleus.
The ribosomes are particles composed of
ribosomal RNAs and ribosomal
proteins that associate with messenger
RNAs and catalyze the synthesis of
protein.
AI@ASU, BY510, Oct. 25, 2005
DNA and its Building Blocks
• DNA is made from simple
subunits, called nucleotides,
each consisting of a sugar
phosphate molecule with a
nitrogen-containing sidegroup, or base, attached to it.
• The bases are of four types:
–
–
–
–
© Garland Science, Molecular
Biology of The Cell, 4th Edition
Adenine (A)
Guanine (G)
Cytosine (C)
Thymine (T)
AI@ASU, BY510, Oct. 25, 2005
DNA and its Building Blocks
© Garland Science, Molecular Biology of The Cell, 4th Edition
• A single strand of DNA consists of nucleotides joined together
by sugar-phosphate linkages.
• The individual sugar-phosphate units are asymmetric, giving the
backbone of the strand a definite directionality or polarity.
• This directionality guides the molecular processes by which the
information in DNA is interpreted and copied in cells.
AI@ASU, BY510, Oct. 25, 2005
DNA and its Building Blocks
• Through templated polymerization, the
sequence of nucleotides in an
existing DNA strand controls the
sequence in which nucleotides are
joined together in a new DNA
strand.
• Rules: {A  T} | {C  G}
• The new strand has a nucleotide
sequence that is complementary to
that of the old strand, and a
backbone with opposite
directionality.
© Garland Science, Molecular
Biology of The Cell, 4th
Edition
AI@ASU, BY510, Oct. 25, 2005
DNA and its Building Blocks
© Garland Science, Molecular Biology of The Cell, 4th Edition
• A normal DNA molecule consists of two complementary
strands.
• The nucleotides within each strand are linked by strong (covalent)
chemical bonds.
• The complementary nucleotides on opposing strands are held
together more weakly, by hydrogen bonds.
AI@ASU, BY510, Oct. 25, 2005
DNA and its Building Blocks
• The two strands twist
around each other to
form a double helix.
• This is a robust structure
that can accommodate
any sequence of
nucleotides without
altering its basic
structure.
© Garland Science, Molecular
Biology of The Cell, 4th Edition
AI@ASU, BY510, Oct. 25, 2005
DNA Replication
© Garland Science, Molecular Biology of The Cell, 4th Edition
• During the process of DNA replication, the two
strands of DNA double helix are pull apart.
• Each strand serves as a template for synthesis of a new
complementary strand by means of templated
polymerization.
AI@ASU, BY510, Oct. 25, 2005
DNA Transcription
© Garland Science, Molecular
Biology of The Cell, 4th Edition
• Each cell contains a fixed set of DNA molecules.
• A given segment of DNA serves to guide the synthesis of many
identical RNA transcripts.
• These transcripts serve as working copies of the information stored
in the DNA archive.
• Many different sets of RNA molecules can be made by
transcribing selected parts of a long DNA sequence, allowing
each cell to use its stored information differently.
AI@ASU, BY510, Oct. 25, 2005
DNA Transcription
• All RNA in a cell is made by the process of DNA transcription.
• DNA transcription is similar to DNA replication.
• It produces a single-stranded RNA molecule that is
complementary to one strand of DNA.
© Garland Science, Molecular Biology of The Cell, 4th Edition
AI@ASU, BY510, Oct. 25, 2005
Translation
• During translation, the RNA
molecules produced from
transcription are used to
guide the synthesis of
molecules of proteins.
• Proteins are long polymer
chains formed by stringing
together monomeric building
blocks (amino acids) drawn
from a standard repertoire
that is the same for all living
cells.
© Garland Science, Molecular
Biology of The Cell, 4th Edition
AI@ASU, BY510, Oct. 25, 2005
Translation
• There are only four different nucleotides in mRNA and
twenty different types of amino acids in a protein.
• Therefore, translation cannot be accounted for by a
direct one-to-one correspondence between a nucleotide
in RNA and an amino acid in protein.
• The nucleotide sequence in mRNA is read in sets of 3
nucleotides, called codons.
• Each codon corresponds to one amino acid.
• This mapping is determined by rules known as the
genetic code.
AI@ASU, BY510, Oct. 25, 2005
Genetic Codes
3L 1L codon
Ala A GCA
GCC
GCG
GCU
Arg R AGA
Arginine
AGG
CGA
CGC
CGG
CGU
Aspartic acid Asp D GAC
GAU
Arsparagine Asn N AAC
AAU
Cys C UGC
Cystein
UGU
Name
Alanine
3L 1L codon
Name
Glutamic acid Glu E GAA
GAG
Gln Q CAA
Glutamin
CAG
Gly G GCA
Glycine
GGC
GGG
GGU
His H CAC
Histidine
CAU
I AUA
Ile
Isoleucine
AUC
AUU
Leu L UUA
Leucine
UUG
CUA
CUC
CUG
CUU
3L 1L codon
Lys K AAA
AAG
Methionine Met M AUG
PhenylalaninePhe F UUC
UUU
Pro P CCA
Proline
CCC
CCG
CCU
Ser S AGC
Serine
AGU
UCA
UCC
UCG
UCU
Name
Lysine
Name
Threonine
Tryptophan
Tyrosin
Valine
STOP
3L 1L codon
Thr T ACA
ACC
ACG
ACU
Trp W UGG
Tyr Y UAC
UAU
Val V GUA
GUC
GUG
GUU
UAA
UAG
UGA
• AUG acts as both initiation codon and
codon for Methionine
* Only 20 different amino acids + STOP codes
AI@ASU, BY510, Oct. 25, 2005
Mechanisms of Translation: Initiation
© Jones and Bartlett Publishers,
Essential Genetics: A Genomics
Perspective, 3rd Edition
AI@ASU, BY510, Oct. 25, 2005
Mechanisms of Translation: Elongation
© Jones and Bartlett Publishers,
Essential Genetics: A Genomics
Perspective, 3rd Edition
AI@ASU, BY510, Oct. 25, 2005
Mechanisms of Translation: Termination
© Jones and Bartlett Publishers,
Essential Genetics: A Genomics
Perspective, 3rd Edition
AI@ASU, BY510, Oct. 25, 2005
From Gene to Protein
© Garland Science, Molecular Biology of The Cell, 4th Edition
AI@ASU, BY510, Oct. 25, 2005
Genes and Genome
• The fragment of DNA that corresponds to one protein (by
means of transcription and translation) is known as a gene.
• DNA molecules are usually very large, containing thousands of
genes, and thus specify thousands of proteins.
• In all cells, the expression of individual genes is regulated:
instead of manufacturing a full repertoire of all possible proteins
at full tilt all the time, the cell adjusts the rate of transcription
and translation of different genes independently, according to
need.
• The entire genetic information encoded in an organism is called
the genome.
AI@ASU, BY510, Oct. 25, 2005
Genotypes and Phenotypes
• The genome of an organism is different than the genome of
another organism, although many similarities may exist.
• The genetic constitution (i.e., the genome) of an organism is
called the genotype of that organism.
• The different cell types in a multi-cellular organism differ
dramatically in both structure and function.
• This is because different cell types synthesize and accumulate
different sets of RNA and protein molecules, without altering
their genotype.
• The observable character of a cell or an organism is called the
phenotype of that cell.
AI@ASU, BY510, Oct. 25, 2005
Systems’ View
• Biology is an informational science
– Systematically perturbing and monitoring biological
systems utilizing powerful new high-throughput
tools
– Creation of new computational methods for
modeling and analysis.
– The integration of discovery science (data mining)
and hypothesis-driven science (modeling &
simulation)
AI@ASU, BY510, Oct. 25, 2005
Molecular
Circuitry
of Cancer
Hahn et al., Nature
Review Cancer 2 (2002)
AI@ASU, BY510, Oct. 25, 2005
Wnt5a Signaling Pathway
A.T.Weeraratna et al., Cancer Cell 1 (2002)
AI@ASU, BY510, Oct. 25, 2005
Genome Dynamics
Ectopic
Expression
Perturbation
RNA
interference
Increased
Expression
Decreased
Expression
RNA
Transcription
DNA
Reference DNA Sequence
Sequence Variants
Gene Copy Number
CpG Methylation
Translation
Protein/DNA
Interactions
Protein/RNA
Interactions
Protein
Measurements
RNA Abundance
RNA Half-life
Protein Interactions
Protein Modification
Protein Half-life
AI@ASU, BY510, Oct. 25, 2005
Biological Data
• Genomic data
– Sequences
– SNPs
– Gene Expression
Microarrays
– CGH arrays
• Proteomic data
• Clinical data
– Patients
– Drug treatment
• Physiological data
– Diet
– Exercise
– MALDI (spectral data)
– Protein arrays
AI@ASU, BY510, Oct. 25, 2005
Gene Expression Microarrays
• It measures transcriptional activities of tens of thousands of
genes simultaneously, resulting in individual snapshots of a cell’s
transcriptional state at any given time.
• While it reflects one of the central dynamic processes of a
biological system, it does not provide an accurate picture of
other important dynamic aspects, such as the current levels of
protein abundance, or of the activation state or modification
state of extant proteins.
• To compensate for this, other measurement technologies, i.e.
protein abundance and interaction arrays, can be combined with
expression data to get a comprehensive transcription, translation,
and modification profile.
AI@ASU, BY510, Oct. 25, 2005
Single Nucleotide Polymorphisms (SNPs)
• Genome Projects: Multiple genomic sequences provide a
reference estimate of normality
• Single nucleotide polymorphisms (SNPs), small genetic changes
or variations that can occur within a person's DNA sequence,
serve as possible markers of aberration from this reference that
might indicate a disease cause or a susceptibility to disease
• Long runs of SNPs also serve to mark haplotypes, groups of
closely linked alleles that tend to be inherited together, which can
be useful for following specific chromosomal areas inherited by
affected individuals in familial genetic studies
• Several commercial platforms are currently available that survey
genomes for SNPs at intervals approaching 20kb and smaller
AI@ASU, BY510, Oct. 25, 2005
Comparative Genomic Hybridization (CGH)
• Array based CGH (aCGH), first introduced by Kallioniemi
(Science, 1992), has proven to be a high throughput and sensitive
genomic screening tool that detects DNA gains and losses with
resolution of 1.0 to 1.5 Mb using BAC arrays.
• CGH data is read as the number of copies of a chromosomal
region and array CGH provides a list of genes and genomic
elements that are overrepresented (gain) in the cell when an
amplification event occurs or underrepresented (loss) when
deletions occur.
• Currently, the application of chip based technology with highly
annotated DNA targets of 20-mer or 60-oligomer length
permits whole genome surveys in clinical specimens.
AI@ASU, BY510, Oct. 25, 2005
Computational Systems Biology
Data Mining & Pattern
Recognition
· Automated & Systematic
· Algorithmic & Computational
Biological Context
as prior knowledge
biological process
subtype of disease
Biological Data
DNA, mRNA/cDNA,
CGH, SNP
Clustering
Clinical and
Pathological
Information
treatment history,
age, gender, race,
survival, and so on
Association
studies
Candidate Biological
Components
genes
proteins
Association
studies
Integration
· Better diagnostic markers
· Better drug development
· More efficient drug
treatment
Measurements
Derived Biological
Context
biological process
subtype of disease
Pathways discovery
Modeling
Modeling
Computable
Knowledge
gene-to-gene
relationships
gene ontology
chemical database
genomic database
proteomic database
genomic database
proteomic database
Databasing
Chemistry
cooperative binding
Clinical chart/report
Perturbation
Biological
Process
In-silico
Biological
operations
Biological
operations
Prediction
Hypothetical
observation
Phenotype
observation
Mathematical and
Computational Biological
Process Models
Discrete vs. Continuous
Deterministic vs. Stochastic
In-silico
Biological
Process
Text-mining
Literature (PubMed)
Knowledge
Mining
Knowledge
Knowledge
Representation &
Mining
Model
refinement
Comp
Network Modeling &
Systems Biology
Integration
· Better treatment strategy
· New drug targets
AI@ASU, BY510, Oct. 25, 2005
Data mining & Pattern Recognition
Data Mining & Pattern
Recognition
· Automated & Systematic
· Algorithmic & Computational
Biological Context
as prior knowledge
biological process
subtype of disease
Biological Data
DNA, mRNA/cDNA,
CGH, SNP
Clustering
Clinical and
Pathological
Information
treatment history,
age, gender, race,
survival, and so on
Association
studies
Association
studies
Candidate Biological
Components
genes
proteins
Integration
Derived Biological
Context
biological process
subtype of disease
•
Unsupervised analysis: exploratory
–
–
–
–
Subtype recognition
Clustering analysis
Multi-Dimensional Scaling plot (MDS)
Contextual pattern recognition
· Better diagnostic markers
· Better drug development
· More efficient drug
treatment
Pathways discovery
•
Supervised analysis: discriminatory
– Classification of diseases
– Rank genes according to their impact
on minimizing cluster volume and
maximizing center-to-center intercluster distance
– t-test, SAM, TNoM, SVM,
Gene@Work, Strong-feature
AI@ASU, BY510, Oct. 25, 2005
Clustering & MDS: melanoma
AI@ASU, BY510, Oct. 25, 2005
RNA interference
RNAi triggered by
synthetic siRNA:
A powerful new tool for
Gene Knockdowns
In mammalian cells
D. Azorsa
AI@ASU, BY510, Oct. 25, 2005
RNAi Synthetic Lethal Phenotype Profiling of
>10,000 siRNA
Context: BxPC3 Pancreatic Cancer Isogenic Cell Lines:
DPC4 negative vs. DPC4 positive
Survival Scatter Plot
low
high
Highlighted Circles: Gene targeting events that preferentially affect the survival of the
BxPC3 DPC4/SMAD4 minus cell line
AI@ASU, BY510, Oct. 25, 2005
Network Modeling and Systems
Biology
Data Mining & Pattern
Recognition
· Automated & Systematic
· Algorithmic & Computational
Biological Context
as prior knowledge
biological process
subtype of disease
• Boolean networks
– S. A. Kauffman, 1969
– On/Off representation of the state of
genes
– Boolean networks qualitatively capture
typical genetic behavior
Biological Data
DNA, mRNA/cDNA,
CGH, SNP
Clustering
Association
studies
Candidate Biological
Components
genes
proteins
Association
studies
Measurements
Derived Biological
Context
biological process
subtype of disease
• Probabilistic Boolean
networks
Clinical and
Pathological
Information
treatment history,
age, gender, race,
survival, and so on
– Shmulevich et al.,
2002
– Stochastic extension
of Boolean network
Integration
· Better diagnostic markers
· Better drug development
· More efficient drug
treatment
Pathways discovery
• Others
Modeling
Modeling
Perturbation
Biological
Process
Biological
operations
In-silico
Biological
operations
Phenotype
observation
Prediction
Hypothetical
observation
Mathematical and
Computational Biological
Process Models
Discrete vs. Continuous
Deterministic vs. Stochastic
In-silico
Biological
Process
– Differential
Equations, Linear
Model, Bayesian
network …
Model
refinement
Comp
Network Modeling &
Systems Biology
Integration
· Better treatment strategy
· New drug targets
AI@ASU, BY510, Oct. 25, 2005
Knowledge Repository:
GO, GenMAPP, KEGG
PubMed
WNT5a
S100P
RET1
pirin
AI@ASU, BY510, Oct. 25, 2005
Knowledge Integration
• Biological database
Data Mining & Pattern
Recognition
· Automated & Systematic
· Algorithmic & Computational
Biological Context
as prior knowledge
biological process
subtype of disease
– Genomic Sequence
– Protein
– Biochemical database
Biological Data
DNA, mRNA/cDNA,
CGH, SNP
Clustering
• BioLog
Clinical and
Pathological
Information
treatment history,
age, gender, race,
survival, and so on
– PubMed literature access
logger, archival and analyzer
Association
studies
Candidate Biological
Components
genes
proteins
Association
studies
• Text- and Context-mining
Integration
· Better diagnostic markers
· Better drug development
· More efficient drug
treatment
• Knowledgebase
Measurements
–
–
–
–
Derived Biological
Context
biological process
subtype of disease
Pathways discovery
Pathways
Ontology
Protein-Protein Interaction
Gene-Gene Interaction
Biological
Process
Modeling
Modeling
Computable
Knowledge
gene-to-gene
relationships
gene ontology
chemical database
genomic database
proteomic database
genomic database
proteomic database
Databasing
Chemistry
cooperative binding
Clinical chart/report
Perturbation
In-silico
Biological
operations
Biological
operations
Mathematical and
Computational Biological
Process Models
Discrete vs. Continuous
Deterministic vs. Stochastic
In-silico
Biological
Process
• Knowledge Mining
– Literatures
– Clinical records
Prediction
Hypothetical
observation
Phenotype
observation
Text-mining
Literature (PubMed)
Knowledge
Mining
Knowledge
Knowledge
Representation &
Mining
Model
refinement
Comp
Network Modeling &
Systems Biology
Integration
· Better treatment strategy
· New drug targets
AI@ASU, BY510, Oct. 25, 2005
Knowledge Mining: Extracting Biological
Information from Global RNAi Phenotype Data
Statistically Processed
Gene List
Canonical Pathway Analysis
Acquire Current Gene
Identifiers and Information
Network Analysis
Gene Ontology Analysis
PathwayAssistTM
AI@ASU, BY510, Oct. 25, 2005
Knowledge Mining: Building Regulatory
Networks from Global RNAi Phenotypes
Doxorubicin Drug Resistance Pathway
Figure 2. Doxorubicin and Drug Resistance Molecular Interaction Network.
AI@ASU, BY510, Oct. 25, 2005
AI@ASU, BY510, Oct. 25, 2005
To be continued …