* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DNA and its Building Blocks
Survey
Document related concepts
Genetic engineering wikipedia , lookup
Biochemistry wikipedia , lookup
Biochemical cascade wikipedia , lookup
Synthetic biology wikipedia , lookup
Biomolecular engineering wikipedia , lookup
Developmental biology wikipedia , lookup
DNA-encoded chemical library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Chemical biology wikipedia , lookup
Gene regulatory network wikipedia , lookup
State switching wikipedia , lookup
History of biology wikipedia , lookup
Symbiogenesis wikipedia , lookup
History of molecular biology wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Transcript
Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational Genomics Research Institute AI @ ASU Lunch Bunch Oct. 25, 2005 BY 510 Biomedical Problems • Can we recognize disease subtypes? • Can we identify molecular markers for certain type of disease? • Can we learn regulatory mechanism governing cellular phenotype, i.e. disease? • Can we find a new therapeutic target for the treatment of disease? • Etc.… AI@ASU, BY510, Oct. 25, 2005 Cells: Basic Features • All living things are made of cells. • All cells share the same machinery for their most basic functions. • All cells store their hereditary information in the same linear chemical code, stored in a doublestranded molecule, the deoxyribonucleic acid (DNA). • All cells replicate their hereditary information by templated polymerization. AI@ASU, BY510, Oct. 25, 2005 Cells: Basic Features • All cells transcribe portions of their hereditary information into single stranded molecules known as ribonucleic acids (RNA). • All cells translate RNA into protein (long polymer chains) in the same way. • All cells use proteins to catalyze most chemical reactions. • All cells function as biochemical factories dealing with the same basic molecular building blocks. AI@ASU, BY510, Oct. 25, 2005 Prokaryotic v. Eukaryotic • Living organisms can be classified on the basis of cell structure into two groups: – Eukaryotes (plants, fungi, and animals) – Prokaryotes (bacteria) • Eukaryotes keep their DNA in a distinct membrane-bounded intracellular compartment called the nucleus. • Prokaryotes have no distinct nuclear compartment to house their DNA. AI@ASU, BY510, Oct. 25, 2005 A Typical Prokaryotic Cell © Garland Science, Molecular Biology of The Cell, 4th Edition AI@ASU, BY510, Oct. 25, 2005 A Typical Eukaryotic Cell © Garland Science, Molecular Biology of The Cell, 4th Edition AI@ASU, BY510, Oct. 25, 2005 A “Simplified” Cell • • The membrane is the lipid bi-layer and associated proteins that encloses all cells. The nucleus is a prominent membranebounded organelle in a eukaryotic cell, containing DNA organized into chromosomes. membrane • nucleus chromatin nuclear envelope ribosomes • • • The nuclear envelop is a double membrane surrounding the nucleus. It consists of an outer and inner membrane and is perforated by nuclear pores. The chromatin is the complex of DNA and various proteins that are found in the nucleus of a eukaryotic cell. It is the material that chromosomes are made of. The cytoplasm is the contents of the cell that are contained within its plasma membrane but, in the case of eukaryotic cells, outside the nucleus. The ribosomes are particles composed of ribosomal RNAs and ribosomal proteins that associate with messenger RNAs and catalyze the synthesis of protein. AI@ASU, BY510, Oct. 25, 2005 DNA and its Building Blocks • DNA is made from simple subunits, called nucleotides, each consisting of a sugar phosphate molecule with a nitrogen-containing sidegroup, or base, attached to it. • The bases are of four types: – – – – © Garland Science, Molecular Biology of The Cell, 4th Edition Adenine (A) Guanine (G) Cytosine (C) Thymine (T) AI@ASU, BY510, Oct. 25, 2005 DNA and its Building Blocks © Garland Science, Molecular Biology of The Cell, 4th Edition • A single strand of DNA consists of nucleotides joined together by sugar-phosphate linkages. • The individual sugar-phosphate units are asymmetric, giving the backbone of the strand a definite directionality or polarity. • This directionality guides the molecular processes by which the information in DNA is interpreted and copied in cells. AI@ASU, BY510, Oct. 25, 2005 DNA and its Building Blocks • Through templated polymerization, the sequence of nucleotides in an existing DNA strand controls the sequence in which nucleotides are joined together in a new DNA strand. • Rules: {A T} | {C G} • The new strand has a nucleotide sequence that is complementary to that of the old strand, and a backbone with opposite directionality. © Garland Science, Molecular Biology of The Cell, 4th Edition AI@ASU, BY510, Oct. 25, 2005 DNA and its Building Blocks © Garland Science, Molecular Biology of The Cell, 4th Edition • A normal DNA molecule consists of two complementary strands. • The nucleotides within each strand are linked by strong (covalent) chemical bonds. • The complementary nucleotides on opposing strands are held together more weakly, by hydrogen bonds. AI@ASU, BY510, Oct. 25, 2005 DNA and its Building Blocks • The two strands twist around each other to form a double helix. • This is a robust structure that can accommodate any sequence of nucleotides without altering its basic structure. © Garland Science, Molecular Biology of The Cell, 4th Edition AI@ASU, BY510, Oct. 25, 2005 DNA Replication © Garland Science, Molecular Biology of The Cell, 4th Edition • During the process of DNA replication, the two strands of DNA double helix are pull apart. • Each strand serves as a template for synthesis of a new complementary strand by means of templated polymerization. AI@ASU, BY510, Oct. 25, 2005 DNA Transcription © Garland Science, Molecular Biology of The Cell, 4th Edition • Each cell contains a fixed set of DNA molecules. • A given segment of DNA serves to guide the synthesis of many identical RNA transcripts. • These transcripts serve as working copies of the information stored in the DNA archive. • Many different sets of RNA molecules can be made by transcribing selected parts of a long DNA sequence, allowing each cell to use its stored information differently. AI@ASU, BY510, Oct. 25, 2005 DNA Transcription • All RNA in a cell is made by the process of DNA transcription. • DNA transcription is similar to DNA replication. • It produces a single-stranded RNA molecule that is complementary to one strand of DNA. © Garland Science, Molecular Biology of The Cell, 4th Edition AI@ASU, BY510, Oct. 25, 2005 Translation • During translation, the RNA molecules produced from transcription are used to guide the synthesis of molecules of proteins. • Proteins are long polymer chains formed by stringing together monomeric building blocks (amino acids) drawn from a standard repertoire that is the same for all living cells. © Garland Science, Molecular Biology of The Cell, 4th Edition AI@ASU, BY510, Oct. 25, 2005 Translation • There are only four different nucleotides in mRNA and twenty different types of amino acids in a protein. • Therefore, translation cannot be accounted for by a direct one-to-one correspondence between a nucleotide in RNA and an amino acid in protein. • The nucleotide sequence in mRNA is read in sets of 3 nucleotides, called codons. • Each codon corresponds to one amino acid. • This mapping is determined by rules known as the genetic code. AI@ASU, BY510, Oct. 25, 2005 Genetic Codes 3L 1L codon Ala A GCA GCC GCG GCU Arg R AGA Arginine AGG CGA CGC CGG CGU Aspartic acid Asp D GAC GAU Arsparagine Asn N AAC AAU Cys C UGC Cystein UGU Name Alanine 3L 1L codon Name Glutamic acid Glu E GAA GAG Gln Q CAA Glutamin CAG Gly G GCA Glycine GGC GGG GGU His H CAC Histidine CAU I AUA Ile Isoleucine AUC AUU Leu L UUA Leucine UUG CUA CUC CUG CUU 3L 1L codon Lys K AAA AAG Methionine Met M AUG PhenylalaninePhe F UUC UUU Pro P CCA Proline CCC CCG CCU Ser S AGC Serine AGU UCA UCC UCG UCU Name Lysine Name Threonine Tryptophan Tyrosin Valine STOP 3L 1L codon Thr T ACA ACC ACG ACU Trp W UGG Tyr Y UAC UAU Val V GUA GUC GUG GUU UAA UAG UGA • AUG acts as both initiation codon and codon for Methionine * Only 20 different amino acids + STOP codes AI@ASU, BY510, Oct. 25, 2005 Mechanisms of Translation: Initiation © Jones and Bartlett Publishers, Essential Genetics: A Genomics Perspective, 3rd Edition AI@ASU, BY510, Oct. 25, 2005 Mechanisms of Translation: Elongation © Jones and Bartlett Publishers, Essential Genetics: A Genomics Perspective, 3rd Edition AI@ASU, BY510, Oct. 25, 2005 Mechanisms of Translation: Termination © Jones and Bartlett Publishers, Essential Genetics: A Genomics Perspective, 3rd Edition AI@ASU, BY510, Oct. 25, 2005 From Gene to Protein © Garland Science, Molecular Biology of The Cell, 4th Edition AI@ASU, BY510, Oct. 25, 2005 Genes and Genome • The fragment of DNA that corresponds to one protein (by means of transcription and translation) is known as a gene. • DNA molecules are usually very large, containing thousands of genes, and thus specify thousands of proteins. • In all cells, the expression of individual genes is regulated: instead of manufacturing a full repertoire of all possible proteins at full tilt all the time, the cell adjusts the rate of transcription and translation of different genes independently, according to need. • The entire genetic information encoded in an organism is called the genome. AI@ASU, BY510, Oct. 25, 2005 Genotypes and Phenotypes • The genome of an organism is different than the genome of another organism, although many similarities may exist. • The genetic constitution (i.e., the genome) of an organism is called the genotype of that organism. • The different cell types in a multi-cellular organism differ dramatically in both structure and function. • This is because different cell types synthesize and accumulate different sets of RNA and protein molecules, without altering their genotype. • The observable character of a cell or an organism is called the phenotype of that cell. AI@ASU, BY510, Oct. 25, 2005 Systems’ View • Biology is an informational science – Systematically perturbing and monitoring biological systems utilizing powerful new high-throughput tools – Creation of new computational methods for modeling and analysis. – The integration of discovery science (data mining) and hypothesis-driven science (modeling & simulation) AI@ASU, BY510, Oct. 25, 2005 Molecular Circuitry of Cancer Hahn et al., Nature Review Cancer 2 (2002) AI@ASU, BY510, Oct. 25, 2005 Wnt5a Signaling Pathway A.T.Weeraratna et al., Cancer Cell 1 (2002) AI@ASU, BY510, Oct. 25, 2005 Genome Dynamics Ectopic Expression Perturbation RNA interference Increased Expression Decreased Expression RNA Transcription DNA Reference DNA Sequence Sequence Variants Gene Copy Number CpG Methylation Translation Protein/DNA Interactions Protein/RNA Interactions Protein Measurements RNA Abundance RNA Half-life Protein Interactions Protein Modification Protein Half-life AI@ASU, BY510, Oct. 25, 2005 Biological Data • Genomic data – Sequences – SNPs – Gene Expression Microarrays – CGH arrays • Proteomic data • Clinical data – Patients – Drug treatment • Physiological data – Diet – Exercise – MALDI (spectral data) – Protein arrays AI@ASU, BY510, Oct. 25, 2005 Gene Expression Microarrays • It measures transcriptional activities of tens of thousands of genes simultaneously, resulting in individual snapshots of a cell’s transcriptional state at any given time. • While it reflects one of the central dynamic processes of a biological system, it does not provide an accurate picture of other important dynamic aspects, such as the current levels of protein abundance, or of the activation state or modification state of extant proteins. • To compensate for this, other measurement technologies, i.e. protein abundance and interaction arrays, can be combined with expression data to get a comprehensive transcription, translation, and modification profile. AI@ASU, BY510, Oct. 25, 2005 Single Nucleotide Polymorphisms (SNPs) • Genome Projects: Multiple genomic sequences provide a reference estimate of normality • Single nucleotide polymorphisms (SNPs), small genetic changes or variations that can occur within a person's DNA sequence, serve as possible markers of aberration from this reference that might indicate a disease cause or a susceptibility to disease • Long runs of SNPs also serve to mark haplotypes, groups of closely linked alleles that tend to be inherited together, which can be useful for following specific chromosomal areas inherited by affected individuals in familial genetic studies • Several commercial platforms are currently available that survey genomes for SNPs at intervals approaching 20kb and smaller AI@ASU, BY510, Oct. 25, 2005 Comparative Genomic Hybridization (CGH) • Array based CGH (aCGH), first introduced by Kallioniemi (Science, 1992), has proven to be a high throughput and sensitive genomic screening tool that detects DNA gains and losses with resolution of 1.0 to 1.5 Mb using BAC arrays. • CGH data is read as the number of copies of a chromosomal region and array CGH provides a list of genes and genomic elements that are overrepresented (gain) in the cell when an amplification event occurs or underrepresented (loss) when deletions occur. • Currently, the application of chip based technology with highly annotated DNA targets of 20-mer or 60-oligomer length permits whole genome surveys in clinical specimens. AI@ASU, BY510, Oct. 25, 2005 Computational Systems Biology Data Mining & Pattern Recognition · Automated & Systematic · Algorithmic & Computational Biological Context as prior knowledge biological process subtype of disease Biological Data DNA, mRNA/cDNA, CGH, SNP Clustering Clinical and Pathological Information treatment history, age, gender, race, survival, and so on Association studies Candidate Biological Components genes proteins Association studies Integration · Better diagnostic markers · Better drug development · More efficient drug treatment Measurements Derived Biological Context biological process subtype of disease Pathways discovery Modeling Modeling Computable Knowledge gene-to-gene relationships gene ontology chemical database genomic database proteomic database genomic database proteomic database Databasing Chemistry cooperative binding Clinical chart/report Perturbation Biological Process In-silico Biological operations Biological operations Prediction Hypothetical observation Phenotype observation Mathematical and Computational Biological Process Models Discrete vs. Continuous Deterministic vs. Stochastic In-silico Biological Process Text-mining Literature (PubMed) Knowledge Mining Knowledge Knowledge Representation & Mining Model refinement Comp Network Modeling & Systems Biology Integration · Better treatment strategy · New drug targets AI@ASU, BY510, Oct. 25, 2005 Data mining & Pattern Recognition Data Mining & Pattern Recognition · Automated & Systematic · Algorithmic & Computational Biological Context as prior knowledge biological process subtype of disease Biological Data DNA, mRNA/cDNA, CGH, SNP Clustering Clinical and Pathological Information treatment history, age, gender, race, survival, and so on Association studies Association studies Candidate Biological Components genes proteins Integration Derived Biological Context biological process subtype of disease • Unsupervised analysis: exploratory – – – – Subtype recognition Clustering analysis Multi-Dimensional Scaling plot (MDS) Contextual pattern recognition · Better diagnostic markers · Better drug development · More efficient drug treatment Pathways discovery • Supervised analysis: discriminatory – Classification of diseases – Rank genes according to their impact on minimizing cluster volume and maximizing center-to-center intercluster distance – t-test, SAM, TNoM, SVM, Gene@Work, Strong-feature AI@ASU, BY510, Oct. 25, 2005 Clustering & MDS: melanoma AI@ASU, BY510, Oct. 25, 2005 RNA interference RNAi triggered by synthetic siRNA: A powerful new tool for Gene Knockdowns In mammalian cells D. Azorsa AI@ASU, BY510, Oct. 25, 2005 RNAi Synthetic Lethal Phenotype Profiling of >10,000 siRNA Context: BxPC3 Pancreatic Cancer Isogenic Cell Lines: DPC4 negative vs. DPC4 positive Survival Scatter Plot low high Highlighted Circles: Gene targeting events that preferentially affect the survival of the BxPC3 DPC4/SMAD4 minus cell line AI@ASU, BY510, Oct. 25, 2005 Network Modeling and Systems Biology Data Mining & Pattern Recognition · Automated & Systematic · Algorithmic & Computational Biological Context as prior knowledge biological process subtype of disease • Boolean networks – S. A. Kauffman, 1969 – On/Off representation of the state of genes – Boolean networks qualitatively capture typical genetic behavior Biological Data DNA, mRNA/cDNA, CGH, SNP Clustering Association studies Candidate Biological Components genes proteins Association studies Measurements Derived Biological Context biological process subtype of disease • Probabilistic Boolean networks Clinical and Pathological Information treatment history, age, gender, race, survival, and so on – Shmulevich et al., 2002 – Stochastic extension of Boolean network Integration · Better diagnostic markers · Better drug development · More efficient drug treatment Pathways discovery • Others Modeling Modeling Perturbation Biological Process Biological operations In-silico Biological operations Phenotype observation Prediction Hypothetical observation Mathematical and Computational Biological Process Models Discrete vs. Continuous Deterministic vs. Stochastic In-silico Biological Process – Differential Equations, Linear Model, Bayesian network … Model refinement Comp Network Modeling & Systems Biology Integration · Better treatment strategy · New drug targets AI@ASU, BY510, Oct. 25, 2005 Knowledge Repository: GO, GenMAPP, KEGG PubMed WNT5a S100P RET1 pirin AI@ASU, BY510, Oct. 25, 2005 Knowledge Integration • Biological database Data Mining & Pattern Recognition · Automated & Systematic · Algorithmic & Computational Biological Context as prior knowledge biological process subtype of disease – Genomic Sequence – Protein – Biochemical database Biological Data DNA, mRNA/cDNA, CGH, SNP Clustering • BioLog Clinical and Pathological Information treatment history, age, gender, race, survival, and so on – PubMed literature access logger, archival and analyzer Association studies Candidate Biological Components genes proteins Association studies • Text- and Context-mining Integration · Better diagnostic markers · Better drug development · More efficient drug treatment • Knowledgebase Measurements – – – – Derived Biological Context biological process subtype of disease Pathways discovery Pathways Ontology Protein-Protein Interaction Gene-Gene Interaction Biological Process Modeling Modeling Computable Knowledge gene-to-gene relationships gene ontology chemical database genomic database proteomic database genomic database proteomic database Databasing Chemistry cooperative binding Clinical chart/report Perturbation In-silico Biological operations Biological operations Mathematical and Computational Biological Process Models Discrete vs. Continuous Deterministic vs. Stochastic In-silico Biological Process • Knowledge Mining – Literatures – Clinical records Prediction Hypothetical observation Phenotype observation Text-mining Literature (PubMed) Knowledge Mining Knowledge Knowledge Representation & Mining Model refinement Comp Network Modeling & Systems Biology Integration · Better treatment strategy · New drug targets AI@ASU, BY510, Oct. 25, 2005 Knowledge Mining: Extracting Biological Information from Global RNAi Phenotype Data Statistically Processed Gene List Canonical Pathway Analysis Acquire Current Gene Identifiers and Information Network Analysis Gene Ontology Analysis PathwayAssistTM AI@ASU, BY510, Oct. 25, 2005 Knowledge Mining: Building Regulatory Networks from Global RNAi Phenotypes Doxorubicin Drug Resistance Pathway Figure 2. Doxorubicin and Drug Resistance Molecular Interaction Network. AI@ASU, BY510, Oct. 25, 2005 AI@ASU, BY510, Oct. 25, 2005 To be continued …