* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Bioinformatics: A New Frontier for Computer - People
Gene nomenclature wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Interactome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genetic code wikipedia , lookup
Gene desert wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Proteolysis wikipedia , lookup
Genomic imprinting wikipedia , lookup
Molecular ecology wikipedia , lookup
Biochemistry wikipedia , lookup
Transposable element wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Expression vector wikipedia , lookup
Genetic engineering wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Community fingerprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene expression profiling wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Molecular evolution wikipedia , lookup
Gene regulatory network wikipedia , lookup
Bioinformatics: A New Frontier for Computer Scientists Ruth G. Alscher Lenwood S. Heath The Language of the New Biology A new language has been created. Words in the language that are useful for today’s talk. Genomics Functional Genomics Proteomics cDNA microarrays Global Gene Expression Patterns Human Genome Project • • • • How many individuals? Which races? Statistics about sequencing Etc. (Ruth) New Computational Tools Needed for Biology • • • • • • • • Sequencing Analyzing experimental data Representing vast quantities of information Searching Pattern matching Data mining Gene discovery Function discovery Molecular Biology • Cell function • Nucleic acids, DNA, RNA, chromosomes, genes • Amino acids, proteins DNA Strand A= adenine complements T= thymine C = cytosine complements G=guanine Complementary DNA Strands Double-Stranded DNA RNA Strand U=uracil replaces T= thymine Proteins Unlike DNA, proteins have three-dimensional structure Protein folds to a three-dimensional shape that minimizes energy Amino Acids • Protein is a large molecule that is a chain of amino acids (100 to 5000). • There are 20 common amino acids (Alanine, Cysteine, …, Tyrosine) • Three bases --- a codon --- suffice to encode an amino acid. • There are also START and STOP codons. Chromosomes • Long molecules of DNA: 10^4 to 10^8 base pairs • 26 matched pairs in humans • A gene is a subsequence of a chromosome that encodes a protein. • Proteins associated with regulation. • Only a fraction of the genes are in use at any time. • Every gene is present in every cell. Cell’s Fetch-Execute Cycle • Stored Program: DNA, chromosomes, genes • Fetch/Decode: RNA, ribosomes • Execute Functions: Proteins --- oxygen transport, cell structures, enzymes • Inputs: Nutrients, environmental signals, external proteins • Outputs: Waste, response proteins, enzymes Evolution • Genotype: Genetic makeup of individuals or species • Mutations are basis for evolution of species • Phenotype: Perceived traits of organism (eye color, number of limbs, etc.); controlled by interaction of many genes Genetics • An individual organism has some set of genes, stored in DNA of each cell. • Gene set determines biological functions and individual characteristics. • Genetic makeup of a particular species defines that species. Protein-Coding Genes Genomics Genomics: Discovery of genetic sequences and the ordering of those sequences into individual genes, into gene families, and into chromosomes. Identification of sequences that code for gene products/proteins and sequences that act as regulatory elements. Functional Genomics Functional Genomics: The biological role of individual genes, mechanisms underlying the regulation of their expression, and regulatory interactions among them. Biologists Need Computer Scientists •Assembling DNA fragments •Physical mapping •Identifying genes and gene families •Protein folding •Determining protein function •Data analysis (microarrays) •Data visualization •Searching •Sequence alignment •Data mining Microarray Data Analysis How to use microarrays to learn more about the influence of drought stress on gene expression? Where the biologists need the computer scientists. A. Confounding factors in the raw data 1. Limitations in accuracy (technique) 2. Biological variation (individuals) B. How to apply corrections for these confounding factors to maximize the predictive power of the data. C. Modeling regulatory networks. Effects of drought stress on loblolly pine- a pilot experiment. 1999- Effects of Drought Stress Virginia Tech: Plant Biologists: Ruth Alscher, Boris Chevone. CS: Lenny Heath and colleagues. Statistics: Ina Hoeschele, Shun-Hwa Li. NC State (Forest Biotechnology): Ying-Hsuan Sun, Ron Sederoff, Ross Whetten Relative Abundance Relative Abundance Detection Detection Detection Treatment 1 1 1 Control 1 2 2 3 3 3 3 2 2 Mix Spots: 1 2 (Sequences affixed to slide) 3 Hybridization 1 2 3 Biological Variation as Reflected in A Comparison of Expression in Two Trees of the Same Clone. A Subquadrant Biological Variation Iterative strategy for detection of genetic interactions microarrays Iterativeusing Strategy Detection of gene expression effects on microarrays 1 4 Genetic Regulatory Networks Test mutant phenotypes 3 Identify mutants Characterize 2 gene function Glycolysis, Citric Acid Cycle, and Related Metabolic Processes Gene Expression: Control Points Responses to Environmental Signals Intracellular Decision Making Drosophila Genome Drosophila Genome Expressed Sequence Tags A publicly accessible collection of cDNAs representing mRNAs present in specific tissues. The cDNAs have been partially sequenced and identified, where possible, as homologs to publicly accessible genes of known function. Microarray Quotes • “ A fresh, comprehensive and open-mined look at every problem in biology” Brown and Botstein, page 33. WOW! •“… the construction of a Biological Periodic Table…” Lander, page 3. •“… as model-independent as possible…” Brown and Botstein, page 33. From The Chipping Forecast ROS arise throughout the cell. ROS arise throughout the cell Wounding , Chilling Pathogens Ozone Cell Wall Pathogens Wounding , Chilling Ozone Cell Wall Mitochondrion Mitochondrion Post-transcriptional EffectsPos t-tra nscriptiona l Drought , Salinity (ROS subcellul ar si tes unclear) Effe cts Drought , Salinity Cytosol Cytosol Antioxidant genes Antioxidant genes Nucleus (ROS su bce llul ar si tes un cle ar) Nucleus Gene Ex pression Gene Expression Chloroplast Chloroplast Pos t-tra nscriptiona l Effe cts Post-transcriptional Paraquat , Effects High Light + Chilling Sulfur Dioxide Paraquat , High Light + Chilling Sulfur Dioxide , , Free Radicals Bioinformatics Institute • Research institute based at Virginia Tech • Begins July 1 with $3 million • Will occupy 2 building and have 100+ employees in 4 years Getting Into Bioinformatics • Get a minor in biology • Get involved with bioinformatics research – – – – – Dr. Alscher Dr. Heath Dr. Keller Dr. Ramakrishnan Dr. Watson