* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Bioinformatics Needs for the post
Non-coding DNA wikipedia , lookup
Secreted frizzled-related protein 1 wikipedia , lookup
Western blot wikipedia , lookup
Gene expression profiling wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein adsorption wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Expression vector wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Proteolysis wikipedia , lookup
Genome evolution wikipedia , lookup
Interactome wikipedia , lookup
Molecular evolution wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics From Egg to Adult in 3x109 Bases • A single cell, the fertilized egg, eventually differentiates into the ~300 different types of cells that make up an adult body. • With a few exceptions all of these cells contain the complete human genome, but express only a subset of the genes. • Gene expression patterns are determined largely by cell type, and vice versa. The “body” has: • • • • • The genome A comprehensive list of genes Gene expression data Protein localization in the cells Information about Protein/protein and protein/DNA interactions. • Ways to store, display and query masses of data so activity can focus on relevant bits. Primary Flows of Information and Substance in the Cell DNA creation regulation mRNA transcription factors splicing factors Environment & other cells Receptors Enzymes structural proteins signaling molecules structural sugars structural lipids Why a Grid? • Growth of Molecular Biological problems is getting out of sync with Moore's Law • Growing interest in Bioinformatics from other disciplines • New experimental approaches (genomics, proteomics, etc.) require new and more demanding solutions Comparative Genomics • Comparative genomics: comparison of whole genomes (e.g. human and mouse) and new techniques for phylogenetic footprinting. Rnomics • Rnomics: tertiary structure prediction and novel RNA gene location in whole genomes • We are conducting genome wide scans for RNA regulatory elements and RNA genes using state of the art comparative genomics tools. • The analysis involves comparison of the human and mouse genomes using tools such as stochastic context-free grammars Molecular Interactions • Large scale in silico maps of the molecular interactions over entire proteomes and genomes. These maps provide quantitative functional models that bridge the biological with the chemical. • We are developing models of gene participation in biological processes. Such models are developed from microarray-based gene expressions and background knowledge, e.g. as provided by the socalled Gene Ontology. The GRID Test Bed will be an excellent computational environment for finding molecular classifiers associated with e.g. major diseases such as, for instance, cancer, artherosclerosis and other diseases that kill many people in Europe. What is needed? • Standard, stable interfaces to conceptual problem solvers / data / objects • A distributed way to store and analyse information • Security for user data • Avoiding duplication of implementation and computation Protein structure prediction an example • There are over 1.3 million sequences in the nonredundant protein database managed by the NCBI and over 19 thousand structures in the protein data bank (PDB) • Using this data we have built a library of common protein substructures linking structure and sequence on a local level • Our library consists of over 4000 unique substructure associated with from seven to two thousand examples of sequence fragments • In order to extract properties that recognize proteins containing particular substructures, we iteratively test different (combinations of) properties on proteins containing and proteins not containing the substructure of interest. • calculating properties for all groups takes one week on ten Athlon XP 1700+ (1.46 GHz, 1GB RAM) processors • In a more realistic search space, without the drastic search space reductions, we estimate to need approximately 700 processor days with 2GB RAM. And depending on the available resources, we would like to run several such trails in order to test different parameter settings. Thus our upper estimates may be multiplied by a factor 5-10.