* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Drug Target Discovery by Genome Analysis
Survey
Document related concepts
Gene expression profiling wikipedia , lookup
Western blot wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Genetic code wikipedia , lookup
DNA barcoding wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Community fingerprinting wikipedia , lookup
Molecular ecology wikipedia , lookup
Protein structure prediction wikipedia , lookup
Point mutation wikipedia , lookup
Molecular evolution wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Transcript
Drug Target Discovery by Genome Analysis AREXIS Model genome Drug 97% total 67% finished x106 1 Species # of genes E. coli 4.289 S. cerevisie 6.217 %known function 62 65 C. elegans 19.000 M. musculus 30-50.000 H. sapiens 30-50.000 ? ≈10 ≈15 gap 0.5 1990 1995 2000 Link genes to biological functions time Bioinformatik? • Bioinformatik - det forskingsområde som behandlar och analyserar “bioinformation” • Bioinformation - den information som finns lagrad i: – – – genom-data (gener, genuttryck, genfunktion, etc i relation till den organism som härbärgerar genomet i fråga) biologiska sekvenser och, relationer mellan biologiska sekvenser, med avseende på biologiska organismers funktion (metabolism, hälsa, etc) • Bioinformatik skall ge idéer och förslag till nya våta experiment • Forskare med bioinformatik som experimentellt verktyg (in silico biologi) Animal models Why animal models? • • • • • • • Genetically homogeneous Controlled environmental influence Large family sizes give optimal statistical power Tools to define and characterise disease causative genes and mechanisms In vivo validation and in vivo pharmacology Increase productivity Higher resolution Research and development strategy Disease models Genetic analysis Academic partners Target discovery Target validation Drug discovery Arexis Clinical development Marketing of new products Industrial partners Arexis Integrated biology-driven discovery Comparative biology Human patient materials Medicinal chemistry Bioinformatics Biotechnology expertise Clinical science Functional genomics In vivo pharmacology R&D project overview Metabolic diseases Type 2 diabetes X Obesity X AMPK Inflammatory diseases Rheumatoid arthritis X Multiple sclerosis X Skin inflammation Immunotherapy Prioritised projects SCCE X Muc. A Business model Input to the Arexis pipeline and portfolio Research collaborations Sub-contracts Partnerships TargetTarget and and discovery Drug Drug discovery Revenue sources Commercialisation process Spin-off opportunities Early Access fees Research funding Targeted In-licensing Drug development/ commercialisation Milestone payments Mid Royalties Late Organisation build-up plan Management & Administration Management Administration Accumulated R&D Bioinformatics Biology Chemistry Clinical development Accumulated Total 2001 2002 2003 2004 2005 2006 3 3 2 5 4 3 7 5 4 9 5 5 10 5 5 10 5 3 10 2 1 16 5 21 4 2 32 8 32 6 3 49 10 45 8 4 67 11 57 13 6 87 8 21 39 58 77 97 3 2 3 Board of Directors Anders Vedin, Chairman of the Board Professor, Senior Advisor InnovationsKapital AB Henry Geraedts, Deputy Chairman of the Board PhD, Independent director, 3i Carl Christensson CEO SEB Företagsinvest Rikard Holmdahl Professor of Medical inflammation, founder Lennart Hansson PhD, Chief Executive Officer Leif Andersson Professor of Animal Genetics, founder Curt Lönnström Chief Executive Officer of Ryda Bruk Expression profiling Affymetrix experiment, and experimental data database with annotated experiments Genetic approaches in silico approaches Ensembl aGDB auto-annotated genetic/linkage genomes data pointers to disease loci pointers to phenotype-related genes relevant genes integration phenotype-related pathways QuickTi me™ and a T IFF (Uncom pressed) decom pressor are needed to see t his pict ure. Target database curated gene structures Research System Architecture aGDB Academic partners DAS DAS Arexisusers tools for sequence analysis tools for expression data analysis LDAP vpn GIM business dev mail economy documents Commercial partners DAS DAS Arexisusers Arexis intranet IT System Architecture project B AMPK common ancestor pig common ancestor common ancestor mouse mouse project C homo homo mouse rat ? homo ? Tissue section of skeletal muscle fiber from Hampshire pigs Normal rn+/rn+ Mutant RN-/rn+ or RN-/RN- AMPK A skeletal muscle-specific variant of AMPK Tissue distribution of AMPKg-chains AMP-activated kinase (AMPK) - a heterotrimeric enzyme g1 g a b g1 g2 g3 g2 b1 b2 Colon Peripheral Blood Small intestine Ovary Testis Prostate Thyroid gland Spleen Pancreas Kidney Muscle Liver Lung Brain Placenta g3 Heart a1 a2 AMPK Pathways regulating glucose transport in muscle cells AMPK Modified from Shepherd et al. NEJM 1999 AMPK genetic mapping Experimental validation chr. 5 mouse chr. 7 human Link to patophysiology? Pathway analysis! AMP aa AMP AMP gg bb AMPK Protein Phosphatase 2C AMPKK P AMPK Acetyl-CoA Carboxylase Increased glucose uptake Protein Phosphatase 2A P Acetyl-CoA Carboxylase inactive Acetyl CoA Increased amount of GLUT4 Malonyl CoA Malonyl-CoA Decarboxylase P Decreased glycogen degradation Malonyl-CoA Decarboxylase active Fatty acid Pristane induced arthritis in the rat Susceptible DA rat Resistant E3 rat mouse (1 Mbp) position of mouse gene duplicated genomic segments human (2.4 Mbp) Genomics data Expression data integrate / analyse / visualise Reconstruction of Pathway Drug Target NOVEL QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Database resources National Center for Biotechnology Information NCBI European Bioinformatics Institute DNA Databank of Japan EBI Blocks MGI PubMed DDB J EMBL PIR GenBank GDB Pfam ProSite Swissprot MIPS AceDB DNA protein motifs families genomes bibliography Where do sequences come from? DNA genomic sequence •Directed / small-scale •Large-scale : BAC, YACs mRNA protein cDNA sequence •Directed / small-scale •Random / large-scale • Expressed Sequence Tag [EST] protein sequence •Directed, very little 5/23/2017 Sequence databases Nucleotide databases GenBank EMBL International Nucleotide Sequence Database Collaboration DDBJ Sequence databases Primary vs secondary databases • Primary database = sequence database Seq 1 Seq 3 ACGTTT TTCTGA Seq 2 CTAGAC – eg EMBL, GenBank, SWISSPROT – Each record describes individual sequence – Can be contain either nucleotide or protein sequences Sequence databases Primary vs secondary databases • Secondary database = pattern database Pattern 1 Pattern 3 accagtgt acgactct ttcgatgtca ttcgatcgca tccgatcgtc Pattern 2 tacgtagc tacctacc taggtagc – eg PROSITE, PRINTS, BLOCKS, Pfam – Each record describes a set of sequences – Set can be expressed as a motif, multiple sequence alignment or probabilistic model Sequence databases Nucleotide databases • How do the databases compare? – Three databases are 99.99% identical – Annotations can be slightly different • How often are they updated? – New release of databases every 3 months – Interim releases - EMBL-new • Can the annotations be trusted? – Not always - some estimates suggest 25% are incorrect Sequence databases Nucleotide databases • EMBL is subdivided into EST and non-EST sequences EST vrt mam Non-EST hum rod Sequence databases Protein databases GenBank EMBL GenPept TrEMBL PIR SWISSPROT Sequence databases Protein databases EMBL • 13,700,000 entries TrEMBL REM SP SWISSPROT • Coding sequences automatically translated • 558,150 entries • TrEMBL split into: – SP-TrEMBL - Sequences destined for SWISSPROT – REM-TrEMBL - Remaining sequences • Sequences manually moved to SWISSPROT • 106,602 entries • Because it is manually curated, annotations are reliable! Sequence databases Summary • • • • EMBL is main nucleotide sequence database (Europe) TrEMBL is an automated translation of EMBL SWISSPROT is main curated protein database Between main releases, interim releases are made – eg EMBL-new, TrEMBL-new, SWISSPROT-new • EMBL is subdivided into EST / non-EST then by species • Annotations can be trusted in SWISSPROT, not in EMBL • Accession numbers uniquely identify a sequence and remain constant when entries are updated Basics of sequence searching Methods Method Rigorous Heuristic Probabilistic • • • Accuracy +++++ ++ ++++ Duration +++++ + +++ Example Smith-Waterman BLAST, FASTA HMM Probabilistic methods are best, but can be slow and difficult to use Rigorous are good when used on a small subset of sequences, but too slow to search large sequence database Heuristic methods are the best place to start Basics of sequence searching Terminology • Sensitivity vs Selectivity – – – – Sensitivity searching will find weaker hits Selectivity searching less likely to find unrelated hits Increased sensitivity means more true positives Increased selectivity means fewer false positives Searching with BLAST How it works Query sequence Find identical stretches of nucleotides in two sequences Sequence in database HSP HSP 1 Extend regions of similarity as far as possible HSP 2 Identify all regions of similarity Local vs global comparisons The nature of proteins • Proteins consist of functional and structural units domains Local vs global comparisons What is a local and global comparison? Global comparison attempts to match all of one sequence against another Local comparison attempts to match short stretches of one sequence with another Local vs global comparisons When should each technique be used? • Global comparisons – Closely related sequences – Same general structure of sequence – Roughly equal lengths • Local comparisons – Sequences not closely related – Sequence fragments – Interested in identifying common domains Local vs global comparisons When should each technique be used? Common domain Non-matching domains Domain unique to one sequence Common domain Common domain Global comparison will attempt to match all of one sequence against another even when sequences share only one common domain Global comparison should only be used if the sequences being compared have a common domain structure Local vs global comparisons Summary • Proteins are organised into domains • Local comparisons find short stretches of similarity • Global comparisons match the whole length of one sequence against another • Local comparisons should be used unless sequences are closely related and have identical domain structures. Searching with BLAST Search with DNA or protein? • Use DNA if – There are frameshifts - common in ESTs – Interested in evolution (3rd base in codon hidden in translation) • Otherwise, use protein sequence. Why? – – – – Two DNA sequences can be aligned in six ways Each alignment can give scores, therefore more partial matches Therefore there is more noise associated with comparison Statistical significance of good hits are thus reduced. Searching with FASTA BLAST vs FASTA • Advantages of BLAST – Faster than FASTA – Reports all high-scoring local alignments • Advantages of FASTA – – – – More sensitive - approaches that of rigorous methods Faster than rigorous methods E-values are more accurate Better handling of frameshifts - important for ESTs. Basics of sequence searching Summary • Sequence searching is complicated because we want to find partial matches • Search method should be sensitive and selective • Rigorous methods are much more sensitive than heuristic methods, but are too slow Secondary databases Databases available - Prosite • 1492 regular expressions • Each entry consists of two files – Text file with information on family – A regular expression and matching sequences ID DT DE PA PROTEIN_KINASE_TYR; PATTERN.AC PS00109; APR-1990 (CREATED); DEC-1992 (DATA UPDATE); JUL-1998 (INF UPDATE). Tyrosine protein kinases specific active-site signature. [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3). Secondary databases Databases available - Pfam • Split into two sections – Pfam-A – Pfam-B • 3,071 HMMs 36,700 HMMs (Curated) (Not curated) Each entry consists of description and alignment ID IL7 AC PF01415 DE Interleukin 7/9 family AU Ponting CP, Schultz J, Bork P AL Clustalw BM hmmbuild HMM SEED BM hmmcalibrate --seed 0 HMM DR PROSITE; PDOC00228; CC IL-7 is a cytokine that acts as a growth factor for early CC lymphoid cells of both B- and T-cell lineages. IL-9 is a CC multi-functional cytokine. IL7_BOVIN/28-172 DISGKDGGAYQNVLMVNIDD-LDNMINFDSNCLNNEPNFFKKHSCDDNKEASFLNRASRK IL7_HUMAN/28-173 DIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCLNNEFNFFKRHICDANKEGMFLFRAARK IL7_MOUSE/28-152 HIKDKEGKAYESVLMISIDE-LDKMTGTDSNCPNNEPNFFRKHVCDDTKEAAFLNRAARK. Secondary databases Databases available - InterPro Biotechhuset modell Biotechhuset Vy mot sydväst Biotechhuset Annedal http://www.arexis.com