Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Species and Classification in Biology Barry Smith http://ifomis.org http:// ifomis.org 2 10-9 m DNA http:// ifomis.org 3 Organism Organ 10-1 m Tissue Cell 10-5 m Organelle Protein DNA 10-9 m http:// ifomis.org 4 New golden age of classification* ~ 30 million species 30,000 genes in human 200,000 proteins 100s of cell types 100,000s of disease types 1,000,000s of biochemical pathways (including disease pathways) *… legacy of Human Genome Project http:// ifomis.org 5 Organism Organ 10-1 m Tissue Cell 10-5 m Organelle Protein DNA 10-9 m http:// ifomis.org 6 FUNCTIONAL GENOMICS proteomics, reactomics, metabonomics, phenomics, behaviouromics, toxicopharmacogenomics … http:// ifomis.org 7 The incompatibilities between different scientific cultures and terminologies immunology genetics cell biology http:// ifomis.org 8 have resurrected the problem of the unity of science in a new guise: The logical positivist solution to this problem addressed a world in which sciences are associated with printed texts. What happens when sciences are associated with databases? http:// ifomis.org 9 … when each (chemical, pathological, immunological, toxicological) information system uses its own classifications how can we overcome the incompatibilities which become apparent when data from distinct sources are combined? http:// ifomis.org 10 Answer: “Ontology” http:// ifomis.org 11 = building software artefacts standardized classification systems/ controlled vocabularies so that data from one source should be expressed in a language which makes it compatible with data from every other source http:// ifomis.org 12 Google hits (in millions) 25.4.06 ontology 52.4 ontology + philosophy 2.7 ontology + information science 6.0 ontology + database 7.8 http:// ifomis.org 13 A Linnaean Species Hierarchy http:// ifomis.org 14 (Small) Disease Hierarchy http:// ifomis.org 15 Combining hierarchies Organisms http:// ifomis.org Diseases 16 via Dependence Relations Organisms http:// ifomis.org Diseases 17 A Window on Reality http:// ifomis.org 18 A Window on Reality Diseases Organisms http:// ifomis.org 19 A Window on Reality http:// ifomis.org 20 How to understand species (aka types, universals, kinds) Species are something like invariants in reality which can be studied by science Species have instances: this mouse, this cell, this cell membrane ... http:// ifomis.org 21 Entity =def anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software http:// ifomis.org 22 Domain =def a portion of reality that forms the subjectmatter of a single science or technology or mode of study; proteomics radiology viral infections in mouse http:// ifomis.org 23 Representation =def an image, idea, map, picture, name or description ... of some entity or entities. http:// ifomis.org 24 Analogue representations http:// ifomis.org 25 Representational units =def terms, icons, photographs, identifiers ... which refer, or are intended to refer, to entities http:// ifomis.org 26 Composite representation =def representation (1) built out of representational units which (2) form a structure that mirrors, or is intended to mirror, the entities in some domain http:// ifomis.org 27 The Periodic Table Periodic Table http:// ifomis.org 28 Ontologies are here http:// ifomis.org 29 Ontologies are representational artifacts http:// ifomis.org 30 What do ontologies represent? http:// ifomis.org 31 A B C 515287 521683 521682 http:// ifomis.org DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt 32 instances A B C 515287 521683 521682 http:// ifomis.org DC3300 Dust Collector Fan Gilmer Belt Motor Drive Belt types 33 Two kinds of composite representational artifacts Databases, inventories: represent what is particular in reality = instances Ontologies, terminologies, catalogs: represent what is general in reality = types http:// ifomis.org 34 What do ontologies represent? http:// ifomis.org 35 Ontologies do not represent concepts in people’s heads http:// ifomis.org 36 Ontology is a tool of science Scientists do not describe the concepts in scientists’ heads They describe the types in reality, as a step towards finding ways to reason about (and treat) instances of these types http:// ifomis.org 37 The biologist has a cognitive representation which involves theoretical knowledge derived from textbooks http:// ifomis.org 38 An ontology is like a scientific text; it is a representation of types in reality http:// ifomis.org 39 Two kinds of composite representational artifacts Databases represent instances Ontologies represent types http:// ifomis.org 40 Instances stand in similarity relations Frank and Bill are similar as humans, mammals, animals, etc. Human, mammal and animal are types at different levels of granularity http:// ifomis.org 41 types substance organism animal mammal cat siamese frog instances http:// ifomis.org 42 science needs to find uniform ways of representing types ontology =def a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent 1. types in reality 2. those relations between these types which obtain universally (= for all instances) lung is_a anatomical structure lobe of lung part_of lung http:// ifomis.org 43 is_a A is_a B =def For all x, if x instance_of A then x instance_of B cell division is_a biological process http:// ifomis.org 44 Entities http:// ifomis.org 45 Entities universals (species, types, taxa, …) particulars (individuals, tokens, instances) http:// ifomis.org 46 Canonical instances within the realm of individuals = those individuals which 1. instantiate universals (entering into biological laws) 2. are prototypical Canonical Anatomy: no Siamese twins, no six-fingered giants, no amputation stumps, … http:// ifomis.org 47 Entities universals junk junk instances junk example of junk particulars: desk-mountain http:// ifomis.org 48 Entities human inst Jane http:// ifomis.org 49 Ontologies are More than Just Taxonomies http:// ifomis.org 50 The Gene Ontology 7 million google hits a cross-species controlled vocabulary for annotations of genes and gene products deeper than Darwinianism http:// ifomis.org 51 When a gene is identified three important types of questions need to be addressed: 1. Where is it located in the cell? 2. What functions does it have on the molecular level? 3. To what biological processes do these functions contribute? http:// ifomis.org 52 GO has three ontologies biological processes molecular functions cellular components http:// ifomis.org 53 GO astonishingly influential used by all major species genome projects used by all major pharmacological research groups used by all major bioinformatics research groups http:// ifomis.org 54 GO part of the Open Biological Ontologies consortium Fungal Ontology Plant Ontology Yeast Ontology Disease Ontology http:// ifomis.org Mouse Anatomy Ontology Cell Ontology Sequence Ontology Relations Ontology 55 Each of GO’s ontologies is organized in a graph-theoretical structure involving two sorts of links or edges: is-a (= is a subtype of ) (copulation is-a biological process) part-of (cell wall part-of cell) http:// ifomis.org 56 http:// ifomis.org 57 The Gene Ontology a ‘controlled vocabulary’ designed to standardize annotation of genes and gene products used by over 20 genome database and many other groups in academia and industry and methodology much imitated http:// ifomis.org 58 The Methodology of Annotations Scientific curators use experimental observations reported in the biomedical literature to link gene products with GO terms in annotations. The gene annotations taken together yield a slowly growing computer-interpretable map of biological reality, The process of annotating literature also leads to improvements and extensions of the ontology, which institutes a virtuous cycle of improvement in the quality and reach of future annotations and of the ontology itself. The Gene Ontology as Cartoon http:// ifomis.org 59 cellular components molecular functions biological processes 1372 component terms 7271 function terms 8069 process terms http:// ifomis.org 60 The Cellular Component Ontology (counterpart of anatomy) membrane nucleus http:// ifomis.org 61 The Molecular Function Ontology protein stabilization The Molecular Function ontology is (roughly) an ontology of actions on the molecular level of granularity http:// ifomis.org 62 Biological Process Ontology death An ontology of occurrents on the level of granularity of cells, organs and whole organisms http:// ifomis.org 63 GO here an example a. of the sorts of problems confronting life science data integration b. of the degree to which formal methods are relevant to the solution of these problems http:// ifomis.org 64 Each of GO’s ontologies is organized in a graph-theoretical data structure involving two sorts of links or edges: is-a (= is a subtype of ) (copulation is-a biological process) part-of (cell wall part-of cell) http:// ifomis.org 65 Linnaeus http:// ifomis.org 66 http:// ifomis.org 67 Entities http:// ifomis.org 68 Entities universals (kinds, types, taxa, …) particulars (individuals, tokens, instances …) Axiom: Nothing is both a universal and a particular http:// ifomis.org 69 Entities universals* *natural, biological, kinds http:// ifomis.org 70 Entities universals instances http:// ifomis.org 71 universals are natural kinds Instances are natural exemplars of natural kinds (problem of non-standard instances) Not all individuals are instances of universals http:// ifomis.org 72 Entities universals instances instances penumbra of borderline cases http:// ifomis.org 73 Entities universals junk junk instances junk example of junk: beachball-desk http:// ifomis.org 74 Primitive relations: inst and part inst(Jane, human being) part(Jane’s heart, Jane’s body) A universal is anything that is instantiated An instance as anything (any individual) that instantiates some universal http:// ifomis.org 75 Entities human inst Jane http:// ifomis.org 76 A is_a B genus(B) species(A) instances http:// ifomis.org 77 is-a D3* e is a f =def universal(e) universal(f) x (inst(x, e) inst(x, f)). genus(A)=def universal(A) B (B is a A B A) species(A)=def universal(A) B (A is a B B A) http:// ifomis.org 78 solve problem of false positives insist that A is_a B holds always as a matter of scientific law http:// ifomis.org 79 nearest species nearestspecies(A, B)=def A is_a B & C ((A is_a C & C is_a B) (C = A or C = B) B A http:// ifomis.org 80 Definitions highest genus lowest species instances http:// ifomis.org 81 Lowest Species and Highest Genus lowestspecies(A)=def species(A) & not-genus(A) highestgenus(A)=def genus(A) & not-species(A) Theorem: universal(A) (genus(A) or lowestspecies(A)) http:// ifomis.org 82 Axioms Every universal has at least one instance Distinct lowest species never share instances SINGLE INHERITANCE: Every species is the nearest species to exactly one genus http:// ifomis.org 83 Axioms governing inst genus(A) & inst(x, A) B nearestspecies(B, A) & inst(x, B) EVERY GENUS HAS AN INSTANTIATED SPECIES nearestspecies(A, B) A’s instances are properly included in B’s instances EACH SPECIES HAS A SMALLER CLASS OF INSTANCES THAN ITS GENUS http:// ifomis.org 84 Axioms nearestspecies(B, A) C (nearestspecies(C, A) & B C) EVERY GENUS HAS AT LEAST TWO CHILDREN nearestspecies(B, A) & nearestspecies(C, A) & B C) not-x (inst(x, B) & inst(x, C)) SPECIES OF A COMMON GENUS NEVER SHARE INSTANCES http:// ifomis.org 85 Theorems (genus(A) & inst(x, A)) B (lowestspecies(B) & B is_a A & inst(x, B)) EVERY INSTANCE IS ALSO AN INSTANCE OF SOME LOWEST SPECIES (genus(A) & lowestspecies(B) & x(inst(x, A) & inst(x, B)) B is_a A) IF AN INSTANCE OF A LOWEST SPECIES IS AN INSTANCE OF A GENUS THEN THE LOWEST SPECIES IS A CHILD OF THE GENUS http:// ifomis.org 86 Theorems universal(A) & universal(B) (A = B or A is_a B or B is_a A or not-x(inst(x, A) & inst(x, B))) DISTINCT UNIVERSALS EITHER STAND IN A PARENT-CHILD RELATIONSHIP OR THEY HAVE NO INSTANCES IN COMMON http:// ifomis.org 87 Theorems A is_a B & A is_a C (B = C or B is_a C or C is_a B) UNIVERSALS WHICH SHARE A CHILD IN COMMON ARE EITHER IDENTICAL OR ONE IS SUBORDINATED TO THE OTHER http:// ifomis.org 88 Theorems (genus(A) & genus(B) & x(inst(x, A) & inst(x, B))) C(C is_a A & C is_a B) IF TWO GENERA HAVE A COMMON INSTANCE THEN THEY HAVE A COMMON CHILD http:// ifomis.org 89 Expanding the theory Sexually reproducing organisms Organisms in general To take account of development (child, adult; larva, butterfly) Biological processes Biological functions -- at different levels of granularity http:// ifomis.org 90 How to understand species (aka types, universals, kinds) Species are something like invariants in reality which can be studied by science Species have instances: this mouse, this cell, this cell membrane ... http:// ifomis.org 91 Universal, Classes, Sets A class is the extension of universal http:// ifomis.org 92 Class =def a maximal collection of particulars determined by a general term (‘cell’, ‘mouse’, ‘Saarländer’) the class A = the collection of all particulars x for which ‘x is A’ is true http:// ifomis.org 93 Universals and Classes vs. Sums The former are marked by granularity: they divide up the domain into whole units, whose interior parts are traced over. The universal human being is instantiated only by human beings as single, whole units. A mereological sum is not granular in this sense (molecules are parts of the mereological sum of human beings) http:// ifomis.org 94 A bad solution Identify both universals and classes with sets in the mathematical sense Problem of false positives adult child lion in Leipzig lion animal owned by the Emporer mammal mammal weighing less than 200 Kg animal http:// ifomis.org 95 Sets in the mathematical sense are marked by granularity Granularity = each class or set is laid across reality like a grid consisting (1) of a number of slots or pigeonholes each (2) occupied by some member. Each set is (1) associated with a specific number of slots, each of which (2) must be occupied by some specific member. A class survives the turnover in its instances: both (1) the number of slots and (2) the individuals occupying these slots may vary with time http:// ifomis.org 96 But sets are timeless A set is an abstract structure, existing outside time and space. The set of human beings existing at t is (timelessly) a different entity from the set of human beings existing at t because of births and deaths. Biological classes exist in time Darwin: because the universals of which they are extensions exist in time http:// ifomis.org 97