Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DWPI Chemistry Resource (DCR) Brian Larner –24th September 2015 AGENDA • THE PROBLEM; WHY IS IT DIFFICULT TO SEARCH CHEMICAL STRUCTURES IN PATENTS? • SOLUTION; DCR INDEXING • DCR COVERAGE • DCR STRUCTURE CONVENTINOS • SEARCHING IN THE DCR DATABASE 2 PROBLEMS IN SEARCHING FOR CHEMICAL INFORMATION • There is no consistent way of representing chemical information in patents • A chemical compound could be referred to using any of the following • A systematic chemical name • A semi-systematic or trivial chemical name • A trade or proprietary name or for drugs an approved name or trial prep code • A drawn out chemical structure • As one possibility within a generic chemical structure • As one possibility when only a generic class of compounds is referred to ALL THE NAMES DICLOFENAC IS KNOWN BY • ABITREN; ADEFURONIC; AFLAMIN; ALLVORAN; ALMIRAL; AM-DICLOFENAC; ANFENAX; ANTHRAXITON; ARTHRIFEN; ARTHRODERM; ARTHROPEN; ARTHROTEC; ARTREN; ARTRILAT; ARTRITAREN; ASPZONE; ASSAREN; ATHROFEN; BA-47210; BATAFIL; BENFOFEN; BIOFNAC; BLESIN; BOLABOMIN; CATAFLAM; CHLORGY; CIBA-47210; COLIRI; CONTRALG; CORDRALAN; CT-DICLO; CURINFLAM; CURINFLAM-A.P.; DAISPAS;DEFLAMAT; DELIMON; DELPHIMIX; DELPHIMIX-1; DELPHINAC; DENBAL; DESINFLAM; DFNA; DFP; DFP-60; DICHRONIC; DICLAC; DICLO; DICLO-ATTRITIN; DICLOBASAN; DICLO-OPT; DICLO-PHLOGONT; DICLO-PUREN; DICLO-REKTAL; DICLO-SPONDRYL; DICLOSPONDYRIL; DICLO-TABLINEN; DICLOBENIN; DICLOD; DICLOFEN; DICLOFENAC; DICLOFENAC SODIUM SALT; DICLOFENAC SODIUM; DICLOFENAC-OPT; DICLOFENAC-SODIUM; DICLOFENACO; DICLOFLEX; DICLOMAX; DICLOMELAN; DICLOPHLOGONT; DICLOREUM; DICLOSIAN; DICLOWAL; DICSANAL; DIFENAC; DIGNOFENAC; DIRALON; DOCELL; DOGNOFENAC; DOLOBASAN; DOLOTREN; DOLOVISANO-DICLO; DONJUST-A; DORAGON; DURAVOLTEN; ECOFENAC; EFFEKTON; EVINOPON; FELORAN; FENACIDON; FENAMED; FENOFLAM; FENYTAREN; FLAMERIL: FLEFARMINA; FLEFARMINE; FLOGOFENAC; FORGENAC; GAUTELIN; GP-45840; GROFENAC; GROSALGEN; HIZEMIN; IMBUN; INFLAMAC; INFLANAC; INFLAREN; IRINATOLON; JAVIPREN; KLAST; KRIPLEX; LINOBOL; MAGLUFEN; MAGLUPHEN; MILNAC; MIYADREN; MONOFLAM; MP-DICLOFENAC; MYOGIT; N-RHEUMAVINCIN; NABOAL; NABOAL-SR; NACLOF; NAKLOFEN; NERIODIN; NIFLERIEL; NOVAPIRINA; OLFEN; OLFEN-GEL; OLPHEN; ORTOPHEN; OXA; PANAMOR; PARSAL; PENNSAID; PENTIATE; POLTAJEN; PRIMOFENAC; PROPHENATIN; REMETHAN; REOXEL; REWODINA; RHEUFENAC; RHEUMAREN; RHEUMASAN-D; RHEUMAVEK; RHEUMAVINCIN; RHEUMAVINCIN-N; RHUMALGAN; RUVOMINOX; SAFFRAC; SANNAX; SAVISMIN; SEECOREN; SGESTONE; SHIGNOL; SILINO; SODIUM DICLOFENAC; SODIUM-DICLOFENAC; SOFARIN; SOLARAZE; SORELMON; SR-318; SR-318A; SR-318B; TAKS; THICATAREN; TORYXIL; TP-318 TRABONA; TRATUL; TSUDOHMIN; URIGON; VALETAN; VERAL; VERICE; VIAVOX; VILONIT; VOLDAL; VOLFENAC; VOLMAGEN; VOLRAMAN; VOLTAREN; VOLTAREN-QS; VOLTAREN-SR; VOLTARENE; VOLTARN-EMULGEL; VOLTAROL; VOLTAROL-EMULGEL; VOLTAROL-OPHTHA; VONAFEC; VOREN; VOTAXIL; VURDON; XENID; YOUFENAC AND OF COURSE • 2-(2-((2,6-dichlorophenyl)-amino)-phenyl)-acetic acid • 2-(2-((2,6-dichlorophenyl)-amino)-phenyl)-ethanoic acid Cl Cl N O OH AGENDA • THE PROBLEM; WHY IS IT DIFFICULT TO SEARCH CHEMICAL STRUCTURES IN PATENTS? • SOLUTION; DCR INDEXING • DCR COVERAGE • DCR STRUCTURE CONVENTINOS • SEARCHING IN THE DCR DATABASE 6 SOLUTION: DWPI CHEMISTRY RESOURCE (DCR) • This is a database of specific chemical substances mentioned in patents • They are also organised into families of closely related compounds as follows – basic compound – salts, isotopes, mixtures, isomers • Substance records include structure diagrams and substance data, e.g. – IUPAC-name, synonyms – molecular formula, molecular weight 7 SOLUTION: DWPI CHEMISTRY RESOURCE (DCR) • The DCR numbers are associated with the relevant fragmentation codes for the substance so they can be searched in conjunction with non-structural fragmentation codes if desired • They also have roles associated with them (e.g. produced, detected)so that you can limit your answers by the role of the compound 8 BENEFITS OF DWPI INDEXING - REAL EXAMPLE • Search on Diclofenac or its most common synonyms (Voltarol or Voltaren) using Key words in DWPI title & abstract - Find 2748 documents • Search on Diclofenac via DCR record – We find 2473 records • 414 of these were not found by the keyword search SOME INVENTIONS FOUND ONLY BY THE KEYWORD SEARCH ARE LESS RELEVANT 10 BUT THE ONES FOUND BY DCR ARE HIGHLY RELEVANT • Multicomponent crystals useful e.g. for treating and preventing acute and chronic pain comprise (2-amino-6-(4fluoro-benzylamino)-pyridin-3-yl)-carbamic acid ethyl ester and 2-(2-((2,6-dichlorophenyl)-amino)-phenyl)-acetic acid NOVELTY – Multi-component crystals comprise((2-amino-6(4-fluoro-benzylamino)-pyridin-3-yl)-carbamic acid ethyl ester and 2-(2-((2,6-dichlorophenyl)-amino)-phenyl)- acetic acid. ADVANTAGE - The multicomponent crystals: are stable and easy to formulate; exhibit physicochemical properties which influence e.g. solubility, stability, hygroscopicity, handling and tabletting; and does not exhibit typical problems of physical mixtures i.e. different bioavailability or decomposition during the production. AGENDA • THE PROBLEM; WHY IS IT DIFFICULT TO SEARCH CHEMICAL STRUCTURES IN PATENTS? • SOLUTION; DCR INDEXING • DCR COVERAGE • DCR STRUCTURE CONVENTINOS • SEARCHING IN THE DCR DATABASE 12 DCR COVERAGE • DCR records are only created for patents that are classified in at least one of the following CPI sections – B(Pharmaceuticals) – C (Agrochemcals) – E (General Chemistry) • In addition existing DCR records are cited when the substances they relate to are mentioned in the DWPI abstracts for patents classified in Section D, F, G, J and K • DCR numbers are auto-generated from the specific compound codes in polymer indexing and added to the indexing 13 DCR COVERAGE BY COMPOUND TYPE • Ordinary organic compounds (eg ethanol, ibuprofen) • Inorganic compounds (eg Sodium chloride, ammonia) • Complexes and organometallics (eg ferrocene, Copper phthalocyanine, diethyl magnesium bromide) • Peptides with 10 or less repeat units • Proteins and other natural polymers with well defined names* • Synthetic Polymers from a standard list of around 340 commonly occurring ones • Plant, animal & microbial extracts* *these records do not contain structures 14 WHAT IS NOT COVERED • Generic classes of compounds – These are covered by other forms of chemical structure indexing in DWPI eg fragmentation coding • Synthetic polymers other than the ones in the predefined list of around 340 – These are covered by polymer indexing • Any compound of ambiguous structure – This could be those with ill defined ratios of ions or components – Or ones with ambiguous names where we can not be sure of the correct structure 15 DCR COVERAGE BY ROLE IN THE INVENTION • Compounds are indexed in DCR when they meet the following criteria • All compounds stated to be new including new intermediates • Compounds produced by the inventive process • Compound purified, removed, or detected by the claimed process • Catalysts • Detecting agents & purifying agents • Starting materials and reagents 16 DCR COVERAGE RULES • The following are selected for DCR indexing • All claimed compounds up to a maximum of ~99. This number is reduced if a Markush is also needed. (Max no. of DCR + Markush records =99 due to antiquated "subscriber" feed to hosts, which only allows for a 2-digit number for the record number) • At least 1 example, which should be the best example illustrating the invention (usually the one in the abstract). In many cases this one is also claimed • Further examples input at analysts discretion, but more should be selected if there are examples which are structurally dissimilar from those claimed, but still representative of the Markush and only a few compounds are claimed • Selected examples should be "real" not prophetic; i.e. should have supporting data such as preparative data, activity data etc. • Compounds from the disclosure can be indexed at the Analysts discretion. Usually these would be if there are no (or few) claimed or exemplified compounds, or if there are novel disclosed compounds that are not claimed (must have supporting data if they are novel) WHEN MORE THAN 99 COMPOUNDS ARE CLAIMED • The ones selected must include the best example • Others are chosen so as to reflect the full structural diversity of the complete set • So at least one example with each different type of ring system present • At least one example of each type of functional group present • Where possible different substitution patterns on a ring system are also covered DCR COVERAGE • Pharmaceutical (B), agrochemical (C) and general chemical (E) patent records • Comprehensive coverage from 4/1999* • Selective coverage for approximately • 20,000 substances from 1/1987 to date • 2,100 substances from 7/1981 to date * Except Japanese patents which are covered from 9/2000 onwards. AGENDA • THE PROBLEM; WHY IS IT DIFFICULT TO SEARCH CHEMICAL STRUCTURES IN PATENTS? • SOLUTION; DCR INDEXING • DCR COVERAGE • DCR STRUCTURE CONVENTIONS • SEARCHING IN THE DCR DATABASE 20 DCR STRUCTURE DRAWING CONVENTIONS • Simple organic structures are drawn using the CAS rules • Structures are represented as drawn in the document unless they contain a clear error (e.g. 5 valent carbon atom) • When drawing from a name however the following rules are applied • Keto enol tautomers are drawn in the keto form • Sugars are drawn in the ring form • Cyclic imines are converted into the ene-amine form when ever tautomerism is possible NH NH2 NH H3C CH3 Leave as drawn Preferred SALTS • Only one of each ion present is drawn irrespective of how many are present • Inorganic cations are always shown charged • Organic cations which are produced by protonation of a base are shown as the uncharged base • True onium cations (e.g. tetramethylammonium) are shown with the charge on the central atom • Organic anions produced by deprotonation of an organic acid are always represented as the free acid • Simple Inorganic anions (e.g. nitrate, sulphate, chloride) are shown charged as long as the cation is shown charged • Tetraorganyl borane type anions are always shown with the charge on the central atom EXAMPLES OF SALTS O O Fe N + O 3+ HO CH3 O S OH Na + O Ferric acetate Sodium 3-nitrobenzenesulphonate CH3 H3C N N Cl CH3 + Cl CH3 Trimethylammonium chloride N-methylpyridinium chloride O O CH3 B + + Li Lithium tetraphenyl borane H3C N CH3 CH3 Tetramethylammonium benzoate ORGANOMETALLIC STRUCTURES • Metal – Carbon σ-bonds are shown as single covalent bonds • All other bonds between metals and organic moieties are shown disconnected Mn H3C Li Mg O + 2+ O Br n-propyl lithium Phenylmagnesium bromide Manganese acetylacetonate 24 COMPLEX INORGANIC ANIONS • Oxoanions of metals are shown with the metal and O atoms disconnected • Charges are shown if this can be worked out • If not elements are shown zero valent • Silicates, borosilicates, heteropolyacid anions etc. are a shown with each element listed and no bonding (all elements zero valent) Na + Al 3+ Sodium aluminate 2- O Mg 2+ (0) Si (0) B (0) O Magnesium borosilicate 25 AGENDA • THE PROBLEM; WHY IS IT DIFFICULT TO SEARCH CHEMICAL STRUCTURES IN PATENTS? • SOLUTION; DCR INDEXING • DCR COVERAGE • DCR STRUCTURE CONVENTIONS • SEARCHING IN THE DCR DATABASE 26 THE DCR RECORD 27 MEANS OF SEARCHING THE DATABASE • The DCR database can be searched using the following options • Chemical structure • Chemical name • Molecular formula • Elements present • Substance descriptor (only for those classes of structures that are hard to define in terms of a structural query such as terpenes or alkaloids) 28 CHEMICAL NAME INFORMATION • Each DCR record has a preferred chemical name – This would be the approved name for a drug – For other substances we use the name it is most commonly known by – We do not normally use the systematic names unless it is short (eg Phenol or Benzoic acid would be fine) • The systematic name appears in a separate field • Finally we include all the known synonyms we come across in patents for this substance – Includes brand names and trial prep codes for drugs – This means you can search by any known name and still find the record 29 MOLECULAR FORMULA INFORMATION • Molecular formula is DCR is always presented as follows – Carbon atoms first, followed by Hydrogen atoms, followed by any other elements listed in alphabetical order – In multi-component structures the formula for each component is listed separately 30 SEARCHING BY MOLECULAR FORMULA • In the molecular formula field (/MF) you can search the exact formula – Eg C6H6/MF • In the element symbol field (/ELS) you can search for the presence of particular elements in the molecular formula – Eg Cl/ELS AND Br/ELS AND O/ELS to find compounds containing Cl, Br AND O 31 SUBSTANCE DESCRIPTORS • Certain classes of substance have special keywords applied to them called substance structures • These are applied to every substance that is of that type and in addition a blank record called a substance descriptor record is indexed for any patent which refers to this class of substance without giving a specific example • To search for these use the /SD field • Eg metallocenes/SD • Will retrieve all DCR records in which the metallocenes substance descriptor has been applied including the substance descriptor record COMPLETE LIST OF SUBSTANCE DESCRIPTORS • ALKALOIDS • LIPOPROTEINS • ALLOYS • METALLOCENES • ANTHRACYCLINES • NOBLE GASES • ANTIBODIES • NUCLEOSIDES • BARBITURATES • NUCLEOTIDES • BENZODIAZEPINES • OLIGONUCLEOTIDE • BETA LACTAMS • OTHER NATURAL PRODUCTS • BORANES • PEPTIDES • CARBOHYDRATES • cyclic peptides • glycoproteins • PHOSPHOLIPIDS • polysaccharides • POLYMERS • cyclodextrins • POLYSACCHARIDES see CARBOHYDRATES • CARBORANES • PROSTAGLANDINS • CROWN ETHERS • PROTEINS • CYCLIC PEPTIDES see PEPTIDES • enzymes • CYCLODEXTRINS see CARBOHYDRATES • glycoproteins • DENDRIMERS • RETINOIDS • ENZYME see PROTEINS • SAPONINS • FATTY ACID see also UNSATURATED FATTY ACIDS • SILICONES • FLAVONOIDS • STEROIDS see SAPONINS • FULLERENES • TAXANES • GLYCOPROTEINS see CARBOHYDRATES and PROTEINS • TERPENES • HALOCARBONS • TETRACYCLINES • HETEROFULLERENES • UNSATURATED FATTY ACIDS see also FATTY ACIDS • HETEROPOLY ACIDS • ZEOLITES TAKING THE SEARCH FURTHER • Having searched to find all the substance descriptor records you can refine your search in a number of ways • Refine by elements present eg find metallocenes containing Zirconium as follows – Metallocenes/sd AND zr/els • Refine the patent results you find by technology area – For example look up all records related to terpenes, find the corresponding DWPI records and then search by antiinflammatory in the activity field to find terpenes used in antiinflammatory compositions 34 THANK YOU FOR LISTENING • Any questions please contact • [email protected] • Tel +44 0207 433 4656 • www.thomsonscientific.com 35 DCR on new STN Agenda • Search fields in DCR – Explore database index fields • STN Classic users – analogous to EXPAND, EXPAND, EXPAND! • Structure search • Find DWPI records with DCR index terms • Rename project 3 New STN search screen Query screen Results screen History screen 4 Select database of interest Query screen 5 6 Access search indices 7 Explore Search Indices or Thesaurus for Terms 8 Chemical Name vs. Chemical Name Segment indices • Chemical Name (CN) = Complete Name ‒ Phrase parsed ‒ Alphabetical listing in index • Chemical Name Segment (CNS) ‒ Word parsed • Searching by Chemical Name Segment may capture additional relevant records 9 Explore Search Indices or Thesaurus for Terms 10 11 12 13 Note: All selected chemical names are added to search query and are ORed together. This list is specific to this project only. 14 Submit search query; results are automatically displayed 15 Search by chemical name segment 16 Search by chemical name segment 17 Additional compounds found by using Chemical Name Segment 18 Search by molecular formula 19 Search by molecular formula (cont.) 20 Search by molecular formula (cont.) 21 22 Naming the Q term list 23 Submit Q term list search; results automatically displayed These first compounds have the correct molecular formula, but are not pantoprazole derivatives. Note: Unlike L1 list which is only available for this project, Q31 is available for all projects associated with this STN ID. Agenda • Search fields in DCR – Explore database index fields • STN Classic users – analogous to EXPAND, EXPAND, EXPAND! • Structure search • Find DWPI records with DCR index terms • Rename project 25 Structure search 26 Structure search 27 Structure search valency checked automatically 28 Submit sub-structure search Notice the highlighting illustrating the match with the drawn structure. 29 Get complete DCR record by clicking on DCR number 30 Complete DCR record Click on X to close detailed DCR record and return to complete set. Agenda • Search fields in DCR – Explore database index fields • STN Classic users – analogous to EXPAND, EXPAND, EXPAND! • Structure search • Find DWPI records with DCR index terms • Rename project 32 Get DWPI records with DCR compounds indexed 33 Get DWPI records with DCR compounds indexed (cont.) 34 DWPI records that have DCR compounds indexed Clicking on Get References button automatically opens DWPI and executes refx search. 35 DWPI records that have DCR compounds indexed Click on DWPI record title to display complete DWPI record. 36 Complete DWPI record 37 Complete DWPI record (cont.) Complete DWPI record includes DCR hit structures in separate field (DCR Hits.) Notice highlighting in structure corresponding to original structure search. All DCRs indexed for this record are in IT field (not shown.) 38 Search for DWPI records with selected DCRs indexed If only certain DCRs are of interest, check boxes for those DCR records and then click on Get References. 39 Search for DWPI records with selected DCRs indexed (cont.) 40 Search for DWPI records with selected DCRs indexed (cont.) Complete DWPI record for selected DCRs. Click on DWPI title to display complete record. Agenda • Search fields in DCR – Explore database index fields • STN Classic users – analogous to EXPAND, EXPAND, EXPAND! • Structure search • Find DWPI records with DCR index terms • Rename project 42 Rename Project Click on upside down triangle to rename project. 43 Rename Project Summary • Search fields in DCR – Explore database index fields • STN Classic users – analogous to EXPAND, EXPAND, EXPAND! – Search examples using database index fields • Structure search • Find DWPI records with DCR index terms • Rename project For more information … CAS [email protected] Support and Training: www.cas.org FIZ Karlsruhe [email protected] Support and Training: www.stn-international.de