Download Heading 1 (Getting started with XXX)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Dr. Louisa Bellis (June 2011)
Small Molecules in Bioinformatics – Exercises
You will need access to ChEMBL and ChEBI online at
https://www.ebi.ac.uk/chembldb and http://www.ebi.ac.uk/chebi/ to complete these
exercises.
ChEMBL:
1.
Draw in the above structure using the drawing tool found at
https://www.ebi.ac.uk/chembldb/index.php/compound and run a substructure
search.
•
Q. How many bioactivity records do you get back for this structure?
•
Q. How many would you have had returned if you filtered on ‘Inhibition (%)’?
•
Q. How many Log KI values are there?
•
Q. How many bioactivity data points are there for the target P11229?
•
Q. What are the lowest and highest inhibition values for P11229?
•
Q. How many compounds are brought back if you do a keyword search for
'dopamine’?
•
Q. What is the highest bioactivity value for this results set (hint: sort the columns)
and what was the bioactivity type?
•
Q. What compound is it for?
2.
3.
This work is licensed under the Creative Commons Attribution-Share Alike 3.0 License. To view a copy
of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 543
Howard Street, 5th Floor, San Francisco, California, 94105, USA.
1
4.
Using the ‘Browse Targets’ pie charts, look for tyrosine kinases (Tyr) and click on
the hyperlink to access the bioactivity data (hint: they are a protein kinase).
Remember to click on the
to navigate the tree.
•
Q. What protein kinase and organism combinations only have 1 compound data
point associated with them?
•
Q. For the top result, i.e. the target with the most compounds, what type of
bioactivity assay has the most data points and what is the highest result for this
assay (hint: click on the hyperlink for the target)?
•
Q. Does this compound, with the highest result, have any Rule of 5 violations?
•
Q. Use its structure to run a substructure search and check if it is still has the
highest bioactivity assay result in the list.
5.
Click on the ‘Browse Targets’ radio button labelled Taxonomy Tree. Look for
Staphylococcus (hint: it is a Gram-Positive bacteria)
•
Q. What are the UniProt accession codes for the compounds with only 1
bioactivity data point?
•
Q. For the target with the most compounds, how many MIC data points are
there?
•
Q. What are the highest and lowest molecular weights for the compounds
associated with this target?
6. (Optional) Draw in the following structure
•
Q. How many compounds in ChEMBL have this substructure?
•
Q. Use the synonyms to order the results grid and find the ChEMBL ID for
amoxacillin?
•
Q. Using amoxacillin as a substructure search, are there any mixtures with
amoxacillin ?
2
7.
•
Q. What target gives the highest IC50 value for doxorubicin?
•
Q. How many ADMET data points are there for this target?
•
Q. Which of these ADMET assays has the most data points?
•
Q. How many compounds are linked to the keyword of aspirin?
•
Q. Do they all contain aspirin?
•
Q. What is the ChEMBLID of aspirin?
•
Q. How many compounds have a 70% or more similarity to aspirin?
•
Q. How many compounds have a substructure of aspirin?
8.
9.
(a)
(b)
•
Q. In a keyword search, how many compounds contain the word ‘glucose’?
•
Q. How many bioactivities does this relate to?
•
Q. How many of these have RO5 violations?
•
(Optional) How many compounds have the substructure of (a)?
•
(Optional) How many compounds have the substructure of (b)?
•
(Optional) How many compounds have a 70% or more similarity to (a)?
•
Find all the compounds in ChEMBLdb that are active against the ‘Stem cell
growth factor receptor’ and download them as an SDFile. You will need to search
for the target first, and use the hyperlink to access the ‘Target Report Card’.
•
For the same target, choose all the data for ‘inhibition’. How many data points are
there? What are the highest and lowest % values?
10.
3
11.
•
Browse the target classification in ‘Browse Targets’. What are the different types
of peptide membrane receptor targets? Can any be sub-classified?
•
Try out the protein target FASTA sequence search with your favourite protein
sequence.
4
CHEBI
1. Go to ChEBI entry CHEBI:3647.

What is it?

What are some of the alternative names you might find referring to this entity
within text?

Is this chemical used as a drug?

If so, what it is used for, and what brand names might it be found under?
2. Go to ChEBI entry CHEBI:45783.

What is it?

What can it be used for?

View the additional structure for this entity (CHEBI:45783). What does it show
that you cannot see in the default structure?
3. Open the entry for GTP (CHEBI:15996). Click on the “Automatic Xrefs” tab just below
the header. Scroll down to the sub section “Reactions and Pathways”. Click on the
Reactome identifier, REACT_1719. This will take you to the Reactome entry for this
reaction.

Can you name the enzyme which catalyses this reaction?
4. Find all the entities that contain a reference to the term “phenol”. How many entities
are retrieved?
5. Find all the entities with molecular formula C10H18O.
a. Export the results as an SDfile.
6. Find all the entities that have formula C6H6O4 which are not acids.
7. Find all the entities where the IUPAC name contains the string “pyrimidin”. (Hint: use
wildcards.)
a. Export the results as a text file.
8. Find all the entities that contain phenol as a substructure:
is the first result?
Which entity
5
9. Find all the entities that have a similar structure to paracetamol:

Excluding paracetamol itself from the result set, can you find a highly similar
result that is also used as a drug?

What is the INN for this drug?
10. Find all the entities that have molecular formula C6H6O4 and contain the substructure
.

How many entities are retrieved?
11. Dichlorvos (CHEBI:34690) is a well known insecticide. Open the ChEBI entry for
dichlorvos (CHEBI:34690). Scroll down the entry to the “ChEBI ontology”.

Can you determine whether it can also be used as a fungicide?
12. On the same entry as above (CHEBI:34690) click on the “Tree View” to display the
entire ontology tree. Scroll down the tree to the “application” sub-ontology
(CHEBI:33232). Follow the tree path from dichlorvos to its parent organophosphate
insecticide (CHEBI:25708). Click on this parent organophosphate insecticide
(CHEBI:25708). This brings you to the ontology view of the parent.

From looking at the children of this entry can you write down any other
insecticide?
13. Download the ChEBI database in flat file format. Open the names.tsv file. Search for
the name ‘Progesterone’ (match case). Can you find the primary ChEBI accession
that this name is linked to? (Hint: You will need to download compounds.tsv also).
14. Go
to
the
web
service
test
facility
available
at
http://www.ebi.ac.uk/chebi/webServices.do. Invoke getLiteEntity to search in
category ‘CHEBI NAME’ with the search string ‘*progesterone’. Name some of the
entities that are returned.
15. One of the entities returned in Question 14 is 17alpha-hydroxyprogesterone. Pass
the ChEBI ID for this entity as parameter to the getCompleteEntity method available
on the web service test page. By looking at the result, can you name some of the
synonyms of this entity?
6
16. The complete entity contains the ontology relationships. What ontology relationships
are linked to the entity examined in question 15?
17. By repeated execution of the getOntologyParents method in the web service test
facility, trace a path from the entity examined in question 14 (17alphahydroxyprogesterone) and determine whether it is a steroid. The path traced should
follow the "is a" relationship. To do this, pass the ChEBI ID to the
getOntologyParents method, examine the result, and select one of the resulting
ChEBI IDs of the resulting relationships to pass to the method getOntologyParents.
Ignore cyclic relationships. List the names of the entities and the names of their
relationships along the path you followed.
7