* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download CTEGD Symposium, UGA, Athens, May 2011
Extrachromosomal DNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Oncogenomics wikipedia , lookup
Point mutation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Human genome wikipedia , lookup
Public health genomics wikipedia , lookup
Metagenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genomic library wikipedia , lookup
Microevolution wikipedia , lookup
Non-coding DNA wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
EuPathDB: an integrated resource and tool for eukaryotic pathogen bioinformatics Aurrecoechea C., Heiges M., Warrenfeltz S. for the EuPathDB team CTEGD, University of Georgia, Athens, GA USA ABSTRACT: EuPathDB (http://eupathdb.org) is an integrated bioinformatics database covering several eukaryotic pathogens. Genera represented are Cryptosporidium, Encephalitozoon, Entamoeba, Enterocytozoon, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma, and the newly added Theileria and Babesia. Each of these groups is supported by a taxonspecific database and web interface which can be accessed independently of EuPathDB. EuPathDB provides a portal to all these databases, and the opportunity to leverage orthology for searches across genera. The databases are updated and expanded about every 2 months, providing online access to the latest genomic-scale datasets including complete genome sequences, annotations, and functional genomics such as proteomics, microarray, RNA-Seq, ChIp-chip, SAGE and EST data. The specific advantage of the EuPathDB databases lies in the graphical search interface that allows users to combine datasets while building a search strategy. Multistep searches strategies are built one step at a time choosing from more than 100 searches. The latest EuPathDB release debuts a search for DNA motifs and a method of combining searches based on relative genomic location. This new operation allows the results of successive steps to be combined based on each feature’s location relative to other features. Parameters defining upstream/downstream distances and gene overlap restrict the search results in a way that highlights biologically relevant relationships such as antisense transcription and promoter sharing. The merger of EuPathDB’s user-friendly search strategy system with full and up-to-date databases offers researchers a powerful tool for data mining during computational experiments. New way of combining searches based on relative genomic location: Building search strategies: Graphical search interface motivates users to prioritize search results based a variety of data types. The search strategy system provides the opportunity to explore and identify biologically meaningful relationships 1 Run a Search. This search for all protein coding genes in P. faliciparum returned 5418 genes. ● Quick access to ID and text search options, login, contact, twitter, etc. ● Main Header Tab Bar: mouse-over ‘New Search’ to initiate searches; click ‘My Strategies’ to enter your workspace 2 ● Portal to EuPathDB databases by clicking on icons ● Initiate searches from center panels. Over 100 search types available. ● Identify Genes by: look for Genes based on a variety of datasets, including whole genome sequence, coding vs non-coding genes, transcript evidence (microarray, EST), exon count, etc. ● Identify Other Data Types: Look for ESTs, SNPs or DNA motifs; ● Tools: Access tools like Blast and PubMed from any EuPathDB home page 3 1. Run a query choosing from more than 100 searches. Build strategies for several data types: genes, ESTs, SNPs, ORFs, etc. 2. Add a step – run a second query combining results with previous searches. Query the results of Step 1 based on functional genomics. Nest strategies to build complexity 3. Add more steps… Add a step. The second search here, based on DNA motif, searches for the EcoR1 restriction enzyme site. Combine search results using the co-location function. E. Dispar, E. histolytica, E. invadens C. hominis, C. muris, C. parvum New Search Type: DNA Motif Pattern G.lamblia, G.assemblage_B, G.assemblage_E E.cuniculi, E.intestinalis, E.bieneusi, N.parisii, O.bayeri Taxon specific databases provide access to the latest available genome-scale datasets. Built with the same web-architecture, search types and functions are the same across all databases. B.bovis, T.annulata, T.parva 1 Search for DNA Motifs such as restriction enzyme sites or transcription factor binding sites. Choose Genomic segments, DNA Motif Pattern: ii i P.berghei, P.chabaudi, P.falciparum, P.gallinaceum, P.knowlesi, P.vivax,P.yoelii iii iv v 4 N.caninum, T.gondii T.vaginalis i. ii. C.fasciculata, L.braziliensis, T.cruzi L.infantum, L.Major, L.mexicana, T.vivax L.tarentolae, T.brucei, T.congolense 2 Initiate the search. It will find all occurrences of GAATTC in the genome. Searches return a list of IDs (genes, ESTs, SNPs, proteins) that satisfy the conditions of your query parameters. This gene search for protein coding genes in P. falciparum returned 5418 gene IDs. ● Results table with ID as the first column. Columns can be added, changed, deleted or sorted. Entire table can be downloaded as Excel or other formats. ● Click on the ID name to access details in that ID’s record page. iv. v. Return IDs from either step 1 or 2. Define relative location (“Region”) of the returned data type. Search the exact region, upstream, or downstream of the returned data type. Define relationship between step 1 and 2 results’ regions: contains, overlaps, or is contained in. Define relative location (“Region”) of the other (non-returned) step result. Define strand to be considered in the operation: either, same or both. 5 ● Graphical representation of your search strategy. Each step can be revised by clicking on the step name. ● Filter table showing the distribution of gene IDs across all species in the database. iii. 3 The search generates a step and the results below show the list of genomic segment IDs corresponding to the locations of EcoR1 site: a segment ID for each occurrence of GAATTC in the genome. Carefully consider the 5 user-defined parameters in the logic statement of the co-location function. View results. The results table lists 214 IDs of genes whose upstream 500bp region contains the EcoR1 site. The column ‘Matched Regions’ defines the genomic location of the EcoR1 site within the gene.