Download Molecular function - SGD-Wiki - Saccharomyces Genome Database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Non-coding DNA wikipedia , lookup

Transposable element wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Copy-number variation wikipedia , lookup

Genomic library wikipedia , lookup

Gene therapy wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Human genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genome (book) wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Public health genomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene desert wikipedia , lookup

Point mutation wikipedia , lookup

Gene wikipedia , lookup

Nutriepigenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Gene nomenclature wikipedia , lookup

Microevolution wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomics wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Navigating data at the
Saccharomyces Genome Database
SGD:
[email protected]
Rob Nash, Senior Biocuration Scientist
[email protected]
July 2015 CSHL
Outline
•
•
•
•
History and background
How to stay current
Basic org. (homepage, search, LSP)
Tabs, access to detailed info (sequence, gene
ontology, phenotype, interaction, expression and
regulation)
• Data analysis: GO tools, YeastMine basics and
use-cases
July 2015 CSHL
About SGD
• Totally public, open, non-profit academic group
• Funded by the NIH (NHGRI)
• Mike Cherry at Stanford is the P.I. (since 1992). Most
of SGD is housed at Stanford, with a few remote
curators who work from home
July 2015 CSHL
Key early decisions
• People who understand the biology (Ph.D. biologists)
are required to design the database, summarize the
literature, etc.
• Full-time staff positions are needed for project stability.
• Our top priority is to serve the needs of the research
community (yeast and other), so communication with
users is critically important.
July 2015 CSHL
SGD Today
• Over 1.7 million visits from
unique IP addresses over
the past year; 175,000
page views per week;
worldwide usage
• About 15 full-time staff
(curators, programmers,
system and db admins)
“Other” represents 30 countries with more than 100
visits, and 49 additional countries with 10-100 visits.
July 2015 CSHL
SGD Staff, Cherry lab
July 2015 CSHL
Basic organization of information on the home page
• Search
• YeastMine
• YouTube tutorials
• New data and updates
• Research spotlight
• Upcoming meetings
• Analysis and seq. tools
• Functional information
• Literature
• Community
• Colleague Info.
• Gene registry
• Wiki
• Newsletter
Social Media:
• Facebook
• Twitter
• Linked in
July 2015 CSHL
Elastic search with autocomplete
Gene names (ACT1) => Locus Summary page
Other terms (actin; “act1 *”) => Instant Search
page
Some IDs direct: 5634, 25721128
July
2015quote
CSHL
Single
(OR) vs double quotes (AND)
Modify your search
Autocomplete
(suggestions)
Instant search
(predictive results)
Next iteration to include facets!
July 2015 CSHL
Website redesign: staying
current and modern
• To store new data and leverage new web development
tools, SGD was completely overhauled.
• Restructured pages, data transfer methods, and
underlying database schema, all done while keeping the
site live and actively curated. Goal was to make the
website faster, and easier to maintain
• New visualization methods, and a responsive layout.
July 2015 CSHL
Locus Summary Page
Responsive layout: better for all devices
Organization:
• moved seq. info up + improved graphics
• some basic protein info.
• regulation summary
• Improved expression histogram
Navigation has changed:
• Sectional nav. bar with back to top
• tabs and details link
• New tabs for seq. and locus history
July 2015 CSHL
What’s behind the tabs?
July 2015 CSHL
Sequence details
• S288C overview
– map
– subfeatures, with coordinates
– sequence (genomic, coding and protein)
• Alternative reference strains
– map
– subfeatures, with coordinates
– sequence (genomic, coding and protein)
• Other strains
July 2015 CSHL
Alternative ref strains
Other ref strains
July 2015 CSHL
Sequence tools
• BLASTN, BLASTP
• BLASTN vs fungi, BLASTP vs fungi
• Strain alignment (YRR1)
• Variant viewer (new)
July 2015 CSHL
Variant viewer
Access from:
1) Sequence (home page navigation bar) -> Strain and species
2) Analyze sequence section of LSP, and
3) resources section of sequence tab
July 2015 CSHL
Protein details
•
•
•
•
•
•
•
Overview
Domains table, and location graphic
Shared domains diagram
Post-translational modifications
Physico-chemical properties
External IDs
Resources
July 2015 CSHL
The Gene Ontology (GO) Project
A collaboration among model organism databases, initiated
in 1998 by a consortium of researchers from FlyBase, SGD,
and MGD, to improve queries within and across databases.
The problem across databases: “Biologists would rather share
their toothbrush than share a gene name. Gene nomenclature
is beyond redemption” - Michael Ashburner
July 2015 CSHL
Neither genetic names nor common
names are consistently used
CDC25
S. cerevisiae
=
Son of
Sevenless
D. melanogaster
=
SOS1
H. sapiens
fructose-bisphosphate aldolase = 1,6diphosphofructose aldolase = D-fructose-1,6bisphosphate D-glyceraldehyde-3-phosphatelyase = diphosphofructose aldolase = fructoaldolase = fructose 1,6diphosphate aldolase = fructose 1-monophosphate aldolase = fructose 1phosphate aldolase = fructose diphosphate aldolase = fructose-1,6-bisphosphate triosephosphate-lyase =
ketose 1-phosphate aldolase = phosphofructoaldolase = zymohexase
July 2015 CSHL
The solution: GO, a set of three independent
structured, controlled vocabularies for describing
the molecular function, biological process, and
cellular component of gene products
Molecular function: the tasks performed by individual gene
products, for example, fructose-bisphosphate aldolase activity
or protein serine/threonine kinase activity.
Biological process: the broad biological goals, such as mitosis or
DNA replication, that are accomplished by ordered assemblies
of molecular functions.
Cellular component: subcellular structures, locations, and
macromolecular complexes, such as nucleus, cellular bud tip, and
origin recognition complex.
July 2015 CSHL
GO Annotation Details
GO Summary
Biological Process
Molecular Function
Cellular Component
July 2015 CSHL
Phenotype details
Browsable list of all phenotypes
Use SGD search to locate observables and ALL text
July 2015 CSHL
Interaction details
Operations
• sort
• filter
• analyze
July 2015 CSHL
Expression details
July 2015 CSHL
SPELL expression tool
See expression of an individual
gene in selected dataset(s)
July 2015 CSHL
Enter a set of genes and find
genes with similar expression
profiles (optional filtering by tags)
Regulation details
Overview
Domains/classificati
ons
Targets
Shared GO for targets
Regulators
July 2015 CSHL
Biochemical Pathways
July 2015 CSHL
Gbrowse
Navigation:
* landmark
* scrolling
* zooming
Selecting:
* tracks
* subtracks
July 2015 CSHL
Navigation:
• Region (chrVI:48,978..58,977), gene name (CDC28), keyword
(invasive growth)
• Highlighted rectangle in overview is region of genome displayed in
detail panel
• Region panel displays a portion of the genome surrounding the region
of interest
• Detail panel displays zoomed in view that corresponds to the overview
selection rectangle
Select tracks:
• SGD Annotations
• Chromatin structure
• Gene Structure
• RNA expression
• Replication and Recomb’n
• Transcription Regulation
• Analysis
July 2015 CSHL
sequence features
histone modifications, nucleosome org.
transcription start sites, 5’ and 3’ UTRs
mRNA, ncRNA, cell cycle
meiotic recomb’n, origins of replication
txn factors, RNAPII, preinitiation factors
restriction sites
Data files for download
July 2015 CSHL
Search full text with Textpresso
July 2015 CSHL
Genome Snapshot: global questions about
the genome and its annotation status
July 2015 CSHL