Download HGNC future plans

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Oncogenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Essential gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

RNA interference wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

RNA silencing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Public health genomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene therapy wikipedia , lookup

Non-coding RNA wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

NEDD9 wikipedia , lookup

Metagenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene desert wikipedia , lookup

Minimal genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression programming wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Pathogenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene wikipedia , lookup

Genome (book) wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Future Plans for the HGNC
Elspeth Bruford
Funding Sources
• Applied to NHGRI for renewal of current U41 funding
Submitted in cycle III (25.09.16)
– expect score Feb/March, advisory council May
– current end 30.06.17 - will apply for no-cost extension
• Will be applying to Wellcome Trust Biomedical Resources fund
(current end 31.08.17)
preliminary application due 13.01.17
full application due 03.04.17
• Should we consider applying to anywhere else?
Future Funded Aims (2017-2022)
1.
continue naming of human protein-coding genes, pseudogenes & RNA genes largely maintenance for protein coding genes, more focus on RNAs
2.
continue reassignment of uninformative symbols based on functional data –
bearing in mind clinical aspect
3.
coordinate gene naming across vertebrates –
increase in automation and species
4.
assign gene names within complex families across vertebrate species (olfactory
receptors, cytochrome P450s) –
including new families: GSTs, UGTs, and ?
? zinc fingers, histones, immunoglobulins… ?
Resource Project
•
Aim 1: Naming novel protein coding loci
Focus on novel protein coding genes reported in the literature, annotated by
GENCODE , and novel genes annotated on new alternative haplotypes.
•
Aim 2: Naming pseudogenes
Focus on transcribed and unprocessed pseudogenes, as well as
segregating/polymorphic pseudogenes and unitary pseudogenes.
•
Aim 3: Naming long non-coding RNA genes
Name long non-coding RNA genes based on genomic location, or published (or
prepublication) functional data. Prioritize published loci, and those annotated by
GENCODE and RefSeq.
•
Aim 4: Naming small non-coding RNA genes
Name microRNAs, transfer RNAs, small nucleolar RNAs and ribosomal RNAs, and
investigate naming piRNA genes, create a “miscellaneous non-coding
RNA” category for non-specific bioinformatically predicted genomic loci.
Resource Project
•
Aim 5: Reassigning placeholder symbols based on novel data
Seek new functional data to enable updates for placeholder symbols., collaborating
with EuropePMC , using bioinformatics tools and identifying new GO annotations.
•
Aim 6: Improving human gene names for transferral to other species
Update human gene names to remove superfluous information and punctuation,
aim to unify gene and protein names, and avoid using human phenotypes if
possible, following community consultation.
•
Aim 7: Naming genes in other vertebrate species
Further automate naming of orthologs utilising a subset of HCOP data and the
conversion rules formulated for chimp, initially using dog, cow and Rhesus
macaque, and improve tools for manual curation.
•
Aim 8: Examining complex homology in chimp
Manually curate chimp gene naming for cases where 2 or less of the orthology
resources agree
Resource Project
•
Aim 9: Naming CYP genes across vertebrates
Continue to name CYP genes in multiple vertebrate species and investigate novel
CYP mammalian subfamilies.
•
Aim 10: Naming OR genes across vertebrates
Expand naming to non-mammalian vertebrate OR repertoires, initially looking at
Xenopus, Anolis, zebrafish, chicken and zebrafinch.
•
Aim 11: Increasing gene family resources
Curate more human genes into family sets based on shared characteristics, in
consultation with specialist advisors when appropriate, continue to collaborate
with FlyBase about their ‘Gene Groups’.
•
Aim 12: Naming in other complex gene families
Manually curate gene families with complicated orthology relationships across
vertebrate species, develop new synteny and BLAST filtering tools, begin with UGT
and GST families.
Resource Informatics
•
Aim 1: Updating internal HGNC curation tools
Reimplement internal tools as AngularJS web applications and migrate to a virtual machine
•
Aim 2: Updating internal HGNC QC tools
Expand tools, including “end of day” sanity check, rewrite internal sequence search and
alignment tool using EMBL-EBI RESTful web services
•
Aim 3: Collaborating with EuropePMC
To notify us of publications relating to placeholder symbols, and journals to target
•
Aim 4: Maintaining and updating HCOP
Expand with addition of new species, initially sheep, gorilla and S. pombe; investigate
further orthology sources.
Resource Informatics
•
Aim 5: Maintaining and updating the VGNC database and pipeline
Expand to include data from other species, beginning with cow, dog & macaque,
increase utility by incorporating more external cross references, expand the set of tools
and views available on the website.
•
Aim 6: Updating internal VGNC curation and QC tools
Create AngularJS web applications for curating individual gene symbols & gene families,
synteny tool for curating orthologs in multiple vertebrate species in a single process.
•
Aim 7: Updating the HGNC database and release pipeline
Move from PostgreSQL schema to fully normalised MySQL schema, reimplement update
pipeline to streamline the processes and utilise extensive compute farm at EMBL-EBI.
•
Aim 8: Soliciting user input
Encourage feedback via our websites, utilise data from annual survey, “contact us” form,
web statistics and from panel of users; continue to attend and participate in a range of
conferences and workshops
Management, Dissemination & Training
•
•
•
•
•
•
Aim 1: Organizational structure and staff responsibilities
Elspeth will continue managing 4 FTE curators at EMBL-EBI and University of Cambridge,
supported by 2 informatics staff at EMBL-EBI, augmented remotely by 4 complex gene
family experts and a programmer.
Aim 2: Scientific Advisory Board
Continue to receive key advice from their SAB, with yearly face to face meeting
Aim 3: HGNC website backend and frontend redesign
HGNC website backend replaced with a single server; frontend re-written using Angular
JS, Jekyll and HTML5
Aim 4: Maintaining and updating searches & download facilities
Continue to support existing facilities, expand Biomart to include gene family data, both
Biomart & REST to include VGNC data.
Aim 5: Maintaining and updating the VGNC website
Initial efforts will focus on methods and tools for downloading VGNC data, along with
gene family data displays
Aim 6: Training
Continue attending major genomics conferences; plan to produce more online tutorials
and start an HGNC blog.
1. Transposable elements
2. Pseudogenes
• What classes:
transcribed?
unprocessed?
unitary?
published?
by parent gene?
3. Symbols converted to dates
3. Symbols converted to dates
3. Symbols converted to dates
DEC1, deleted in esophageal cancer 1 >
??
MARC1-2, mitochondrial amidoxime reducing component 1-2 >
MTARC1-2?
MARCH1-10, membrane associated ring-CH-type finger 1-10 >
MARF1-10?
SEPT1-14, septin 1-4 >
SEPTIN1-14?
4. Create tool for simplified queries
on recent updates to dataset
5. Alliance of Genomic Resources
6. HGNCmine
7. HCOP - other species
7. HCOP - other species
8. HCOP - more export options
9. Classifying Gene Families
Different ways to classify:
Homology
Domain/motif
Complex
Shared function/pathway/phenotype
Combinations of these…
10. Blogging and other social media
Computing
Complex Gene Families
• Olfactory receptors – Doron
• Cytochrome P450s – Jed & David