Download Appendix 1 - HUGO Gene Nomenclature Committee

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics in learning and memory wikipedia , lookup

X-inactivation wikipedia , lookup

Epistasis wikipedia , lookup

Genomic imprinting wikipedia , lookup

Public health genomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Pathogenomics wikipedia , lookup

Protein moonlighting wikipedia , lookup

Point mutation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genome evolution wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

NEDD9 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

The Selfish Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene therapy wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression profiling wikipedia , lookup

Helitron (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Gene nomenclature wikipedia , lookup

Transcript
Appendix 1: symbol assignment flowchart
DATA INPUT
Gene
annotated by
CCDS
Gene symbol
submission by
researcher
Gene identified
by HGNC from
publication
Gene identified
by HGNC from
database
Gene symbol suggested
by other nomenclature
group
LOCUS TYPE DETERMINATION
Identify locus type according to external annotation e.g. RefSeq, HAVANA, Pseudogene.org, SwissProt
If protein coding,
follow table 1a
For pseudogenes,
follow table 1b
For ncRNA genes,
follow table 1c
For other locus types, e.g.
ERVs, immunoglobulin genes,
follow specific guidelines
Table 1a: flowchart for naming protein-coding genes
SEQUENCE ANALYSIS
Compare with HGNC
database via in-house BLAST
and external BLAST
Identify genomic location via
BLAT and/or in-house “map
by coordinates” tool
Identify
orthologs via
HCOP
Analyse protein
sequence for domains,
motifs, TM regions using
Pfam, TMHMM etc
SYMBOL DESIGNATION
Determine if there is a known function via literature and database searches, and correspondence with researchers. If
yes, assign a unique symbol and name based on function e.g. ACAT1 (acetyl-CoA acetyltransferase 1)
Deter
If gene is a member of an established family, name with next available symbol in the family series (in coordination
with specialist advisor). If the family has no established nomenclature, consider creating a new naming scheme in
consultation with the research community. If the family has no known function, name as a FAM#.
If gene has no known function but is a paralog of a known gene assign an appropriate symbol based on gene
nomenclature of known gene, .e.g ADAL (adenosine deaminase like).
If gene is an ortholog of a gene with known function in another species assign appropriate symbol with “homolog”
included in the gene name e.g. CDC6 (cell division cycle 6 homolog).
If gene product contains known protein domains/motifs/TM regions name based on these features e.g. ABHD1
(abhydrolase domain containing 1).
Try to find other information from publications, databases or directly from researchers, e.g. cellular location, tissue
specificity, chromosomal location, and name on this basis.
If the gene cannot be named via any of the above steps, assign a C$orf# (chromosome $ open reading frame)
symbol.
GENE SYMBOL DISSEMINATION
Contact researchers
about release of symbol,
and to confirm symbol
will be used in
subsequent publications
Release symbol in public
database for dissemination to
NCBI Gene, Ensembl, UniProt,
GeneCards, Vega, UCSC, locus
specific databases etc
Coordinate symbol
update with other
nomenclature
committees,
especially mouse
Table 1b: flowchart for naming pseudogenes
SEQUENCE ANALYSIS
Compare with HGNC
database via in-house BLAST
to check that the pseudogene
is not already named
Identify genomic location via
BLAT and/or in-house “map
by coordinates” tool
Identify parent human gene, relevant
human gene family, or functional ortholog in
other species via BLAST and comparison of
annotations in external databases
SYMBOL DESIGNATION
Where possible name the pseudogene after its parent protein-coding gene; use the symbol format parent gene symbol
P# e.g. CCNJP1; use the gene name format parent “gene name pseudogene #” e.g. cyclin J pseudogene 1
If the gene has no specific identifiable parent gene (unprocessed pseudogenes can present in clusters with proteincoding genes of the same family), name pseudogene within the gene family series but denote pseudogene status
using a “P” at the end of the symbol. Use the gene symbol format family stem symbol #P e.g. ZNF890P; use the gene
name format “family stem name #, pseudogene” e.g. zinc finger protein 890, pseudogene
If the gene has a functional ortholog in a different species, name after this ortholog; use the gene symbol format
ortholog symbol P e.g. GULOP; use the name format “ortholog name, pseudogene” e.g. gulonolactone (L-) oxidase,
pseudogene
Exceptions to the above rules include: symbols that do not follow our rules but that are entrenched in the literature,
and pseudogenes that are part of established nomenclature systems that follow a different naming convention e.g. T
cell receptor pseudogenes. If unable to include a P in the symbol, then add the word (pseudogene) at the end of the
gene name e.g. symbol: TRAJ51, name: T cell receptor alpha joining 51 (pseudogene)
GENE SYMBOL DISSEMINATION
Contact researchers
about release of symbol,
and to confirm symbol
will be used in
subsequent publications
Release symbol in public
database for dissemination to
NCBI Gene, Ensembl, UniProt,
GeneCards, Vega, UCSC, locus
specific databases etc
Coordinate symbol
update with other
nomenclature
committees if
unitary pseudogene
Table 1c: flowchart for naming non-coding RNA genes
lowcs
SEQUENCE ANALYSIS
Compare with HGNC database via
in-house BLAST and external
BLAST to look for homologous
ncRNAs
Identify genomic location via BLAT
and/or in-house “map by
coordinates” tool
Perform secondary structure
analysis
SYMBOL DESIGNATION
If the gene is a member of an established small ncRNA class (as established by homology), name with next
available symbol in the family series (in coordination with specialist advisor) e.g. MIR100. If the class has no
established nomenclature, then create a new naming scheme in consultation with the research community.
If the transcript product of a small ncRNA is predicted to not have the required secondary structure to function as
a member of that class, then it is named as a pseudogene and provided with the next number available symbol in
the family series but appended with a “P” for “pseudogene”, e.g. RNU7-2P.
If the gene encodes a long non-coding RNA (lncRNA) (>200bp) then first determine if there is a known function via
literature and database searches, and correspondences with researchers. If yes, assign unique symbol and name
based on function e.g. XIST.
If the lncRNA has no known function then it should be named based on its genomic location with reference to the
closest protein-coding gene. Antisense lncRNA gene symbols have the ‘-AS’ suffix appended to the protein-coding
symbol (e.g. BOK-AS1). Likewise intronic lncRNA gene symbols have the ‘IT’ suffix (e.g. SPRY4-IT1) and
overlapping gene symbols have the “OT” suffix (e.g. HMBOX1-OT1). Intergenic lncRNA genes are named with
the next consecutive LINC# number, e.g. LINC000028.
GENE SYMBOL DISSEMINATION
Contact researchers
about release of symbol,
and to confirm symbol
will be used in
subsequent publications
Release symbol in public
database for dissemination to
NCBI Gene, Ensembl, UniProt,
GeneCards, Vega, UCSC, locus
specific databases etc
Coordinate symbol update
with other nomenclature
committees and/ or
specialist ncRNA
resources