Download Standard Genetic Nomenclature - Iowa State University Digital

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetically modified crops wikipedia , lookup

Metagenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Heritability of IQ wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Behavioural genetics wikipedia , lookup

Genomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Epistasis wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Pathogenomics wikipedia , lookup

Twin study wikipedia , lookup

RNA-Seq wikipedia , lookup

Population genetics wikipedia , lookup

Gene wikipedia , lookup

Gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Dominance (genetics) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome editing wikipedia , lookup

Genetic engineering wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Public health genomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Animal Science Publications
Animal Science
2014
Standard Genetic Nomenclature
Zhiliang Hu
Iowa State University, [email protected]
James M. Reecy
Iowa State University, [email protected]
Fiona M. McCarthy
University of Arizona
Carissa A. Park
Iowa State University, [email protected]
Follow this and additional works at: http://lib.dr.iastate.edu/ans_pubs
Part of the Agriculture Commons, Animal Sciences Commons, and the Genetics Commons
The complete bibliographic information for this item can be found at http://lib.dr.iastate.edu/
ans_pubs/171. For information on how to cite this item, please visit http://lib.dr.iastate.edu/
howtocite.html.
This Book Chapter is brought to you for free and open access by the Animal Science at Digital Repository @ Iowa State University. It has been accepted
for inclusion in Animal Science Publications by an authorized administrator of Digital Repository @ Iowa State University. For more information,
please contact [email protected].
24
Standard Ganetic·•Nomencl,ture
Zhi-Liang Hu, 1 James M. Reecy, 1 Fiona McCarthy2 and
Carissa A. Park 1
1 /owa State University, Ames, Iowa, USA;
2 University of Arizona, Tucson, Arizona, USA
Introduction
Locus and Gene Names and Symbols
Locus name and symbol
Allele name and symbol
Genotype terminology
Gene annotations and the gene ontology (GO)
Trait and Phenotype Terminology
Traits
Super-traits
Trait hierarchy and ontology
Current status of research
Trait and phenotype nomenclature
Future Prospects
Acknowledgements
References
Introduction
Genetics includes the study of genotypes
and phenotypes, the mechanisms of genetic
control between them, and information
transfer between generations. Genetic terms
describe processes, genes and traits with
which genetic phenomena are examined and
described. While the genetic terminologies
are extensively discussed in this book and
elsewhere, the standardization of their names
has been an ongoing process. Therefore,
this chapter will only concentrate on discussions about the issues involved in the standardization of gene and trait terminologies.
Readers may wish to refer to online resources
(see Table 24.1 for URLs) for lists of the
glossaries currently in use.
598
598
599
599
600
600
600
602
602
602
603
604
605
605
607
607
A standardized genetic nomenclature is
vital for unambiguous concept description, efficient genetic data management and effective
communications not only among scientists, but
also among those who are involved in cattle
production and genetic improvement. This
issue has become even more critical in the
post-genomics era due to rapid accumulation
of large quantities of genetic and phenotypic
data, and the requirement for data management and computational analysis, which
increases the need for precise definition and
interpretation of gene and trait terms.
For example, the Myostatin (MSTN) gene
is known as Growth and Differentiation Factor
8 (GDF8 or GDF-8) in some literature and
is also referred to as the 'muscle hypertrophy'
or 'double-muscling' locus in cattle. While the
©CAB International 2015. The Genetics of Cattle,
2nd Edn (eds D.J. Garrick and A. Ruvinsky)
·)
interchangeable use of all these names in the
literature can cause confusion, it gets more
complicated when one considers paralogous gene
duplications across species, which led Rodgers
et a/. (2007) to propose MSTN-1 and MSTN2. Unfortunately, this naming scheme does not
follow the Human Genome Organization
(HUGO) Gene Nomenclature Committee (HGNC)
guidelines, which indicate that these paralogues should be named MSTNl and MSTN2,
respectively.
In terms of traits, an example that would
benefit from consistent nomenclature is the
longissimus dorsi muscle area, which is also
referred to as the loin eye area (LEA), loin muscle area (LMA), meat area (MLD), ribeye area
(REA), etc. Each of these is known to certain
researchers as their default name for the trait.
Complexity is further increased by variation in
anatomic locations, physiological stages and
methods used to measure a given trait. This
may seem manageable at first, but once one
starts to compare data across different laboratories, publications or species, it quickly becomes
very confusing.
The 'standard genetic nomenclature' recommendations made by the Committee on
Genetic Nomenclature of Sheep and Goats
(COGNOSAG) in the 1980s and 1990s initially
covered sheep and goats and were later extended
. to cattle (Broad et a/., 1999). Dolling (1999)
summarized these efforts and abstracted guidelines for practical use. In 2009, an international
meeting to discuss coordination of gene names
across vertebrate species was held in Cambridge,
UK (Bruford, 2010). While we may hesitate to
dictate how genetic terms are defined, adopting
a standardized genetic nomenclature system enables researchers to more easily manage and compare their data, both within and across species.
The emergence of the use of ontologies in biological research has contributed a new way to
effectively organize biological data and facilitate
analysis of large datasets. Adopting standardized
nomenclature will further enable researchers to
unambiguously organize and manage their data.
When genomic information must be transferred
across species to perpetuate genetic discoveries,
the role of a standardized genetic nomenclature
becomes even more important.
The goal of this chapter is to clearly state
guidelines for nomenclature, with the hope
that they will facilitate comparison of results
between experiments and, most importantly,
prevent confusion.
Locus and Gene Names
and Symbols
Locus name and symbol
The following guidelines for cattle gene nomenclature are adapted and abbreviated from the
HUGO Gene Nomenclature Committee (HGNC;
see Table 24.1 for URL).
A gene is defined as 'a DNA segment
that contributes to phenotype/function. In the
absence of demonstrated function a gene may
be characterized by sequence, transcription or
homology.' A locus is not synonymous with a
gene. It is defined as 'a point in the genome,
identified by a marker, which can be mapped
by some means. A locus could be an anonymous non-coding DNA segment or a cytogenetic
feature.' A single gene may have numerous
loci within it (each may be defined by different
markers).
A gene name should be short and specific,
and convey the character or function of the
gene. Gene names should be written using
American spelling and contain only Latin letters or a combination of Latin letters and
Arabic numerals.
A gene symbol should start with the same
letter as the gene name. The gene symbol
should consist of upper-case Latin letters and
possibly Arabic numerals. Gene symbols must
be unique.
A locus name should be in capitalized
Latin letters or a combination of Latin letters
and Arabic numerals.
A locus symbol should consist of as few
Latin letters as possible or a combination of
Latin letters and Arabic numerals. The characters of a symbol should always be capital Latin
characters and should begin with the initial letter of the name of the locus. If the locus name
is two or more words, then the initial letters of
each word should be used in the locus symbol.
Gene and locus names and symbols should
be printed in italics whenever possible; otherwise they should be underlined.
Z·-LHu.etal.)
500
When assigning cattle gene nomenclature,
the gene name and symbol should be assigned
based on existing HGNC nomenclature when
1:1 human:bovine orthology is well established.
Recognized members of gene families should be
named following existing naming schemes.
Initial efforts to provide information about genes
predicted during the cattle genome sequencing
project resulted in the assignment of standardized names for 57 57 cattle genes based on
human gene nomenclature (Bovine Genome
Sequencing and Analysis Consortium, 2009).
There are two categories of novel cattle
genes: (i) novel genes predicted by bioinformatic
gene prediction programs; and (ii) novel genes
that have been studied prior to the completion
of the cattle genome. In addition, it is anticipated
that, in the future, additional novel genes will
be identified by RNA-sequencing experiments.
In cases where no strict 1: 1 human orthologue
exists that has been assigned nomenclature, the
NCB! LOC# or Ensembl 10 should be used as a
temporary gene symbol for predicted genes with
no known function. In order to assign a symbol/
name to novel genes, they will need to be manually curated and assigned a unique symbol/name
following these guidelines.
Allele name and symbol
These guidelines for allele nomenclature are
adapted from Dolling (1999) and mouse genome
nomenclature guidelines (see Table 24.1 for
URL), consistent with HGNC guidelines.
Alleles do not have to be named, but
should be assigned symbols. An allele symbol
should always be written following the locus
symbol. It can consist of Latin letters or a combination of Latin letters and Arabic numerals.
An allele name should be as brief as possible,
and should convey the variation associated
with the allele. If a new allele is similar to one
that has already been named, it should be
named according to the breed, geographic
location or population of origin. If new alleles
are to be named for a recognized locus, they
should conform to nomenclature established
for that locus. The first letter of the allele name
should be lower case.
The allele name and symbol may be
identical for a locus detected by biochemical,
1
serological or nucleotide methods. The HGNC
guideline recommends that 'allele designation
should be written on the same line as gene
symbol separated by an asterisk e.g. PGM1 * 1,
the allele is printed as * 1 '. The wild-type allele
can be denoted with a+ (e.g. MSTN+). Neither+
nor - symbols should be used in alleles detected
by biochemical, serological or nucleotide methods. Null alleles should be designated by the
number zero. A single nucleotide polymorphism (SNP) allele should be designated based
on its dbSNP_id, followed by a hyphen and the
specific nucleotide (e.g. MSTNrs1234567- T).
If the SNP occurs outside of an identified gene,
the SNP locus can be designated using the
dbSNP_id as the locus symbol, followed by a
hyphen and the nucleotide allelic variants as in
rs1234567- T.
The allele name and symbol should be
printed in italics whenever possible; otherwise
they should be underlined.
Genotype terminology
The genotype of an individual should be
shown by printing the relevant locus and allele
symbols for the two homologous chromosomes concerned, separated by a slash, e.g.
MSTNrs1234567-T/rs1234567-C. Unlinked
loci should be separated by a semicolon, e.g.
CD 11Rsal-2400!2200; ESRPvuii-5 700/4200.
Linked loci should be separated by a space
or dash and listed in linkage order (e.g.
POU1F1A!G-STCHC/G-PRSS7A!T), or in
alphabetical order if the linkage order is not
known. For X-linked loci, the hemizygous case
should have a /Y following the locus and allele
symbol, e.g. AR-Eco57I-1094/Y. Likewise,
Y-linked loci should be designated by /X following the locus and allele symbol.
Gene annotations and the
gene ontology (GO)
Advances in genomic technologies require that
researchers be able to functionally analyse
large, high-throughput datasets to gain insight
into the complex systems they are studying. By
using the same nomenclature and procedures
j
'
I
I
to describe gene function, gene components
can be consistently linked to function in a way
that facilitates effective computational analysis
and promotes comparative genomics. In 1998,
the GO Consortium was formed to standardize
functional annotation in the form of gene
ontologies that can be used across all eukaryotes (Gene Ontology Consortium, 2000). This
effort not only provided a standard method for
functional annotation but also promoted data
sharing and enabled modelling of functional
genomics datasets. The GO consists of three
separate ontologies: Biological Process, Cellular
Component, and Molecular Function. Genes
or gene products are associated with GO terms
that represent gene attributes.
A GO term is defined with a term name, a
unique identifier and a definition (preferably
indicating which of the three sub-ontologies it
belongs to, information about its relationships
to other GO terms and cited sources). GO
terms may also have synonyms, database
cross-references and comments to provide
more detailed information. A unique GO identifier consists of the prefix 'GO' followed by
a colon and six to eight numerical digits,
e.g. G0:0000016. It serves as a key to reference GO terms in a GO database. An example
of a GO term is shown in Fig. 24.1.
Standard GO annotations are maintained
by the GO Consortium (see Table 24.1 for
URL), which provides updates of qualitychecked data for public access. The GO
id:
name:
namespace:
def:
synonym:
synonym:
xref:
xref:
xref:
is_a:
annotations are used by secondary source
databases like Entrez Gene (see Table 24.1 for
URL; Sayers eta/., 2012) and UniProt (UniProt
Consortium, 2010), genome browsers like
Ensembl (see Table 24.1 for URL; Flicek,
2013), and analysis tools like DAVID (see
Table 24.1 for URL; Huang, 2009), among
other publicly accessible resources and tools.
A growing number of model organism and
livestock animal species (including bovine)
databases and working groups contribute
annotation sets to the GO repository
(McCarthy, 2007; Reese, 2010).
GO annotations are created by capturing the gene product information (database,
database accession, name and symbol, type
of gene product and species taxon), its associated GO term, GO sub-ontology and evidence for the assertion with references. The
current practice for bovine GO annotation
is to provide names and symbols based
upon a combination of NCBI Entrez Gene
and UniProtKB names. In instances where
there is no suitable gene symbol, database
accessions are used. Continued efforts are
made to improve the accuracy of the bovine
GO annotations by transferring GO annotations from better annotated proteins in
human and mouse based on Ensembl orthology. As of September 2012, GO annotation for bovine (McCarthy, 2007) comprises
306,7 46 annotation entries for 41,63 7
gene products; 86.7% of these annotations
G0:0000016
lactase activity
molecular_function
"Catalysis of the reaction: lactose + H20 = D-glucose + Dgalactose. • [EC:3.2.1.108]
"lactase-phlorizin hydrolase activity• BROAD [EC:3.2.1.108]
"lactose galactohydrolase activity• EXACT [EC:3.2.1.108]
EC:3.2.1.108
MetaCyc:LACTASE-RXN
Reactome:20536
G0:0004553 ! hydrolase activity, hydrolyzing 0-glycosyl
compounds
Fig. 24.1. An example of a GO term. (For further information, see Table 24.1 for GO website URL.)
are computationally derived (AgBase: see
Table 24.1 for URL).
To contribute annotations to the GO, or for
a complete list of bovine GO data, users are
encouraged to contact either the GO Consortium
or AgBase at their respective websites.
Trait and Phenotype Terminology
Cattle traits are conventionally named based
on performance (e.g. body weight), physiological parameters (e.g. blood cholesterol level),
anatomic locations/dissections (e.g. loin muscle
area), physical-chemical properties (e.g. milk
protein content), livelihood soundness (e.g.
immune capacity) and exterior appearance
(e.g. coat colour), etc. As such, there is a good
chance a trait will be named differently by different people, even within a species community. Furthermore, traits have been studied
across many species, which adds additional
complexity to their naming. The study of traits
may also involve the study of underlying genes
and markers, environments and management
protocols that contribute to the manifestation
of a trait. Therefore, it is obvious that factors
that contribute to the naming of a trait are
multi-dimensional. As the amount of trait information associated with a gene or chromosomal
region is growing exponentially, we cannot
overemphasize the need for a standard nomenclature to be used by researchers to communicate as consistently and unambiguously as
possible, with the aid of bioinformatics tools.
Traits
Cattle trait terms can be found ubiquitously
throughout journal articles, farm reports and
daily communications among scientists and
cattle industry personnel. A trait term can be
created by anyone, and each person may have
a slightly different definition for any given
term. As such, hundreds of thousands of terms
can be found in the literature with various naming conventions used. Previously, there was no
central repository where the uniqueness of a
trait term could be maintained and checked,
until two relatively recent database development
efforts emerged: the Online Mendelian
Inheritance in Animals (OMIA) database and
the Animal QTL database (QTLdb).
OMIA (see Table 24.1 for URL) was initiated in 1978. To date, it contains >400 cattle
trait variations and/or abnormalities from cattle genetic research publications (Nicholas,
Chapter 5). The Animal QTLdb (see Table 24.1
for URL) has a collection of 4 70 cattle traits,
including measurement method variations
(Hu eta/., 2013), of which 407 traits have at
least one QTL. Curators at both OMIA and
Animal QTLdb made efforts to make each
database entry unique in terms of the names
and their representations. Expanded from.
the QTLdb development, an Animal Trait
Ontology (ATO) project at Iowa State
University (see Table 24.1 for URL) has been
launched to standardize traits for livestock species including cattle. Its initial purpose was to
help with organization and management of
trait information through the use of a controlled vocabulary to facilitate comparison of
QTL results and standardize trait data annotation and retrieval (Hu et a/., 2005, 2007).
It was soon introduced to the community
(Hughes eta/., 2008).
Super-traits
Compared to standard gene nomenclature,
trait name standardization is far more complex, not only because the same trait can be
named differently (e.g. 'loin eye area' versus
'ribeye area'), but also because many factors
contribute to how a trait is defined under various circumstances. For example, Fig. 24.2
shows a list of 10 'backfat thickness' variations, each of which is defined by their specific
measurement methods, measuring time and
specific anatomic locations, which may contribute to trait comparison difficulties and
increase the potential for confusion.
One attempt to simplify the comparisons
was by introduction of the concept of 'trait
types' or 'super-traits'. Hu et a/. (2005)
described trait type as a general physical or
chemical property of, or the processes that
lead to, or types of measurements that result
in, an observation (phenotype). The 'trait types'
~~a~~~~~""~fhEJil~~~~····
60S
Backfat thickness (average backfat) by ultrasound
}
Backfat thickness (average backfat) by ruler
Backfat thickness at the 7th rib
Backfat thickness at the 121h rib
}
Backfat thickness at the 12th_13th rib
Backfat thickness at the 13th rib
Backfat thickness measured at 1-3 days postpartum
Backfat thickness measured at 40-42 days postpartum
Backfat thickness measured at 90-92 days postpartum
Backfat thickness measured at 130-150 days postpartum
bymethods
by locat;orn;
}"'"""'
Fig. 24.2. An example of the trait name variations by different modifiers such as measurement methods,
time and sampling locations. This variation can easily add difficulties for accurate and unambiguous
trait comparisons.
or 'super-traits' were initially used to serve as a
general concept for a trait, regardless of possible variations in trait names based on measurement times, locations or methods. As the
ATO project progressed, the factors in the
methods of trait measurements, such as point
in time or time span, anatomic locations,
instruments, etc., were classified as 'trait modifiers', because they do not constitute a component of a trait, but only affect the way a trait is
described. Therefore, the 'super-trait' may only
be employed to categorize variations in how a
trait is defined or named. For example, 'rib eye
area', 'rib-eye area', 'rib muscle area', 'longissimus dorsi muscle area', 'longissimus muscle
area', 'loin eye area', 'loin muscle area', etc.
can be unified as 'longissimus dorsi muscle
area (LMA)'. 'Backfat', 'backfat depth', 'backfat thickness', 'backfat above muscle dorsi',
'backfat intercept', 'backfat linear', etc. may all
simply be referred to as 'subcutaneous fat
thickness'.
Trait hierarchy and ontology
In order to compare QTL across experiments,
the Cattle QTLdb uses a trait hierarchy (Fig.
24.3) to provide a framework for organizing
the traits and easily locating them (Hu et a/.,
2013). This approach simplifies the procedures by which traits are defined, linked and
compared. Subsequently, a computer program
could be implemented to automatically process
the database searches, so that when a user
queries for a trait by keywords, the database
can gather and retrieve related trait names and
their associated QTL, put them together and
present them to the user in real time.
However, people of different disciplines
may see the need for a different trait hierarchy,
which may better capture the subtleties required
in their field. For example, for body weight
gained over a period of time (e.g. average daily
gain, ADG), a farmer considers it a production
trait, a nutritionist may see it as an indicator for
feed conversion efficiency and a veterinarian
may find it a health status parameter. Similarly,
blood cholesterol levels may be used to predict
meat quality by beef producers, and may also
be used as a parameter to predict coronary
heart disease by those who use cattle as an animal model for human heart disease research.
Therefore, a simple hierarchy may be helpful to
reduce the complexity in some cases, although
may not be adequate in all cases. In addition,
due to the existence of multiple overlapping
hierarchies for cattle traits, the management of
such data may introduce one more dimension
of complexity to the ontology structure.
Ontologies are controlled vocabularies
used to describe objects and relationships
between them in a formal manner. In an ontology, the Directed Acyclic Graph (DAG), a
mathematical graphic modelling method, is
used to solve data management problems with
complex hierarchical structures. For example,
the trait 'marbling' may belong to the 'meat
quality', 'adipose trait' or 'muscular system
physiology' hierarchies. Computer tools have
been developed and are freely available to
manage such ontology data with DAG structures. The two most popular tools that are
likely to be useful to the cattle genetics community
Cattle traits
Disease susceptibility
General health parameters
Mastitis
Organ disorder
Parasite load
Parasite resistance
Carcass characteristics
Meat quality
+ Milk composition - fat
Milk composition - other
Milk composition - protein
Milk processing trait
Milk yield
Production traits
Energy efficiency
Feed conversion
Feed intake
Growth
Life history traits
Lifetime production
Reproduction traits
Fertility
General
Semen quality
Behavioural
Conformation
Pigmentation
A simple cattle trait class hierarchy used
in the Animal QTLdb for users to browse for traits
of interest. (See Table 24.1 for URL.)
Fig. 24.3.
are AmiGO and OBO-Edit (Gene Ontology
Tools, see Table 24.1 for URL). AmiGO is an
ontology browser adapted to the ATO database, which allows users to share and view
trait data stored in ATO with any web browser
on the internet. OBO-Edit is a java-based
ontology data editor that can be used by anyone to edit ontology term definitions and relationships, and to export data in Open
Biological/Biomedical Ontologies (080) format
to share data.
Current status of research
The ATO has been a successful project since
its development from the QTLdb several
years ago. Recently, the developers of ATO
have begun working with Mouse Genome
Informatics, the Rat Genome Database,
European Animal Disease Genomics Network
of Excellence (EADGENE) and the French
National Institute for Agricultural Research
(INRA) to incorporate the Mammalian Phenotype Ontology (MPO) and the ATO into a
unified Vertebrate Trait (VT) Ontology (Park
et a/., 2013; see Table 24.1 for URL). To
reach a proper granularity level of the trait
ontology, Product Trait (PT) Ontology (see ·
Table 24.1 for URL) and Clinical Measurement
Ontology (CMO; Shimoyama et a/., 2012;
see Table 24.1 for URL) were introduced. By
reuse of existing ontologies and integration of
production-specific livestock traits, researchers at INRA have also launched an Animal
Trait Ontology for Livestock (ATOL) site, containing over 1000 traits including those of cattle (Golik eta/., 2012).
Current efforts have been aimed at
enhancing the ability to standardize trait
nomenclature within and across species. For
example, a disease such as mastitis in dairy cattle may have been considered a 'trait' in classical animal genetic studies. In fact, in terms of
concept specifications, it is not a characteristic
cattle trait observable in the general population, but rather an abnormal manifestation in
some cattle (in fact, resistance to mastitis is a
trait). In addition, a trait name may have variations because it is 'modified' by measurement
time or method (Fig. 24.2), but the names
actually represent the same trait. The separation of diseases from traits reflects the efforts
toward a well-defined and standardized trait
nomenclature. Standardization of the trait
nomenclature will undoubtedly help the cattle
genomics community make meaningful trait
comparisons, as well as facilitate the transfer of
genomics information from some well-studied
species. The challenge of using ontologies to
standardize and manage trait nomenclature is
not only a technical issue, but a community
issue, in the sense that it has to be commonly
recognized, mutually agreed upon, and widely
shared.
(standard Genetic Nomendat~re
Trait and phenotype nomenclature
Until an international committee issues rules
for trait and phenotype nomenclature, a good
practice with wide acceptability is to follow the
'norm' in published materials. Listed in Table 24.1
are some of the best trait reference resources
available to date (see table footnote for details).
Since this has been an active research area in
recent years, it is highly recommended that
users check multiple databases for the best and
most up-to-date information.
Phenotype is the actual manifestation of
observable traits. A phenotype is a trait
observed in an individual. It usually consists of
a trait with characteristic features (e.g. twinning), variations that can be described (e.g.
black spots on the body) or qualities that can be
measured (e.g. birth weight of 30 kg). Since
there are so many variations as to how a phenotype can be 'observed' (often such observation
is made indirectly with instruments or
through tests) and obtained, a technical guide
for recording each trait might be ideal. Often
a description of comments for a phenotype
record may be necessary to correctly understand and use the data. For example, when
blood samples are taken, the number of hours
the animal is fasted might be an important
co-factor for the measurement of blood cholesterol concentration.
When a phenotype is a reflection of a certain genotype, the phenotype symbol should
be the same as the genotype symbol. The difference is that the characters should not be
underlined or in italics, and they should be written with a space between locus characters and
allele characters, instead of an asterisk. Square
brackets [ ] may also be used.
In classical genetics, phenotypes were
sometimes used to denote Mendelian genotypes. This was done using an abbreviation of
the trait, post-fixed with a plus(+) or minus(-)
sign to represent 'presence' or 'absence' of
certain trait features. For example, halothanenegative was denoted as 'Hal-', and halothanepositive as 'Hal+'. A phenotype denotation can
also be used to represent genetic haplotypes,
such that 'K88ab+, ac+, ad-' are written
together as an entire denotation. Likewise,
numbers or letters may be used to denote
alleles when polymorphisms are observed, for
005
example, ApoB1/2, ApoB2/3, etc. (Note the
difference from recording genotypes, where
italics or asterisks are required.)
Future Prospects
The Gene Ontology and Mammalian
Phenotype Ontology are already playing a role
in robust annotation of mammalian genes and
phenotypes in the context of mutations, quantitative trait loci, etc. (Smith et a/., 2005).
Undoubtedly, a standardized cattle genetic
nomenclature will more effectively facilitate
efficient cattle genome annotation and transfer
of knowledge from information-rich species
such as humans and mouse, and make it possible for new bioinformatics tools to easily
streamline data management and genetic analysis. Meanwhile, it is noteworthy to mention
that the term 'phene' for 'trait' is being used
more frequently in the scientific literature in
recent years. It is interesting that in terms of
etymology lineage, 'phene' is to 'phenotype'
and 'phenome' as 'gene' is to 'genotype' and
'genome' (Wikipedia, 2012), where 'phene' is
an equivalent term for 'trait'. However,
Dr Frank Nicholas from the University of Sydney
has used the term 'phene' in OMIA in a slightly
different but more concise context, namely
'phene is to gene as phenotype is to genotype', where 'phene' refers to a set of phenotypes that correspond to a set of genotypes
determined by a gene. This is practically very
useful in light of the future structured genetic
terminology standardization in the genomics era.
Several genome databases, such as
ArkDB, Animal QTLdb, Bovine Genome
Database, Ensembl and NCB! GeneDB, have
played a role in the usage of commonly accepted
gene/trait notations. Undoubtedly, existing
and new genome databases and tools will further develop and evolve. As such, a standardized genetic nomenclature in cattle will
definitely become crucial for information sharing and comparisons between different research
groups, across experiments and even across
species. Recently the Animal Genetics journal
has updated its Author Guidelines insisting that
proper gene nomenclature be followed: 'All
gene names and symbols should be italicized
throughout the text, table and figures'; 'Locus
-----
http://www.agbase.msstate.edu/cgi-bin/information/Cow.pl
http://www.animalgenome.org/QTLdb
http://www.animalgenome.org/bioinfo/projects/ATO/
http://www.atol-ontology.com
http://www.animalgenome.org/QTLdb/exporVcattle_traits
http://bioportal.bioontology.org/ontologies/CMO (BioPortal)
http://www.animalgenome.org/bioinfo/projects/cmo
http://david.abcc.ncifcrf.gov
http://www.ensembl.org
http://www.ncbi.nlm.nih.gov/gene
http://neurolex.org/wiki/Category: Resource:Gene_Ontology_Tools
http://www.animalgenome.org/genetics_glossaries
http://www.geneontology.org
http://www.geneontology.org/GO.ontology.structure.shtml
http://www.genenames.org/guidelines.html
http://www.informatics.jax.org/mgihome/nomen/gene.shtml
http://omia.angis.org.au/
http://www.animalgenome.org/bioinfo/projects/pt
http://www.uniprot.org
http://bioportal.bioontology.org/ontologies/VT (BioPortal)
http://www.animalgenome.org/bioinfo/projects/vt
AgBASE
Animal QTLdb
Animal Trait Ontology project
ATOL
Cattle trait hierarchy
CMO project
----------------~--- ---~----
VT, Vertebrate Trait Ontology is a controlled vocabulary for the description of traits (measurable or observable characteristics) pertaining to the morphology, physiology or development
of vertebrate organisms. CMO, Clinical Measurement Ontology is designed to be used to standardize morphological and physiological measurement records generated from clinical
and model organism research and health programmes. PT, Product Trait Ontology is a controlled vocabulary for the description of traits (measurable or observable characteristics)
pertaining to products produced by or obtained from the body of an agricultural animal or bird maintained for use and profit. QTLdb, Animal QTLdb is a database to house all QTL data
for all livestock species. OMIA, Online Mendelian Inheritance in Animals is a comprehensive collection of phenotypic information on heritable animal traits and genes in a comparative
context, relating traits to genes where possible. ATOL, Animal Trait Ontology for Livestock is aimed at defining livestock traits, with a focus on the main types of animal production in line
with societal priorities.
DAVID
Ensembl
Entrez Gene
Gene Ontology Tools
Genetic glossaries
GO Consortium
GO structure
HGNC guidelines
Mouse genome nomenclature guidelines
OMIA
PT project
UniProt
VT project
URL
Data source
Table 24.1. Internet URL addresses for the web resources used in this chapter and cattle trait glossary information.
(stand&'4 ~~No!J'I~~clatut$
607
symbols used in Animal Genetics publications
must be confirmed with HGNC' and 'nonhuman gene names should be checked against
NCBI's Entrez Gene database'. This is a good
move towards educating the community on the
proper use of standardized genetic nomenclatures. Active development and use of a standardized genetic nomenclature will surely help
to improve data quality and reusability, and
facilitate data comparisons between experiments, laboratories, even species.
Acknowledgements
The authors wish to thank Dr Frank Nicholas
from the University of Sydney for useful discussions, inputs and kind review of the draft.
References
Bovine Genome Sequencing and Analysis Consortium eta/. (2009) The genome sequence of taurine cattle:
a window to ruminant biology and evolution. Science 324, 522-528.
Broad, T.E., Dolling, C.H.S., Lauvergne, J.J. and Millar, P. (1999) Revised COGNOSAG guidelines for gene
nomenclature in ruminants 1998. Genetics, Selection, Evolution 31, 263-268.
Bruford, E.A. (201 0) Highlights of the 'Gene Nomenclature Across Species' meeting. Human Genomics 4,
213-217.
Dolling, C.H.S. (1999) Standardized genetic nomenclature for cattle. In: Fries, R. and Ruvinsky, A. (eds)
The Genetics of Cattle. CAB International, Wallingford, UK, pp. 657-666.
Flicek, P., Ahmed, 1., Amode, M.R., Barrell, D., Beal, K., Brent, S., Carvalho-Silva, D., Clapham, P.,
Coates, G., Fairley, S. eta/. (2013) Ensembl 2013. Nucleic Acids Research 41 (Database issue),
D48-D55.
Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nature Genetics 25,
25-29.
Golik, W., Dameron, 0., Bugeon, J., Fatet, A., Hue, 1., Hurtaud, C., Reichstadt, M., Meunier-Salaun, M.C.,
Vernet, J., Joret, L. eta/. (2012) ATOL: the multi-species livestock trait ontology. 6th International
Conference on Metadata and Semantic Research (MTSR'12), Cadiz, Spain, 28-30 November.
Hu, Z.-L., Dracheva, S., Jang, W.-H., Maglott, D., Bastiaansen, J., Rothschild, M.F. and Reecy, J.M.
(2005) A QTL resource and comparison tool for cattle: PigQTLDB. Mammalian Genome 16,
792-800.
Hu, Z.-L., Fritz, E.R. and Reecy, J.M. (2007) AnimaiQTLdb: a livestock QTL database tool set for positional
QTL information mining and beyond. Nucleic Acids Research 35 (Database issue), D604-D609.
Hu, Z.-L., Park, C.A., Wu, X.-L. and Reecy, J.M. (2013) Animal QTLdb: an improved database tool for livestock animal QTUassociation data dissemination in the post-genome era. Nucleic Acids Research
41' D871-D879.
Huang, D:W., Sherman, B.T. and Lempicki, R.A. (2009) Systematic and integrative analysis of large gene
lists using DAVID bioinformatics resources. Nature Protocols 4, 44-57.
Hughes, L.M., Bao, J., Hu, Z.-L., Honavar, V.G. and Reecy, J.M. (2008) Animal Trait Ontology (ATO): The
importance and usefulness of a unified trait vocabulary for animal species. Journal of Animal Science
86, 1485-1491.
McCarthy, F.M., Bridges, S.M., Wang, N., Magee, G.B., Williams, W.P., Luthe, D.S. and Burgess, S.C.
(2007) AgBase: a unified resource for functional analysis in agriculture. Nucleic Acids Research 35
(Database issue), D599-D603.
Park, C.A., Bello, S.M., Smith, C.L., Hu, Z.-L., Munzenmaier, D.H., Nigam, R., Smith, J.R., Shimoyama, M.,
Eppig, J.T. and Reecy, J.M. (2013) The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species. Journal of Biomedical Semantics 4, 13.
Reese, J.T., Childers, C.P., Sundaram, J.P., Dickens, C.M., Childs, K.L., Vile, D.C. and Elsik, C.G. (2010)
Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome.
BMC Genomics 11, 645.
Rodgers, B.D., Roalson, E. H., Weber, G.M., Roberts, S.B. and Goetz, F.W. (2007) A proposed nomenclature consensus for the myostatin gene family. American Journal of Physiology- Endocrinology and
Metabolism 292, E371-E372.
008
l.-L.. Hu et al.)
Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M.,
Dicuccio, M., Federhen, S. eta/. (2012) Database resources of the National Center for Biotechnology
Information. Nucleic Acids Research 40 (Database issue), D13-D25.
Shimoyama, M., Nigam, R., Mcintosh, L.S., Nagarajan, R., Rice, T., Rao, D.C. and Dwinell, M.R. (2012)
Three ontologies to define phenotype measurement data. Frontiers in Genetics 3, 87.
Smith, C.L., Goldsmith, C.A. and Epcattle, J.T. (2005) The Mammalian Phenotype Ontology as a tool for
annotating, analyzing and comparing phenotypic information. Genome Biology6, R7.
UniProt Consortium (201 O) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research 38
(Database issue), D142-D148.
Wikipedia (2012) Phene. Available at:http://en.wikipedia.org/wiki/Phene (accessed 30 March 2013).