Download Gene Expression

Document related concepts

Neurogenomics wikipedia , lookup

Transcript
TRANSCRIPTOMICS
UNIT 5
GENE EXPRESSION – A MISNOMER ?
• In reality, gene expression can only be quantified
by looking at protein products in the cell (via
proteomic approaches).
• The term has been co-opted to describe
differences in transcript (mRNA) levels.
• Transcripts may or may not be translated into
protein and thus don’t necessarily reflect gene
expression.
DIFFERENTIAL GENE EXPRESSION
• Responsible for differences between cell types of the same
organism (e.g., kidney vs. brain cells)
• Means by which development is controlled
• Involves gene feedback loops and induction/repression
initiated by external (environmental stimuli) and/or internal
(transcription factors) forces
A TYPICAL EUKARYOTIC PROTEIN CODING GENE
A TYPICAL EUKARYOTIC PROTEIN CODING GENE
TRANSCRIPTOMICS
• The study of the complete set of RNAs
(transcriptome) encoded by the genome
• 1 - of a specific cell or organism
• 2 - at a specific time or
• 3 - under a specific set of conditions
• Dependent on:
• The organism
• The cell, cell line, or tissue
• The developmental stage
• The condition/treatment
• Usually, we tend to ignore the rRNAs and tRNAs
TRANSCRIPTOME COMPLEXITY
TRANSCRIPTOME COMPLEXITY
TRANSCRIPTOME COMPLEXITY
TRANSCRIPTOME COMPLEXITY
TRANSCRIPTOME COMPLEXITY
TRANSCRIPTOME COMPLEXITY
TRANSCRIPTOME COMPLEXITY
HISTORY OF GENE EXPRESSION ANALYSIS
• Northern blotting
• EST sequencing
• Microarrays
• RNA-Seq
NORTHERN BLOTTING
• What is it?
• Detection of RNA on a substrate via hybridization with a
probe
• Pros
• No amplification involved
• Can study expression of multiple genes (e.g., 5-10) on the same
gel as long as they are of different molecular weights
• Allows detection of some alternative splicing
• Cons
•
•
•
•
Must blot gels (messy and time-consuming)
Requires LOTS of starting mRNA
RNA highly vulnerable to degradation
Not high-throughput
NORTHERN BLOTTING
• Isolate mRNAs from multiple samples that differ with regard to
tissue type, developmental stage, disease resistance, exposure
to stimulus, etc.
• Place each mRNA population in its own well of a denaturing
agarose gel (formaldehyde added to gel to keep inter- and
intramolecular base pairing from occurring)
• Separate mRNAs by electrophoresis
• Blot mRNA onto membrane. Fix RNA to membrane.
• Hybridize labeled DNA probe(s) to membrane
• Quantify differences in transcript levels between samples
NORTHERN BLOTTING
• What is it?
• Detection of RNA on a substrate via hybridization
with a probe
HOW NORTHERN BLOTTING IS USED
• -actin expression in the brain of a mouse
• Can also look at changes in expression in
multiple tissues as a function of time
EST SEQUENCING (SANGER)
mRNA pool
cDNA pool
Cloned library
• Expressed Sequence Tags are a
set of single sequence reads
from a library of cDNAs of a
given sample
Clone sequences
EST SEQUENCING (SANGER)
• Sequence ESTs isolated from
different tissues or different
experimental trials
• Compare similarities and differences
in EST expression patterns
• Dominance of certain transcripts can
make EST sequencing an inefficient
means of measuring changes in gene
expression
• For example, in estrogen-treated
chicken oviduct, > 50% of transcripts
in cell are product of one gene
mRNA pool
cDNA pool
Cloned library
Clone sequences
EST SEQUENCING (SANGER)
• Pros
• Lots of sequence information that can be used for lots
of different purposes
• Can study expression variation of whole transcriptome,
not just a handful of genes
• Allows detection of alternative splicing
• Cons
• Expensive/inefficient because dominant cDNAs will be
sequenced over and over again
• Not likely to be truly quantitative due to RT and cloning
biases
MICROARRAYS
• DNAs are spotted onto a glass microscope slide or similar
substrate
• RNAs from a tissue/cell culture are hybridized to the DNAs
• Fluorescence techniques rather than radioisotopic
techniques are used in visualization
• Spots are about the size of a typed period using 10 pt font.
• Each spot contains roughly the same amount of DNA
• Use the fluorescence data to determine exactly which genes
are expressed differently between two tissue types
• Quantify differences in expression for individual genes
• Actually know which genes correspond to which spots
• Find genes that may be activated together (gene expression
pathways)
MICROARRAY
MICROARRAY QUANTITATION
MICROARRAY VISUALIZATION
MICROARRAYS
• Pros
• Can study expression variation of whole transcriptome, not just a
handful of genes
• Definitely high-throughput
• Rapid screening possible
• Slides can be stored at room temperature
• Many PCR products can be spotted on a single array (up to 390,000
spots)
• For species with relatively few genes (e.g., yeast), it is possible to
spot all the genes in an ordered manner onto a single array
•
Cons
• Very few once good slides/chips are made
• Not practical with poorly characterized genomes (expense in designing
chip requires a commitment from a relatively large scientific
community)
• Only as good at the genes you spot on the slide.
RNA-SEQ: WHOLE TRANSCRIPTOME SHOTGUN
SEQUENCING
• The current state-of-the-art
• Process
•
Isolate mRNA from a tissue or tissues (replicates?)
•
Build a sequencing library
•
Sequence (e.g., Illumina)
•
Transcript identification and/or quantification
RNA-SEQ: RNA ISOLATION AND QUALITY
• RNA degrades quickly
• RIN – RNA integrity number
• Calculated by identifying a combination of
characteristics
•
Total RNA ratio – compares ratio of rRNA to other
RNAs, more intact RNA is better because it
indicates little degradation, >=better
•
Height of 28S rRNA peak – 28S rRNA is typically
degraded more quickly than 18S, more intact 28S
rRNA indicates little degradation, >=better
•
Fast area ratio – indicates how much degradation
has occurred, <=better
•
Marker height - <=better, indicates only small
amounts of RNA have been degraded
RNA-SEQ: RNA LIBRARY PREP
• Standard cDNA library example
5’
First Strand Synthesis
Random hexamer
5’
Second Strand Synthesis
5’
A
A
5’
T
A
A
T
Major problem – you get ALL RNAs, including rRNA
A addition
Adapter ligation
RNA-SEQ: RNA LIBRARY PREP
• Library construction challenges
• How to avoid rRNA?
• Use oligo-dT enrichment
• Bias toward 3’ end
• Protocols to remove rRNA
followed by random
fragmentation
• More even coverage but bias
against the ends
• “Not-so-random” (NSR) priming
• Subtract the random
hexamers and heptamers that
are likely complementary to
rRNAs before first round cDNA
synthesis
Examples from yeast
RNA-SEQ: RNA LIBRARY PREP
• Strand-specific library or not?
• Transcription can occur in both directions
• Gene can be located on either DNA strand and sometimes
overlapping.
• Complementary RNA molecule to a given mRNA can also be
transcribed, antisense transcription, are involved in regulatory
mechanism.
• Knowing the strand of origination can resolve questions
about the gene or origin, function of the RNA and
expression level
RNA-SEQ: RNA LIBRARY PREP
• Major methods for
strand-specific libraries
• 1. Differential adapter
ligation to RNA
• 2. ‘Strand marking’ of
the RNA or secondstrand cDNA (dUTP)
• 3. Differential adapter
priming (RT method)
RNA-SEQ: SEQUENCING DEPTH
• How much to sequence depends heavily on the goals and targeted starting
material
• Detailed analysis of differential expression will require at least 10s of millions
of reads
• Simple discovery of what is being transcribed requires as few as 50,000100,000 reads
RNA-SEQ: ANALYSIS
• Heavily dependent on the goals and resources of the researcher
• Transcript Identification
•
Map to reference genome
•
Map to available transcriptome
•
Assemble transcriptome de novo
• Transcript Quantification
• Differential Expression Analysis
• Alternative Splicing Analysis
• Small RNAs
RNA-SEQ: TRANSCRIPT IDENTIFICATION
• Alignment
• Map to available reference genome
• Must deal with splice junctions
Spliced read
Unspliced read
AAAAAAAAA
Mature mRNA
Splicing
Gene
• Reads may map uniquely or be multi-mapped (pseudogenes,
paralogs, repetitive sequences, etc)
• Use a gapped mapper (Tophat, STAR)
RNA-SEQ: TRANSCRIPT IDENTIFICATION
• Alignment
• Map to available transcriptome
• May need to deal with alternative splicing
Spliced read
Alternatively spliced read
AAAAAAAAA
AAAAAAAAA
Alternatively spliced read
• Again, reads may map uniquely or be multi-mapped
(pseudogenes, paralogs, repetitive sequences, etc)
• Use an ungapped mapper (Bowtie)
• Reduced ability to identify new transcripts
Transcript 1
Transcript 2
With retained intron
RNA-SEQ: TRANSCRIPT IDENTIFICATION
• Alignment – no reference
• Assemble the transcriptome de novo
• Trinity package uses a de Bruijn graph approach
RNA-SEQ: TRANSCRIPT IDENTIFICATION
• Alignment – no reference
• Assemble the transcriptome de novo
• Other packages include Oases, SOAPdenovo, Trans, Trans-ABYSS
• Map reads to assembled transcriptome
• Again, reads may map uniquely or be multi-mapped
(pseudogenes, paralogs, repetitive sequences, etc)
• Use an ungapped mapper (Bowtie)
• Increased ability to identify novel transcripts and isoforms
RNA-SEQ: TRANSCRIPT QUANTIFICATION
• The most common RNA-Seq task
• Basically, you count the number of reads that map to a particular
locus
• Assumes that library was constructed in such a way that reads are
proportional to transcript abundance
• Simple counts won’t work because of differential gene and
transcript lengths
• Longer and more highly expressed transcripts are more likely
be represented among RNA-seq reads
• Several measures normalize by transcript length and the total
number of reads captured and mapped in the experiment
RNA-SEQ: TRANSCRIPT QUANTIFICATION
• The most common RNA-Seq task
• Basically, you count the number of reads that map to a particular
locus
• Assumes that library was constructed in such a way that reads are
proportional to transcript abundance
• Simple counts won’t work because of differential gene and
transcript lengths
• Standard measures
•
RPKM = reads per kilobase per million
= [# of mapped reads]/[length of transcript in kilobase]/[million mapped reads]
•
FPKM = fragments per kilobase per million
= [# of fragments]/[length of transcript in kilobase]/[million mapped reads]
FPKM is more appropriate for PE RNA-Seq experiments
RNA-SEQ: DIFFERENTIAL EXPRESSION
• Comparison of transcription levels among samples
• Objective: In samples that have been exposed to different
treatments or in distinct tissues, identify what genes are being
transcribed at higher or lower rates than others.
• Accomplished by mapping the reads to the genome or assembled
transcriptome then performing a statistical transformation of the
data
• But there are problems
RNA-SEQ: DIFFERENTIAL EXPRESSION
• A simple scenario
• Two samples A and B are sequenced
• Every gene (n = 1000) that is expressed in A is expressed in B at
the same level (same total number of transcripts)
• However, in A there are 1000 additional genes that are also
expressed but that are not expressed in B
• Sample A has twice the number of transcripts, half of which are
unique to A
• If we sample each to the same depth (say 5,000,000 reads) the
shared genes from sample A will have half the number of reads as
B
• You should adjust (normalize) by a factor of 2
• Other factors can impact this as well - technical variation, random
noise in the data, sequencing differences, etc.
RNA-SEQ: DIFFERENTIAL EXPRESSION
• How do we tell the
noise from the ‘real’
differences?
• Most common method
= TMM (Trimmed
mean of m-values)
• A method to
determine ‘global fold
change’
• equates the overall
expression levels of
genes between
samples under the
assumption that the
majority of them are
not differentially
expressed
RNA-SEQ: ALTERNATIVE SPLICING
• Often performed as part of the transcriptome assembly
• Can also use genome mapping to identify differential mapping to
exons or mapping to introns
RNA-SEQ: FUNCTION PROFILING
• Gene Ontology (GO)
• Suppose you find some differentially expressed genes. How do
you find out what they do?
• GO terms are associated with specific references that describe the
work or analysis upon which the association between the term and
gene product is based.
• Each annotation includes an evidence code to indicate how the
annotation to a particular term is supported.
• Experimental evidence, computational evidence, author statements,
curatorial statements, inferred from automated annotation statement
RNA-SEQ: FUNCTION PROFILING
• Gene Ontology (GO) Terms fall into three categories
• Cellular component
• A component of a cell, but with the proviso that it is part of some
larger object; this may be an anatomical structure (e.g. rough
endoplasmic reticulum or nucleus) or a gene product group (e.g.
ribosome, proteasome or a protein dimer).
• Cytochrome c is a gene with the GO cellular component terms
mitochondrial matrix and inner mitochondrial membrane
associated with it
RNA-SEQ: FUNCTION PROFILING
• Gene Ontology (GO) Terms fall into three categories
• Molecular function
• activities, such as catalytic or binding activities, that occur at the
molecular level. Does not specify where or when, or in what context,
the action takes place. Molecular functions generally correspond to
activities that can be performed by individual gene products, but
some activities are performed by assembled complexes of gene
products.
• Examples of functional terms are catalytic activity, transporter
activity or binding, adenylate cyclase activity or Toll receptor
binding.
• Cytochrome c molecular function GO term is oxidoreductase activity
RNA-SEQ: FUNCTION PROFILING
• Gene Ontology (GO) Terms fall into three categories
• Biological process
•
A series of events accomplished by one or more ordered assemblies
of molecular functions.
•
Examples of biological process terms are cellular physiological
process or signal transduction, pyrimidine metabolism or alphaglucoside transport.
• Cytochrome c biological process GO terms are oxidative
phosphorylation and induction of cell death associated with it
RNA-SEQ: FUNCTION PROFILING
• Conesa et al. Genome Biology (2016) 17:13
RNA-SEQ: FUNCTION PROFILING
• Conesa et al. Genome Biology (2016) 17:13
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• ENCODE project (Nature, 2007)
•
Examined 1% of the genome (~30Mb)
•
“The human genome is pervasively transcribed, such that the majority of its bases
are associated with at least one primary transcript and many transcripts link distal
regions to established protein coding loci.”
•
“Many novel non-protein-coding transcripts have been identified, with many of
these overlapping protein-coding loci and other located in regions of the genome
previously thought to be transcriptionally silent.”
•
“Numerous previously unrecognized transcription start sites have been identified,
many of which show chromatin structure and sequence-specific protein binding
properties similar to well-understood promoters.”
•
74% of bases are represented in a primary transcript with evidence coming from 2
or more experimental technologies
•
This project was published prior to more advanced techniques being developed
and conclusions have been contested.
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• ENCODE project (Nature, 2012)
•
“…assign biochemical functions for 80% of the genome, in particular outside of the
well-studied protein-coding regions.”
•
“We define a functional element as a discrete genome segment that
•
•
•
1 - encodes a defined product (e.g. protein or non-coding RNA)
•
2 - or displays a reproducible biochemical signature (e.g. protein-binding or a specific
chromatin structure).”
Even more criticism of this work (and I think deservedly so).
•
The definition above is much too loose and allows for just about anything to be
considered ‘functional’.
•
Essentially, anything that produces a transcript or is bound by a protein is ‘functional’.
Criticized most soundly, in my opinion, by Dan Graur in: “On the immortality of
television sets: ‘Function’ in the human genome according to the evolution-free
gospel of ENCODE”
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• On the immortality of television sets: ‘Function’ in the human genome
according to the evolution-free gospel of ENCODE (GBE 2013)
•
Main points – “This absurd conclusion was reached through…”
•
“employing the seldom used ‘causal role’ definition of biological function and then
applying it inconsistently to different biochemical properties”
•
“committing a logical fallacy known as ‘affirming the consequent’”
•
“failing to appreciate the crucial difference between ‘junk DNA’ and ‘garbage
DNA’”
•
“using analytical methods that yield biased errors and inflate estimates of
functionality”
•
“favoring statistical sensitivity over specificity”
•
“emphasizing statistical significant rather than the magnitude of the effect.”
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• Main points – “This absurd conclusion was reached through…”
• Employing the seldom used ‘causal role’ definition of biological function….”
What is the meaning of ‘function’?
•
•
Selected effect definition is historical and evolutionary
•
For a trait, T, to have a proper biological function, F, it is necessary and sufficient that
two conditions hold
•
1 – T originated as a reproduction of some prior trait that performed F (or something
similar) in the past
•
2 – T exists because of F
•
The ‘selected effect’ function of a trait is the effect for which it was selected or is
maintained
Causal role definition
•
•
For a trait, Q, to have a causal role, function G, it is necessary and sufficient that Q
performs G.
The heart
•
Selected effect – to pump blood, Causal role – to add mass to the body
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• Employing the seldom used ‘causal role’ definition of biological function….”
Main points – “This absurd conclusion was reached through…”
•
Two identical sequences (TATAAA) in the genome at distinct loci
•
Instance 1 has been selected for and maintained by natural selection with the
effect of binding a transcription factor to initiate gene expression
•
Instance 2 has arisen by chance, but because of its sequence, can also bind a
transcription factor but probably has no impact on function
•
Instance 1 – selected effect, Instance 2 – causal role
•
“ENCODE adopted a strong version of the causal role definition of function,
according to which a functional element is a discrete genome segment that
produces a protein or an RNA or displays a reproducible biochemical signature
(e.g., protein binding). Oddly, ENCODE not only uses the wrong concept of
functionality, it uses it wrongly and inconsistently.”
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• Committing a logical fallacy known as ‘affirming the consequent’
•
If P, then Q. Q. Therefore, P.
•
According to ENCODE, DNA segments that ‘function’ in a process (e.g. gene
regulation) tend to display a certain property (e.g. transcription factor binding)
•
Another DNA segment displays said property (e.g. it binds a transcription factor)
•
Therefore, the DNA segment is functional (e.g. is involved in gene regulation)
•
All ‘nopes’ apply.
•
One of my favorite passages, “the ENCODE authors singled out transcription as a
function, as if the passage of RNA polymerase through a DNA sequence is in some
way more meaningful than other functions. But, what about DNA polymerase and
DNA replication? Why make a big fuss about 74.7% of the genome that is
transcribed, and yet ignore the fact that 100% of the genome takes part in a
strikingly “reproducible biochemical signature”—it replicates!”
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• Committing a logical fallacy known as ‘affirming the consequent’
•
“Transcription ≠ function”
•
“Histone modification ≠ function”
•
“Open chromatin ≠ function”
•
“Transcription factor binding ≠ function”
•
“DNA methylation ≠ function”
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• Failing to appreciate the crucial difference between ‘junk DNA’ and ‘garbage
DNA’”
•
Misconceptions about ‘junk DNA’
•
1 – lack of knowledge of original definition
•
2 – belief that evolution can always get rid of nonfunctional DNA
•
3 – belief that ‘future potential’ constitutes ‘function’
•
Original definition of junk DNA – a genomic segment on which selection does not
operate.
•
“This sense of the term “junk DNA” was used by Jacob (1977) in his famous paper
“Evolution and Tinkering”: “[N]atural selection does not work as an engineer … It
works like a tinkerer—a tinkerer who does not know exactly what he is going to
produce but uses whatever he finds around him whether it be pieces of string,
fragments of wood, or old cardboards … The tinkerer … manages with odds and
ends. What he ultimately produces is generally related to no special project, and it
results from a series of contingent events, of all the opportunities he had to enrich
his stock with leftovers.”
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• Failing to appreciate the crucial difference between ‘junk DNA’ and ‘garbage
DNA’”
•
Misconceptions about ‘junk DNA’
•
1 – lack of knowledge of original definition
•
2 – belief that evolution can always get rid of nonfunctional DNA
•
3 – belief that ‘future potential’ constitutes ‘function’
•
“Evolution can only produce a genome devoid of “junk” if and only if the effective
population size is huge and the deleterious effects of increasing genome size are
considerable.”
•
In bacteria, this generally applies. Generation time is correlated with genome size
and effective population sizes are enormous.
•
In eukaryotes, not so much.
HOW MUCH OF THE HUMAN GENOME IS
FUNCTIONAL?
• Failing to appreciate the crucial difference between ‘junk DNA’ and ‘garbage
DNA’”
•
Misconceptions about ‘junk DNA’
•
1 – lack of knowledge of original definition
•
2 – belief that evolution can always get rid of nonfunctional DNA
•
3 – belief that ‘future potential’ constitutes ‘function’
•
Teleology – the philosophy that nature has goals
•
“Junk DNA may, in fact, exhibit a very similar behavior to the regular junk in one's
garage, which is kept for years and years, and then thrown out a day before it may
become useful.”
•
“Some years ago I noticed that there are two kinds of rubbish in the world and that
most languages have different words to distinguish them. There is the rubbish we
keep, which is junk, and the rubbish we throw away, which is garbage. The excess
DNA in our genomes is junk, and it is there because it is harmless, as well as being
useless, and because the molecular processes generating extra DNA outpace those
getting rid of it. Were the extra DNA to become disadvantageous, it would become
subject to selection, just as junk that takes up too much space, or is beginning to
smell, is instantly converted to garbage … ”. Brenner 1998
AN EVOLUTIONARY CLASSIFICATION OF GENOMIC
FUNCTION
D. GRAUR, Y ZHENG, RBR AZEVEDO
•
The pronouncements of the ENCODE Project Consortium regarding “junk DNA” exposed the
need for an evolutionary classification of genomic elements according to their selected-effect
function. In the classification scheme presented here, we divide the genome into “functional
DNA,” i.e., DNA sequences that have a selected-effect function, and “rubbish DNA,” i.e.,
sequences that do not. Functional DNA is further subdivided into “literal DNA” and
“indifferent DNA.” In literal DNA, the order of nucleotides is under selection; in indifferent
DNA, only the presence or absence of the sequence is under selection. Rubbish DNA is further
subdivided into “junk DNA” and “garbage DNA.” Junk DNA neither contributes nor detracts
from the fitness of the organism and, hence, evolves under selective neutrality. Garbage DNA,
on the other hand, decreases the fitness of its carriers. Garbage DNA exists in the genome
only because natural selection is neither omnipotent nor instantaneous. Each of these four
functional categories can be (1) transcribed and translated, (2) transcribed but not translated,
or (3) not transcribed. The affiliation of a DNA segment to a particular functional category may
change during evolution: functional DNA may become junk DNA, junk DNA may become
garbage DNA, rubbish DNA may become functional DNA, and so on, however, determining the
functionality or nonfunctionality of a genomic sequence must be based on its present status
rather than on its potential to change (or not to change) in the future. Changes in functional
affiliation are divided in to pseudogenes, Lazarus DNA, zombie DNA, and Hyde DNA.
Selected effect function?
No
Yes
Does sequence matter?
Yes
No
Selectively neutral?
Yes
No
RNA-SEQ: FUNCTION PROFILING
• Gene Ontology (GO)
• Evidence terms – Experimental evidence
• Experimental (EXP)
• Direct Assay (IDA)
• Physical interaction (IPI)
• Mutant Phenotype (IMP)
• Genetic Interaction (IGI)
• Expression Pattern (IEP)
RNA-SEQ: FUNCTION PROFILING
• Gene Ontology (GO)
• Evidence terms – Computational evidence
• Sequence or Structural Similarity (ISS)
• Sequence Orthology (ISO)
• Sequence Alignment (ISA)
• Sequence Model (ISM)
• Genomic Context (IGC)
• Biological Aspect of Ancestor (IBA)
• Biological Aspect of Descendant (IBD)
• Key Residues (IKR)
• Rapid Divergence (IRD)
• Reviewed Computational Analysis (RCA)
RNA-SEQ: FUNCTION PROFILING
• Gene Ontology (GO)
• Author statement terms
• Traceable Author Statement (TAS)
• Non-traceable Author Statement (NAS)
• Curatorial statement terms
• Inferred by Curator (IC)
• No Biological Data (ND)
• Automatically assigned evidence term
• Inferred from Electronic (automated) Annotation (IEA)
RNA-SEQ: FUNCTION PROFILING
• Gene Ontology (GO)
• Author statement terms
• Traceable Author Statement (TAS)
• Non-traceable Author Statement (NAS)
• Curatorial statement terms
• Inferred by Curator (IC)
• No Biological Data (ND)
• Automatically assigned evidence term
• Inferred from Electronic (automated) Annotation (IEA)