Download Translation of Drug Metabolic Enzyme and Transporter (DMET) Genetic Variants into Star Allele Notation using SAS.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of human development wikipedia , lookup

Tag SNP wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Inbreeding wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epistasis wikipedia , lookup

Skewed X-inactivation wikipedia , lookup

X-inactivation wikipedia , lookup

Medical genetics wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Human genetic variation wikipedia , lookup

Public health genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome (book) wikipedia , lookup

Human leukocyte antigen wikipedia , lookup

Designer baby wikipedia , lookup

Genome-wide association study wikipedia , lookup

Population genetics wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Genetic drift wikipedia , lookup

Microevolution wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Transcript
Paper PR03-2009
Translation of Drug Metabolic Enzyme and Transporter (DMET) Genetic Variants
into Star Allele Notation using SAS
Mark Farmen, Eli Lilly and Company, Indianapolis, IN
William Koh, Lilly Singapore CDD, Republic of Singapore
Sandra L Close, Eli Lilly and Company, Indianapolis, IN
ABSTRACT
Conversion of genotyping results from metabolic enzyme and transporter (DMET) genetic assays into the consensus
star allele nomenclature is necessary for clinical utilization of DMET genetics. The problem involves translation of
variant genotypes, such as single nucleotide polymorphisms (SNPs) or small insertion-deletion events, into gene level
star allele nomenclature. Genetic variants in the DMET genes can occur in combinations. In some instances a single
variant or often these combinations define a star allele. When and how these variants combine is often poorly
understood. However, given adequate definitions, a simple algorithm based on vector addition and comparison can
be easily implemented in SAS proc IML to perform the conversion. The simplicity and transparency of this algorithm
can help de-mystify the translation process and allow statisticians and data management experts to handle this new
type of patient information.
INTRODUCTION
The cytochrome P450 superfamily of enzymes (CYPs), together with other enzyme classes and transport proteins
have important roles in the uptake, distribution, metabolism and excretion of a host of therapeutic drugs and other
xenobiotic molecules (Lewis (2005) and Cascorbi (2006)). Extensive literature evidence exists that significant
variability in individual drug disposition and response is commonplace. Much of this observed heterogeneity is
believed to be due to the underlying genetic variation found in metabolism and transport genes. The functional
consequences or predicted phenotypes resulting from genetic variation, such as single nucleotide polymorphisms,
deletions, insertions, and gene duplications, in these enzymes and transporters can markedly influence drug
pharmacokinetics or alter efficacy and/or toxicity profiles. An early example includes the metabolism of the
hypertension drug, debrisoquine, which was shown to be influenced by common genetic variants in the CYP2D6 gene
(Gonzalez et.al. (1988). More recently, Mega et. al. (2009) have demonstrated genetic variation in CYP2C19 affects
clopidogrel metabolism with subsequent effects on pharmacodynamic response and cardiovascular event rates.
DNA can be thought of as a sequence of nucleotides denoted by A, C, G and T. Locations along the DNA sequence
that differ from person to person in the nucleotide(s) present at that location are called genetic variants. Currently,
relatively inexpensive genotyping platforms can readily determine point variations, by determining which nucleotide
(A, C, G or T) pair exists at a distinct location along each of the chromosomes for a patient (see Figure 1). This
common type of genetic variant is called a single nucleotide polymorphism (SNP). The variant is defined by its
location on the chromosome and the composition of nucleotides seen at this locus. Since any given gene is
represented by two copies (one on the paternal chromosome and one on the maternal chromosome), data collected
from these platforms is a genotype consisting of a pair of nucleotides (e.g. CC, TC or TT - for a C/T variant) at the
locus. However, in many cases information about which of the two chromosomes contributed which alleles is not
known due to limitations of the genotyping technology. In addition to SNPs, alternate, but equally important types of
biallelic genetic variation in metabolism and transport genes are known as an insertion/deletion (IN/DEL). At a locus
within a gene, nucleotides are either present (insertion) or missing (deletion) (see Figure 2).
1
CYP4B1 gene from one of two chromosomes illustrating variant 517C>T
Major allele (wild type)
Minor allele (mutant)
…AGTTATCCAG … AGAGAAAGCT[C (>T)]GGGAGGGTAA …
Variant Locus (bp)
517
DNA location axis
Figure 1. Example genetic variant 517C>T from http://www.cypalleles.ki.se/cyp4b1.htm. The units along the axis are
base pairs, labeled with a nucleotide position number. At the relative location, 517, the usual C nucleotide (major
allele) can be a variant T nucleotide (minor allele). The presence of T changes how the resulting liver enzyme
functions.
CYP4B1 gene from one of two chromosomes illustrating variant 881_882delAT;
…AGTTATCCAG … CCCTAACCCAGG [AT>(-)] GAAGATGACATC…
Variant Locus (bp)
DNA location axis
881
- Indicates that AT are
missing from location (a Del)
Figure 2. Example genetic variant 881_882delAT from http://www.cypalleles.ki.se/cyp4b1.htm. At the relative
location 881-882, the major AT nucleotides (INS major allele) can be a “variant” deletion in which these two
nucleotides are missing from the sequence at this position (DEL minor allele indicates the missing “AT” at 881).
Again, the absence of the AT changes how a protein is made from the genetic template, resulting in liver enzyme
dysfunction for this allele.
The resulting data for the SNP at position 517 and the IN/DEL at 881 in allele pairs for four hypothetical patients is
shown in Table 1. For each of the loci, major (represented in greater than 50% of the population) and minor
(represented in less than 50% of population) alleles are customarily defined. Worth noting is patient 2 with one minor
T allele and one minor DEL allele. A limitation of almost all current genotyping platforms, is the inability to define if
the 517C>T “T” is on the same chromosome as the 881_882delAT “DEL.” This lack of chromosomal information can
be problematic, as most genes often have multiple loci with defined variation on a particular chromosome (e.g.
maternal). Although much of this variation is rare, and observing them in combination is infrequent, some of the
known functional variation is quite common. The inability to determine if the person has one copy of the gene with two
minor variants or two copies of the gene with one minor variant represented in each copy may lead to difficulty in
interpreting the functional consequences. For some of the metabolism and transporter genes, there has been
extensive research on how these variants are inherited, which generally aids in inferring which chromosomes have
which of the observed variants (minor alleles).
patient
517C>T
1
2
3
4
CC
CT
TT
CC
881_882delA
T
INS/INS
INS/DEL
INS/DEL
DEL/DEL
Table 1. Example genetic data on two variants for 4 patients.
For many genes in drug metabolism and transport, a “common consensus nomenclature” has been developed, based
upon the presence or absence of defined genetic variants. Translation of multiple locus level genetic variants within a
gene into this standardized nomenclature known as star alleles has become common place. Nebert and IngelmanSundberg have maintained star allele nomenclature and annotations for many of the Cytochrome P450 genes
2
(http://www.cypalleles.ki.se/ and Sim and Ingelman-Sundberg (2006)). For some non-cyp drug metabolism enzymes
or transporters, there are similar sites that maintain star allele nomenclature [UDP-glucuronosyltransferase (UGT)
(http://galien.pha.ulaval.ca/alleles/alleles.html) and N-acetyltransferase (NAT)
(http://louisville.edu/medschool/pharmacology/NAT.html) as pointed out in Robarge et al. (2007). For clinical utility, a
transparent and flexible means of translating locus level variants to star allele genotypes is needed.
Star alleles are typically a multi-locus variant known as a haplotype (haploid meaning from one chromosome vs.
diploid from a pair of chromosomes). There are very powerful methods for determining haplotype pairs that are
consistent with a given set of locus level genotypes (e.g. Schaid et al. (2002)). These methods assume that the
haplotypes are unknown and do not make use of information outside the sample genotypes being analyzed. For
DMET genes, there is considerable literature on known gene haplotypes/star alleles. Given a set of variants at
multiple loci, publicly available information can be used by a clinical expert to create star allele definitions. These will
be referred to as translation tables. The translation involves finding star allele pairs from the translation table that are
consistent with a set of locus level genotypes.
In addition to the need to translate genetic information appropriately, when genotyping is performed on a single or few
variants rather than comprehensively for the gene, missing information regarding these multilocus variants may lead
to misinterpretation of the patient’s function. In the quest to utilize pharmacogenetic information for tailoring drug
prescribing, a multiplex platform has been developed by Affymetrix® to genotype variants at roughly 1000 loci in Drug
Metabolic Enzyme and Transporter (DMETTM) genes (Dumaul et. al. (2007) and Daly et. al. (2007)). The data
discussed here was generated by an early version of the Affymetrix DMET™ Plus Premier Pack
http://www.affymetrix.com/products_services/arrays/specific/dmet.affx. This report defines efforts to create an
automated translating algorithm which transforms individual genetic variant data for 22 genes and 165 variants into
the common consensus * allele nomenclature.
TRANSLATION TABLE
To demonstrate the method, the cytochrome P450 enzyme CYP4B1 was selected. The Karolinska website lists
variants at 7 loci that are used to define the star alleles (see Table 2). The “nucleotide changes” field lists the locus
level variants within the gene that must be present on a chromosome for that chromosome to have the indicated star
allele. For example, 517C>T indicates that the major allele, C, is replaced by variant or minor allele, T, at the 517
nucleotide position in the CYP4B1 gene. If this variant occurs on a chromosome without any other listed variants
then the chromosome has the CYP4B1*3 allele. The star allele provides a description of one copy of the patient’s
gene.
It is useful to think of the star alleles as vectors and the variants as vector components (i.e. dimensions or fields).
Table 3 lists variants as fields and star alleles are defined by an “x” indicating that this variant must be present to
define the allele. No “x” means the defining allele is not present. Table 3 is a translation table. Comparing Tables 2
and Tables 3, the CYP4B1*6 allele is missing and the 1033G>A variant is missing. In a multiplex assay, missing one
of several variants utilized to define the allele occurs on occasion. When markers are missing, decisions have to be
made about how to modify the translation table in terms of its star allele definitions. This requires intimate knowledge
of the gene and its variants. Missing locus level variants (i.e. not in the multiplex assay) cause some star alleles to be
indistinguishable from one another. Manipulations of Table 2 as a SAS dataset will be shown that identify these
“aliased” star alleles in the last section. It is utilizing logic very similar to translation, so many of the details behind the
code will be left to the reader. For CYP4B1, only the 1033G>A variant is missing and CYP4B1*6 can not be
distinguished form CYP4B1*3. For demonstration purposes, the variant pattern with an x for 517C>T and thus a
genotype indistinguishable between CYP4B1*6 and CYP4B1*3, will be assigned the star allele CYP4B1*3.
The translation table, shown in Table 3, will be used to translate locus level variant data into gene level star allele
pairs (i.e. star allele genotypes). As mentioned above, the “x” indicates the variant(s) that defines the star allele. For
a copy of the gene (1 chromosome), the minor allele must be present for the indicated variants and the major allele
for the other variants in order for the gene copy to be the indicated star allele. Using 0 to indicate the major allele and
1 to indicate the minor allele, the star allele, CYP4B1*2, is present on a specific chromosome if the pattern of locus
level variants is 0-1-0-1-1-1.
3
Allele
Protein
Nucleotide changes,
cDNA
Effect
Enzyme activity
CYP4B1*1 CYP4B1.1
None
None
CYP4B1*2
881_882delAT;
993G>A; 1018C>T;
1123C>T
294Frameshift (premature
stop);
M331I; R340C; R375C
LoGuidice et
al, 2002
CYP4B1*3 CYP4B1.3
517C>T
R173W
LoGuidice et
al, 2002
CYP4B1*4 CYP4B1.4
964A>G
S322G
LoGuidice et
al, 2002
CYP4B1*5 CYP4B1.5
993G>A
M331I
LoGuidice et
al, 2002
CYP4B1*6 CYP4B1.6
517C>T; 1033G>A
R173W; V345I
Hiratsuka
et al., 2004
CYP4B1*7 CYP4B1.7
881_882delAT; 993G>A;
1018C>T
294Frameshift (premature
stop); M331I; R340C
Hiratsuka
et al., 2004
In_viv
o
In_vitr
o
Reference
s
Normal Normal
Additional SNPs, where the haplotype has not yet been determined
1061T>G
F354C
NCBI
dbSNP
Table 2. Allele nomenclature table taken from http://www.cypalleles.ki.se/cyp4b1.htm on 02/25/2009. The Nucleotide
Changes field lists the variants that define each star allele. A copy of the gene must have the listed variants without
any of the other variants in order for it to be the indicated star allele.
However, the locus level data consists of genotypes from two chromosomes, not just one, necessitating that the
algorithm handle genetic data from the chromosome pairs simultaneously. For the following example, reference
Table 3 and the minor alleles. If the locus level variants, ordered as in Table 3, have patient genotypes CC, DelDel,
AA, AA, TT, TT, then the translation would be star allele genotype CYP4B1*2/CYP4B1*2 or *2/*2 (a *2 homozygote
for CYP4B1). Two of the *2 haplotypes, C-DEL-A-A-T-T, are the only star allele pair consistent with this set of variant
level genotypes. At first glance, the translation appears complex. However, the simple vector algorithm described
below makes finding consistent star allele pairs relatively simple.
TRANSLATION ALGORITHM
Given a set of biallelic variants, the star alleles or gene haplotypes can be converted to a vector of zeros and ones.
The variants can be order by the relative location on the gene or by other consistent ordering. The value of 1 is
assigned if the minor allele of the variant is in the star allele and zero otherwise. For the star alleles in Table 3
ordered by the locus of the individual variants, *1=(0,0,0,0,0,0), *2=(0,1,0,1,1,1), *3=(1,0,0,0,0,0), *4=(0,0,1,0,0,0),
*5=(0,0,0,1,0,0), and *7=(0,1,0,1,1,0).
4
CYP4B1
refSNP ID
rs4646487
NA
rs45467195
rs2297810
rs4646491
rs2297809
Genomic
variant
Amino acid
changes
517C>T
881_882delAT
964A>G
993G>A
1018C>T
1123C>T
R173W
294Frameshift
(premature
stop)
S322G
M331I
R340C
R375C
Affy DMET
chip (v1)
CYP4B1star3(
RS4646487)
CYP4B1star2_
881delAT
CYP4B1star4
CYP4B1star
5(RS2297810)
CYP4B1star
2_R340
CYP4B1star
2_R375
Validated
N
N
N
N
N
N
Haplotype
Y
Y
Y
Y
Y
Y
Major allele
C
INS
A
G
C
C
Minor allele
T
DEL
G
A
T
T
x
x
x
*1
*2
*3
x
x
*4
x
*5
*7
x
x
x
x
Table 3. Translation table for the CYP4B1 star alleles. The >”nucleotide” or “del/ins” indicates the variant or minor
allele. The major and minor alleles are listed again in the table for clarity. The variant name used by Affymetrix® is
listed to identify the data (variant order must be controlled). The genetic variants are ordered from a starting location
in the gene. For this particular gene, the star alleles are defined not by a single variant but by a combination, or
haplotype. The variants that define the star allele for the gene on a specific chromosome are indicated with an “x” for
each of the star alleles *1-*5 and *7. *1 has no variants and it is C-INS-A-G-C-C or 0-0-0-0-0-0 in numbers of variants
at each loci. The *7 allele has 3 variants and it is C-DEL-A-A-T-C or 0-1-0-1-1-0.
The patient diploid genotypes (from the chromosome pair) can also be converted to a numeric vector. The elements
of the vector are the number of minor alleles at each locus (e.g. SNP). The values at any given locus are 0, 1 or 2:
0=the variant (minor allele) is not present on either chromosome at the locus,
1=the variant (minor allele) is present on one chromosome at the locus but not the other (heterozygous),
2=the variant (minor allele) is present on both chromosomes at the locus.
Converting the genotypes into numbers is actually a better representation of the data. A heterozygote such as CT is
now 1 with no possible implication of which nucleotide is on which chromosome.
The algorithm simply loops through all possible star allele pair vectors and compares their vector sum to the numeric
patient vector that is defined by the genotypes. This is a double loop with the first index going from first to last star
allele and the second index going from the star allele of the first index to the last star allele. A star allele pair that has
a vector sum equal to the patient numeric genotypes vector is a star allele genotype that is consistent with the
patients’ variant level genotypes. For the six star alleles of our CYP4B1 translation (Table 3), the sum of all possible
pairs of star alleles can be listed (as the loop would run). Figure 3 shows the vector sum of each CYP4B1 star allele
pair as a column vector. Comparing the patient genotype vector to the vector sums is the basis of the algorithm used
to find the star allele genotype(s) that is consistent with the patient’s variant level data.
5
*1/*1 *1/*2 *1/*3
⎡1 ⎤
⎡0 ⎤
⎡0 ⎤
⎢0 ⎥
⎢1⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢0 ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢0 ⎥
⎢0 ⎥
⎢1⎥
⎢0 ⎥
⎢0 ⎥
⎢1⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢⎣0⎥⎦
⎣⎢0⎦⎥
⎣⎢1⎦⎥
*1/*4 *1/*5 *1/*7
⎡0 ⎤
⎡0 ⎤
⎡0 ⎤
⎢1 ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢0 ⎥
⎢0 ⎥
⎢1 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢1 ⎥
⎢1 ⎥
⎢0 ⎥
⎢1 ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢⎣0⎥⎦
⎣⎢0⎦⎥
⎣⎢0⎦⎥
*4/*4 *4/*5 *4/*7
⎡0 ⎤
⎡0 ⎤
⎡0 ⎤
⎢1 ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢1 ⎥
⎢1 ⎥
⎢ 2⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢1 ⎥
⎢1 ⎥
⎢0 ⎥
⎢1 ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢⎣0⎥⎦
⎣⎢0⎦⎥
⎣⎢0⎦⎥
*2/*2 *2/*3
⎡1⎤
⎡0 ⎤
⎢1⎥
⎢ 2⎥
⎢ ⎥
⎢ ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢1⎥
⎢ 2⎥
⎢1⎥
⎢ 2⎥
⎢ ⎥
⎢ ⎥
⎢⎣1⎥⎦
⎣⎢2⎦⎥
*2/*4 *2/*5 *2/*7
⎡0 ⎤
⎡0 ⎤
⎡0 ⎤
⎢ 2⎥
⎢1 ⎥
⎢1⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢0 ⎥
⎢0 ⎥
⎢1⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ 2⎥
⎢ 2⎥
⎢1⎥
⎢ 2⎥
⎢1 ⎥
⎢1⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢⎣1 ⎥⎦
⎣⎢1 ⎦⎥
⎣⎢1⎦⎥
*5/*5 *5/*7
⎡0⎤
⎡0 ⎤
⎢1 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢0⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 2⎥
⎢ 2⎥
⎢1 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢⎣0⎥⎦
⎣⎢0⎦⎥
*3/*3
⎡ 2⎤
⎢0 ⎥
⎢ ⎥
⎢0 ⎥
⎢ ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢⎣0⎥⎦
*3/*4 *3/*5 *3/*7
⎡1 ⎤
⎡1 ⎤
⎡1 ⎤
⎢1 ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢0 ⎥
⎢0 ⎥
⎢1 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢1 ⎥
⎢1 ⎥
⎢0 ⎥
⎢1 ⎥
⎢0 ⎥
⎢0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢⎣0⎥⎦
⎢⎣0⎥⎦
⎢⎣0⎥⎦
*7/*7
⎡0⎤
⎢ 2⎥
⎢ ⎥
⎢0⎥
⎢ ⎥
⎢ 2⎥
⎢ 2⎥
⎢ ⎥
⎢⎣0⎥⎦
Figure 3. CYP4B1 vector sums for each pair of star alleles that could be a possible star allele genotype. The
algorithm is comparing the numeric patient data vectors to these vector sums to find consistent star allele genotypes.
For missing patient locus variants, a weight matrix must be added to the comparison of star allele haploid pairs with
the patient’s numeric genotype vector. By making the variant weight 0, for a missing patient genotype, this variant
can be ignored in comparison of star allele vector sums at this variant. The result is generally more pairs of star
alleles that are consistent with the patient’s variant level genotype data. However, there are many cases, where the
missing variant does not affect which star allele pairs are consistent with the non-missing patient genotypes.
SAS Proc IML (Interactive Matrix Language) is ideal for implementing the algorithm and performing the vector
comparisons. It is also easy to track and store the text star allele genotypes, which have vector sums that match the
patient vector. The algorithm is not efficient but it is transparent. Since clinical translations are not always well
studied, transparency is crucial for working with a wide range of subject matter experts. Sometimes multiple versions
of the translation tables must be tested to find definitions that are consistent with observed genotypes. Literature on
some rare star alleles is quite sparse.
The translation algorithm has the following three limitations. First, the translation table must include all the important
haplotypes. Due to the random nature of cross-over events during meiosis, it is possible that some important
haplotypes have not been observed. Second, the variants are assumed to be of a biallelic type (normal vs. mutant two possible versions at each locus). In the case of a variant that is not biallelic, the second limitation can often be
overcome, since one locus with several (say n) variants can be represented by at most an n(n-1)/2 set of vector
components. Third, copy number variants, or cases where patients do not have two gene copies, can not easily be
included in the translation. Inclusion of copy number variants is of high importance due to their functional role in
some genes such as CYP2D6.
ALGORITHM IN SAS
First, input SAS datasets are needed. The imported translation table dataset is shown in Table 4. Patient variant
level data is shown in Table 5. It is not uncommon for an assay measuring a locus variant to fail. A variant assay
failure is noted numerically with a -1. The early Affymetrix® platform also reported possible rare alleles. In some
6
cases, “possible rare allele” implied that at least one chromosome carried the minor allele. If it is know that the minor
allele is present on 1 or more of the chromosomes, then a -2 is entered instead of -1.
CYP4B1
F2
F3
F4
F5
F6
F7
Triallelic
refSNP ID
rs4646487
NA
rs45467195
rs2297810
rs4646491
rs2297809
Genetic
nucleotide
position
517C>T
881_882
delAT
964A>G
993G>A
1018C>T
1123C>T
Amino acid
changes
R173W
294Frame
shift
(premature
stop)
S322G
M331I
R340C
R375C
Affy DMET chip
(v1)
CYP4B1
star3
CYP4B1
star4
CYP4B1
star5
CYP4B1
star2_R340
CYP4B1
star2_R375
(RS4646487)
CYP4B1
star2_881
delAT
N
Y
C
T
N
Y
INS
DEL
N
Y
A
G
N
Y
G
A
N
Y
C
T
N
Y
C
T
x
x
x
x
x
x
Validated
Haplotype
Major allele
Minor allele
*1
*2
*3
*4
*5
*7
(RS2297810)
x
x
x
x
Table 4. SAS dataset pl.transtable_cyp4b1 derived from a standard import of the excel translation table shown in
Table 2. The structure of the table (row/column position of information) is assumed by the macro.
Study_Pr
otocal
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
Patient_I
dentifier _02
_03
_04
_05
_06
_07
eg05
0
-1
0
1
1
1
eg06
1
-1
0
1
1
1
eg01
0
0
0
1
0
0
eg07
1
-1
0
1
1
0
eg00
0
2
0
2
2
2
eg08
1
1
0
1
1
-1
eg03
2
0
0
0
0
0
eg02
0
1
0
2
1
1
eg04
-1
0
0
0
0
0
eg10
0
-1
0
2
0
0
eg09
1
0
1
0
0
0
Table 5. SAS table: num_dat.transpose_CYP4B1 of patient locus level genotype data. Variables _02-_07 are the
variants listed in the order of the translation table. The genotypes have been converted to number of minor alleles.
The macro assumes numeric data is the locus level genotypes ordered as they are listed in the translation table from
left to right. The -1 is used to indicate missing data for a variant. Though the patient identifiers are artificial, the
variant level genotypes were observed on patient samples.
The algorithm is easiest to follow in the Proc IML code. SAS datasets are essential in manipulating the input data into
a vector/matrix form that can be fed into the algorithm. There is complexity in the macro because the code is
7
intended to work on an arbitrary gene’s translation table. The patient locus level gentoypes must have been
converted to numeric form for the given gene.
*** start of translation code for gene defined by macro variable current gene ***;
libname pl ‘A directory';
libname num_dat ‘A directory with patient data’;
%let ncol=1;
%let current_gene=CYP4B1;
/***
Convert translation table from excel to 0/1 vector that defines star allele by the presence vs absence of the variant in
the star allele definition vectors.
***/
%macro get_gene(gene);
data tr_&gene.;
set pl.transtable_&gene.;
array allvar{*} _character_;
length name $15. vlist $300.;
if (_N_=7) then do i=1 to dim(allvar);
if allvar(i)=" " then do;
call vname(allvar{i}, name);
vlist=catx(' ',vlist,trim(name));
end;
end;
if (_N_=7) then call symput('to_drop', left(trim(vlist)));
drop name vlist i;
RUN;
%if not("&to_drop." = " ") %then %do;
data tr_&gene.;
set cyp.transtable_&gene.;
drop &to_drop.;
run;
%end;
data tr_&gene.;
set tr_&gene.;
array allvar{*} _character_;
call symput('ncol', left(trim(put(dim(allvar)-1,best.))));
run;
data tr_&gene.;
set tr_&gene.;
array allvar{*} _character_;
array variants{&ncol.} v1-v&ncol.;
length star_al $5.;
if _N_>=12 then do;
star_al=allvar{1};
do i=1 to &ncol.;
if allvar{i+1}=" " then variants{i}=0;
else variants{i}=1;
end;
if not(star_al=" ") then output;
end;
keep star_al v1-v&ncol.;
run;
%mend;
%get_gene(&current_gene.);
8
/**********************************************************************************
v1
v2
0
0
1
0
0
0
v3
0
1
0
0
0
1
v4
0
0
0
1
0
0
v5
0
1
0
0
1
1
v6
0
1
0
0
0
1
0
1
0
0
0
0
star_al
*1
*2
*3
*4
*5
*7
tr_&gene as row vectors defining the star alleles (star_al) as shown above for CYP4B1
***********************************************************************************/
/***
macro performs the translation of the locus level genotypes that have been converted from character genotypes to
numeric number of minor alleles in the genotype (0, 1, 2). The code -1 is used for missing data and -2 is used for the
cases of “either 1 or 2” i.e. at least one minor allele is present in the genotype.
****/
%macro pros_gene(gene);
proc iml;
use tr_&gene.;
read all var {star_al};
read all var _num_ into vv;
use num_dat.transpose_&gene.;
read all var {Study_Protocol Patient_Identifier};
read all var _num_ into ww;
wwt=t(ww);
vvt=t(vv);
nprobe = nrow(vvt);
nallele = ncol(vvt);
npat=nrow(ww);
do i=1 to npat;
wpat=wwt[,i];
icall=loc(wpat>-1);
i_nc=loc(wpat=-1);
i_pra=loc(wpat=-2);
if nrow(i_nc)>0 then wpat[i_nc]=0;
if nrow(i_pra)>0 then wpat[i_pra]=1;
pgt={" "};
sgt={" "};
do j=1 to nallele;
/*** begin process patient i variant level data ***/
al1=vvt[,j];
/*** get star allele vector j ***/
do k=j to nallele;
al2=vvt[,k]; ];
/*** get star allele vector k>= j ***/
cgt=al1+al2;
/*** compute the sum of the j,k star allele pair ***/
diffp=wpat-cgt;
/*** compare with patient i numeric locus genotype vector ***/
pmatch=max(abs(diffp));
if pmatch=0 then do;
pgt=concat(pgt,star_al[j],{"/"},star_al[k],{","}); /**when match is found then store the star allele genotype**/
end;
/****
Code from this point on deals with missing data and accumulates the patient star allele genotype calls.
****/
else do;
if nrow(i_pra)>0 then smat1=min(cgt[i_pra]);
else smat1=1;
diffs=diffp;
if nrow(i_nc)>0 then diffs[i_nc]=0;
if nrow(i_pra)>0 then diffs[i_pra]=0;
9
smatch=max(abs(diffs));
if smat1>0 & smatch=0 then do;
sgt=concat(sgt,star_al[j],{"/"},star_al[k],{","});
end;
end;
end;
end;
if i=1 then do;
pat_gt=Study_Protocol[1] || Patient_Identifier[1] || pgt || sgt;
end;
else do;
pat_gt=pat_gt // (Study_Protocol[i] || Patient_Identifier[i] || pgt || sgt);
end;
end;
print pat_gt;
create pl.&gene.lcalls from pat_gt[colname={'Study_Protocol' 'Patient_Identifier' 'pgt' 'sgt'}];
append from pat_gt;
quit;
%mend;
/****
Run macro to translate the data. Merge translation with raw data for review.
****/
%pros_gene(&current_gene.);
proc sort data=pl.&current_gene.lcalls;
by Study_Protocol Patient_Identifier;
run;
proc sort data=num_dat.transpose_&current_gene.;
by Study_Protocol Patient_Identifier;
run;
data pl.&current_gene.lcalls;
merge num_dat.transpose_&current_gene. pl.&current_gene.lcalls;
by Study_Protocol Patient_Identifier;
length gene_name $12.;
pgt=compress(pgt);
sgt=compress(sgt);
gene_name=upcase("&current_gene.");
run;
The translated patient data is stored in the table cyp4b1lcalls with the transposed locus level genotypes coded
numerically. Table 6 shows the translated results. The variable PGT is the primary genotype. The primary
genotype(s) is based on complete locus level data or in the case of missing variant data, the variant is assumed to be
0 (no minor alleles). The secondary genotypes (SGT) are possible star allele genotypes, which result from “non-wild
type” possible variants due to missing data. Table 5 can be compared to Figure 3 to see directly how the star allele
genotypes are matched to the patient locus level variant genotypes in vector representation.
10
Study_Pr
otocal
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
SUGI
Patient_I
dentifier _02 _03
_04 _05 _06 _07 PGT
eg00
0
2
0
2
2
2 *2/*2,
eg01
0
0
0
1
0
0 *1/*5,
eg02
0
1
0
2
1
1 *2/*5,
eg03
2
0
0
0
0
0 *3/*3,
eg04
-1
0
0
0
0
0 *1/*1,
eg05
0
-1
0
1
1
1
eg06
1
-1
0
1
1
1
eg07
1
-1
0
1
1
0
eg08
1
1
0
1
1
-1 *3/*7,
eg09
1
0
1
0
0
0 *3/*4,
eg10
0
-1
0
2
0
0 *5/*5,
SGT
*1/*3,*3/*3,
*1/*2,
*2/*3,
*3/*7,
*2/*3,
Table 6. SAS dataset pl.cyp4b1lcalls. The star allele genotype(s) are in variables PGT (Primary Genotype) and
SGT (Secondary Genotype).
COLAPSING HAPLOTYPES IN CONSTRUCTION OF TRANSLATION TABLES
Typically a genotyping platform will not have all the variants needed to identify all known star alleles. This results in
components of the star allele defining vectors being removed and previously distinct star allele patterns becoming
identical to other star alleles. Given a current list of star alleles, which has all necessary variants, one must determine
the star alleles that are aliased with one another, when some necessary variants are removed. For CYP4B1, one can
visually compare CYP4B1*3 and CYP4B1*6 with the other star alleles. It is not hard to see that the loss of the
1033G>A variant will only make CYP4B1*3 indistinguishable from CYP4B1*6. No other star alleles are effected by
the loss of 1033G>A. However, there are much more complex CYP genes (see CYP2D6 in Appendix 1).
Fortunately, it is possible to use the vector idea for defining star alleles to determine, which star alleles are aliased
with one another using SAS code.
Table 7 displays a SAS table containing the first few columns of the Karolinska CYP4B1 star allele nomenclature,
http://www.cypalleles.ki.se/cyp4b1.htm. The information is identical to Table 2 but care has been taken to put the
variant list (Nucleotide Changes) into one variable record with semicolons delimiting the variants. The SAS code in
Appendix 1 is used to systematically uncover, which star alleles can not be distinguished from one another. The SAS
code results, based on Table 7 as the cyp_raw input, are shown in Table 8. Note, the ordering of the variants is
changed due to the SAS transpose statement. However, the loss of the 1033G>A variant shows that *3 and *6 are
not distinguishable with the remaining locus variants because they now have the same pattern.
Allele
Protein
CYP4B1*1
CYP4B1*2
CYP4B1*3
CYP4B1*4
CYP4B1*5
CYP4B1*6
CYP4B1*7
CYP4B1.1
CYP4B1.2
CYP4B1.3
CYP4B1.4
CYP4B1.5
CYP4B1.6
CYP4B1.7
Variants
cDNA
None
881_882delAT; 993G>A; 1018C>T; 1123C>T
517C>T
964A>G
993G>A
517C>T; 1033G>A
881_882delAT; 993G>A; 1018C>T
Table 7. CYP4B1 information from http://www.cypalleles.ki.se/cyp4b1.htm imported into a SAS table, cyp_raw, for
processing. This is the input information for the SAS dataset code in Appendix 1.
The code works on general nomenclature tables of the form listed on http://www.cypalleles.ki.se/ with data imported to
cyp_raw in the format of Table 7. The code is based on simple vector ideas presented for translation. First, one
constructs a matrix table with rows that are the vectors defining all star alleles. This is the current complete
translation table based on a reference (in this case Karolinska). Again, the fields are the locus variants that define the
star alleles. In order to construct this complete translation table, the information from the Karolinska website must be
imported into a SAS table and then transposed into the matrix of star allele row vectors.
11
The next step is to keep only the variants in a particular assay panel. This is done with a keep statement. The star
allele row vectors are now converted to (single field) character variant patterns. With variants removed, these star
allele vectors and equivalent patterns are no longer unique. Star alleles that share a pattern can no longer be
distinguished from one another. These groups are listed in the variable allele_grp in the final table.
_1123C_T
_881_882delAT
_993G_A
_1018C_T
_517C_T
_964A_G
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
pattern
'000000
'000001
'000010
0
0
1
0
1
1
1
1
1
0
1
1
0
0
0
0
0
0
'001000
'011100
'111100
allele_grp
CYP4B1*1
CYP4B1*4
CYP4B1*3,
CYP4B1*6
CYP4B1*5
CYP4B1*7
CYP4B1*2
Table 8. Using our appendix SAS code on CYP4B1, the output table, cyp_condensed, has the star alleles that are
not distinguishable from one another (see variable allele_grp). The keep statement in creation of the SAS table
cyptt2, has all the variants except 1033G>A. The result is that the variants patterns are the same for both CYP4B1*3
and CYP4B1*6. They cannot be distinguished with the variants that we had in our early DMET platform.
The methods described here represent a simple, systematic and transparent process for translating genetic variants
into star alleles (e.g. Table 2). However, the “effect” information in column 4 of the Karolinska tables is of crucial
importance, where contextually important hypertext is listed as effects and annotated. Currently this information must
be processed manually. This information is crucial in dealing with groups of star alleles that are aliased with one
another (see CYP2D6 in Appendix 1) .
CONCLUSION
The amount of genetic information with clinical application is increasing (www.fda.gov/Cder/guidance/6400fnl.pdf
page 4 for example). Improvements in data processing and analyses are necessary in order to maximize the
usefulness of this information for clinicians. Efficient algorithms and implementation can lead to huge efficiency gains
in both developing and using genetic assays. This manuscript describes how SAS may be utilized to simplify the
translation of genetic variation into the common drug metabolizing enzyme and transporters (DMET) nomenclature.
Algorithms translating the often unfamiliar or un-interpretable genetic variants into the DMET nomenclature
recognized by clinicians, the star allele nomenclature, will increase the interpretability and thus the utility in a clinical
setting. In a SAS environment, the translations can be made with current information from the literature and websites
such as Karolinska. However, an intimate knowledge of both the biological relevance and the mathematical structure
are required for any one individual to recognize this. Close partnership and collaboration between statisticians,
programmers, data management experts and the clinician or biologist, often requiring persistence, is essential to
bring the potential to fruition.
REFERENCES
Cascorbi I, (2006). Genetic basis of toxic reactions to drugs and chemicals. toxicol. Lett. 162(1), 16-28.
Daly TM, Dumaual CM, Miao X. (2007), Multiplex assay for comprehensive genotyping of genes involved in drug
metabolism, excretion, and transport. Clin Chem. 53(7):1222-1230.
Dumaual C, Miao X, Daly TM, et al. (2007), Comprehensive assessment of metabolic enzyme and transporter genes
using the Affymetrix Targeted Genotyping System. Pharmacogenomics. 8(3):293-305.
Gonzalez, F.J., Skoda, R.C., Kimura, S., Umeno, M., Zanger, U.M., Nebert, D.W. (1988), Characterization of the
common genetic defect in humans deficient in debrisoquine metabolism. Nature 331: 442–446.
Lewis DF, (2005). Human P450s in the metabolism of drugs: molecular modeling of enzyme-substrate interactions.
Expert Opin. Drug Metab. Toxicol, 1(1), 5-8.
Mega JL, Close S, Wiviott SD, Shen, L, Hockett RD, Brandt JT, Walker JR., Antman EM., Macias W, Braunwald E,
and Sabatine MS (2009), Cytochrome P-450 Polymorphisms and Response to Clopidogrel. NEJM 360:354-362
12
Robarge, J. D., L. Li, Z. Desta, A. Nguyen, and D. A. Flockhart. (2007), The star-allele nomenclature: retooling for
translational genomics. Clin Pharmacol Ther 82:244-248.
Schaid, D.J., Rowland, C.M., Tines, D.E., Jacobson, R.M., and Poland, G.A. (2002), "Score Tests for Association
between Traits and Haplotypes when Linkage Phase is Ambiguous," American Journal of Human Genetics, 70:425 434.
Sim, S.C. & Ingelman-Sundberg, M. (2006), The human cytochrome P450 Allele Nomenclature Committee Web site:
submission criteria, procedures, and objectives. Methods Mol. Biol. 320:183–191.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Mark Farmen
Lilly Research Laboratories
Lilly Corporate Center
Indianapolis, IN 46285
Work Phone: (317) 433-4262
E-mail: [email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
APPENDIX 1
SAS CODE FOR COLAPSING HAPLOTYPES AND BUILDING TRANSLATION TABLES
data cypv1;
set cyp_raw;
retain star_allele;
length gv $70.;
if upcase(variants)="NONE" then variants=" ";
if length(Allele)>5 then do;
if upcase(substr(compress(Allele),1,3))="CYP" then star_allele=Allele;
end;
do ii=1 to 1000;
gv = compress(trim(scan(variants, ii, ";")));
* if no more gv AND not first entry then exit loop;
if gv='' & ii>1 then leave;
output;
end;
run;
data cypt;
set cypv1;
ind=1;
if (gv=" ") then do;
if (star_allele^=" ") then do;
gv="NONE";
ind=0;
end;
else delete;
end;
if (star_allele=" ") then delete;
keep star_allele gv ind;
run;
13
proc sort data=cypt;
by star_allele gv;
run;
proc transpose data=cypt out=cyptt;
by star_allele;
id gv;
var ind;
run;
data cyptt;
set cyptt;
array allnum{*} _numeric_;
do i=1 to dim(allnum);
if allnum{i}=. then allnum{i}=0;
end;
drop i NONE _NAME_;
run;
/****
Save cyptt in order to select the variants that are genotyped by the platform. These go in the keep statement below.
****/
data cyptt2;
set cyptt;
keep star_allele _1123C_T _881_882delAT _993G_A _1018C_T _517C_T _964A_G;
run;
data cyptt2;
set cyptt2;
length pattern $25. bit $1.;
array allnum{*} _numeric_;
do i=1 to dim(allnum);
bit=put(allnum{i},1.);
pattern=compress(pattern)||bit;
end;
drop bit i;
run;
proc sort data=cyptt2;
by pattern star_allele;
run;
data cyp_condensed;
set cyptt2;
by pattern star_allele;
retain allele_grp;
length allele_grp $350.;
if first.pattern then do;
allele_grp=star_allele;
end;
if not first.pattern then do;
allele_grp=trim(allele_grp)||", "||compress(star_allele);
end;
14
if last.pattern then output;
run;
The above code can be applied to much more complex genes. The CYP2D6 star alleles were imported into SAS
from http://www.cypalleles.ki.se/cyp2d6.htm in the format shown in Table 7. The first 3 columns and a small selection
of star alleles are shown in Table A1. Though importing the webpage table into SAS is difficult, the SAS code can be
run essentially unchanged.
Allele
CYP2D6*1A
CYP2D6*2A
CYP2D6*3A
CYP2D6*3B
CYP2D6*4A
CYP2D6*4B
CYP2D6*4C
CYP2D6*4D
CYP2D6*4E
CYP2D6*4F
CYP2D6*4G
CYP2D6*4H
CYP2D6*4J
CYP2D6*4K
CYP2D6*4L
CYP2D6*4M
Protein
CYP2D6.1
CYP2D6.2
variants
-1584C>G; -1235A>G; -740C>T;-678G>A;
CYP2D7 gene conversion in intron 1;
1661G>C; 2850C>T; 4180G>C
2549delA
1749A>G; 2549delA
100C>T; 974C>A; 984A>G; 997C>G;
1661G>C; 1846G>A; 4180G>C
100C>T; 974C>A; 984A>G; 997C>G;
1846G>A; 4180G>C
100C>T; 1661G>C; 1846G>A; 3887T>C;
4180G>C
100C>T; 1039C>T; 1661G>C; 1846G>A;
4180G>C
100C>T; 1661G>C; 1846G>A; 4180G>C
100C>T; 974C>A; 984A>G; 997C>G;
1661G>C; 1846G>A; 1858C>T; 4180G>C
100C>T; 974C>A; 984A>G; 997C>G;
1661G>C; 1846G>A; 2938C>T; 4180G>C
100C>T; 974C>A; 984A>G; 997C>G;
1661G>C; 1846G>A; 3877G>C; 4180G>C
100C>T; 974C>A; 984A>G; 997C>G;
1661G>C; 1846G>A
100C>T; 1661G>C; 1846G>A; 2850C>T;
4180G>C
100C>T; 997C>G; 1661G>C; 1846G>A;
4180G>C
-1235A>G; 746C>G; 843T>G 974C>A;
984A>G; 997C>G; 1661G>C; 1846G>A;
2097A>G; 3384A>C; 3582A>G; 4401C>T
Table A1. Vital variables from the SAS table cyp_raw containing CYP2D6 star allele definitions based on the
Karolinska website 12/12/2008. Due to size issues, only some of the rows are listed. Refer to
http://www.cypalleles.ki.se/cyp2d6.htm for the complete table.
The validated CYP2D6 variants on the early DMET assays were 100C>T; 4180G>C; 2850C>T; 883G>C; 124G>A;
1758G>A; 1758G>T; 137_138insT; 1023C>T; 2539_2542delAACT; 2573_2574insC; 2988G>A; 2587_2590delGACT;
2549delA; 2950G>C; 1846G>A; CYP2D6deleted; 1707delT; 2935A>C; 2615_2617delAAG. Using the CYP2D6 data
in cyp_raw (see Table A1) as the input table and the following data steps (see keep statement) to reduce the alleles
in table cyptt2, the star alleles that can not be distinguished (because the variants are not measured) are obtained
from cyp_condensed. The variant patterns and allele_grp are shown in Table A2.
data cyptt2;
set cyptt;
15
keep star_allele _100C_T _4180G_C _2850C_T _883G_C _124G_A _1758G_A _1758G_T _137_138insT
_1023C_T _2539_2542delAACT _2573_2574insC _2988G_A _2587_2590delGACT _2549delA _2950G_C
_1846G_A CYP2D6deleted _1707delT _2935A_C _2615_2617delAAG;
run;
Pattern
allele_grp
'00000000000000000000
CYP2D6*13, CYP2D6*16, CYP2D6*17XN, CYP2D6*18,
CYP2D6*1A, CYP2D6*1B, CYP2D6*1C, CYP2D6*1D,
CYP2D6*1E, CYP2D6*1XN, CYP2D6*22, CYP2D6*23,
CYP2D6*24, CYP2D6*25, CYP2D6*26, CYP2D6*27,
CYP2D6*33, CYP2D6*43, CYP2D6*48, CYP2D6*50,
CYP2D6*53, CYP2D6*60, CYP2D6*61, CYP2D6*62,
CYP2D6*66, CYP2D6*67, CYP2D6*68, CYP2D6*71
'00000000000000000001
'00000000000000000100
'00000000000000001000
'00000000000000010000
'00000000000000100000
'00000000000001000000
'00000000000010000000
'00000000000100000000
'00000010000000000000
'00100000000000000000
'01000000000000000000
'01000000000000001000
'01100000000000000000
CYP2D6*9
CYP2D6*7
CYP2D6*6A, CYP2D6*6B, CYP2D6*6D
CYP2D6*5
CYP2D6*4M
CYP2D6*44
CYP2D6*3A, CYP2D6*3B
CYP2D6*38
CYP2D6*15
CYP2D6*34, CYP2D6*63
CYP2D6*39, CYP2D6*70
CYP2D6*6C
CYP2D6*20, CYP2D6*28, CYP2D6*29, CYP2D6*2A,
CYP2D6*2B, CYP2D6*2C, CYP2D6*2D, CYP2D6*2E,
CYP2D6*2F, CYP2D6*2G, CYP2D6*2H, CYP2D6*2J,
CYP2D6*2K, CYP2D6*2L , CYP2D6*30, CYP2D6*31,
CYP2D6*32, CYP2D6*35, CYP2D6*35X2, CYP2D6*42,
CYP2D6*45A, CYP2D6*45B, CYP2D6*46_1, CYP2D6*46_2,
CYP2D6*51, CYP2D6*55, CYP2D6*56A, CYP2D6*59
'01100000000000000010
01100000001000000000
'01100000010000000000
'01100000100000000000
'01100001000000000000
'01100100000000000000
'01101000000000000000
'01110000000000000000
'10000000000000100000
'11000000000000000000
CYP2D6*8
CYP2D6*2M, CYP2D6*41
CYP2D6*21A, CYP2D6*21B
CYP2D6*19
CYP2D6*17, CYP2D6*40, CYP2D6*58
CYP2D6*14B
CYP2D6*12
CYP2D6*11
CYP2D6*4G, CYP2D6*4J
CYP2D6*10A, CYP2D6*10B, CYP2D6*10D,
CYP2D6*36Dupl., CYP2D6*36single, CYP2D6*37,
CYP2D6*47, CYP2D6*49, CYP2D6*52, CYP2D6*54,
CYP2D6*56B, CYP2D6*57
CYP2D6*4A, CYP2D6*4B, CYP2D6*4C, CYP2D6*4D,
CYP2D6*4E, CYP2D6*4F, CYP2D6*4H, CYP2D6*4L,
CYP2D6*4N
'11000000000000100000
'11000001000000000000
'11100000000000000000
'11100000000000100000
Star Allele
choice
CYP2D6*64
CYP2D6*65
CYP2D6*4K
16
'11100000001000000000
'11100100000000000000
CYP2D6*69
CYP2D6*14A
Table A2. Output SAS table cyp_condensed for CYP2D6, which shows the star alleles that can not be distinguished
based on the variants that can be genotyped. The star allele choice must be made based on knowledge of the key
variants and star allele frequency in order to create the translation table.
The star alleles that cannot be determined because the variants are not measured with the DMET platform are
classified as wild type alleles (*1 in most cases). When no data suggests that subtypes of a star allele behave
differently in vivo, these subtypes are grouped together (e.g. *6A, *6B, and *6D into *6). The *10 allele poses some
challenges. The allele classifying variants are the 100C>T and the 4180G>C. These two variants are also found in
several other star alleles (for example *4, *14, *36, & *37). When both are present and there is no 1846G>A (calls a
*4) or 1758G>T (calls a *14) then the allele is classified as *10. No attempt is made to distinguish *10 from *36 or *37
as there is no data to suggest the additional variants present in the *36 or *37 haplotypes cause additional enzyme
activity changes compared to the *10 allele.
17