Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
1997 Oxford University Press Human Molecular Genetics, 1997, Vol. 6, No. 7 1099–1107 Polymorphisms in the apolipoprotein(a) gene and their relationship to allele size and plasma lipoprotein(a) concentration Loretto H. Puckey1, Richard M. Lawn2 and Brian L. Knight1,* 1MRC Lipoprotein Team, Clinical Sciences Centre, Royal Postgraduate Medical School, Hammersmith Hospital, London W12 ONN, UK and 2Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA Received February 11, 1997; Revised and Accepted April 14, 1997 Genotypes at five previously described polymorphic sites at the apolipoprotein(a) gene locus have been determined for the members of 27 families as well as for unrelated white Caucasian and Asian-Indian subjects, and their relationship with isoform size and plasma lipoprotein(a) concentrations investigated. There was strong linkage disequilibrium between sites at the 5′-region of the gene and also between this region and a site in the coding sequence for Kringle 4-37 on the other side of the polymorphic Kringle 4 repeat region. There was no evidence that changes at any of the sites had any direct effect upon lipoprotein(a) concentration. However, certain haplotypes were present almost exclusively on apolipoprotein(a) alleles within a restricted range of sizes and associated lipoprotein(a) concentrations. After correcting for the effect of allele size, there were clear differences between the lipoprotein(a) concentrations associated with alleles of different haplotypes, suggesting that there may be genetically distinct groups of apolipoprotein alleles of different size and different levels of expression. Factors that regulate expression apparently exchange at a rate similar to the rate of change of Kringle 4 repeat number. INTRODUCTION Lipoprotein(a) [Lp(a)] is a plasma particle similar in size and composition to low-density lipoprotein. In addition to apolipoprotein (apo)B100, Lp(a) contains a second protein, apo(a), that bears a strong resemblance to plasminogen (1,2). Apo(a) occurs in >30 differently sized isoforms resulting from different numbers of a repeated sequence homologous to the fourth of the triple-looped ‘Kringle’ structures found in plasminogen (3). There is considerable variation in plasma Lp(a) concentrations between individuals, >90% of which is determined by the apo(a) gene locus (4–6). In Caucasian populations, there is an inverse trend between Lp(a) concentration and the size of its apo(a) component (7). However, there is a large variation within this trend, with up to 200-fold differences in the concentrations associated with the same sized apo(a) isoforms produced from different independent alleles (8). In the livers of both monkeys (9) and man (10), there are also differences in the abundance of apo(a) mRNA transcripts of the same size, suggesting that at least part of the size-independent variation in expression is likely to involve regions that regulate the rate of transcription of the gene. The immediate 5′-flanking region of the apo(a) gene has been cloned (11,12), and the first 1.5 kb has been shown to promote transcription of luciferase reporter gene constructs transfected into HepG2 cells (11). Three single base substitutions and a variable region containing between seven and 11 copies of a TTTTA repeat have been identified in this sequence (11,13,14). One substitution, a C→T change at position +93 in the 5′-untranslated region of the first exon, is associated with a 58% decrease in expression resulting from a reduction in translation due to the creation of a novel upstream ATG start site (14). Also, an association of 10 or 11 TTTA repeats with small apo(a) isoforms (15) or low plasma Lp(a) concentrations (15,16) has been observed. However, the other sites have not been examined in detail. To assess the physiological significance of these changes, we have studied the polymorphisms in white Caucasian and Asian-Indian subjects and have investigated their relationship to allele size and Lp(a) concentration. We have also examined the effects of a further polymorphism, in the coding sequence at the other end of the gene, which changes a Met residue to a Thr close to the putative lysine-binding site in Kringle 4-37 (17) and has been reported to be in linkage disequilibrium with the apo(a) size polymorphism (18). Finally, since there is evidence for the presence of subgroups of apo(a) alleles (8,13,15), we have constructed haplotypes from the five polymorphisms to discover if particular groups, defined in this way, are associated with a limited range of sizes or concentrations. RESULTS Population studies Five polymorphisms in the human apo(a) gene have been examined in this study (Fig. 1). One (N–/N+) relates to an NcoI *To whom correspondence should be addressed. Tel: +44 181 383 3262; Fax: +44 181 383 2077; Email: [email protected] 1100 Human Molecular Genetics, 1997, Vol. 6, No. 7 restriction enzyme site at position 12 605 in the coding sequence. The others are in the 1400 bp region immediately 5′ to the translation start site. Three of these are single base differences, at positions –772 (G/A), +93 (C/T) and +121 (G/A), while the fourth is a variable region beginning at position –1231, containing between seven and 11 copies of a TTTTA repeat. Table 1 shows the allele frequencies and heterozygosity index for each of these polymorphisms in a population of unrelated white Caucasians and a population of unrelated Asian-Indians. The only major difference between the populations was at position –772, where the G variant was more frequent in the Caucasians and the A variant was more frequent in the Indians. At the other positions, the rarer variant was slightly more frequent in the Indian population. Eight TTTTA repeats were most common in both populations, with nine and 10 repeats at a lower frequency, similar to each other. The frequencies in the Caucasian sample were similar to those described previously for the TTTTA repeats (15,16) the G/A (+121) polymorphism (13) and the N–/N+ polymorphism (18). In both populations reported in Table 1, the frequency distributions of genotypes were in Hardy–Weinberg equilibrium. Linkage disequilibrium was examined for each pair of polymorphisms by calculating the disequilibrium statistic (∆) and the relative linkage disequilibrium (rel. D), values of which are given in Figure 2. As would be expected, strong linkage disequilibrium was detected between the four polymorphic loci in the immediate 5′-flanking region of the apo(a) gene. There was only one instance where the ∆ value was not significant, between the sites at +93 and +121. Linkage disequilibrium was also detected between the 5′ sites and the NcoI polymorphism in Kringle 4-37. The only exceptions were the almost complete equilibrium observed between the NcoI polymorphism and the G/A site at position –772 in the Indian population and between the NcoI polymorphism and the G/A site at position +121 in the Caucasian population. The apo(a) phenotype was determined for 170 unrelated white Caucasians. Two bands were detected on phenotyping gels for all but 17 subjects. The others showed single band phenotypes. The distribution of the different sized isoforms in the two populations is shown in Figure 3, and provides evidence for at least two and probably three major subgroups. Family studies To examine the relationship between the different polymorphic variants in greater detail, genotypes were determined at all five sites for 269 members of 27 unrelated white Caucasian families. Haplotypes were constructed by analysis of the segregation of the variants, with the assumption that there had been no recombination within the apo(a) locus. For each pedigree, only the independent haplotypes were included in the subsequent analysis. Unambiguous haplotypes were obtained for 197 independent alleles. A total of 26 different haplotypes from a possible 40 (65%) were observed, the frequencies of which are given in Table 2. The polymorphic sites were clearly not in equilibrium. The T variant at position +93 was present exclusively on alleles that contained the A variant at position –772. Almost all of the A (+121) variants were also on alleles with the A variant at –772. In contrast, the N+ variant was more frequent on alleles containing the G variant at position –772. Most of the alleles containing G (–772) carried eight TTTTA repeats, while those containing A (–772) had a far higher proportion carrying nine or 10 repeats. For convenience, alleles will be identified by the pattern of variants, reading from the 5′ end. The apo(a) phenotype was determined for the members of 17 of the families, comprising 186 individuals. By tracking the isoforms through the families, it was possible to establish the number of Kringle 4 repeats contained by 156 of the alleles for which variants at the polymorphic sites were known. In assessing whether different variants were associated with alleles of different size, these alleles were augmented with those from the white Caucasian subjects used previously who were homozygous at the site of interest. The frequency of alleles containing different numbers of Kringle 4 repeats is given for individual variants in Figure 4. The frequency profiles for alleles with the G or A variants at position –772 were similar, as were those for the N– or N+ variants in Kringle 4-37. In contrast, there was a greater proportion of large and a smaller proportion of small alleles among those carrying the A variant at position +121. The T (+45) variant and nine TTTTA repeats were mainly on middle sized alleles, whereas a high proportion of the 10 TTTTA repeats were on small alleles. Table 1. Allele frequencies at five variable sites in the apo(a) gene and its 5′ region Polymorphism Position Allele White Caucasians Frequency ± SE Heterozygosity Asian-Indians Frequency ± SE Heterozygosity G/A –772 G 0.586 ± 0.020 0.485 0.429 ± 0.038** 0.489 C/T +93 C 0.867 ± 0.013 0.231 0.765 ± 0.033** 0.359 G/A +121 G 0.844 ± 0.014 0.263 0.765 ± 0.033* 0.359 NcoI ‘Kr37’ N– 0.697 ± 0.018 0.422 0.582 ± 0.038** (TTTTA)n –1231 0.485 0.486 0.552 n =7 0.005 ± 0.003 0.006 ± 0.005 n=8 0.687 ± 0.018 0.612 ± 0.037 n=9 0.144 ± 0.014 0.206 ± 0.031* n = 10 0.149 ± 0.014 0.176 ± 0.029 n = 11 0.016 ± 0.005 0 Nucleotide positions are expressed relative to the transcription start site. The NcoI site represents a C/T polymorphism at position 12 605 of the coding sequence, in Kringle 4-37, as described by McLean et al. (3). Values are shown for a population of 319 white Caucasians and a population of 85 Asian-Indians. Significant difference between populations (χ2 test); *P 0.05, **P <0.001. 1101 Human Genetics, 1997, 6, No. NucleicMolecular Acids Research, 1994, Vol. Vol. 22, No. 1 7 1101 Table 2. Frequency of alleles containing different haplotypes at the apo(a) gene locus in white Caucasian families Variant at position 772 +93 No. of alleles +121 ‘Kr37’ Row No. of TTTTA repeats Total (%) 7 8 9 10 11 G G G G C C C C G G A A N– N+ N– N+ 0 0 0 0 59 44 1 3 4 1 1 0 4 1 0 0 2 0 0 0 69 (35%) 46 (23%) 2 (1%) 3 (2%) G T G/A N–/N+ 0 0 0 0 0 0 (0%) A A A A A A A A C C C C T T T T G G A A G G A A N– N+ N– N+ N– N+ N– N+ Column total 0 0 1 0 0 0 0 0 5 2 12 8 2 0 0 0 136 2 1 0 0 14 0 2 0 25 13 1 3 1 7 1 2 0 33 0 0 0 0 0 0 0 0 20 (10%) 4 (2%) 16 (8%) 9 (5%) 23 (12%) 1 (1%) 4 (2%) 0 (0%) 1 2 197 Haplotypes were constructed from the segregation of variants at five sites through 27 Caucasian families that included 269 individuals. The full pedigrees contained 134 unrelated individuals, of whom 91 were available for study. A total of 231 independent alleles could be identified, of which full haplotypes could be constructed for 197. Figure 1. Diagram of the apo(a) gene showing the relative positions of the polymorphisms studied. Exons are shown as boxes with the translated regions shaded. The 5′-flanking region (homology to plasminogen hatched) and exons are approximately to scale. The introns (lines) are not to scale and are greatly shortened. The positions of the polymorphisms (arrows) and of the various types of Kringle 4 repeat are illustrated. Relationship to plasma Lp(a) concentrations Our sample of normal, unrelated white Caucasian subjects showed the typically skewed distribution of plasma Lp(a) concentrations. The mean, after log transformation, was 10.1 ± 4.7 (SD) mg/dl, with a range of 0.2–129 mg/dl and a median of 13.7 mg/dl. To establish the relationship between L(a) concentration and apo(a) isoform size, the total Lp(a) concentration for each subject was apportioned between the two Lp(a) species as described in Materials and Methods. This gives an estimate of the Lp(a) concentration associated with each apo(a) isoform, with an error of ∼12% (SD) (8). As observed before for hyperlipidaemic subjects (8), there was a general inverse non-linear relationship between Lp(a) concentration and the number of Kringle 4 repeats in the apo(a) protein, with a large, up to 500-fold, range of values for each of the different-sized isoforms (Fig. 5). The plasma Lp(a) concentration was apportioned between the Lp(a) species for each member of the 17 families for whom the apo(a) phenotype had been determined. There were a few instances, such as those shown in Figure 6, where families contained alleles of the same or similar size with the same, most common, haplotype. In the RW family, the alleles with 25 Kringle 4 repeats inherited from different mothers had the same haplotype but were associated with markedly different Lp(a) concentrations. In the NL family, the three middle siblings expressed alleles containing 28, 29 or 30 Kringle 4 repeats with the same haplotype, which were associated with very different Lp(a) concentrations, even when two of them were expressed in the same individual. Thus it is clear that major differences in Lp(a) concentrations occur without any changes in the polymorphisms that we are studying. However, this does not necessarily imply that the changes themselves have no effect on the rate of transcription. To check this, the Lp(a) concentration associated with each independent allele in the families was estimated. Each value was then corrected for the effect of allele size using the exponential line of best fit for the white Caucasian population shown in Figure 5 (see Materials and Methods), allowing a direct comparison between values for all alleles (Fig. 7). After correcting for size, there was still a vast range of Lp(a) concentrations, although there was a suggestion from the spread of the points, seen most clearly in the column for the G (+121) variant, that the values fell into distinct groups. Overall there was no significant association between Lp(a) concentration and the variants at the G/A (–772), C/T (+93) or N–/N+ (Kr 4-37) sites. The A variant at position +121 was associated with a significantly higher mean Lp(a) concentration than the G variant, due mainly to an absence of very low values. Alleles containing 10 TTTTA repeats were associated with a significantly lower mean concentration than those carrying nine repeats. Haplotype relationships There were 136 alleles for which the whole haplotype and the associated Lp(a) concentration were known. Over 90% of these had one of six haplotypes. Figure 8 shows the size and Lp(a) concentrations, corrected for size, for the alleles with each of these haplotypes. Alleles with 8GCG haplotypes were the most 1102 Human Molecular Genetics, 1997, Vol. 6, No. 7 Figure 2. Values for pairwise disequilibrium (∆) and relative linkage disequilibrium (Rel. D) between polymorphic sites in the apo(a) gene. The variable TTTTA site was recoded as a diallelic system of eight or ‘not eight’ repeats. Results were obtained for 319 unrelated normal white Caucasians and 85 Asian-Indians living in and around London. *Significant value of ∆ taken at the level of P < 0.005, giving an overall significance level for the 10 tests of 0.05. between the 9ATGN–, 10ACGN– and 8ACAN– alleles and the 8ACAN– and 8ACAN+ alleles were all statistically significant (P <0.05, Student’s t-test). DISCUSSION Figure 3. Frequency of apo(a) Kringle 4 repeats. Apo(a) phenotype was determined for 170 unrelated normal white Caucasians. Isoform sizes were converted to Kringle 4 content using the relationship derived previously (8). abundant and were associated with a wide range of sizes and concentrations. A quarter of 8GCG alleles with the N– variant, but none of those with the N+ variant, contained <20 Kringle 4 repeats and were associated mostly with high concentrations (Fig. 8a). There were less alleles with haplotypes containing the A variant at position –772, but these fell into more clearly defined groups (Fig. 8b). All but one of the 9ATGN– alleles contained 27–29 Kringle 4 repeats (26.9 ± 1.2 Kringles, mean ± SE) and all were associated with corrected Lp(a) concentrations within the 100–300% range (189.5 ± 21.4%). Similarly, all the 10ACGN– haplotypes were confined to alleles containing 21–23 Kringle 4 repeats (22.4 ± 0.2 Kringles) and all but one were associated with corrected Lp(a) concentrations of between 15 and 50% (48.5 ± 23.0%). Points for the 8ACAN– haplotypes were more widespread, but were confined to alleles with 27 or more Kringle 4 repeats (32.8 ± 1.1 Kringles) associated with corrected concentrations of ∼200% or less (115.1 ± 19.8%). In contrast, 8ACAN+ haplotypes were on alleles with a greater spread of sizes (25.8 ± 3.2 Kringles), the majority of which were associated with particularly high Lp(a) concentrations (332.5 ± 74%). Differences in both Kringle number and corrected concentration The C/T (+93) and G/A (+121) polymorphisms studied here are in the first exon of the apo(a) gene. The G/A (–772) and variable TTTTA sites are further upstream, beyond the region that has homology with the plasminogen gene (Fig. 1). The disequilibrium statistic was statistically significant between all these sites except C/T (+93) and G/A (+121). However, the disequilibrium between these two sites was of the negative type and both polymorphic frequencies were low, so the probability of detecting disequilibrium with this sample size was small (19). Since the rel. D values were high, the results indicate strong linkage disequilibrium throughout the region. Linkage disequilibrium was also observed between the 5′-flanking region and the NcoI polymorphism in Kringle 4–37. If the expansion of the variable Kringle 4 region had involved a crossing-over mechanism, these sites on either side would have reached equilibrium rapidly. Lackner et al. (20) have reported a case where the number of repeats had altered from one generation to the next with no exchange of flanking polymorphic markers, suggesting that the new apo(a) allele of different length had arisen without any crossing-over of homologous chromosomes. The disequilibrium described here, together with that observed previously for a DraIII polymorphic pattern exhibited by a subset of the variable Kringle 4 units (21), is consistent with their proposal (20) that much of the variation in Kringle 4 number results from sister chromatid exchange or complex gene conversion events. The disequilibrium between the G/A (–772) and NcoI sites in the Caucasians arises because a high proportion (82%) of the A (–772) variants are on N– alleles. Since there is no disequilibrium and a far higher frequency of A (–772) variants in the Indian population, it seems that there has been an accumulation of G (–772)N+ alleles in the Caucasians. Similarly, although we did not have enough families to confirm this, the data suggest that there has been an accumulation of A (+121)N+ alleles in the Indian group. In both populations, the disequilibrium between the 1103 Human Genetics, 1997, 6, No. NucleicMolecular Acids Research, 1994, Vol. Vol. 22, No. 1 7 Figure 4. Size distribution of apo(a) isoforms coded by alleles containing different polymorphic variants. Isoform size and variants at polymorphic sites in the apo(a) gene were determined for 156 independent alleles from white Caucasian subjects by following their segregation through families. These were augmented by alleles from individuals in the Caucasian ‘population’ who were homozygous at the site of interest. Sizes are shown for 116 alleles with the G variant and 80 alleles with the A variant at position –772, for 22 alleles with nine TTTTA repeats and 26 with 10 TTTTA repeats, for 20 alleles with the T variant at position 93, 38 alleles with the A variant at position +121 (A*), 151 alleles with the N– variant and 69 alleles with the N+ variant in Kringle 4-37. Distributions for alleles with eight TTTTA repeats, C (+93) and G (+121) were similar to that shown in Figure 3 and are not shown. Figure 5. Lp(a) apoB concentrations associated with different-sized apo(a) isoforms. Total plasma Lp(a) concentration was apportioned between the two Lp(a) species (8) for the unrelated subjects described in Figure 3. Values are shown for 323 white Caucasian isoforms with the exponential line of best fit. 1103 NcoI site and the C/T site or the variable TTTTA region was probably a reflection of the relatively recent occurrence of the C→T mutation and of the latest addition to the repeats. Most (89%) of the variants containing nine or 10 TTTTA repeats and all but one of the T (+93) variants were present on an N– allele. Indeed, the T variant and the expansion to nine or 10 repeats probably all arose on A (–772)G (+121)N– alleles, which, with C (+93) and eight TTTTA repeats, is the most likely original haplotype. Two apo(a) isoforms were identified in 91% of the individuals studied. This is close to the number expected from studies of the gene (22) and justifies the use of polyacrylamide–agarose gels, which are difficult to perfect but are sensitive and give clear, sharp bands. By determining the sizes of the isoforms coded by alleles containing the different variants in the families, it was clear that alleles with the T variant were associated mainly with middle sized isoforms and those with the A (+121) variant with middle and large isoforms. This approach also revealed that alleles with nine TTTTA repeats were mostly confined to the middle sizes and that those with 10 TTTTA repeats apparently fell into two groups, one of small sizes and one of middle sizes. A similar association of TTTTA repeat number with allele size, although with a greater spread of sizes than observed here, has been reported previously for another Caucasian population (15). The variants at position –772 were not associated with alleles of any particular size and we did not detect in our Caucasian subjects the high frequency of alleles with 28 Kringle 4 repeats among N+ alleles that was observed in a Tyrolean population (18). As the apo(a) alleles were further subdivided through the construction of detailed haplotypes, association with allele size became more clearly defined. Nearly all of the nine TTTTA variants and most of those with T at +93 were on the same alleles (9ATGN–) with a very restricted range of sizes (27–29 Kringle 4 units). Similarly, 10 TTTTA repeats with ACGN– haplotypes were restricted to alleles with 21–23 Kringle 4 units. Although the range of sizes was greater, other relationships could also be demonstrated. For instance, all of the 8ACAN– alleles were large and most of the very small alleles of 20 Kringle 4 units or less were 8GCGN–. Thus there appear to be subgroups of apo(a) alleles of the same haplotype associated with a limited range of sizes, the spread of which presumably reflects the time that has elapsed since the last mutation. Having used segregation analysis to define apo(a) alleles by haplotype, it was clearly of interest to discover if they were associated with different Lp(a) concentrations. By apportioning the total according to the proportions shown on the apo(a) phenotyping blots, it was possible to calculate the concentration of the separate species in individual subjects and so to estimate the Lp(a) concentration associated with each of the defined alleles in the families. There were clearly major differences in expression of alleles of similar size within families and, in numerous instances such as that in Figure 6, within the same individual. Thus large differences in the expression of apo(a) alleles cannot be explained by the effects of environment or other genes, and it is justifiable when examining these differences to treat each independent allele as being phenotypically distinct (15,21,23). If this assumption were made, it became apparent that some subgroups of apo(a) alleles defined by haplotype were associated with narrow ranges of Lp(a) concentrations. For instance, 9ATGN– alleles were all associated with moderately high concentrations, while most 10ACGN– alleles were associated 1104 Human Molecular Genetics, 1997, Vol. 6, No. 7 Figure 6. Pedigree diagrams illustrating the segregation of variants and their relationship to Lp(a) concentration. The apo(a) isoforms for each subject, given as the number of repeated Kringle 4 units, are shown in italics immediately below the symbols. The apportioned Lp(a) apoB concentration (µg/ml) is shown in bold under the appropriate isoform, with the variants carried by the allele beneath (reading 5′ to 3′, top to bottom). Alleles of interest are shown in bold type. Figure 7. Lp(a) concentrations associated with alleles containing different polymorphic variants. The Lp(a) apoB concentration associated with each of the white Caucasian alleles from Figure 4 was estimated and corrected for isoform size as described in Materials and Methods. Each value is given as a percentage of the average for an isoform of the same size. Results are shown for 218 alleles with either eight, nine or 10 TTTTA repeats at position –1231, 196 alleles with either the G or A variant at –772, 241 alleles with either C or T at +93, 259 alleles with G or A at +121 and 228 alleles with the N– or N+ variant in Kringle 4-37. Mean values are shown by the solid bar. There were significant differences (P <0.05, Student’s t-test) between the mean values for the G and A variants at +121 and mean values for alleles with nine and 10 TTTTA repeats. with fairly low concentrations. These haplotypes were the two that were restricted to alleles with a size spread of only three Kringle 4 units. Generally, as the spread of allele sizes in the subgroup increased, the range of associated concentrations also increased. Thus it appears that after the defining mutation, features that regulate the gene gradually exchange and equilibrate at a rate similar to the changes in Kringle 4 number. At various times, it has been suggested that the polymorphisms observed in the 5′ region could directly influence the expression of the apo(a) gene (11,14,16). It is also possible that the NcoI site in Kringle 4-37 is linked to changes in lysine binding ability that could affect Lp(a) assembly. Since the overall variation in expression of a given allele within families has been estimated to be 21% (SD), the present analysis would not be able to confirm relatively small effects of less than ∼2-fold (8). However, within this limitation, there was no evidence that variation at any of the polymorphic sites studied had any direct effect on plasma Lp(a) concentration. Our studies confirm the association of low concentrations with alleles containing 10 TTTTA repeats observed by Trommsdorff et al. (16), but indicate that this is because a large proportion are in a subgroup (10ACGN–) linked to a relatively inactive regulatory element rather than through a direct effect on transcription. Similarly the higher concentrations associated with the A (+121) variant can be attributed to the fact that the majority were in the 8ACAN– or 8ACAN+ subgroups, which were not linked to very low activity elements. These results provide physiological support for in vitro assays that have shown no difference in the transcriptional activity of constructs with different variants at the G/A (–772) and variable TTTTA sites (15,24). Unfortunately, the T (+93) variant was present mainly in only two subgroups, so it was not possible to evaluate the effect of the reduced transcriptional activity observed in vitro (14). The Lp(a) concentration associated with different apo(a) alleles of the same size can differ >100-fold. The data presented here indicate that it is unlikely that common sequence changes in the immediate 5′-flanking region of the gene could be responsible for such large differences in expression. The results also suggest that apo(a) alleles can be divided into subgroups which accumulate to a greater or lesser extent in different populations and that these subgroups can be associated with a limited range of Lp(a) concentrations, presumably reflecting those of their precursor alleles. They would predict that there are regulatory elements of different activities in the population, situated far enough away from the gene to allow exchange between alleles. The three liver-specific DNase-hypersensitive sites in the 40 kb region between the apo(a) and plasminogen genes (25) are obvious 1105 Human Genetics, 1997, 6, No. NucleicMolecular Acids Research, 1994, Vol. Vol. 22, No. 1 7 1105 Figure 8. Size and associated Lp(a) concentrations of apo(a) alleles carrying different haplotypes. Haplotypes at five polymorphic sites were constructed from the segregation of variants through 17 white Caucasian families. Whole haplotypes were obtained for 136 independent alleles for which the allele size and associated Lp(a) concentration had been determined (see Materials and Methods and text). These alleles were augmented by those from 11 independent white subjects who were homozygous at each polymorphic site. Each value is given as a percentage of the average for an isoform of the same size (see Materials and Methods). (a) Values for 8GCGN– () alleles and 8GCGN+ () alleles, (b) values for 8ACAN– alleles (), 8ACAN+ alleles (∆), 9ATGN+ alleles () and 10ACGN– alleles (). candidates for the location of these elements, and experiments to characterize them currently are being pursued. MATERIALS AND METHODS Materials and general methods Restriction endonucleases were obtained from BoehringerMannheim, Lewes, UK and Biotaq polymerase from Bioline UK Ltd, London, UK. Agarose (ultra pure) and DNA size markers were supplied by Gibco-BRL, Paisley, UK and MetaPhor agarose by FMC Bio Products, Rockland ME. Hybond N nylon membrane (0.45 mm) was obtained from Amersham International plc, Little Chalfont, UK. General methods and the definition and composition of solutions used are given in Sambrook et al. (26). Subjects The subjects in this study comprised 269 members of 27 Caucasian families and 356 unrelated white Caucasian and 85 unrelated Asian-Indian individuals living in and around London. The unrelated subjects were apparently normal individuals from screening projects, together with laboratory volunteers. Both sampled populations were heterogeneous within their basic racial groups. Venous blood was collected into tubes containing EDTA and either immediately frozen or separated by centrifugation within 2 h. Whole blood and separated plasma and blood cells were stored at –70C. Lp(a) and apo(a) analysis Plasma Lp(a) concentration was assayed using a TintElize immunoassay kit (Biopool AB, Umea, Sweden) and the apo(a) phenotype determined by immunoblotting (8). To estimate the Lp(a) concentration associated with each of the apo(a) isoforms in a sample, the phenotyping blot was scanned and the total Lp(a) concentration was apportioned between the two constituent Lp(a) species containing the different apo(a) isoforms, as described previously (8). Values from individual family members were used to estimate the Lp(a) concentration associated with each of the independent alleles. Seven of the 17 families used for these studies contained members with familial hypercholesterolaemia, and values from these subjects were excluded from the estimations. Where there was more than one normolipidaemic subject with the same apo(a) allele, which occurred in ∼50% of cases, the values were averaged. Individual values never varied by >40% of the average, which was small compared with the variation in values between different alleles. Apportioned values of Lp(a) concentration were corrected for the effect of apo(a) allele size using the plot of isoform size against associated Lp(a) concentration for unrelated white Caucasian individuals shown in Figure 5. The points were best described by an exponential line, which was used to give an estimate of the average concentration associated with each size of apo(a) allele. Each individual Lp(a) concentration was then expressed as a percentage of the estimated average concentration associated with an allele of the same size. DNA analysis DNA was isolated from frozen whole blood or packed blood cells as described previously (27) or by a rapid small scale method (28). A 1458 bp fragment, encompassing the region +170 to –1288 relative to the transcription start site of the apo(a) gene, was amplified by the polymerase chain reaction (PCR), carried out in a final volume of 50 µl containing ∼150 ng of genomic DNA, 1.25 mM MgCl2, 250 ng of each primer, 0.125 mM of each deoxynucleotide triphosphate and 1.25 U of Taq polymerase in the ‘ammonium’ buffer provided by the manufacturer. Samples were overlaid with mineral oil and heated in a thermal cycler for 1 min at 95C, followed by three cycles of 30 s at 97C, 30 s at 50C and 2.5 min at 65C, and then by 30 cycles of 30 s at 94C, 30 s at 50C and 2.5 min at 65C, ending with 7 min at 72C. A smaller, ∼100 bp fragment encompassing the TTTTA repeats 5′ 1106 Human Molecular Genetics, 1997, Vol. 6, No. 7 to position –1230 was amplified under similar conditions only using 100 ng of each primer and 48C for the annealing reaction. The primers used were 1458 bp fragment: 5′-primer, 5′-GAA AGA TTG ATA CTA TGC-3′; 3′-primer, 5′-AGT AGA AGA ACC ACT TC-3′; 100 bp fragment: 5′-primer, 5′-GCG GAA AGA TTG ATA CT-3′; 3′-primer, 5′-ACG TCA GTG CAC TTC AA-3′. The authenticity of the 1458 bp fragment was verified by restriction enzyme digestion with HindIII, which cuts at one site to give fragments of 986 and 472 bp. G and A variants at position –772 were detected by digestion with TaqI followed by electrophoresis on 1.5% agarose. The G variant contains a cutting site for TaqI and produces a fragment of 942 bp, whereas the non-cutting A variant produces an equivalent fragment of 1139 bp. The C and T variants at position +93 and the G and A variants at position +121 were detected by annealing with allele-specific oligonucleotides. Samples (10 µl) of PCR reaction mixture were diluted with 200 µl of 15× SSC, heated at 100C for 10 min and 100 µl were added to wells of a slot-blot apparatus containing a nylon membrane pre-soaked in 15× SSC. After washing with 15× SSC, the nucleic acids were fixed to the membrane with UV light. The membrane was pre-hybridized for 10 min at 40C in 5× SSPE (0.75 M NaCl), 5× Denhardt’s solution and 0.5% SDS [see (26) for solution composition] and hybridized for 3 h at 40C in the same solution containing ∼1×106 c.p.m./ml of the appropriate oligonucleotide, end-labelled with [γ-32P]ATP using standard methods (26). Oligonucleotides employed were 5′CAA CAA CGT CCT GG3′ (coding strand) for the C variant, 5′CCA GGA CAT TGT TGA3′ (non-coding strand) for the T variant, 5′TTC TGG GCA CTG CT3′ (coding strand) for the G (+121) variant and 5′AGC AGT GTC CAG AAA3′ (non-coding strand) for the A (+121) variant. The membrane was rinsed once and washed for 30 min at 48C with 5× SSPE, 0.1% SDS before autoradiography. The number of TTTTA repeats was determined by size fractionation of the smaller PCR product on non-denaturing 8% polyacrylamide gels (20 cm × 20 cm × 1 mm, 250 V for 5 h) visualized by silver staining, or on 4% MetaPhor agarose visualized with ethidium bromide. Sizes were estimated by comparison with a 10 bp DNA size ladder. A fragment of 96 bp contained eight TTTTA repeats. The DNA encoding the lysine-binding region of Kringle 4-37 was amplified by PCR after BamHI digestion as described by Pfaffinger et al. (29). The product was incubated with the restriction enzyme NcoI and the fragments separated on 8% polyacrylamide gels as described above. In the presence of the cutting site, the enzyme cut the 182 bp PCR product into fragments of 122 and 60 bp. Statistical methods Departures of genotype distributions from Hardy–Weinberg equilibrium were tested for significance by the χ2 test (1 d.f., P <0.05). Allele frequencies for the polymorphic sites were determined by allele counting of the independent chromosomes, and standard errors and heterozygosity indices calculated by standard procedures (30). Non-random association between a pair of sites was measured by the standardized disequilibrium statistic (∆), calculated according to Chakravarti et al. (30) after haplotype frequencies had been estimated using the maximum-likelihood procedure outlined by Hill (31) with the assumption for each pair of Hardy–Weinberg equilibrium, selective neutrality and co-dominance. Statistical significance was tested using N∆2, where N is the number of genotypes observed, which is distributed as a χ2 random variable with 1 d.f. Since it cannot be assumed that the pairwise comparisons are independent, statistical significance was assessed at a level for each ∆ value that gave an overall significance level of P <0.05. Relative linkage disequilibrium was estimated as described by Thompson et al. (19). ACKNOWLEDGEMENTS We thank Dr G. Lindahl, Dr B. Zysow and Mr Y.F.N. Perombelon who established the conditions for amplifying the 1.5 kb fragment and determining the C/T (+93) variants. We are indebted to Dr G. Thompson (Hammersmith), Dr M. Seed (Charing Cross) and Dr A. Hale (BUPA) for access to clinical material. This work was supported in part by a grant from the British Heart Foundation (PG 94142). REFERENCES 1. Scanu, A.M. and Fless, G.M. (1990) Lipoprotein(a). Heterogeneity and biological relevance. J. Clin. Invest., 85, 1709–1715. 2. Eaton, D.L., Fless, G.M., Kohr, W.J., McLean, J.W., Xu, Q.T., Miller, C.G., Lawn, R.M. and Scanu, A.M. (1987) Partial amino acid sequence of apolipoprotein(a) shows that it is homologous to plasminogen. Proc. Natl Acad. Sci. USA, 84, 3224–3228. 3. McLean, J.W., Tomlinson, J.E., Kuang, W.J., Eaton, D.L., Chen, E.Y., Fless, G.M., Scanu, A.M. and Lawn, R.M. (1987) cDNA sequence of human apolipoprotein(a) is homologous to plasminogen. Nature, 330, 132–137. 4. Boerwinkle, E., Leffert, C.C., Lin, J., Lackner, C., Chiesa, G. and Hobbs, H.H. (1992) Apolipoprotein(a) gene accounts for greater than 90% of the variation in plasma lipoprotein(a) concentrations. J. Clin. Invest., 90, 52–60. 5. Kraft, H.G., Kochl, S., Menzel, H.J., Sandholzer, C. and Utermann, G. (1992) The apolipoprotein(a) gene: a transcribed hypervariable locus controlling plasma lipoprotein(a) concentration. Hum. Genet., 90, 220–230. 6. DeMeester, C.A., Bu, X., Gray, R.J., Lusis, A.J. and Rotter, J.I. (1995) Genetic variation in lipoprotein(a) levels in families enriched for coronary artery disease is determined almost entirely by the apolipoprotein(a) gene locus. Am. J. Hum. Genet., 56, 287–293. 7. Utermann, G., Menzel, H.J., Kraft, H.G., Duba, H.C., Kemmler, H.G. and Seitz, C. (1987) Lp(a) glycoprotein phenotypes. Inheritance and relation to Lp(a)-lipoprotein concentrations in plasma. J. Clin. Invest., 80, 458–465. 8. Perombelon, Y.F., Soutar, A.K. and Knight, B.L. (1994) Variation in lipoprotein(a) concentration associated with different apolipoprotein(a) alleles. J. Clin. Invest., 93, 1481–1492. 9. Azrolan, N., Gavish, D. and Breslow, J.L. (1991) Plasma lipoprotein(a) concentration is controlled by apolipoprotein(a) (apo(a)) protein size and the abundance of hepatic apo(a) mRNA in a cynomolgus monkey model. J. Biol. Chem., 266, 13866–13872. 10. Wade, D.P., Knight, B.L., Harders-Spengel, K. and Soutar, A.K. (1991) Detection and quantitation of apolipoprotein(a) mRNA in human liver and its relationship with plasma lipoprotein(a) concentration. Atherosclerosis, 91, 63–72. 11. Wade, D.P., Clarke, J.G., Lindahl, G.E., Liu, A.C., Zysow, B.R., Meer, K., Schwartz, K. and Lawn, R.M. (1993) 5′ control regions of the apolipoprotein(a) gene and members of the related plasminogen gene family. Proc. Natl Acad. Sci. USA, 90, 1369–1373. 12. Malgaretti, N., Acquati, F., Magnaghi, P., Bruno, L., Pontoglio, M., Rocchi, M., Saccone, S., Della Valle, G., D’Urso, M., LePaslier, D. et al. (1992) Characterization by yeast artificial chromosome cloning of the linked apolipoprotein(a) and plasminogen genes and identification of the apolipoprotein(a) 5′ flanking region (published erratum appears in Proc. Natl Acad. Sci. USA, 1993, 90, 1634). Proc. Natl Acad. Sci. USA, 89, 11584–11588. 13. Cohen, J.C., Chiesa, G. and Hobbs, H.H. (1993) Sequence polymorphisms in the apolipoprotein(a) gene. J. Clin. Invest., 91, 1630–1636. 14. Zysow, B.R., Lindahl, G.E., Wade, D.P., Knight, B.L. and Lawn, R.M. (1995) C/T polymorphism in the 5′ untranslated region of the apolipoprotein(a) gene introduces an upstream ATG and reduces in vitro translation. Arterioscler. Thromb. Vasc. Biol., 15, 58–64. 1107 Human Genetics, 1997, 6, No. NucleicMolecular Acids Research, 1994, Vol. Vol. 22, No. 1 7 15. Mooser, V., Mancini, F.P., Bopp, S., Petho Schramm, A., Guerra, R., Boerwinkle, E., Muller, H.J. and Hobbs, H.H. (1995) Sequence polymorphisms in the apo(a) gene associated with specific levels of Lp(a) in plasma. Hum. Mol. Genet., 4, 173–181. 16. Trommsdorff, M., Kochl, S., Lingenhel, A., Kronenberg, F., Delport, R., Vermaak, H., Lemming, L., Klausen, I.C., Faergeman, O., Utermann, G. et al. (1995) A pentanucleotide repeat polymorphism in the 5′ control region of the apolipoprotein(a) gene is associated with lipoprotein(a) plasma concentrations in Caucasians. J. Clin. Invest., 96, 150–157. 17. van der Hoek, Y.Y., Wittekoek, M.E., Beisiegel, U., Kastelein, J.J. and Koschinsky, M.L. (1993) The apolipoprotein(a) kringle IV repeats which differ from the major repeat kringle are present in variably-sized isoforms. Hum. Mol. Genet., 2, 361–366. 18. Kraft, H.G., Haibach, C., Lingenhel, A., Brunner, C., Trommsdorff, M., Kronenberg, F., Muller, H.J. and Utermann, G. (1995) Sequence polymorphism in kringle IV 37 in linkage disequilibrium with the apolipoprotein(a) size polymorphism. Hum. Genet., 95, 275–282. 19. Thompson, E.A., Deeb, S., Walker, D. and Motulsky, A.G. (1988) The detection of linkage disequilibrium between closely linked markers: RFLPs at the AI-CIII apolipoprotein genes. Am. J. Hum. Genet., 42, 113–124. 20. Lackner, C., Cohen, J.C. and Hobbs, H.H. (1993) Molecular definition of the extreme size polymorphism in apolipoprotein(a). Hum. Mol. Genet., 2, 933–940. 21. Mancini, F.P., Mooser, V., Guerra, R. and Hobbs, H.H. (1995) Sequence microheterogeneity in apolipoprotein(a) gene repeats and the relationship to plasma Lp(a) levels. Hum. Mol. Genet., 4, 1535–1542. 22. Gaw, A., Boerwinkle, E., Cohen, J.C. and Hobbs, H.H. (1994) Comparative analysis of the apo(a) gene, apo(a) glycoprotein, and plasma concentrations of Lp(a) in three ethnic groups. Evidence for no common ‘null’ allele at the apo(a) locus. J. Clin. Invest., 93, 2526–2534. 23. Rainwater, D.L. (1995) Genetic basis for multimodal relationship between apolipoprotein (a) size and lipoprotein (a) concentration in Mexican-Americans. Atherosclerosis, 115, 165–171. 1107 24. Bopp, S., Kochl, S., Acquati, F., Magnaghi, P., Petho Schramm, A., Kraft, H.G., Utermann, G., Muller, H.J. and Taramelli, R. (1995) Ten allelic apolipoprotein(a) 5′ flanking fragments exhibit comparable promoter activities in HepG2 cells. J. Lipid Res., 36, 1721–1728. 25. Magnaghi, P., Mihalich, A. and Taramelli, R. (1994) Several liver specific DNAse hypersensitive sites are present in the intergenic region separating human plasminogen and apoprotein(A) genes. Biochem. Biophys. Res. Commun., 205, 930–935. 26. Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. 27. Soutar, A.K., McCarthy, S.N., Seed, M. and Knight, B.L. (1991) Relationship between apolipoprotein(a) phenotype, lipoprotein(a) concentration in plasma, and low density lipoprotein receptor function in a large kindred with familial hypercholesterolemia due to the pro664→leu mutation in the LDL receptor gene. J. Clin. Invest., 88, 483–492. 28. Talmud P., Tybjaerg Hansen A., Bhatnagar, D., Mbewu, A., Miller, J.P., Durrington, P. and Humphries, S. (1991) Rapid screening for specific mutations in patients with a clinical diagnosis of familial hypercholesterolaemia. Atherosclerosis, 89, 137–141. 29. Pfaffinger, D., McLean, J. and Scanu, A.M. (1993) Amplification of human APO(a) kringle 4-37 from blood lymphocyte DNA (published erratum appears in Biochim. Biophys. Acta, 1995; 1270: 101). Biochim. Biophys. Acta, 1225, 107–109. 30. Chakravarti, A., Buetow, K.H., Antonarakis, S.E., Waber, P.G., Boehm, C.D. and Kazazian, H.H. (1984) Nonuniform recombination within the human beta-globin gene cluster. Am. J. Hum. Genet., 36, 1239–1258. 31. Hill, W.G. (1974) Estimation of linkage disequilibrium in randomly mating populations. Heredity Edinburgh, 33, 229–239.