Download Analysis of Complex Genetic Traits in Population

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Digital Comprehensive Summaries of Uppsala Dissertations
from the Faculty of Medicine 287
Analysis of Complex Genetic
Traits in Population Cohorts using
High-throughput Genotyping
Technology
ANDREAS DAHLGREN
ACTA
UNIVERSITATIS
UPSALIENSIS
UPPSALA
2007
ISSN 1651-6206
ISBN 978-91-554-7007-4
urn:nbn:se:uu:diva-8291
! "
# $ !!% &'(!! ) * ) ) +** ,-
) ./0 1* 2 3*0
* 40 !!%0 4
) 5 6 1
+
5* *7** 6 1*0 4
0 8%0 9:
0 0 ;"<# =%87=&799$7%!!%7$0
. *
*
*
0 1* 2 * * ** * ,1 ./0 ; ; 2 > 2 * ,"#+/ * 15-%? *
*
*2 2*
1 .0 4
2
) * ?"4. * ) @&9!! 0 A 2
* *
2 ) *2 2 * 0 ; ;;
2 > ) ) 2* 1 . 72 0 A > "#+ * * ?"4. * ) 2 "#+ * 3B 0
1* ) ;;;7C 2
) )) **0 * ;;; &% 2 * ?"4.
* "#+0 4 ) * 3"& 2* ** 2
) ) )
) * +;C" *0 ; ;C
) * 632 2 ) ) ) ) ** * B7*0 < > &'%% "#+ %8! -* 2 2
:9 ) * 2* ** 0 1*
* 6+5' +-: *
*
2 2
** ))0 ; C )
** 2
) ) * 5D?&4&& * )
-
;
0
"#+ 15-%? 3B 5D?&&4& 3"& ** ?"4. *
! ! "" #"! ! $%&'()' ! E 4
* !!%
;""# &:9&7: !:
;"<# =%87=&799$7%!!%7$
((((
78 =& ,*(FF00FGH((((
78 =&/
To my good friends
List of publications
This thesis is based on the following publications, which will be referred to
in the text by their roman numerals:
I
Dahlgren A, Zethelius B, Jensevik K, Syvänen A-C, Berne C.
Variants of the TCF7L2 gene are associated with beta cell dysfunction and confer an increased risk of type 2 diabetes mellitus
in the ULSAM cohort of Swedish elderly men. Diabetologia
50:1852-1857 (2007)
II
Dahlgren A, Zethelius B, Eriksson N, Lundmark P, Axelsson T,
Syvänen A-C, Berne C. Variants in the HHEX gene are associated with biochemical markers for beta-cell function in the ULSAM cohort. Submitted manuscript
III
Dahlgren A, Lundmark P, Axelsson T, Lind L, Syvänen A-C.
Association of the estrogen receptor 1 (ESR1) gene with body
height in adult males from two Swedish population cohorts.
Submitted manuscript
IV
Dahlgren A, Perola M, Liljedahl U, Kaprio J, Spector T, Martin
N, Peltonen L, Syvänen A-C. Finemapping of a QTL for body
height on the human X chromosome in a Finnish twin cohort.
Manuscript
V
Kettunen J, Sammalisto S, Costiander E, Gudbjartsson D,
Dahlgren A, Heikkalinna T, Kaprio J, Heliövaara, M, Peltonen
L, Perola M. The COL11A1 gene is associated with human stature in two population cohorts. Manuscript
Published material was reprinted with permission from Springer Science and
Business Media.
Supervisor:
Ann-Christine Syvänen, Professor
Molecular Medicine
Department of Medical Sciences, Uppsala University, Sweden
Co-supervisors:
Håkan Melhus, Professor
Clinical Pharmacology
Department of Medical Sciences, Uppsala University, Sweden
Markus Perola, Ph.D, M.D.
Molecular Medicine
National Public Health Institute, Helsinki, Finland
Faculty opponent:
Doctor Struan Grant
Center for Applied Genomics,
The Children’s Hospital of Philadelphia, USA
Review board:
Professor Anders Karlsson
Endocrinology, Diabetes and Metabolism,
Department of Medical Sciences, Uppsala University, Sweden
Docent Fredrik Nyström
Department of Endocrinology and Metabolism
Faculty of Health Science,
Lindköping University Hospital, Sweden
Docent Ingrid Dahlman
Endocrinology unit, Department of Medicine
Karolinska Instituetet, Stockholm, Sweden
Docent Marju Orho-Melander
Diabetes and Endocrinology Research unit,
Department of Clinical Sciences,
Malmö University Hospital, Lund University Sweden
Professor Åke Sjöholm
Experimental Endocrinology
Department of Clinical Research and Education,
Karolinska Institutet, Stockholm, Sweden
Contents
Introduction...................................................................................................11
The human genome ..................................................................................12
Our genes..................................................................................................12
Sequence variations..................................................................................13
Single nucleotide polymorphisms........................................................13
Copy-number variants .........................................................................13
Technology ...................................................................................................15
Polymerase chain reaction........................................................................15
DNA Sequencing......................................................................................16
SNP genotyping........................................................................................17
Hybridization-based techniques...........................................................17
Enzyme-assisted techniques ................................................................18
Genetics ........................................................................................................21
Genetic complexity ..................................................................................21
Monogenic ...........................................................................................21
Polygenic .............................................................................................22
Genetic analysis........................................................................................22
Linkage ................................................................................................22
Association ..........................................................................................23
Present study .................................................................................................25
Overall aim...............................................................................................25
Specific aims ............................................................................................25
Trait and disease.......................................................................................26
Type 2 diabetes mellitus ......................................................................26
Human body height..............................................................................26
Genetics....................................................................................................27
Type 2 diabetes mellitus ......................................................................27
Human body height..............................................................................27
Material and methods ...............................................................................28
Study I-II..............................................................................................28
Study III-V...........................................................................................29
Results and discussion..............................................................................35
Concluding remarks .................................................................................39
Final thoughts ...............................................................................................40
Acknowledgements.......................................................................................41
References.....................................................................................................43
Abbreviations
ASO
CNV
ddNTP
DNA
GWA
HapMap
HMGA2
Indel
IRI
kb
LD
LSO
MAF
OGTT
OMIM
PCR
PIVUS
RR
SNP
STR
T2DM
TCF7L2
ULSAM
Allele specific oligonucleotide
Copy-number variant
Dideoxynucleotide triphosphate
Deoxyribonucleic acid
Genome-wide association
International Haplotype Mapping project
High mobility group A2 protein
Insertion/Deletion polymorphism
Immuno reactive insulin
kilo base pairs
Linkage disequilibrium
Locus specific oligonucleotide
Minor allele frequency
Oral glucose tolerance test
Online Mendelian inheritance in man
Polymerase chain reaction
Prospective Investigation of the Vasculature in Uppsala Seniors
Relative risk
Single nucleotide polymorphism
Short tandem repeat
Type 2 diabetes mellitus
Transcription factor 7-like 2
Uppsala Longitudinal Study of Adult Men
Introduction
“Equipped with his five senses, man explores the universe around him and
calls the adventure Science” (Edwin Powell Hubble, The Nature of Science, 1954).
The science of genetics can be described as the study of inherited variation in living organisms. The knowledge that physical traits can be passed
on from generation to generation has been known and utilized since humans
began growing crops and domesticating animals to improve agricultural
production and breeding of livestock. The foundation for the modern scientific field of genetics has been attributed to the works of George Mendel. He
presented and published a study in the mid 18th century where he looked at
variations in plants using hybrids of pea plants [1] introducing the concept
of dominant and recessive properties of heritable traits. The significance of
his findings were not realized by the scientific community until the early
19th century, but today “Mendel’s law” of inheritance is taught as the first
introduction to genetics in schools across the world. The science of genetics
has since George Mendel continued to develop and many important strides
forward have been taken.
The discovery of DNA as the carrier of genetic information [2] and the
following characterization of its now famous double helix structure revealed
how this molecule both transmits the genetic information during cell division
and as a blue print for all the molecules needed for all functions to create and
sustain life [3, 4]. Key technical developments such as the Polymerase
Chain Reaction (PCR) and Sanger sequencing has allowed us take genetics
in to the molecular era were the first draft sequence of the entire human genome was published in 2001.
With an ever increasing amount of data on the make up of our genome
and its variations, the science of genetics is attempting to decipher this information in order to understand how complex patterns of genetic variations
affects biological functions that combined with environmental factors determines human traits and influence common diseases.
The work presented in this thesis touches on both these areas. It contains
studies aimed at identifying genes underlying one of the most basic biological traits, namely human body height as well as investigation on how genes
influence on type 2 diabetes mellitus which is one of the most rapidly growing common diseases in developed countries today.
11
The human genome
The genetic information in each living human cell coded by approximately 3.1 billion paired nucleotides (adenine to thymine and cytosine to
guanine), which form DNA molecules that in humans are organized into 22
paired autosomal and two sex specific chromosomes.
The first draft sequence of the whole human genome was published in 2001 in parallel by the
Human Genome Project (HGP, International Human Genome Sequencing
Consortium) [5] and the company Celera Genomics [6]. The HGP declared
the sequence completed in 2004 [7] when over 99% of the genome sequence
had been successfully elucidated . The gaps remaining will most likley be
filled in as new sequencing techniques are designed and used.
Our genes
Before the first draft sequence was published, there was a lot of speculation on how many protein coding genes the human genome could contain
and early guesses ranged all the way from over 100,000 to 35,000 genes [8].
With the genome sequence complete, one of the most current estimates suggests that the number will end up in the range of 20,000-25,000 protein coding genes [7]. If one compares this to the ~20,000 genes found in the genome of the famous model organism “Caenorhabditis elegans” (C. elegans)
[9] it is not possible to explain the obvious difference between this nematode
(roundworm) with a 1mm body length and a fully grown Homo sapiens by
the number of genes in the genome. Two mechanisms have been suggested
to partially explain how the difference in complexity can be generated with
such similar number of genes. The first being alternative RNA splicing
where the RNA is modified in different ways after being transcribed using
the genomic DNA sequence as template [10]. The different RNA splice
variants can then be translated into multiple different proteins with different
functions. The second mechanism that can increase the diversity of available genes is regulation of their expression the gene expression creating
unique patterns required in different cell types to drive development of tissues and organs at specific phases of an organisms development [11].
12
Sequence variations
Single nucleotide polymorphisms
Among known types of sequence variation found in the human genome
the single nucleotide polymorphism (SNP) is the most frequently found. A
SNP is most commonly defined as a position in the genome were a single
nucleotide has been substituted for another and that this change can be seen
at least in 1% of a chosen population. The two different nucleotide at the
SNP position are referred to as the its two alleles. SNPs can be found
throughout the entire genome and to date 11.8 million SNPs have been registered in the dbSNP database (http://ncbi.nih.gov/SNP/, Build 127, September
18, 2007) which is the largest public database for SNPs. Almost half of the
SNPs in dbSNP are validated thanks to the efforts of researchers around the
world and projects like the International Haplotype Mapping project
(www.hapmap.org). Because of their abundance and presence throughout
the genome, SNPs are well suited for use as biallelic markers in genetic studies. SNPs have been applied to genetic studies ranging in scope from analyzing variants of a single gene to performing genome-wide analyses to investigate the genetics of complex traits and diseases.
Copy-number variants
Another common, but less well known type of variation in the human genome is made up of copy-number variants (CNVs) that can be subdivided
into several groups based on their sizes and amount of different alleles they
present. Information about the distribution and frequency of CNVs in the
human genome is steadily increasing as new large scale sequence data becomes available through new re-sequencing technologies that can be compared against the human reference sequence [12]. Current research looking
into comparing the human and chimpanzee genomes suggest that segmental
duplications could have greater effects on genomic change than SNPs making CNVs important to study from an evolutionary perspective [13].
Short tandem repeats
Short tandem repeats (STR) often referred to as “microsatellites” are short
sequences made up of a 2-4 nucleotides that are repeated continuously for
different lengths, ranging from below ten repeats to over a hundred. These
markers are highly polymorphic and are found in all populations, making
them good markers for extracting an ample amount of information using a
relatively low number of makers [14]. These properties have resulted in
STRs being extensively used as marker for large genetic studies, such as
13
whole genome linkage scans and STRs have also become the most common
type of genetic marker used in the field of forensic genetics. One example is
the Combined DNA Index System (CODIS, www.fbi.gov/hq/lab/codis) developed by the Federal Bureau of Investigation in th US that uses 13 STR
markers to create forensic DNA profiles for identification of individuals.
Insertions and deletions
Insertion/deletions (Indels) are most often used to describe a copy-number
change smaller that 1kb are often made up of repetitive elements like the
well known STRs described previously. Indels kan have multiple alleles
when in the form of STR’s, but can also like SNPs be biallelic. Indels have
been estimated to make up about 20% of all human DNA polymorphisms
[15]. The interest and possibility of studying indels is growing with more
human sequences becoming available to allow comparisons with the finished
human ref sequence. In the recently published sequence of Craig Venters
(founder of Celera genomics) genome over 700,000 indels where identified
when comparing the sequences of both copies of his chromosomes to the
human reference assembly [16]. Indels have been shown to have similar
allele frequency distributions as SNPs in population samples and could thus
be used as markers for association studies [17]
Large scale copy-number variants
Above the size of indels (>1kb) there exist larger variations involving
segmental duplications. One recent review article estimated based on current literature that they expected around 100 CNVs with a size above 50kb
along with a substantial number of smaller CNVs likes those described
above to be found in any individual compared to the human reference sequence. This suggests that the 99.9% sequence homology proposed between
individuals might be an overestimate [12]. CNVs are important to consider
when performing SNP genotyping because if a believed base substitution is
located in a segmental duplication it can cause a scew in the distribution of
the alleles or even give rises to “false” SNPs when a base varies only between duplicated segments but not at one unique position [18].
14
Technology
Polymerase chain reaction
To give an overview of the different techniques used to investigate the
DNA molecules in our genome one must start with the polymerase chain
reaction (PCR), which has become one of the most significant technological
developments for genetic research in our time since it was first introduced in
the late 1980’s [19, 20]. The principle of PCR is beautiful in its simplicity
and allows exponential amplification of a selected sequence of DNA to create millions of DNA copies to be used for further analysis. To generate a
PCR amplified fragment, two specific primers are constructed to be complementary to the ends of the DNA fragment of interest, located so that the
3’ ends of the primers are facing each other. The DNA is denatured using
high temperature to break the hydrogen bonds holding the two strands of the
DNA molecule together to render it single stranded. Lowering the temperature allows the PCR primes to hybridize to their complementary sequences,
and by having a DNA polymerase present along with deoxynucleotides the
annealed primers will be extended starting at their 3’ ends. When enough
time has passed for the polymerase to extend the selected DNA sequence the
temperature is raised again to denature all DNA to single stranded form this
process is then repeated in several cycles generating an exponential amplification as long as the polymerase is viable and there is available deoxynucleotides in the reaction mixture. To be able to cycle the reaction without
adding new enzyme for each cycle, a heat-stable DNA polymerase originally isolated from the thermophilic bacteria Thermophilus aquaticus [21] is
used. This feature has made PCR a cornerstone technology for molecular
genetics during the last 20 years. Today PCR is still widely used, but for
highly multiplexed analysis required for studies on a genome-wide scale, it
has become a limiting factor due to the problem that originate from primerprimer interactions when using many primer pairs in one reaction [22].
Modifications of the traditional PCR design and new PCR-free technologies
have been developed to accommodate whole genome analysis [23].
15
DNA Sequencing
The premier technique for investigating the genome is DNA sequencing,
where the complete sequence information is determined for the area of interest. The gold standard method for doing sequencing is the Sanger’s dideoxy
sequencing method [24]. Like PCR, this method utilizes a DNA polymerase
to extend a primer annealed to a DNA template, By including dideoxynucleotides (ddNTPs) to the reaction mixture, which when incorporated terminate the extension process to yield fragments of different lengths. Together
these fragments represent the whole length of the targeted sequence. By
separating the fragments according to size and fluorescently labeling each of
the four ddNTP types with a different fluorophore, the sequenced can be
deduced and analyzed. Sanger sequencing has been automated [25] and was
used for sequencing by the HGP to determine the human genome sequence.
With the human reference sequence available, the focus has now shifted to
re-sequencing to examine the complete sequence of a selected area or the
whole genome in multiple human samples in future studies. Very recently
the first diploid genome sequence from an individual was published. The
genome sequenced using Sanger sequencing belonged to the former president and founder of the sequencing company Celerea Genomics [16].
To lower the costs of sequencing has driven the development of the next
generation of sequencing technology. To be able to sequence the whole
genome at a cost of 1000 US dollars per individual is now one of the goals
for the development of new sequencing technologies [26]. Today there are
a few commercially available systems that could possibly in the foreseeable
future approach this kind of performance.
One such system has been developed by 454 Life Sciences™ [27] using
the Pyrosequencing technique of sequence-by-synthesis. In this system the
four nucleotides are added one at the time to the reaction mixture and when a
nucleotide is incorporated by the polymerase an enzymatic cascade uses the
release of pyrophosphate and generates luminescence and the type of nucleotide is recorded to the sequence read [28]. This technology was used this
year (2007) to re-sequence the genome of James Watson who is credited as
co-discoverer of the structure of DNA. The cost of this effort was said to be
two million US dollars and it took two months to complete. For comparison
the sequencing cost for the human genome reference sequence has been estimated to 3 billion dollars. Executives at 454 Life Sciences™ have said that
they hope to bring the cost for re-sequencing one genome down to 100,000
US dollars during 2008 [29].
Another commercial system that is working towards the goal of a 1000
dollar genome is the Solexa sequencing system (Illumina Inc.). This system
uses a slightly different application of the principle of sequencing-bysynthesis than the 454-system. By using fluorescently labeled nucleotides
with reversible termination and labeling moieties, each cycle of the reaction
16
can use all four types of nucleotides. Only one base in is incorporated in
each cycle due to the termination properties. The unincorporated nucleotides
are washed away and then the fluorescence is the detected and identifies one
position in the sequence template. The termination properties and the
fluorophore can both be removed by enzymatic cleavage and another cycle
can start to determine the next position in the sequence template [30]. Illumina Inc. are reported to be planning to use this system to sequence the
entire genome of one of the individuals used in the HapMap project [29].
SNP genotyping
As described previously SNPs are the most abundant genomic variant
available in the genome. As a biallelic genetically stable variation it is less
informative than an STR which is the second most common type of genetic
marker used. However the abundance of SNPs allows better coverage of the
genome. This is an advantage for whole genome association studies which
are now practically and economically possible using SNPs and new genotyping technologies in combination with the increased number of validated
SNPs available for assay design.
The methods used for genotyping SNPs can be divided into two main
groups based on two main principals used to discriminate the alleles of each
SNP.
Hybridization-based techniques
Discrimination of SNP alleles based on hybridization is the first of the
two main groups. It utilizes the feature that the strength of the binding between two short complementary strands of DNA is changed by a single
mismatched nucleotide. This change in stability of the DNA molecule allows separation using different denaturing conditions for example by high
temperature or low salt concentration in a washing solution.
The first method using the hybridization principle for genotyping SNPs
was published in 1979, and used allele specific oligonucleotides (ASO) for
genotyping of DNA from a bacetriophage [31]. This study showed that a
single nucleotide mismatched changed the denaturing temperature of the
hybridized strands by 10 degrees. Using this principle several types of genotyping techniques have been developed such as real-time PCR using
TaqMan probes [32] or Molecular Beacons [33]. Due to the fact that the
hybridization and denaturing conditions are sequence dependent it has
proven difficult to achieve high multiplexing levels using hybridization for
discrimination SNP alleles [34]. Using specially selected SNP and high
density oligonucleotide arrays it is however possible today to determine the
alleles of over 900,000 SNPs in one experiment using the Genome-Wide
17
Human SNP array 6.0 from Affymetrix (www.affymetrix.com). Affymetrix
uses photolithographic synthesis of ASO probes directly onto a array surface, which allow for the construction of very high density arrays [35]. This
type of array can have up to 40 different ASO probes with slightly different
sequence for each SNP to be genotyped. The SNP site is amplified using
PCR and the PCR products are labeled with biotin which can be detected
using fluorescently marked streptavidin. Using the combined signal from all
probes provides redundancy for the genotype calling to compensate for the
sequence dependent issues that arise in highly multiplexed genotyping by
hybridization.
Enzyme-assisted techniques
Utilizing enzymes to discriminate SNP alleles is the second and largest
group of genotyping techniques used today. Genotyping assays using enzymes for discrimination are more specific than hybridization assays [36].
Enzymes like DNA-ligases and DNA-polymerases have been used extensively. In vivo these enzymes need to be both very specific and to have very
low error rates to perform their natural biological functions. DNApolymerases and DNA-ligases are involved in DNA replication and repair
and are highly sensitive to matched and mismatched nucleotides. Utilizing
the natural functions of these enzymes has resulted in several different genotyping assays for highly specific multiplexed genotyping of SNPs.
Ligation assisted assays
The function of DNA ligase is to repair breaks in DNA molecules by creating a phospodiester bonds to ligate the two strands covalently together. It
will only do so if the ends of the DNA strands are aligned properly, if there
is a mismatched nucleotide pair at the site ligation will not occur. This was
utilized by the Oligonucleotide Ligation Assay (OLA) by using two OLA
probe sequences that hybridize to the target DNA so that a free 3’ and 5’ are
positioned next to each other. If either end is not correctly hybridized due to
the allele of the targeted SNP ligation will not occur. In the first publication
describing the use OLA, ligation was detected by adding a biotin to one of
the probes and a radioactive label to the other. This made it possible to separate the biotinylated primer from the reaction mixture using streptavidin. If a
radioactive signal can be detected a perfect probe match was present in the
DNA and ligation occurred joining the probes. Based on knowing the probe
sequences the genotype of the SNP can be determined [37]. Since its invention OLA has been refined by development of new types of probes and detection schemes. The use of padlock probes is one good example. A padlock probe is a single linear probe with specific recognition sequences for
the selected target DNA sequence in its ends. When these successfully
ligated, the probe circularized [38]. A commercialized version of padlock
18
probes are molecular inversion probes. In molecular inversion probes universal primer sites are added to the probe sequence allowing for highly multiplexed PCR amplification. A tag sequence is also added that allows the
probes to be sorted by hybridization on microarrays for analysis. The circularization of a probe that finds a perfect match is utilized by adding exonuclease to the reaction after ligation which will destroy any linear probe in the
reaction leaving only the circular probes for amplification and sorting [39].
Molecular inversion probes have been used successfully to type 12,000
SNPs simultaneously in one multiplexed reaction [40] and the limit of possible multiplexing has yet to be determined.
Polymerase assisted assays
DNA polymerase is the key enzyme for both PCR and Sequencing and its
ability to assemble a double stranded DNA molecule using a single strand
template can also be utilized for SNP genotyping.
Single nucleotide primer extension
In a single nucleotide primer extension reaction an oligonucleotide primer
is designed so that its 3’ end hybridizes to the nucleotide adjacent to the SNP
site. The DNA polymerase will then extend the primer over the SNP site
enabling determination genotype of the sample. This method of genotyping
originally called minisequencing [41] has since the original publication become known by many different names such as single base extension [42] and
single nucleotide primer extension [43]. As with the names there are many
different single nucleotide primer extension assay formats for performing
SNP genotyping of individual SNPs and in multiplexed formats, with primers in solution, immobilized or sorted on microarrays with detection by several labeling strategies (Table 1).
Assay format
Detection method
Singleplex genotyping by template directed incorporation in microtiterplates
Fluorescent polarization [44]
Multiplexed primer extension
MALDI-TOF detection [45]
Multiplexed genotyping by immobilized
primers on microarrays.
Radioactively labeled ddNTPs [36]
Fluorescently labeled ddNTPs [46]
Multiplexed tag-array minisequencing
Fluorescently labeled ddNTPs [47]
Table 1: Different variants of single nucleotide primer extension
19
Single nucleotide primer extension is available in several commercial applications for SNP genotyping. The GenomeLab™ SNPstream® system
from Beckman & Coulter is designed for flexible medium to high throughput
genotyping [48]. This system analyzes 12 or 48 SNPs per experiment using
tag-array minisequencing with two color fluorescent detection on a 384-well
microtiter formatted microarrays. For whole genome sized SNP genotyping
the Infinium II assay developed by Illumina® uses single base extension
[49]. Using this assay and the humanhap650Y genotyping beadchip over
650,000 SNPs can be typed in a single experiment [50]. In July of 2007 the
human 1M beadchip also using the Infinium II assay was released as the first
commercially available application for genotyping more than one million
SNPs in one experiment.
Allele specific primer extension
To use allele specific primer extension is another way to utilize the function of DNA polymerase for SNP genotyping. In this method allele specific
oligonucleotides (ASO) primers are designed so that the nucleotide at the 3’
end will hybridize to the SNP position in the target sequence. Two ASOs
are needed to determine the genotyped of a SNP. If an ASO primer sequence has a complete match to the target sequence the DNA polymerase
will be able to extend the ASO primer. By detecting the extended ASO the
genotype can be determined. One way to detect extension used in early applications of ASO was to run a PCR reaction use an ASO primer as a one of
the two PCR primers. If the ASO primer fully matched the PCR reaction
would amplify the target sequence and the PCR product could easily be detected using a standard agarose gel [51]. Several alternate detection methods
have been used with ASO primers and one of the latest adaptations of this
reaction principle is the GoldenGate genotyping assay from Illumina® [52].
This assay makes use of both DNA polymerase and ligase to determine SNP
genotypes. If the ASO primer is a match it will be extended by the polymerase until it reaches a locus specific primer (LSO) that stops the extension. In the next step the extended ASO primer is ligated to the locus specific primer. Using universal PCR primer sequences contained in both the
ASO and LSO the ligated product can be amplified and later sorted on to a
bead array [53] using a tag sequence in the LSO. The GoldenGate assay can
be used for flexible genotyping of 1536 SNPs per sample in one experiment.
20
Genetics
Genetic complexity
All human traits and diseases that have a heritable component can roughly
be divided into two major groups according to the genetic complexity underlying the trait or disease in question.
Monogenic
Monogenic traits and diseases follow the Mendelian patterns of dominant
or recessive inheritance. Diseases with Mendelian inheritance in humans
have traditionally been identified and studied by finding families with multiple affected members. By examining how the trait or disease is passed on
through the generations in families the mode of inheritance can be determined. For a trait or disease to be defined as having Mendelian inheritance
it must show either a dominant or recessive pattern in families. Current scientific information about diseases showing Mendelian inheritance are catalogued in the “Online Mendelian Inheritance of Man” (OMIM) database
(www.ncbi.nlm.nih.gov/omim/). OMIM also catalogues genes that have
been indicated to affect phenotypes and diseases with Mendelian inheritance.
The first monogenic disease to have its causative gene identified was cystic
fibrosis. The CFTR gene was identified using linkage-based analysis followed by positional cloning and the most common mutation causing cystic
fibrosis ( F508) was identified [54]. Today more than 1000 mutations have
been described in the CTFR gene, with the F508 being the most common
in cystic fibrosis patients. It is worth to note that even if a disease shows
Mendelian inheritance and is referred to as monogenic, there will be interactions with other genes that result in differences in severity of disease for
patients with the same mutation in the causative gene [55]. Relatively few
traits and diseases have been defined as monogenic, and consequently the
vast majority of human traits and diseases with heritable components are
polygenic.
21
Polygenic
Polygenic traits are most often referred to as complex genetic traits in current literature. The name indicates that in contrast to monogenic traits they
are influenced by more than one gene and do not display an obvious pattern
of Mendelian inheritance in families. The majority of human traits and
common diseases that have a heritable component have a complex genetic
makeup[56]. The most frequently used method for determining the heritable
component of a trait or disease that does not show Mendelian inheritance is
to study twins. By comparing to what extent monozygotic twins share a
phenotype compared to dizygotic twins, provides a good a first estimate of
the heritable component affecting the trait or disease being studied [57].
Genetic analysis
To find the genetic components of any trait or disease two main methods
of genetic analysis have been applied. They are called linkage and association analysis. Both have in common that they utilize known genetic markers
like microsatellites or SNPs and share the common purpose to find one or
more markers that are correlated to the genetic loci that makes up the heritable component of the trait of interest.
Linkage
The process of recombination is where DNA segments are exchanged between paired chromosomes during meiosis. The average number of recombination events is around 38 for females and 24 for males during meiosis
[58]. Recombination is a key mechanism for generating genetic diversity
and gives rise to genetic linkage that can be used to map loci linked to a trait
or disease. The frequency of recombination between loci is related to the
distance between them. Loci that are closer to each other are inherited together and are said to be in linkage with each other.
To perform linkage analysis is to measure how a known genetic marker is
co-inherited with the trait or disease of interest in families. Traditionally in
linkage analysis microsatellites have been used as genetic markers. Linkage analysis tracks the recombination in family materials to locate causative
loci. The analysis is then limited by the number of meiosis available in the
family material which depends on the size of the family material and the
number of generations represented. This limitation can result in poor genetic
resolution of the markers used finding linkage being detected between a
marker and the locus of interest even if they are several mega bases (Mb)
apart. Genetic linkage studies in family materials have been very successful
in identifying genetic loci linked to traits and diseases with Mendelian in22
heritance [59]. Using linkage analysis to identify genetic components of
common diseases has not been nearly as successful [60].
Association
A population cohort sample can be described as a very large pedigree
where the family information is unknown, but where it can be assumed that
all share common ancestry going back far enough in time. This means that
in a pedigree of unknown structure there has been thousands of recombination events that have taken place since the beginning of the common ancestry. In order for this assumption to be valid it is important to ensure that the
ethnicity and geographic origin of the samples are matched as far as possible. If not the problem of population stratification arises where population
subgroups are present in sample which can cause differences in marker allele
frequencies, that in turn can result in false positive findings of association
[61]. Association analysis examines the end result of all the recombination
events in the population which provides higher genetic resolution compared
to traditional linkage analysis that is limited by the number of generations
available in a family material with a known pedigree.
The classical set up for studying association is a case-control study using
SNP markers. It compares the allele frequencies in a group of “cases” that
have a disease of interest for example type 2 diabetes examined in this thesis. This group is then compared to a group of control samples from the
same population that does not have the disease. Recent technological advancements and efforts such as the HapMap project have now made genome-wide association studies a reality [60]. This year (2007) several such
studies have been published with findings of previously unknown genes for
body height and type 2 diabetes (see Present study).
Linkage disequilibrium
The connection between genetic markers and genetic loci utilized in association analysis is called linkage disequilibrium (LD). LD is described using
two types of statistical measurements called D´ and r2. The measurement of
D´ estimates the number of recombination events that have occurred between the two loci and the value ranges from zero to one. If D´ equals one
for two loci, they are considered to be in complete genetic linkage, meaning
that they are inherited together and that no recombination has occurred between them in the population analyzed. However the marker alleles can still
have different frequencies in the population. For SNPs this is caused by
original mutations that have occurred at different time points in the populations genetic history. The value of r2 is therefore used to describe the correlation between two loci where r2 equals one means that the marker alleles
have the same frequencies. This correlation can be used to select the most
23
informative SNPs referred to as tagSNPs in order to maximize cost-benefits
in an association study [62].
Haplotypes
The LD measurements can be used to construct haplotypes formed by
SNPs. A haplotype is defined as a set of distinct genetic loci that are linked
on the same chromosome, that are inherited together. After the completion
of the human reference sequence it was a logical next step to start mapping
the variation in the genomic sequence and using this data determine the haplotype structure of the entire genome. The HapMap project was initiated to
attempt to accomplish this task. The project was started in 2002 and set out
to genotype more than one million SNPs in populations selected to represent
all major population groupings in the world [63]. The results of this effort
were published in 2005 [64] and continued with a second phase of genotyping that were officially completed this year (2007) [65]. The HapMap project has in total genotyped around 6.8 million SNPs and the data is publicly
available making it an invaluable asset for all research involving SNP genotyping.
24
Present study
Overall aim
To use analyze the genetics complex genetic make up of type 2 diabetes
mellitus and human body height.
Specific aims
x To replicate previous findings for the association between the TCF7L2
gene and T2DM originally identifed by by linkage analysis in an Icelandic population. (Study I)
x Test for association between TCF7L2 and T2DM specific quantitative
biochemical markers in the ULSAM population cohort. (Study I)
x To replicate association originally identified in a genome-wide association study in patients from the French population between variants in the
LOC387761 loci and the HHEX, SLC30A8 and EXT2 genes for T2DM
in the Swedish population. (Study II)
x Test for association between SNPs in the LOC387761 loci and the
HHEX, SLC30A8 and EXT2 genes for T2DM (Study II) and T2DM specific quantitative biochemical markers in the ULSAM population cohort.
(Study II)
x Identify candidate genes and analyze the association of SNPs in them
with body height in the ULSAM cohort. (Study III)
x Fine map a region with known linkage to body height on the Xchromosomes using SNPs. (Study IV)
x Analyze SNPs in the four functional candidate genes COL1A11, CSF1,
ALX3 and EPS8L3 for association and linkage with to body height.
(Study V)
25
Trait and disease
Type 2 diabetes mellitus
Type 2 diabetes mellitus (T2DM) is a metabolic disease characterized by
insulin resistance and/or abnormal insulin secretion resulting in hyperglycemia. The diagnostic criteria according to the World Health Organization is
fasting plasma glucose 7.0 mmol/l or 11.1 mmol/l measured two hours
after a oral glucose tolerance test (OGTT) [66]. Left untreated T2DM will
result in sever complications due effect of chronic hyperglycemia. The
complications include an overall increased risk for cardiovascular disease,
retinopathy that can lead to blindness and nephropathy that progress until the
kidneys fail completely.
During the last century there has been a dramatic increase in the incidence
of T2DM world wide, to the point that T2DM is referred to as an epidemic
[67]. It is rapidly becoming one of the largest common diseases in the
world. Recent projection predicts that by the year 2050 there could be as
many as 48 million individuals diagnosed with diabetes in the U.S. alone
[68]. The dramatic rise in T2DM is mainly attributed to changes in human
behavior and lifestyle leading to increased obesity [69]. The best way to
deal with this epidemic is prevention and several studies have clearly shown
that lifestyle intervention can have great success in preventing the development of T2DM in subjects with impaired glucose tolerance (IGT) which is a
pre-stage to full T2DM [70, 71].
Human body height
Standing body height is one of the most basic human quantitative traits.
The heritability of body height has been extensively examined and high
heritability is well known. For adult body height the heritability estimates
ranges from 68-93% [72, 73]. Beside genetic influences, height is also affected by many environmental factors, of which nutrition and health care are
important. Understanding the genetic components of normal variation in
body height would not only provide important insight into basic human biology, but could also serve as a model for future investigations of other complex genetic traits.
26
Genetics
Type 2 diabetes mellitus
T2DM is a complex genetic disease being influenced by several genes
most of which are still unknown. Until last year (2006) only the PPARG
[74] and KCNJ11 [75] genes had been convincingly identified as having an
effect on the risk for T2DM. In January of 2006 the association between a
variant located in the TCF7L2 gene and increased risk (RR=1.56) for T2DM
identified in was published [76]. The TCF7L2 variant was discovered when
Grant and colleagues in the Icelandic DECODE group performed a genetic
fine mapping of a region on chromosome 10 originally identified by linkage
analysis, using an additional high density set of microsatellite markers. This
region had been reported to be linked to T2DM in Mexican Americans [77]
as well as having shown suggestive evidence of linkage in an Icelandic
population [63] . Grant and colleagues also replicated the association initially found an Icelandic population by replicating it in Danish and American
populations. They also genotyped a number of SNPs in the TCF7L2 gene
and found one (rs12255372) in almost complete LD (r2=0.95) with the original microsatellite marker (DG10S478). They suggested that the SNP
rs12255372 and one other SNP rs7903146 should be included in any future
replication efforts by other research groups. These results prompted us to
initiate study I presented in this thesis. Since the original publication numerous population cohorts from all around the world including Europe,
America, Asia and West Africa have been analyzed to replicate the association TCF7L2 and T2DM [78-96]. Additional support for the TCF7L2 as risk
factor for T2DM has been provided by seven genome-wide association studies (GWAS) published during 2007 [91, 97-101]. The first of these studies
was performed by Sladek and colleagues and in addition replicating the
TCF7L2 association they also showed associations to T2DM for the
LOC387761 loci and the HHEX, SLC30A8 and EXT2 genes. These findings prompted us to initiate study II.
Human body height
Body height is a classic complex genetic trait. The results from a large
number of genome-wide linkage studies indicate that there must be multiple
genes controlling body height each with relatively small effect. This in part
explains the absence of findings genes with a clear and reproducible link
with for body height. Loci on all autosomal chromosomes except for 10, 16,
and 19 and Y-chromosome have been suggested to be linked to body height
[102-120]. Up until this year only findings on chromosomes 3,5,6 and 7 had
been suggested in more than one study [112]. The best candidate gene so far
27
for body height is the HMGA2 gene that was only recently found in a genome-wide association study. A common variant in this gene showed convincing evidence for association and it was subsequently replicated in several population cohorts in the same study [121].
Material and methods
Study I-II
Uppsala longitudinal study of adult men (ULSAM)
The ULSAM population cohort was collected as part of an investigation
on
diabetes
and
cardiovascular
disease
in
adult
men
(www.pubcare.uu.se/ULSAM). The study was initiated in 1970 when all
men born between 1920 and 1924 and residing in Uppsala county in Sweden were invited to a health survey, in which 2,322 men participated [122].
The participants have subsequently been invited for follow-ups every 10
years with the last follow-up study completed in 2005. At the follow-up
study conducted when the participants had reached 70 years of age, blood
samples for extraction of the DNA samples analyzed in study I and II were
collected (n=1,142) [123]. The ULSAM cohort has been extensively characterized for studying T2DM using biochemical and clinical approache. The
euglycaemic–hyperinsulinaemic clamp technique [124] considered to be the
gold standard for determining insulin sensitivity was used to calculate the
insulin sensitivity index (M/I). Using the M/I value to adjust for insulin
sensitivity in the body allows examination of the actual -cell function very
precisely. Several other key biochemical markers related to T2DM have
been measured in the ULSAM cohort. They include immunoreactive insulin
(IRI) during a oral glucose tolerance test (OGTT), fasting intact and 32–33
split proinsulin and specific insulin [125].
Genotyping
In study I and II genotyping was performed using a homogeneous single
base extension assay with fluorescent polarization detection using in-house
reagents [44]. Fluorescence polarization was recorded in a fluorometer
(Analyst AD; Molecular Devices, Sunnyvale, CA, USA). Genotyping of two
SNPs in study I resulted in a sample success rate of 98% and 100% genotype reproducibility based on >100 duplicated genotypes from independent
experiments. For the SNP genotyped in study II the sample success rate was
99% and genotype reproducibility 100 %.
28
In study II ten SNPs were genotyped using the GenomeLab™
SNPstream® system [48] (Beckman Coulter, Fullerton, CA, USA) in one
multiplexed reaction. The sample success rate was on average 99% and
genotype reproducibility was >99%. Primer design was performed using the
Autoprimer.com primer design tool (www.Autoprimer.com, Beckman Coulter) for both genotyping methods. The genotypes of all SNPs conformed to
Hardy-Weinberg equilibrium according to a chi-square test (p>0.05). During assay design all SNP alleles showed gave correct inheritance when typed
in a reference family material.
Statistical analysis
All statistical analysis in study I and II were performed using SAS version
9.1 (SAS Institute, Cary, NC, USA). Correction for multiple testing was
performed by calculating the number of tests taking into account the LD
between the SNPs [126]. In study I that resulted in a required overall critical
p-value of 0.033 for significance. In study II the overall critical p-value was
0.007.
In study I, logistic regression analysis was used to test for association between SNPs and T2DM. For the quantitative biochemical measures linear
regression analysis was performed after excluding subjects with type 2 diabetes. Age was used as a covariate in all analyses, and during for biochemical measurements both insulin sensitivity and BMI were used alternately as
covariates. Significant findings were further analyzed using a two-tailed ttest.
In study II power was estimated for detection of association with T2DM
assuming a dominant model of inheritance. It was found to be 80% for a
SNP with a minor allele frequency of 25% using a cut off p-value of 0.05 for
an odds ratio of 1.4. Association analysis for T2DM was performed using a
Chi-square test. Analysis of quantitative biochemical measures was performed using ANOVA, univariate linear regression and multiple linear regression analysis was performed after excluding subjects with type 2 diabetes mellitus, using either age and insulin sensitivity index (M/I) or age and
BMI as covariates.
Study III-V
Study design
Study III used a candidate gene design study to analyze body height.
First 17 genes were selected (see Table 1, Paper 3) all with different connections to body height or growth according to the current literature. By genotyping a smaller number of SNPs distributed across the selected genes we
screened for association with body height. When a suggestive association
29
was found in the ESR1 gene, we genotyped the gene further to detect more
strongly associated polymorphisms. We also replicated our analysis of the
ESR1 gene by genotyping it in a second population cohort, were we found a
significant association.
In study IV we used SNPs to performe fine mapping of a linked locus for
body height on the X-chromosome identified by a combined analysis of several genome-wide linkage scans [120].
Study V combines fine mapping and candidate gene design. Four functional candidate genes for body height were selected from a genomic region
on chromosome 1p21. This region had been shown to be linked to body
height in a previous genome-wide linkage study [112]. After initial analysis
only the COL11A1 showed convincing evidence of linkage and association
to body height. Thus further genotyping and analysis was only done for
COL11A1 to further investigate and replicate the initial findings using association in a Finnish and Icelandic population cohort.
SNP selection
Study III
The principle for SNP selection was to cover the candidate genes, including exons and introns using SNPs with minor allele frequencies > 0.05 at an
even spacing of 1,5 kb to 12.5 kb, depending in the size of the genes. In addition, an Illumina design score of 0.5 was used as the lower limit for selecting of a SNP for genotyping using the Illumina GoldenGate assay[52]. 174
SNPs found in the dbSNP database were selected (see Table 1, Paper IV).
An additional panel of 33 tag-SNPs was selected to investigate the genetic
variation of the ESR1 gene further. Selection was done the Haploview software [127].
Study IV
We designed a genotyping panel for fine mapping the region on chromosome 1p21 using the Illumina® GoldenGate™ assay by which 1536 SNPs
can be genotyped in parallel. The design scheme for the genotyping panel
was centered it with respect to the peak micro satellite marker (DXS1047)
identified previously [120] and use an average physical spacing of 5kb on
average between SNPs.
The SNP were selected manually, assisted by a computer script which
highlighted available SNPs that fulfil the basic selection criteria of validation
status and spacing at 5kb distance flanking the DSX1047 marker. There are
three levels of validation status given in the SNP information file created by
Illumina for the user.
30
1. Top level validation is GoldenGate™ validation status, meaning that a
SNP has been genotyped before by Illumina in-house with their
GoldenGate™ assay.
2. Second level is called Two-hit validated meaning that the SNP has been
genotyped and reported by two different methods and in two populations.
3. Third level of validation and last choice for selecting SNPs for our fine
mapping panel was Non-validated meaning that a SNP has only been reported by one method and in one population in the databases.
When selecting between available SNPs with the same level of validation
the Illlumina® SNP design score was used as the decisive criteria. The SNP
design score uses a proprietary algorithm to estimate the probability of designing a GoldenGate™ assay for any SNP. A GoldenGate validated SNP
has a SNP design score of 1.1. Selecting SNPs with a score above 0.6 was
recommended, and was always used if possible regardless of validation
status. If validation status and SNP design score were equal the minor allele
frequency (MAF) was considered, and SNPs with a MAF>0.05 in a European population were preferred.
The original design containing 1536 manually selected SNPs had an average spacing of ~6kb and covered approximately 9.3Mb of the QTL region
(Figure1, Paper IV).
Study V
Using SNPs from the HapMap database (www.hapmap.org) tagSNPs
were selected to capture the known variation in the functional candidate
genes. Forty eight SNPs were selected and genotyped in total. HapMap
build #16 was used to select the 25 tag SNPs and one additional SNP that is
non-synonymous in COL11A. HapMap build #18 was used to select the 22
tag SNPs in CSF1, EPS8L3, and ALX3.
Sample cohorts
Study III
The initial genotyping of the candidate genes was performed in the ULSAM population cohort (see earlier description). To further investigate our
findings in the ESR1 gene, we utilized genotype data from the Prospective
Investigation of the Vasculature in Uppsala Seniors (PIVUS) population
cohort. The PIVUS cohort consists of 1016 participants, with 507 males and
509 females of age 70, and was originally collected to study endothelial
functions [128]. Both population cohorts are from the Uppsala region in
Central Sweden.
31
Study IV
The genotyped sample cohort used for genotyping consisted of 780 Finnish twin samples from the Finnish twin cohort study [129]. This cohort was
part of the samples used for the combined analysis that identified the linked
region to be fine mapped on the X chromosome [120].
Study V
Two of the four original cohorts used to identify the initial linkage of
stature[112] on chromosome 1p21 were used for fine-mapping in this study.
These cohorts were initially ascertained for familial combined hyperlipedemia (FCHL) and familial low HDL-cholesterol. The detailed characteristics
and the ascertainment protocol for these 54 families are described in their
respective original articles[114, 130, 131]. Some additional family members
of these families recruited since the original linkage studies were included,
as well as an independent set of 38 Finnish families for replication purposes.
In addition to the family samples used for gene identification we also
verified the association in an unselected population sample drawn to represent the Finnish population from the Health 2000-cohort, which is a representative sample of the Finnish population that has been collected in the year
2000 to be used to study a wide range of issues related to public health.
Samples were also randomly selected from the ATBC Study that originally
was a collected to investigate if -tocopherol and ß-carotene supplements
reduce the incidence of lung cancer in Finland. More detailed information
about the Health-2000 and ATBC studies can be found at
www.nationalbiobanks.fi. Finally a set of Finnish dizygotic twins, were
used as part of creating a sample cohort representative for the Finnish population.
Genotyping
Study III
The 174 SNPs selected for the candidate genes were genotyped in the
ULSAM cohort using the GoldenGate assay [52] and the Illumina BeadArray system (Illumina, San Diego, CA, USA). The assay success rate for the
original assays was 79%. The failed SNP assays were caused by 25 SNPs
that were not morphic in the ULSAM cohort. Another 10 SNP assays were
excluded due to sample call rates below 90% and 2 SNPs for which the
genotype distribution deviated from Hardy-Weinberg equilibrium were also
omitted. The sample success rate for the SNP assays that passed quality controls were 96.3% and the reproducibility of genotyping was 99.98%.
32
For the ESR,1 gene 25 SNPs were genotyped in the PIVUS cohort using
the GoldenGate™ assays with an average call rate of 99.5% and reproducibility of 99.8% based on duplication of 2% of the genotypes.
The additional panel of 33 selected tag-SNPs were genotyped in the ULSAM cohort using the SNPstream™ genotyping system (Beckman Coulter,
Fullerton, CA, USA) [48]. Three SNPs from the original genotyping were
also included for quality checking between the two genotyping systems.
The sample success rate for working assays was 92%. Genotype reproducibility between the Illumina and GenomeLab SNPstream systems was
99.7%
In total 47 SNPs in the ESR1 gene were successfully genotyped in the
ULSAM cohort 25 of them were also genotyped in the PIVUS cohort.
Study IV
Genotyping was done using the GoldenGate™ assay and Illumina Bead
array system (Illumina, San Diego, CA, USA) [52]. After testing and quality
checks the final working genotyping panel consisted of 1377 SNPs giving an
assay success rate of 90%. The sample success rate was 93% and reproducibility of the genotypes was 99.9% .
This panel of working assays had an average spacing of ~7kb and coverage remained around 9.3Mb.
Study V
The SNPs in COL11A1 were genotyped using homogenous Mass Extension reaction of the MassARRAY System (Sequenom, San Diego, California, USA). Tag SNPs in CSF1, EPS8L3, and ALX3 were genotyped using
iPlex assay of the MassARRAY System (Sequenom, San Diego, California,
USA). The sample success rate for the 24 working SNPs assays in the
COL11A1 was between 90-99% in the initial genotyping. These samples
were genotyped at the Finish Genome Center.
The Finnish twin samples were genotyped in Uppsala using 9 of the 24
SNPs genotyped previously the COL11A gene 9. Genotyping was performed using the GenomeLab™ SNPstream® system (Beckman Coulter,
Fullerton, CA, USA). The sample success rate for the SNPs ranged from 9799.6%. Genotype reproducibility ranged from 98.8-100% based on 13.45%
of the samples being genotyped in duplicate.
Statistical analysis
Study III
The statistical analysis was performed using the free statistical software
environment “R” [6]. Analysis of variance (ANOVA) was performed to test
for association between SNPs and body height. Replicated nominally sig33
nificant results from ANOVA (p<0.05) were tested using a Wilcoxon rank
sum test on body height according to genotype distribution. In the PIVUS
cohort, males and females were analyzed separately because of known differences in heritability of body height. The Haploview software “Tagger”
was used for SNP selection and to estimate the amount of SNP variation
captured by the panel of SNPs.
Study IV
The Merlin statistical software package [132] was used to perform nonparametric linkage analysis for genotyped SNPs and body height. Analysis
was performed using both multi-point and single-point variance component
linkage analysis. Age and sex were used as covariates. Males and females
were also analyzed separately using age as covariate.
We performed association analysis using QTDT [133], Mendel software
suit [134] and a prototype X-chromosome module for Merlin to test all SNPs
for association with body height. Analysis was performed using the same
groupings and covariates as was done for the linkage analysis.
All computational work was performed using the Linux cluster located at
the Genome Informatics Unit at Biomedcium in Helsinki (www.giu.fi).
Study V
Prior to genetic analyses the phenotype distributions were examined with
SPSS 14.0.1 (SPSS, Chicago, IL). Individuals less than 23 years old was
excluded since they may still be growing. Also outliers (> 3 SD from the
sex-specific mean) were removed prior to genetic analyses because they may
have an undue impact in the subsequent analyses. Variance components
linkage analyses were performed using MERLIN[132] and family-based
association analyses using MENDEL[135]. The significant association found
for one SNP the population sample was examined with SPSS using analysis
of covariance (ANCOVA). In all population analysis the region of residence
was used as a covariate due to the population stratification seen in the Finnish population sample. Correction of multiple comparisons in the familybased analyses was performed by the method proposed by Li and Ji, which
takes linkage disequilibrium to account [136].
34
Results and discussion
Study I
We successfully replicated the association for both SNPs in the TCF7L2
gene with T2DM (Table 2).
Table 2: Association analysis of SNP rs12255372 and rs7903146 genotypes with type
2 diabetes at age 70
SNP
Non-diabetic
Diabetic
p-value
Odds ratio (95% CI)
rs7903146
CC
496 (0.56)
67 (0.40)
CT vs CC
CT
327 (0.37)
83 (0.49)
0.0006
1.88 (1.32–2.67)
TT
62 (0.07)
18 (0.11)
TT vs CC
2.15 (1.20–3.85)
T allele
451 (0.25)
119 (0.35)
0.0002
C allele
1,319 (0.75)
217 (0.65)
rs12255372
GG
GT
TT
T allele
G allele
498 (0.56)
327 (0.37)
63 (0.07)
73 (0.44)
81 (0.48)
14 (0.08)
453 (0.26)
1,323 (0.74)
109 (0.32)
227 (0.68)
0.011
GT vs GG
1.69 (1.20–2.39)
TT vs GG
1.52 (0.81–2.84)
0.0085
Logistic regression was performed to test for association with T2DM. Diabetic subjects were
diagnosed with T2DM diabetes mellitus in accordance with WHO criteria [66].
Analysis of the quantitative biochemical markers in ULSAM showed an
at the time novel association to proinsulin. We found that the risk allele of
both SNPs were associated with elevated levels of proinsulin in plasma using
M/I as covariate (rs7903146 p=0.005 and rs12255372 p=0.004). This finding in supports the findings of Loos and colleagues who reported elevated
proinsulin levels associated with the risk allele of rs7903146 in the TCF7L2
gene [137]. It should be noted that when we performed the analysis using
BMI to adjust for insulin sensitivity as was done by Loos and colleagues we
did not see an association (rs7903146 p=0.18 and rs12255372 p=0.12).
Our findings suggest that the increased risk for T2DM conferred by the
TCF7L2 variants is caused by dysfunction in the production of insulin in the
-cell. Adjusting for insulin sensitivity measured using the euglycaemic–
hyperinsulinaemic clamp technique, we are able to compensate for the otherwise significant association between insulin resistance and elevated fasting
plasma proinsulin. This adjustment enabled us to distinguish the elevated
proinsulin levels associated with impending -cell failure from elevated levels caused by insulin resistance in the body.
35
By using the longitudinal data available in the ULSAM cohort we also
analyzed early insulin response in a subset of the cohort with data from a
intravenous glucose tolerance test performed at the first investigation at age
50 comparing it with the OGTT performed at age 70. This analysis showed
a significantly lower acute insulin response in carriers of the high-risk allele
of SNP rs7903146.
This finding emphasizes the association between
TCF7L2 genetic variants and first-phase insulin release [138-140]. The results highlight the importance of impaired insulin secretion, which occurs
decades before the clinical onset of T2DM diabetes mellitus in individuals at
increased genetic risk [141].
The mechanism behind the increased risk for T2DM associated to
TCF7L2 is currently investigated by many research groups. It is known that
TCF7L2 is a part of the so called WNT pathway [142] that in turn affects the
regulation of glucose homeostasis trough GLP-1 [143]as well as -cell proliferation [144]. A recent study published by Lynsenko and colleagues
showed that the risk allele of TCF7L2 was associated with impaired insulin
secretion and the overexpression of TCF7L2 appears to reduce glucose
stimulated insulin secretion [145]. These finding can be the first real steps to
understand how the TCF7L2 gene causes an increased risk for T2DM.
Study II
Due to lack of statistical power to detect small genetic effects in the ULSAM cohort we were not able to replicate the association for the
LOC387761 loci and the HHEX, SLC30A8 and EXT2 genes with T2DM.
Although not significant our result showed similar odds ratios as was published by Sladek and colleagues [97].
Analyzing the biochemical marker resulted in significant associations for
several measures related to insulin secretion for all three SNPs genotyped in
the HHEX gene (see Table 1, Paper II). Immunoreactive insulin concentrations both in the fasting state and at 30 minutes after an oral glucose load
were significantly associated with the SNPs in the HHEX gene, which was
also the case for the measurement of acute insulin response. Our findings of
association to impaired first phase insulin response provides biochemical
support to the association found between HHEX and T2DM by GWAS [97,
98, 101].
The function of the HHEX in relation to T2DM gene has been reported to
play a central role in both hepatic and pancreatic differentiation in mouse
models for of development [146, 147]. The expression of HHEX in part
regulated by the WNT-signaling pathway [148] as was previously described
also for the TCF7L2 gene (see Study I). We could not find any evidence for
association between the HHEX gene and elevated proinsulin levels analogously to our findings for TCF7L2. This suggests that TCF7L2 and HHEX
36
gene although both being part of the WNT signaling pathway affect the risk
of T2DM via alternate mechanisms.
Study III
We found four SNPs in the ESR1 gene that showed nominally significant
association signals (p<0.05). Based on this finding we selected the ESR1
gene for further study. We performed the same analysis for 26 SNPs in the
ESR1 in the PIVUS cohort and found a male specific association to body
height (p=0.0056). The associated SNP rs2179922 is located in intron 4 of
ESR1. The difference in height can be seen below.
Mean body height1 according to rs2179922 genotype
1
2
Cohort
GG
AA+AG
p-value2
ULSAM
175.1±6.0 (843)
174.2±6.8 (210)
0.03
PIVUS
176.2±6.3 (408)
173.9±6.8 (93)
0.002
Mean standing body height ± SD in cm with number of observations in parenthesis.
Wilcoxon rank sum test
Of the additional 21 tag-SNPs that were genotyped in the ULSAM cohort,
three of them showed nominal evidence for association (p< 0.05). We calculated that the 47 SNPs genotyped in the ESR1 gene captured 73% of the
common SNP variation of ESR1 (MAF 0.05) found in the European sample from the HapMap project.
The ESR1 gene has a functional connection to body height. It has been
shown to have a direct effect on bone development and body height. In a
reported case of a male patient with estrogen resistance caused by mutations
in the estrogen receptor gene the patient had incomplete epiphyseal closure
and a history of continued growth into adulthood. His final body height was
204 cm. [149]. Other published studies with SNPs in the ESR1 gene that are
suggested to be associated to body height in women[150, 151] and adolescent boys [152]. The effect on body height of the G-allele of the SNP
rs2179922 that we observed in males from the ULSAM and PIVUS cohorts
is comparable to the effect of the SNP rs1042725 in the HMGA2 gene in
adult males in multiple cohorts from the UK and Sweden[121].
Our findings suggest that the ESR1 gene could be one of the genes involved in regulating normal variation of body height in males. The power of
this study however does not allow us to exclude small effects on height by
the other candidate genes analyzed.
37
To identify the actual functional variants in the ESR1 gene that affect
height will require re-sequencing of the genes to identify possible rare variants, and functional studies on the molecular level, as well as very large
population-based studies on the interactions between genes and with factors
from the environment.
Study IV
Utilizing the sibling relation of the twin pairs we performed single-point
linkage analysis and when analyzing male samples separately and found 18
SNPs with linkage to body height located together defining a region covering ~65.5kb. Analysis of the female samples separately did not result in any
significant linkage. Association analysis didn’t produce and significant results for any grouping of the samples.
Based on the genotyping results we evaluated the importance the validation status for the final assay success rate. We concluded that the Illumina
design score was more important than the current validation status of the
SNPs (see Table 1, Paper IV).
Two functionally interesting candidate genes are located in the region defined by the SNPs linked in the male samples.
The first gene is Glypican-3 (GPC3) a gene were variations have been
shown to cause Simpson-Golabi-Behmel syndrome which among other
things results in abnormal body height (gigantism) along with skeletal
anomalies [153]. The GPC3 gene is also believed to play a role in the suppression and regulating growth in mesodermal tissues and organs and possibly can also interact with insulin-like growth factor 2 (IGF2) which could
also be a functional link to growth regulation and influence o body height
[154].
The second possible candidate gene is the plant homeodomain finger gene
6 (PHF6). The PHF6 gene is associated with the Borjeson-ForssmanLehmann syndrome, which is a form of X-linked mental retardation were
short stature is part of the clinical manifestations.
Further investigation into this region and the candidate genes in other
population cohort are needed to possibly replicate our findings, and to determine if the suggested candidate genes are influencing normal variation in
body height.
Study V
Linkage results from the first round of genotyping and analysis of all four
candidate genes resulted in non-significant linkage of all but the COL11A1
gene. This prompted this study to focus further effort on the COL11A1 gene
alone. Results from extensive analysis of linkage and association using the
genotyped markers in the COL11A1 gene identified significant linkage and
association for a functional non-synonymous SNP and body height in males.
38
We could show that one allele of this SNP was associated with an increase in
body height for males in the Finnish population. Homozygote carriers of the
serine allele were approximately 0.9cm taller when compared to carriers of
the other allele.
We calculated that this SNP can explain 0.1% of the variance in the
male population and 0.01% in the whole Finnish population.
The COL11A1 gene is a highly relevant candidate gene for human body
height. It is expressed in cartilage tissue, growth plate and in the nucleus
pulposus of the intervertebral discs. The encoded protein is one of the three
subunits that make up collagen XI which in turn is a part of collagen fibrils
that are a vital part of cartilage in the human body.
Known mutations in the COL11A1 gene have been shown to cause Marshall and Stickler syndrome. Phenotypes observed for these syndromes include short stature, osteoarthritis, midfacial hypoplacia and cleft palate, all
indications of skeletal defects. The involvement of COL11A1 in the skeletal
development and morphogenesis is strongly supported by a knock-out mouse
model were COL11A1 null mice were only half the normal length [155].
Our finding of a potentially functional variant in the COL11A1 gene with
relevant biological function related to body height represents one small but
important step towards discovering and understanding the complex make up
of human body height.
Concluding remarks
Both studies I and II demonstrate the importance of having a broad range
of well characterized phenotypes in sample cohorts used to further investigate loci identified using whole genome SNP association studies. Type 2
diabetes has all the characteristics of a typical complex genetic disease so
associated genes are not unlikely to have a wide variety of effects on its pathology.
The three studies III-V presented here confirm that human body height is
a complex genetic trait that is most likely influenced by a large number of
variants throughout the genome. Two genes that are genetically and functionally linked with body height were identified. These are the ESR1 and
COL11A1genes. Two other potential candidate genes, GPC3 and PHF6 were
found to be located in a linked region on the X-chromosome that could be
investigated more closely in the future.
The future of research of human body height will most likely be to analyze the results from genome-wide association studies that have only recently become a usable tool. Because body height is being so readily available in most sample cohorts, it is a good candidate for analysis in upcoming
studies regardless of their main focus.
39
Final thoughts
The work that began all those years ago has now come to completion in
my Ph.D. that you are now reading. As I look back at the years of work that
have passed I am reminded of the mysterious quote
“May you live in interesting times”
This rings true in more ways than one for me. During my time as a
Ph.D. student the reference sequence of the human genome was realized
along side with the genomes of many other species that we share a planet
with. I was able to follow the huge undertaking of the HapMap project from
start to finish. The number of human SNPs in the dbSNP database has increased threefold from 3 million almost 12 million, and the proportion of
them that are validated as true SNPs have risen from around 10% to almost
50%.
On the technology side of science when I started the cutting edge commercial genotyping system in our lab could analyze 12 SNP per sample in
one experiment. Today we have the possibility to analyze over 1 million
SNPs per sample in one experiment.
If you are just beginning your Ph.D. studies as you read this do not
worry, you have many things to look forward to. Re-sequencing of genes
will become an every day operation, advancing to whole chromosomes to
perhaps actually making the $1000 genome a reality. We have just started to
truly explore the human genome in close detail and the complexity of it is so
vast that it is sometimes frightening (I know).
Interesting times it has been, and more is sure to come, but my thesis is
now finished.
One road ends and another begins, who knows what waits around the
next bend not only for me but for all of us ….I look forward to finding out.
40
Acknowledgements
The work presented in this thesis was performed in the group of Molecular Medicine at the Department of Medical Sciences, Uppsala
University.
I would like to express my gratitude to all that have contributed to my
work, helped and supported me during my time working on this thesis.
To my supervisor Ann-Christine Syvänen, thank you for giving me the opportunity perform my thesis work and be a part of your research team.
To my co-supervisor Håkan Melhus, we did not get the chance to do much
work together but I very much appreciated the conversations we did have.
My second co-supervisor Markus Perola I would like to thank for all the help
and interesting collaboration we have had in connection to the GenomEUtwin project and my work on human body height.
To all the people in Markuls Perola’s research group at the National Public Health Institute in Finland. Thank you for making me feel welcome and
helping me out during my visits with you working on the statistical analysis.
To the Molecular Medicine research group.
I would like to express my deepest appreciation for all you past and present
members that I have had the pleasure and privilege to work along side with
during these years.
I should really write a paragraph for each and every one of you but I’m
afraid I would leave someone out (and time is in short supply these days ;)
To all past and present Ph.D. students and project students:
I thank all of you that you that have helped me with my work during and
in the final stages of this thesis work. You have been both good friends and
colleagues to me over the years, I will not forget and will always be grateful
for everything you did for me.
41
To all the people that work at the genotyping core facility within the Molecular Medicine group:
To all of you I would also like to express my appreciation and gratitude,
for you have all been so nice to me and really helped me a lot. It has been an
honor and a privilege to have worked along side you all.
To my collaborators in my work with type 2 diabetes I would like to thank
especially. Christian Berne and Björn Zethelius for providing me the opportunity to work with you on the two studies that became a vital part of this
thesis. I also thank Karin Jensevik and Niclas Ericsson for explaining the
statistics to me.
To all my friends at Uppsala Ju-Jutsuklubb, without you I could not have
done this. Thank you for providing me with a constructive way to handle
my frustration from time to time and for all the joy we ha shared over the
years.
To all my friends outside the world of science and Ju-Jutsu. You have all
been a vital part to get to this point and I’m truly privileged to call all of you
my friends.
Special thanks to Helena for helping me with the arrangements for the upcoming celebrations.
To Martin Hallonqvist
Your support and faith in me has been an invaluable asset during the final
stages of this thesis. I hope that I can be as good a friend to you as you are
to me. I owe you one.
To those that got lost along the way, I will never forget you and I will always
treasure the time we shared together.
Finally to any and all that I might have missed to mention here, I have not
forgotten you and I never will.
Andreas Dahlgren
Uppsala, 2007-10-23
This work was supported by the European Commission through the GenomEUtwin project (Contract QLG2-CT-2002-01254) and by the Swedish
Research Council for Science and Technology (VR-NT).
42
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Mendel, G.J., Versuche über Pflanzen-Hybriden. Verhandlungen des Naturforschenden Vereins zu Brünn, 1866. 4: p. 3-47.
Avery, O.T., C.M. MacLeod, and M. McCarty, Studies on the chemical
nature of the substance inducing transformation of pneumococcal types.
Inductions of transformation by a desoxyribonucleic acid fraction isolated
from pneumococcus type III. J. Exp. Med., 1944. 79: p. 137-159.
Watson, J.D. and F.H. Crick, Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature, 1953. 171(4356): p. 737-8.
Watson, J.D. and F.H. Crick, A structure for deoxyribose nucleic acid.
Nature, 1953. 171(4356): p. 964-7.
Lander, E.S., et al., Initial sequencing and analysis of the human genome.
Nature, 2001. 409(6822): p. 860-921.
Venter, J.C., et al., The sequence of the human genome. Science, 2001.
291(5507): p. 1304-51.
Finishing the euchromatic sequence of the human genome. Nature, 2004.
431(7011): p. 931-45.
Liang, F., et al., Gene index analysis of the human genome estimates approximately 120,000 genes. Nat Genet, 2000. 25(2): p. 239-40.
Genome sequence of the nematode C. elegans: a platform for investigating
biology. Science, 1998. 282(5396): p. 2012-8.
Graveley, B.R., Alternative splicing: increasing diversity in the proteomic
world. Trends Genet, 2001. 17(2): p. 100-7.
Levine, M. and R. Tjian, Transcription regulation and animal diversity.
Nature, 2003. 424(6945): p. 147-51.
Feuk, L., A.R. Carson, and S.W. Scherer, Structural variation in the human
genome. Nat Rev Genet, 2006. 7(2): p. 85-97.
Cheng, Z., et al., A genome-wide comparison of recent chimpanzee and
human segmental duplications. Nature, 2005. 437(7055): p. 88-93.
Edwards, A., et al., DNA typing and genetic mapping with trimeric and
tetrameric tandem repeats. Am J Hum Genet, 1991. 49(4): p. 746-56.
Weber, J.L., et al., Human diallelic insertion/deletion polymorphisms. Am J
Hum Genet, 2002. 71(4): p. 854-62.
Levy, S., et al., The Diploid Genome Sequence of an Individual Human.
PLoS Biol, 2007. 5(10): p. e254.
43
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
44
Bhangale, T.R., et al., Comprehensive identification and characterization
of diallelic insertion-deletion polymorphisms in 330 human candidate genes. Hum Mol Genet, 2005. 14(1): p. 59-69.
Fredman, D., et al., Complex SNP-related sequence variation in segmental
genome duplications. Nat Genet, 2004. 36(8): p. 861-6.
Saiki, R.K., et al., Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science,
1985. 230(4732): p. 1350-4.
Mullis, K., et al., Specific enzymatic amplification of DNA in vitro: the
polymerase chain reaction. Cold Spring Harb Symp Quant Biol, 1986. 51
Pt 1: p. 263-73.
Saiki, R.K., et al., Primer-directed enzymatic amplification of DNA with a
thermostable DNA polymerase. Science, 1988. 239(4839): p. 487-91.
Landegren, U. and M. Nilsson, Locked on target: strategies for future gene
diagnostics. Ann Med, 1997. 29(6): p. 585-90.
Syvanen, A.C., Toward genome-wide SNP genotyping. Nat Genet, 2005. 37
Suppl: p. S5-10.
Sanger, F., S. Nicklen, and A.R. Coulson, DNA sequencing with chainterminating inhibitors. Proc Natl Acad Sci U S A, 1977. 74(12): p. 5463-7.
Drossman, H., et al., High-speed separations of DNA sequencing reactions
by capillary electrophoresis. Anal Chem, 1990. 62(9): p. 900-3.
Mardis, E.R., Anticipating the 1,000 dollar genome. Genome Biol, 2006.
7(7): p. 112.
Margulies, M., et al., Genome sequencing in microfabricated high-density
picolitre reactors. Nature, 2005. 437(7057): p. 376-80.
Ronaghi, M., et al., Real-time DNA sequencing using detection of pyrophosphate release. Anal Biochem, 1996. 242(1): p. 84-9.
Singer, E., The $2 Million Genome TechnologyReview.com, 2007.
Bentley, D.R., Whole-genome re-sequencing. Current Opinion in Genetics
& Development, 2006. 16(6): p. 545-552.
Wallace, R.B., et al., Hybridization of synthetic oligodeoxyribonucleotides
to phi chi 174 DNA: the effect of single base pair mismatch. Nucleic Acids
Res, 1979. 6(11): p. 3543-57.
Livak, K.J., et al., Oligonucleotides with fluorescent dyes at opposite ends
provide a quenched probe system useful for detecting PCR product and
nucleic acid hybridization. PCR Methods Appl, 1995. 4(6): p. 357-62.
Tyagi, S. and F.R. Kramer, Molecular beacons: probes that fluoresce upon
hybridization. Nat Biotechnol, 1996. 14(3): p. 303-8.
Southern, E., K. Mir, and M. Shchepinov, Molecular interactions on microarrays. Nat Genet, 1999. 21(1 Suppl): p. 5-9.
McGall, G.H. and J.A. Fidanza, Photolithographic synthesis of high-density
oligonucleotide arrays. Methods Mol Biol, 2001. 170: p. 71-101.
Pastinen, T., et al., Minisequencing: a specific tool for DNA analysis and
diagnostics on oligonucleotide arrays. Genome Res, 1997. 7(6): p. 606-14.
Landegren, U., et al., A ligase-mediated gene detection technique. Science,
1988. 241(4869): p. 1077-80.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
Nilsson, M., et al., Padlock probes: circularizing oligonucleotides for localized DNA detection. Science, 1994. 265(5181): p. 2085-8.
Hardenbol, P., et al., Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat Biotechnol, 2003. 21(6): p. 673-8.
Hardenbol, P., et al., Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Genome
Res, 2005. 15(2): p. 269-75.
Syvanen, A.C., et al., A primer-guided nucleotide incorporation assay in
the genotyping of apolipoprotein E. Genomics, 1990. 8(4): p. 684-92.
Fan, J.B., et al., Parallel genotyping of human SNPs using generic highdensity oligonucleotide tag arrays. Genome Res, 2000. 10(6): p. 853-60.
Fortina, P., et al., Simple two-color array-based approach for mutation
detection. Eur J Hum Genet, 2000. 8(11): p. 884-94.
Chen, X., L. Levine, and P.Y. Kwok, Fluorescence polarization in homogeneous nucleic acid analysis. Genome Res, 1999. 9(5): p. 492-8.
Sauer, S., et al., A novel procedure for efficient genotyping of single nucleotide polymorphisms. Nucleic Acids Res, 2000. 28(5): p. E13.
Lindroos, K., et al., Minisequencing on oligonucleotide microarrays: comparison of immobilisation chemistries. Nucleic Acids Res, 2001. 29(13): p.
E69-9.
Lindroos, K., et al., Multiplex SNP genotyping in pooled DNA samples by a
four-colour microarray system. Nucleic Acids Res, 2002. 30(14): p. e70.
Bell, P.A., et al., SNPstream UHT: ultra-high throughput SNP genotyping
for pharmacogenomics and drug discovery. Biotechniques, 2002. Suppl: p.
70-2, 74, 76-7.
Steemers, F.J., et al., Whole-genome genotyping with the single-base extension assay. Nat Methods, 2006. 3(1): p. 31-3.
Steemers, F.J. and K.L. Gunderson, Whole genome genotyping technologies on the BeadArray platform. Biotechnol J, 2007. 2(1): p. 41-9.
Wu, D.Y., et al., Allele-specific enzymatic amplification of beta-globin
genomic DNA for diagnosis of sickle cell anemia. Proc Natl Acad Sci U S
A, 1989. 86(8): p. 2757-60.
Fan, J.B., et al., Highly parallel SNP genotyping. Cold Spring Harb Symp
Quant Biol, 2003. 68: p. 69-78.
Oliphant, A., et al., BeadArray technology: enabling an accurate, costeffective approach to high-throughput genotyping. Biotechniques, 2002.
Suppl: p. 56-8, 60-1.
Kerem, B., et al., Identification of the cystic fibrosis gene: genetic analysis.
Science, 1989. 245(4922): p. 1073-80.
Scriver, C.R. and P.J. Waters, Monogenic traits are not simple: lessons
from phenylketonuria. Trends Genet, 1999. 15(7): p. 267-72.
Wang, W.Y., et al., Genome-wide association studies: theoretical and
practical concerns. Nat Rev Genet, 2005. 6(2): p. 109-18.
Boomsma, D., A. Busjahn, and L. Peltonen, Classical twin studies and
beyond. Nat Rev Genet, 2002. 3(11): p. 872-82.
Cheung, V.G., et al., Polymorphic variation in human meiotic recombination. Am J Hum Genet, 2007. 80(3): p. 526-30.
45
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
46
Jimenez-Sanchez, G., B. Childs, and D. Valle, Human disease genes. Nature, 2001. 409(6822): p. 853-5.
Hirschhorn, J.N. and M.J. Daly, Genome-wide association studies for
common diseases and complex traits. Nat Rev Genet, 2005. 6(2): p. 95-108.
Cardon, L.R. and L.J. Palmer, Population stratification and spurious allelic
association. Lancet, 2003. 361(9357): p. 598-604.
Carlson, C.S., et al., Selecting a maximally informative set of singlenucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet, 2004. 74(1): p. 106-20.
Reynisdottir, I., et al., Localization of a susceptibility gene for type 2 diabetes to chromosome 5q34-q35.2. Am J Hum Genet, 2003. 73(2): p. 323-35.
A haplotype map of the human genome. Nature, 2005. 437(7063): p. 1299320.
Frazer, K.A., et al., A second generation human haplotype map of over 3.1
million SNPs. Nature, 2007. 449(7164): p. 851-61.
Alberti, K.G. and P.Z. Zimmet, Definition, diagnosis and classification of
diabetes mellitus and its complications. Part 1: diagnosis and classification
of diabetes mellitus provisional report of a WHO consultation. Diabet Med,
1998. 15(7): p. 539-53.
Zimmet, P., K.G. Alberti, and J. Shaw, Global and societal implications of
the diabetes epidemic. Nature, 2001. 414(6865): p. 782-7.
Narayan, K.M., et al., Impact of recent increase in incidence on future
diabetes burden: U.S., 2005-2050. Diabetes Care, 2006. 29(9): p. 2114-6.
Hossain, P., B. Kawar, and M. El Nahas, Obesity and diabetes in the developing world--a growing challenge. N Engl J Med, 2007. 356(3): p. 213-5.
Pan, X.R., et al., Effects of diet and exercise in preventing NIDDM in people with impaired glucose tolerance. The Da Qing IGT and Diabetes Study.
Diabetes Care, 1997. 20(4): p. 537-44.
Tuomilehto, J., et al., Prevention of type 2 diabetes mellitus by changes in
lifestyle among subjects with impaired glucose tolerance. N Engl J Med,
2001. 344(18): p. 1343-50.
Carmichael, C.M. and M. McGue, A cross-sectional examination of height,
weight, and body mass index in adult twins. J Gerontol A Biol Sci Med Sci,
1995. 50(4): p. B237-44.
Silventoinen, K., et al., Heritability of adult body height: a comparative
study of twin cohorts in eight countries. Twin Res, 2003. 6(5): p. 399-408.
Altshuler, D., et al., The common PPARgamma Pro12Ala polymorphism is
associated with decreased risk of type 2 diabetes. Nat Genet, 2000. 26(1):
p. 76-80.
Gloyn, A.L., et al., Large-scale association studies of variants in genes
encoding the pancreatic beta-cell KATP channel subunits Kir6.2 (KCNJ11)
and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated
with type 2 diabetes. Diabetes, 2003. 52(2): p. 568-72.
Grant, S.F., et al., Variant of transcription factor 7-like 2 (TCF7L2) gene
confers risk of type 2 diabetes. Nat Genet, 2006. 38(3): p. 320-3.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
Duggirala, R., et al., Linkage of type 2 diabetes mellitus and of age at onset
to a genetic location on chromosome 10q in Mexican Americans. Am J
Hum Genet, 1999. 64(4): p. 1127-40.
Bodhini, D., et al., The rs12255372(G/T) and rs7903146(C/T) polymorphisms of the TCF7L2 gene are associated with type 2 diabetes mellitus
in Asian Indians. Metabolism, 2007. 56(9): p. 1174-8.
Chandak, G.R., et al., Common variants in the TCF7L2 gene are strongly
associated with type 2 diabetes mellitus in the Indian population. Diabetologia, 2007. 50(1): p. 63-7.
Chang, Y.C., et al., Association study of the genetic polymorphisms of the
transcription factor 7-like 2 (TCF7L2) gene and type 2 diabetes in the Chinese population. Diabetes, 2007. 56(10): p. 2631-7.
Dahlgren, A., et al., Variants of the TCF7L2 gene are associated with beta
cell dysfunction and confer an increased risk of type 2 diabetes mellitus in
the ULSAM cohort of Swedish elderly men. Diabetologia, 2007. 50(9): p.
1852-7.
Damcott, C.M., et al., Polymorphisms in the transcription factor 7-like 2
(TCF7L2) gene are associated with type 2 diabetes in the Amish: replication and evidence for a role in both insulin secretion and insulin resistance.
Diabetes, 2006. 55(9): p. 2654-9.
De Silva, N.M., et al., The transcription factor 7-like 2 (TCF7L2) gene is
associated with Type 2 diabetes in UK community-based cases, but the risk
allele frequency is reduced compared with UK cases selected for genetic
studies. Diabet Med, 2007. 24(10): p. 1067-72.
Elbein, S.C., et al., Transcription factor 7-like 2 polymorphisms and type 2
diabetes, glucose homeostasis traits and gene expression in US participants
of European and African descent. Diabetologia, 2007. 50(8): p. 1621-30.
Groves, C.J., et al., Association analysis of 6,736 U.K. subjects provides
replication and confirms TCF7L2 as a type 2 diabetes susceptibility gene
with a substantial effect on individual risk. Diabetes, 2006. 55(9): p. 26404.
Hayashi, T., et al., Replication study for the association of TCF7L2 with
susceptibility to type 2 diabetes in a Japanese population. Diabetologia,
2007. 50(5): p. 980-4.
Horikoshi, M., et al., A genetic variation of the transcription factor 7-like 2
gene is associated with risk of type 2 diabetes in the Japanese population.
Diabetologia, 2007. 50(4): p. 747-51.
Humphries, S.E., et al., Common variants in the TCF7L2 gene and predisposition to type 2 diabetes in UK European Whites, Indian Asians and
Afro-Caribbean men and women. J Mol Med, 2006. 84(12): p. 1005-14.
Marzi, C., et al., Variants of the transcription factor 7-like 2 gene
(TCF7L2) are strongly associated with type 2 diabetes but not with the metabolic syndrome in the MONICA/KORA surveys. Horm Metab Res, 2007.
39(1): p. 46-52.
Mayans, S., et al., TCF7L2 polymorphisms are associated with type 2 diabetes in northern Sweden. Eur J Hum Genet, 2007. 15(3): p. 342-6.
47
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
48
Meigs, J.B., et al., Genome-wide association with diabetes-related traits in
the Framingham Heart Study. BMC Med Genet, 2007. 8 Suppl 1: p. S16.
Ng, M.C., et al., Replication and identification of novel variants at TCF7L2
associated with type 2 diabetes in Hong Kong Chinese. J Clin Endocrinol
Metab, 2007. 92(9): p. 3733-7.
Parra, E.J., et al., Association of TCF7L2 polymorphisms with type 2 diabetes in Mexico City. Clin Genet, 2007. 71(4): p. 359-66.
Sale, M.M., et al., Variants of the transcription factor 7-like 2 (TCF7L2)
gene are associated with type 2 diabetes in an African-American population enriched for nephropathy. Diabetes, 2007. 56(10): p. 2638-42.
Scott, L.J., et al., Association of transcription factor 7-like 2 (TCF7L2)
variants with type 2 diabetes in a Finnish sample. Diabetes, 2006. 55(9): p.
2649-53.
Zhang, C., et al., Variant of transcription factor 7-like 2 (TCF7L2) gene
and the risk of type 2 diabetes in large cohorts of U.S. women and men. Diabetes, 2006. 55(9): p. 2645-8.
Sladek, R., et al., A genome-wide association study identifies novel risk loci
for type 2 diabetes. Nature, 2007. 445(7130): p. 881-5.
Genome-wide association study of 14,000 cases of seven common diseases
and 3,000 shared controls. Nature, 2007. 447(7145): p. 661-678.
Salonen, J.T., et al., Type 2 diabetes whole-genome association study in
four populations: the DiaGen consortium. Am J Hum Genet, 2007. 81(2):
p. 338-45.
Saxena, R., et al., Genome-wide association analysis identifies loci for type
2 diabetes and triglyceride levels. Science, 2007. 316(5829): p. 1331-6.
Scott, L.J., et al., A genome-wide association study of type 2 diabetes in
Finns detects multiple susceptibility variants. Science, 2007. 316(5829): p.
1341-5.
Beck, S.R., et al., Age-stratified QTL genome scan analyses for anthropometric measures. BMC Genet, 2003. 4 Suppl 1: p. S31.
Deng, H.W., et al., A whole-genome linkage scan suggests several genomic
regions potentially containing QTLs underlying the variation of stature.
Am J Med Genet, 2002. 113(1): p. 29-39.
Ellis, J.A., et al., Comprehensive multi-stage linkage analyses identify a
locus for adult height on chromosome 3p in a healthy Caucasian population. Hum Genet, 2006.
Geller, F., A. Dempfle, and T. Gorg, Genome scan for body mass index and
height in the Framingham Heart Study. BMC Genet, 2003. 4 Suppl 1: p.
S91.
Hirschhorn, J.N., et al., Genomewide linkage analysis of stature in multiple
populations reveals several regions with evidence of linkage to adult
height. Am J Hum Genet, 2001. 69(1): p. 106-16.
Liu, Y.Z., et al., Genetic linkage of human height is confirmed to 9q22 and
Xq24. Hum Genet, 2006. 119(3): p. 295-304.
Liu, Y.Z., et al., Genetic dissection of human stature in a large sample of
multiplex pedigrees. Ann Hum Genet, 2004. 68(Pt 5): p. 472-88.
109.
110.
111.
112.
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
Mukhopadhyay, N., et al., A genome-wide scan for loci affecting normal
adult height in the Framingham Heart Study. Hum Hered, 2003. 55(4): p.
191-201.
Mukhopadhyay, N. and D.E. Weeks, Linkage analysis of adult height with
parent-of-origin effects in the Framingham Heart Study. BMC Genet,
2003. 4 Suppl 1: p. S76.
Sale, M.M., et al., Loci contributing to adult height and body mass index in
African American families ascertained for type 2 diabetes. Ann Hum Genet, 2005. 69(Pt 5): p. 517-27.
Sammalisto, S., et al., A male-specific quantitative trait locus on 1p21 controlling human stature. J Med Genet, 2005. 42(12): p. 932-9.
Shmulewitz, D., et al., Linkage analysis of quantitative traits for obesity,
diabetes, hypertension, and dyslipidemia on the island of Kosrae, Federated States of Micronesia. Proc Natl Acad Sci U S A, 2006. 103(10): p.
3502-9.
Soro, A., et al., Genome scans provide evidence for low-HDL-C loci on
chromosomes 8q23, 16q24.1-24.2, and 20q13.11 in Finnish families. Am J
Hum Genet, 2002. 70(5): p. 1333-40.
Willemsen, G., et al., QTLs for height: results of a full genome scan in
Dutch sibling pairs. Eur J Hum Genet, 2004. 12(10): p. 820-8.
Wiltshire, S., et al., Evidence for linkage of stature to chromosome 3p26 in
a large U.K. Family data set ascertained for type 2 diabetes. Am J Hum
Genet, 2002. 70(2): p. 543-6.
Wu, X., et al., Combined analysis of genomewide scans for adult height:
results from the NHLBI Family Blood Pressure Program. Eur J Hum Genet, 2003. 11(3): p. 271-4.
Xu, J., et al., Major recessive gene(s) with considerable residual polygenic
effect regulating adult height: confirmation of genomewide scan results for
chromosomes 6, 9, and 12. Am J Hum Genet, 2002. 71(3): p. 646-50.
Weiss, L.A., et al., The sex-specific genetic architecture of quantitative
traits in humans. Nat Genet, 2006. 38(2): p. 218-22.
Perola, M., et al., Combined Genome Scans for Body Stature in 6,602 European Twins: Evidence for Common Caucasian Loci. PLoS Genet, 2007.
3(6): p. e97.
Weedon, M.N., et al., A common variant of HMGA2 is associated with
adult and childhood height in the general population. Nat Genet, 2007.
39(10): p. 1245-1250.
Hedstrand, H., A study of middle-aged men with particular reference to risk
factors for cardiovascular disease. Ups J Med Sci Suppl, 1975. 19: p. 1-61.
Byberg, L., et al., Birth weight and the insulin resistance syndrome: association of low birth weight with truncal obesity and raised plasminogen activator inhibitor-1 but not with abdominal obesity or plasma lipid disturbances. Diabetologia, 2000. 43(1): p. 54-60.
DeFronzo, R.A., J.D. Tobin, and R. Andres, Glucose clamp technique: a
method for quantifying insulin secretion and resistance. Am J Physiol,
1979. 237(3): p. E214-23.
49
125.
126.
127.
128.
129.
130.
131.
132.
133.
134.
135.
136.
137.
138.
139.
50
Zethelius, B., et al., Insulin resistance, impaired early insulin response, and
insulin propeptides as predictors of the development of type 2 diabetes: a
population-based, 7-year follow-up study in 70-year-old men. Diabetes Care, 2004. 27(6): p. 1433-8.
Nyholt, D.R., A simple correction for multiple testing for single-nucleotide
polymorphisms in linkage disequilibrium with each other. Am J Hum Genet, 2004. 74(4): p. 765-9.
Barrett, J.C., et al., Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 2005. 21(2): p. 263-5.
Lind, L., et al., A comparison of three different methods to evaluate endothelium-dependent vasodilation in the elderly: the Prospective Investigation
of the Vasculature in Uppsala Seniors (PIVUS) study. Arterioscler Thromb
Vasc Biol, 2005. 25(11): p. 2368-75.
Kaprio, J. and M. Koskenvuo, Genetic and environmental factors in
complex diseases: the older Finnish Twin Cohort. Twin Res, 2002. 5(5): p.
358-65.
Lilja, H.E., et al., A candidate gene study in low HDL-cholesterol families
provides evidence for the involvement of the APOA2 gene and the
APOA1C3A4 gene cluster. Atherosclerosis, 2002. 164(1): p. 103-11.
Pajukanta, P., et al., Genomewide scan for familial combined hyperlipidemia genes in finnish families, suggesting multiple susceptibility loci influencing triglyceride, cholesterol, and apolipoprotein B levels. Am J Hum
Genet, 1999. 64(5): p. 1453-63.
Abecasis, G.R., et al., Merlin--rapid analysis of dense genetic maps using
sparse gene flow trees. Nat Genet, 2002. 30(1): p. 97-101.
Abecasis, G.R., L.R. Cardon, and W.O. Cookson, A general test of association for quantitative traits in nuclear families. Am J Hum Genet, 2000.
66(1): p. 279-92.
Lange K, C.R., Horvath S, Perola M, Sabatti C, Sinsheimer J, Sobel E. ,
Mendel version 4.0: A complete package for the exact genetic analysis of
discrete traits in pedigree and population data sets. . Amer J Hum Genetics
2001. 69(supplement):A1886.
Lange, K., J.S. Sinsheimer, and E. Sobel, Association testing with Mendel.
Genet Epidemiol, 2005. 29(1): p. 36-50.
Li, J. and L. Ji, Adjusting multiple testing in multilocus analyses using the
eigenvalues of a correlation matrix. Heredity, 2005. 95(3): p. 221-7.
Loos, R.J., et al., TCF7L2 polymorphisms modulate proinsulin levels and
beta-cell function in a British Europid population. Diabetes, 2007. 56(7): p.
1943-7.
Saxena, R., et al., Common single nucleotide polymorphisms in TCF7L2
are reproducibly associated with type 2 diabetes and reduce the insulin
response to glucose in nondiabetic individuals. Diabetes, 2006. 55(10): p.
2890-5.
Florez, J.C., et al., TCF7L2 polymorphisms and progression to diabetes in
the Diabetes Prevention Program. N Engl J Med, 2006. 355(3): p. 241-50.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
151.
152.
153.
154.
155.
Munoz, J., et al., Polymorphism in the transcription factor 7-like 2
(TCF7L2) gene is associated with reduced insulin secretion in nondiabetic
women. Diabetes, 2006. 55(12): p. 3630-4.
Zethelius, B., et al., Proinsulin and acute insulin response independently
predict Type 2 diabetes mellitus in men--report from 27 years of follow-up
study. Diabetologia, 2003. 46(1): p. 20-6.
Prunier, C., B.A. Hocevar, and P.H. Howe, Wnt signaling: physiology and
pathology. Growth Factors, 2004. 22(3): p. 141-50.
Yi, F., P.L. Brubaker, and T. Jin, TCF-4 mediates cell type-specific regulation of proglucagon gene expression by beta-catenin and glycogen synthase
kinase-3beta. J Biol Chem, 2005. 280(2): p. 1457-64.
Rulifson, I.C., et al., Wnt signaling regulates pancreatic beta cell proliferation. Proc Natl Acad Sci U S A, 2007. 104(15): p. 6247-52.
Lyssenko, V., et al., Mechanisms by which common variants in the TCF7L2
gene increase risk of type 2 diabetes. J Clin Invest, 2007. 117(8): p. 215563.
Bort, R., et al., Hex homeobox gene-dependent tissue positioning is required for organogenesis of the ventral pancreas. Development, 2004. 131(4):
p. 797-806.
Bort, R., et al., Hex homeobox gene controls the transition of the endoderm
to a pseudostratified, cell emergent epithelium for liver bud development.
Dev Biol, 2006. 290(1): p. 44-56.
Foley, A.C. and M. Mercola, Heart induction by Wnt antagonists depends
on the homeodomain transcription factor Hex. Genes Dev, 2005. 19(3): p.
387-96.
Smith, E.P., et al., Estrogen resistance caused by a mutation in the estrogen-receptor gene in a man. N Engl J Med, 1994. 331(16): p. 1056-61.
Lehrer, S., et al., Association of an estrogen receptor variant with increased height in women. Horm Metab Res, 1994. 26(10): p. 486-8.
Langdahl, B.L., et al., A TA repeat polymorphism in the estrogen receptor
gene is associated with osteoporotic fractures but polymorphisms in the
first exon and intron are not. J Bone Miner Res, 2000. 15(11): p. 2222-30.
Lorentzon, M., et al., Estrogen receptor gene polymorphism, but not estradiol levels, is related to bone density in healthy adolescent boys: a crosssectional and longitudinal study. J Clin Endocrinol Metab, 1999. 84(12): p.
4597-601.
Pilia, G., et al., Mutations in GPC3, a glypican gene, cause the SimpsonGolabi-Behmel overgrowth syndrome. Nat Genet, 1996. 12(3): p. 241-7.
Weksberg, R. and J.A. Squire, Molecular biology of Beckwith-Wiedemann
syndrome. Med Pediatr Oncol, 1996. 27(5): p. 462-9.
Li, Y., et al., A fibrillar collagen gene, Col11a1, is essential for skeletal
morphogenesis. Cell, 1995. 80(3): p. 423-30.
51
Acta Universitatis Upsaliensis
Digital Comprehensive Summaries of Uppsala Dissertations
from the Faculty of Medicine 287
Editor: The Dean of the Faculty of Medicine
A doctoral dissertation from the Faculty of Medicine, Uppsala
University, is usually a summary of a number of papers. A few
copies of the complete dissertation are kept at major Swedish
research libraries, while the summary alone is distributed
internationally through the series Digital Comprehensive
Summaries of Uppsala Dissertations from the Faculty of
Medicine. (Prior to January, 2005, the series was published
under the title “Comprehensive Summaries of Uppsala
Dissertations from the Faculty of Medicine”.)
Distribution: publications.uu.se
urn:nbn:se:uu:diva-8291
ACTA
UNIVERSITATIS
UPSALIENSIS
UPPSALA
2007