Download Methods for ARIC Carotid MRI Genotyping Project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Promoter (genetics) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene desert wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genome evolution wikipedia , lookup

Molecular ecology wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcript
Methods for ARIC Carotid MRI Genotyping Project
Gene Selection
Candidate genes related to athersclerosis were identified by the ARIC investigators and
provided to the ARIC DNA laboratory for compilation and verification (n=281).
SNP Selection
TagSNPs within these genes were derived using the Haploview Program
(http://www.broad.mit.edu/mpg/haploview/) based on two sources of SNPs: the
Caucasian (CEU) and Yoruban (YRI) population from the International HapMap project
(http://www.hapmap.org/). The data was analyzed in a race specific manner on a gene
by gene basis using the gene definitions provided by the HapMap database.
Nonsynonymous SNPs were selected by an automated search of public SNP databases.
dbSNP (http://www.ncbi.nlm.nih.gov/SNP/index.html) was used as the primary source
for the data on each gene. The UC Santa Cruz Genome Assembly
(http://genome.ucsc.edu/) was used to supplement dbSNP where no data was available
and to help resolve ambiguous data. The algorithm used for the SNPs selection was
Haploview’s implementation of the Broad Institute’s Tagger software. The R squared cut
off for Tagger was set to 0.8 and the LOD threshold to 2. In addition, Tagger was used in
aggressive multi-marker mode. SNPs with a minor allele frequency (MAF) of less that
0.05 were excluded from consideration before the tagSNPs were calculated. All
tagSNPs selected by Tagger for the CEU population were included in the SNP panel.
TagSNPs that were not in blocks, or only tagged themselves in the YRI population were
not included. Nonsynonymous SNPs with a MAF >0.05 and a limited number of
additional candidate SNPs were included if provided by an ARIC investigator. The final
SNP set for each gene was determined by taking the union of the four SNP sets
(nonsynonymous, tagSNPs from each population and PI requested SNPs) for each
gene. The overall SNP set is time-dependent and is likely to change as the data at the
various SNP databases in refined or expanded. At the time this SNP panel (n=6,890)
was created we used Haploview v. 3.32pr, Hap Map Data Rel 22/phase Apr 07 and
NCBI build 36/dbSNP build 126.
Sample Selection
The ARIC coordinating center provided the ARIC DNA lab with a pull list of 2,110 Carotid
MRI individuals. Eight individuals had withdrawn consent to use their DNA and were not
included in the sample set used for genotyping.
Genotyping
After SNP selection, assay design and oligonucleotide manufacturing a total of 6,104
SNPs in 281 genes were selected for genotyping. Illumina’s FastTrack Genoyping
Services (San Diego, CA) was utilized for completion of the ARIC Carotid MRI
genotyping and DNA samples (n=2,101) were shipped to Illumina in June 2007. A
custom designed iSelect Infinium BeadChip with 7,600 bead types was used
(http://www.illumina.com/pages.ilmn?ID=158) to generate the genotype data.
Page 1 of 2
Genotyping Quality Control
After genotyping was complete, the following exclusion criteria were applied which
resulted in a final data set of 5,266 SNPs that was provided to the ARIC coordinating
center on October 3, 2007.
 Failed production genotyping (i.e. assay did not work) (n=346)
 Missing data > 20% (n=20)
 Monomorphic in all samples (n=472)
 Hardy-Weinberg Equilibrium (HWE) was not utilized as an exclusion criterion for
data submission, thus race-specific HWE should be calculated and a cut off
value determined on an individual basis.
Known replicate samples were included in the plate set for genotyping and concordance
of these duplicated samples for all SNPs was 99.998% (126,674 / 126,676 pairs
matched). The following table describes the two non-matching pairs.
SNP_NAME
rs4649182
rs81663
ID
QC_E67
QC_E67
Genotype1
A/A
T/T
Genotype2
A/T
T/A
Blind duplicate samples were also included and these quality control results will be
calculated and distributed by the ARIC coordinating center.
There were 31 of the 2,101 samples that failed to produce genotype results for all SNPs.
These 31 samples remained in the final data set for completion, but all of the genotypes
were set to missing (“XX”).
Page 2 of 2