* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SNP
Quantitative trait locus wikipedia , lookup
Population genetics wikipedia , lookup
Genome evolution wikipedia , lookup
Primary transcript wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Point mutation wikipedia , lookup
Metagenomics wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
DNA profiling wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
DNA polymerase wikipedia , lookup
DNA vaccination wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Human genome wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Designer baby wikipedia , lookup
Genetic drift wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Molecular cloning wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genomic library wikipedia , lookup
DNA supercoil wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genetic variation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Epigenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Genome editing wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Dominance (genetics) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Helitron (biology) wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Microsatellite wikipedia , lookup
Haplogroup G-M201 wikipedia , lookup
Genotyping: principle, technology. application, and practice Jer-Yuarn Wu 鄔哲源 National Genotyping Center at Academia Sinica [email protected] “Genotype” vs “Phenotype” Genotype: the genetic makeup of an organism or set of DNA variants found at one or more loci in an individual, as characterized by its physical appearance or phenotype. Phenotype: our external features are called our phenotypes and are very different, eg. skin color, eye shape and color, hair texture; drug efficacy/sensitivity; predisposition to complex diseases, such as hypertension, diabetes, etc.. Genotype determines phenotype Human Genome Sequence Variations 1. 2. 3. 4. Restriction Fragment Length Polymorphism (RFLP) Variable Number of Tandem Repeat (VNTR) Short Tandem Repeat Polymorphism (STRP) Single Nucleotide Polymorphism (SNP) Restriction Fragment Length Polymorphism (RFLP) Restriction enzyme cleavage site (4-8 bp) EcoRI GAATTC GGATTC GAATTC GAATTC P M GAATTC GAATTC EcoRI Digestion GAATTC GAATTC GAATTC GGATTC Variable Number of Tandem Repeat (VNTR) Minisatellite (20-50 bp) P M Short Tandem Repeat Polymorphism (STRP) Microsatellite (2-5 bp) P M GAATTC What is STRP? Weber JL and May PM, 1989. Abundant class of human DNA polymorphism which can be typed using the polymerase chain reaction. Am J Hum Genet 44, 388-296. Short Tandem Repeat Polymorphism (also known as microsatellite or simple sequence length polymorphism) Characteristics of STRP Multiple copies of an identical DNA sequence arranged in direct succession in a particular region of a chromosome Highly polymorphic, abundant, multiallelic DNA polymorphism Ex: sequence “GCGCGCGCGCGC” represents six copies of the dimer “GC” Normally 2-5 base pair repeat Application of STRP Forensic Tests Paternity Tests Genetic Study of Heredity Diseases Molecular evolution and Phylogenetics PCR Amplification Fluorescence labeled primer (Blue) PCR Product Length 9 AT repeats marker 40 base pairs 60 base pairs 118 bp 7 AT repeats marker 40 base pairs 60 base pairs 114 bp Instrument ABI 384-well GeneAmp® PCR system 9700 Pooling PCR Product Principle of pooling PCR products : Different dyes with same size range can be pooled into one tube Same dye with different sizes can be pooled into one tube Benefit of pooling Cost saving Throughput increase Instrument Electrophoresis system ABI 3730 48-capillary DNA analyzer Electrophoresis Analysis Laser Catode Anode Sensor Data Collection Softwares Genotyping Software ABI GeneMapper® v3.0 Size Calling Allele Calling Results of STRP genotyping Genotyping Quality Control Check genotypes of CEPH (Centre d'Etude du Polymorphisme humain) controls Check genotypes of identical samples Missing rate of each marker Missing rate of each sample Missing rate of each batch Mendelian Inheritance Errors if family information are available Human Genome Unraveling in 2001 What is a SNP? A variation in the genetic code at a specific point on the DNA. Example order of bases in a section of DNA on a chromosome: ...C C A T T G A C... …G G T A A C T G... ...C C G T T G A C... …G G C A A C T G... Some people have a different base at a given location Single Nucleotide Polymorphism (SNP) gSNP Intergenic Region Intrageneic Region Coding cSNP Non-coding rSNP Functional Variants (5%) iSNP nSNP Features of Single Nucleotide Polymorphisms (SNPs) the variant sequence type has a frequency of at least 1% in the population. high frequency of SNPs in human genome: estimated ~1 SNP/Kb. SNP: bi-allelic markers with 2 common nucleotide substitution alleles (0.1% of total SNPs is tri-allelic markers in TSC data). SNPs has lower mutation rate than do repeat sequences, but not as informative as microsatellite markers. detection methods for SNPs are potentially more suitable for genetic screening in automated and large-scale. the SNPs are likely be responsible for the functional change of the diseases: cSNPs. Allele frequencies of some SNPs tend to be population-specific. Application of SNP Markers for whole genome linkage study of monogenic diseases Disease loci fine-mapping Candidate genes association study Whole genome association study of common diseases Adverse drug reaction Personal medicine SNP genotyping Genome-wide scans for Linkage Analysis: 2,800 – 4,600 markers. Candidate Gene (LD mapping): 100 to 1,000 markers per cM. Genome-wide scans for Association Studies: ~3,000,000 SNP markers If haplotype map is constructed, ~200,000 markers Genome-wide Linkage Analysis Microsatellite markers genome-wide scan with 10 cM microsatellite markers. Advantage: low cost, high heterozygosity Disadvantage: slow (genotyping and allele calling are difficult to automate), widely spaced SNP markers Genome-wide linkage with ~1 cM 4,600 SNPs Advantage: high-throughput (genotyping and allele calling are easy to automate), densely spaced Disadvantage: high cost, low heterozygosity Disease Gene Mapping microsatellite chromosome disease gene genes SNPs Microsatellite markers are too widely spaced to localize the disease gene. SNPs are distributed densely (>1 SNP/kb). What is “Haplotype”? A set of closely linked alleles (genes or DNA polymorphisms) inherited as a unit. A contraction of the phrase "haploid genotype". Haploid, a single set of chromosomes present in the egg and sperm cells of animals. Different combinations of polymorphisms are known as haplotypes. Collectively the results from several loci could be referred to as a haplotype. "Haplo" comes from the Greek word for "single". International Haplotype Mapping (HapMap) Project The Hypothesis Linkage disequilibrium (LD) occurs in blocks (10–100 kb) of low haplotype diversity Few SNPs required to test common variation in each block for association to phenotype LD mapping is more evident with haplotypes than single markers Haplotype block is population specific Block size: Chinese > Caucasian > African Linkage disequilibrium: The occurrence of some genes (SNPs) together, more often than would be expected International Haplotype Mapping (HapMap) Project The Project $100 million investment over three years 1.5 million SNP assays to be developed Participating countries include: United States, United Kingdom, Canada, Japan, China Study populations: Caucasian, African, Japanese, Chinese Cost and throughput consideration for genomewide association study using SNP markers If average size of linkage disequilibrium (LD) is 30~50 Kb. • 20 Kb per SNP marker • 150,000 SNP markers analyzed per association study. • at least 1,000 patients and 1,000 controls in a cohort. • Cost per genotyping: 0.01 US dollar A total of 300,000,000 SNP genotyping data per study and try to associate with available phenotypes. Cost: 300,000,000X0.01=3,000,000 US dollars Factors in SNPs Technology Platforms Consideration Cost Throughput Accuracy Experiment Design Multiplex? Pooling? A workflow chart for SNP genotyping Target fragment amplification PCR (all except Invader assay) Specific amplicon, filtering out repeat sequence before designing PCR primers PCR clean-up to remove unincorporated dNTP (SAP) and/primers (Exo I) Uniplex and/or multiplex Strand displacement amplification Biochemistry of Allelic Discrimination Biochemical feature: high specificity and accuracy Enzymes or process used: DNA ligases, the best specific DNA polymerases, vary in specificity Hybridization, relatively lower discriminating power Allele discrimination based on DNA polymerases Utilizes feature of DNA polymerase DNA synthesis activity in an accurate and precise way 5’ and 3’ exonuclease activity Methods Single-base extension or mini-sequencing Allele-specific extension and allele-specific PCR Synchronized DNA synthesis – Pyrosequencing Structure specific cleavage -- the Invader assay The 5’ exonuclease activity -- the TaqMan assay Single-base extension Extend a primer by one base, the terminators (ddNTP) at the polymorphic site SBE is highly specific, error rate < 10-4 Superior discrimination Methods for allelic specific product identification Mass spectrometry, MALDI-TOF Microarray-based detection Fluorescent resonance energy transfer (FRET) detection Fluorescent polarization (FP) Allele-specific extension (ASE) and allele-specific PCR (AS-PCR) ASE and AS-PCR utilize the difference in extension efficiency between matched and mismatched 3’bases ASE requires two allele-specific primers that anneal to the target with their 3’ bases matching the two alleles of an SNP Less discrimination power compared to SBE Synchronized DNA synthesis pyrosequencing When DNA polymerase incorporate nucleotide on to the 3’ end of the primer, it produces a pyrophosphate. On pyrosequencing, the pyrophosphate trigger an enzymatic cascade to produce a detectable signal High accuracy and quantifiable (pooling) Principle of pyrosequencing system Structure specific cleavage – the Invader assay DNA polymerase from archael bacterial has an endonuclease activity that require specific substrate structure Structure features: Bifurcated duplex with a free 5’ end There is at least one base overlapping the strand that has a free 5’ end and the strand that is annealed to the template The enzyme cleaves the strand with a free 5’ end when one or more overlapping bases are detected The Invader assay design uses 3 oligo probes, two allelic specific ones and one invasive probe Highly specific Genotyping SNP directly from genomic DNA without PCR amplification The quality of Invader probe is critical The Invader assay The 5’ exonucleae activity – TaqMan assay The TaqMan assay exploit the 5’ nuclease activity of DNA polymerase Two dually labeled fluorescent probes annealed to SNP site The TaqMan assay exploit the difference in stability between perfectly matched and single-base mismatched duplexes Requires optimization because subtle difference between perfectly and mismatched duplexes Probe design is empirical Detection by fluorescent resonance energy transfer (FRET) TaqMan genotyping assay Allelic discrimination based on DNA ligases DNA ligases join the end of two oligonucleotides annealed next to each other on a template Ligation occurs only if the oligonucleotides are perfectly matched Oligonucleotide ligation genotyping Allelic Discrimination based on Hybridization SNP genotyping by hybridization exploits the subtle difference in thermodynamics between a perfectly matched and a single-base mismatched duplexes Detection basis: Steady-state dynamic Microarray genotyping Mechanisms for product detection and identification Homogeneous detection mechanism Fluorescent resonance energy transfer (FRET) Fluorescent polarization (FP) Solid-phase-mediated detection mechanism Identification of products by mass spectrometry Microarray as supporter for genotyping Fluorescent coded microbeads as supporters Separation of products by electrophoresis Fluorescent resonance energy transfer (FRET) FRET occurs between two fluorescent groups when they are in physical proximity and one fluorophore’s emission spectrum (the donor) overlaps the other’s (the acceptor) excitation spectrum. When FRET occurs the donor emission is quenched and the acceptor emission increases when the donor is excited FRET can be monitored by quenching of donor emission or increase of acceptor emission Strategies for FRET detection Separation of two closely spaced fluorescent groups Stitch two fluorescent groups together from distant space to promote energy transfer between fluorophores FRET detection FRET utilized SNP genotyping technologies Successfully used to detect allelic-specific products from SBE, ASE, DNA hybridization, DNA ligation and structure specific cleavage SNP technologies that utilize FRET: The TaqMan 5’ nuclease assay The Molecular Beacons The Invader assay The scorpion AS-PCR assay Identification of products by mass spectrometry Discriminating the allelic difference by mass difference A combination of ddNTP and dNTP for SBE extension Multiplexable Higher throughput and accuracy One of the best quantification methods for allele frequency estimation - Pooling Microarray as supporter for genotyping High density (10,000 ~100,000 SNPs available) High throughput (half million SNPs per day) Well-characterized and unique DNA sequences attached to microarray Array the entire target sequences of the chip using multiple overlapping oligonucleotides: resequencing by hybridization Novel SNP discovery Cost Low accuracy? (accuracy improved by multiple hyb at one SNP) Linkage mapping, linkage disequilibrium and association studies Small numbers of subjects and large number of SNPs Fluorescent coded microbeads as supporters Similar to array that well-characterized and unique DNA sequences attached to microbeads that feature fluorescent signature Capable of genotyping large number of SNPs in parallel Flexibility compared to chips Competitive cost compared to chips Separation of products by electrophoresis Tedious protocol High cost DHPLC, SNaPshot, SNuPe, etc. Direct sequencing (gold standard) Single-strand conformation polymorphism analysis Shi MM, Clin Chem, 2001 Direct heterozygote sequencing Cost structure of SNP genotyping Instrument and initial setup costs Fixed cost per SNP marker Consumable cost proportional to usage Technician time, depend on protocol and automation Maintenance cost MassARRAY Genotyping Platfrom Utilizing MALDI-TOF Mass Spectrometer MADI-TOF MS system DNA fragment (extension product) mix with matrix Laser beam Matrix absorb laser energy and transfer to DNA fragment DNA fragment become gas phase and ionized DNA fragment fly in the electricmagnetic vacuum field Detector detect mass produce signal; flight time in reverse ratio of mass/charge MassARRAY process DNA Isolation Target Amplification SAP & Primer Extension Reaction Primer NNNNNNNNNNNN Conditioning and Nano-dispensing of Products onto SpectroCHIP MALDI-TOF Mass Spectrometry Automated Data Analysis and Allele Calling Principle of SNP genotyping to neutralize free dNTPs Principle of SNP genotyping primer, terminators Mass Spectrometry for Analysis and Scoring Haff and Smirnov, Genome Res. 7 (1997): 378 Use mass spec to score which base(s) add Multiplex 4-10 with known primer masses Pool 50 to 500 samples MassEXTEND Reaction Allele 1 Allele 2 Unlabeled Primer (23-mer) Same Primer (23-mer) TCT ACT +Enzyme +ddATP +dCTP/dGTP/dTTP Extended Primer (26-mer) A TCT TG A Allele 2 EXTEND Primer Allele 2 Allele 1 ACT EXTEND Primer Allele 1 EXTEND Primer Extended Primer (24mer) Multiplex Analysis: 5-plex * T C * * A C T G * A G * A G MassARRAY Pooled Genotyping MassARRAY Pooled Genotyping MassARRAY Pooled Genotyping Cost Effectiveness of Pooled Genotyping Cost and Time for Case/Control Study: Conventional Technology 1000 individuals x 100,000 SNPs 100 million genotypes • Study Cost $1.00/genotype $0.01/genotype $100 million $1 million • Time 100,000 genotypes/day 2 million genotypes/day 4 years 2 months 基因型鑑定技術平台 SpectroPREP SNP (Single Nucleotide Polymorphism) 基因型鑑定平台 Sequenom MassARRAY 7K System – 開放式系統(彈性 高)、低通量 (relatively low throughput, 7,000 genotypings/day) Major Instruments (SNP, Sequenom) SpectroPOINT SpectroREADER 基因型鑑定技術平台 Major Instruments STRP (short tandem repeat polymorphism) 基因型鑑定平台 (STRP) Applied Biosystems 3730 DNA Analyzer Hamilton MPH-96 SpectroREADER ABI 3730 Hamilton MPH-96 “Bead-Array” System Genotyping Platfrom Genome-wide Linkage Analysis Microsatellite markers genome-wide scan with 10 cM microsatellite markers. Advantage: low cost, high heterozygosity Disadvantage: slow (genotyping and allele calling are difficult to automate), widely spaced SNP markers Genome-wide linkage with ~1 cM 4,600 SNPs Advantage: high-throughput (genotyping and allele calling are easy to automate), densely spaced Disadvantage: high cost, low heterozygosity Affymetrix GeneChip SNP Genotyping Platform (Microarray-based) Affymetrix Platform Affymetrix Platform Hybridization Oven SNP (Single Nucleotide Polymorphism) 基因型鑑定平台 Affymetrix GeneChip System – 封閉式系統、 高通量 Major Instruments (ultra-high throughput, half-million genotypings/day) (SNP, Sequenom MassARRAY及Affymetrix 為功能上互補之技 Affymetrix) 術平台 PCR Machine Probe Array Fluidics Station Scanner Data Analysis Software SNPlex Overview OLA/PCR on CE for Genotyping SNPlex OLA/PCR Assay Universal PCR Priming site GER P ZipCode A ASOX G ASOY LSO GER C gDNA Target 1. OLA 2. Clean-up GER = Genome Equivalent Region ASO = Allele Specific Oligo LSO = Locus Specific Oligo Ligation Product Formed 3. Multiplexed Universal PCR Univ. PCR Primer Biotin Univ. PCR Primer 4. Capture (Streptavidin) 5. Drag Chute Hybridization • Detection 6. Wash and release Drag Chutes • •• ••• ••• • 7. Load on CE instrument SNP 1 SNP 2 Universal Drag Chute Fluorescent Label •• Mobility Modifiers cZipcode SNPlex 30-plex assay (48-plex capable) 16G 18G 1T 4A 27A 28C 23T 17G 31T 24C 20G 21G 19C 7T 20A 26A 22G 29A 3G 3A 4T 5A 6A 29G 10G 25G 32G 12C 9C 2G 14G 17C 25A 16C 21A 30T 11A 13A 30A 13G Comparison between GeneChip and Bead-Array SNP genotyping System Bead-Array: Close GeneChip: Close Throughput Bead-Array (million SNPs/day) > GeneChip (half million/day) Cost Equipments: GeneChip (300K US$) < Bead-Array (1.5-2 million US$) Genotyping: GeneChip (0.01 US$/SNP) < Bead-Array (0.05 US$ Equipment cost Labor Bead-Array less labor intensive Accuracy Equivalent Could the high-density SNP panels be used in the case/control association studies? Stevens-Johnson Syndrome (SJS) • Severe adverse drug reaction • Caused by medication • Widespread erythematous, cutaneous macules with blisters • severe mucosal erosions • >30% skin detachment Toxic epidermal necrolysis (TEN) Incidence and Epidemiology of Stevens-Johnson Syndrome • Potentially life-threatening, high morbidity and mortality (5~15%). • over 100 drugs can cause SJS Taiwan Incidence 8 *Western countries 1-3 (per million people per year) Major culprit drug Carbamazepine (CBZ) Sulfonamides *Ref: N Engl J Med. 1994, 1995 Aim: to identify the susceptibility gene for Carbamazepine-induced Stevens-Johnson syndrome (CBZ-SJS) • candidate genes approach A. Drug metabolizing enzymes: B. Immune response-related genes: Bioactivation, detoxification 1. Phase I enzymes: Cytochrome P450 superfamily: CYP2C9, 2B6, 2C19, 2D6, 2E1, 3A4, 2C8, and 1A2….. 1. Antigen recognition: HLA (human leukocyte antigen) HLA-A, B, C, DQ, DP, DR T Cell Receptor: Vb... 2. Phase II enzymes: Microsomal Epoxide hydrolase Arylamine N-acetyltransferase UDP-glucuronosyl-transferase….. 3. Receptors 4. Transporters: MDR1 P-glycoprotein… 2. Mediators Cytokines/Chemokines: TNF-a Complements Apoptosis proteins: Fas Enzymes: perforin, granzyme, Leukotriene synthase, ……. Extracellular matrix components Comparison of allele frequencies in the MHC region (Candidate gene approach) • 56 CBZ-SJS patients vs. 101 tolerant control 46 SNPs: p<0.01 : MHC class I, II and III 6 SNPs: p<10-10 : between HLA-C and DRA rs3130690: p<10-30 : near HLA-B locus -log(p value) 40 SJS patients vs. tolerant group 30 20 10 0 A 29.8 30.8 C B 31.8 DRA Position of SNP on Chr6 (Mb) 32.8 33.8 Genome-wide scan approach with 100,000 SNP markers 40 ADR controls vs. 41 SJS patients Affymetrix GeneChips – 100K Xba/Hind 240 Call Rate Reference sample done by NGC Hind Xba Overall Call Rate 93.71% 53641/57244 94.30% 555597/58960 94.26% 109238/116204 Concorda nt Rate 99.47% 99.61% 99.54% Call Rates at NGC Mean ± 1S.D. (Min , Max) Hind* Xba Overall 0.9845± 0.0101 0.9898 ± 0.0070 0.9871± 0.0090 ADR cases (0.9501 , 0.9948) (0.9730 , 0.9977) (0.9501 , 0.9977) ADR controls 0.9903 ± 0.0061 0.9945± 0.0017 0.9924± 0.0049 (0.9685 , 0.9977) (0.9889 , 0.9983) (0.9685 , 0.9983) Overall 0.9874± 0.0088 0.9921± 0.0056 0.9898± 0.0077 (0.9501, 0.9977) (0.9730 , 0.9983) (0.9501 , 0.9983) * One sample with call rate = 0.9385 was excluded. Call Rate Xba vs. Hind Call Rate Cases vs. Controls Minor Allele Frequency Minor Allele Cumulative Cumulative Freq. Frequency Percent Frequency Percent 0.0 18977 16.33 18977 16.33 0.0-0.1 25983 22.36 44960 38.69 0.1-0.2 20723 17.83 65683 56.52 0.2-0.3 18167 15.63 83850 72.16 0.3-0.4 16394 14.11 100244 86.27 0.4-0.5 15960 13.73 116204 100.00 Allele 1 Frequency Allele 1 Frequency Hind and Xba Allele 1 Frequency Allele 1 Frequency Hind and Xba Allele 1 Frequency Hind and Xba Table of map by group group map 0.00 .0-.1 .1-.2 .2-.3 .3-.4 .4-.5 .5-.6 .6-.7 .7-.8 .8-.9 .9-.99 1.0 Total Hind 5559 4.78 9.71 58.53 6733 5.79 11.76 50.05 4685 4.03 8.18 46.43 4317 3.72 7.54 47.43 3679 3.17 6.43 45.78 3542 3.05 6.19 45.89 3670 3.16 6.41 44.38 3714 3.20 6.49 44.83 4199 3.61 7.34 45.82 5068 4.36 8.85 48.13 6435 5.54 11.24 51.15 5643 4.86 9.86 59.53 57244 49.26 Xba 3938 3.39 6.68 41.47 6720 5.78 11.40 49.95 5405 4.65 9.17 53.57 4784 4.12 8.11 52.57 4358 3.75 7.39 54.22 4176 3.59 7.08 54.11 4599 3.96 7.80 55.62 4571 3.93 7.75 55.17 4965 4.27 8.42 54.18 5462 4.70 9.26 51.87 6145 5.29 10.42 48.85 3837 3.30 6.51 40.47 58960 50.74 9497 8.17 13453 11.58 10090 8.68 9101 7.83 8037 6.92 7718 6.64 8269 7.12 8285 7.13 9164 7.89 10530 9.06 12580 10.83 9480 8.16 116204 100.00 Total 9497 + 9480 = 18977 non-polymorphisms Allele 1 Frequency by Ethnic Group Non-SNP Allele Frequency by Ethnic Group Frequency Percent Table of Han by Asian Table of Han by Caucasian Table of Han by African Asian Non SNP SNP Caucasian Non SNP SNP African Non SNP SNP Han Total Han Han Total Total Non SNP 14260 12.27 4717 4.06 18977 16.33 Non SNP 7131 6.14 11846 10.19 18977 16.33 Non SNP 1000 0.86 17977 15.47 18977 16.33 SNP 1975 1.70 95252 81.97 97227 83.67 SNP 3764 3.24 93463 80.43 97227 83.67 SNP 2093 1.80 95134 81.87 97227 83.67 16235 13.97 99969 86.03 116204 100.00 10895 9.38 105309 90.62 116204 100.00 3093 2.66 113111 97.34 116204 100.00 Total Total Table of Asian by Caucasian Table of Asian by African Caucasian Non SNP SNP African Non SNP SNP Asian Total Asian Total Table of Caucasian by African Total Caucasian African Non SNP SNP Total Non SNP 6768 5.82 9467 8.15 16235 13.97 Non SNP 955 0.82 15280 13.15 16235 13.97 Non-SNP 913 0.79 9982 8.59 10895 9.38 SNP 4127 3.55 95842 82.48 99969 86.03 SNP 2138 1.84 97831 84.19 99969 86.03 SNP 2180 1.88 103129 88.75 105309 90.62 10895 9.38 105309 90.62 116204 100.00 3093 2.66 113111 97.34 116204 100.00 3093 2.66 113111 97.34 116204 100.00 Total Total Total Allele 1 Frequency Han vs. American African Allele 1 Frequency Han vs. Caucasian Allele 1 Frequency Han vs. Asian Heterozygosity Test of HWE Proportion Test of Allele Type Association Chr 6 HLA Region -log(p) Chr 6 HLA Region Comparison of allele frequencies in the MHC region (Candidate gene approach) • 56 CBZ-SJS patients vs. 101 tolerant control 46 SNPs: p<0.01 : MHC class I, II and III 6 SNPs: p<10-10 : between HLA-C and DRA rs3130690: p<10-30 : near HLA-B locus -log(p value) 40 SJS patients vs. tolerant group 30 20 10 0 A 29.8 30.8 C B 31.8 DRA Position of SNP on Chr6 (Mb) 32.8 33.8 Affymetrix Platform - Summary Overall average successful rate of Affymetrix 100K GeneChips is 0.9898± 0.0077 Non-polymorphic SNPs account for 16.33% which is significantly higher than Caucasian (9.38%) and African American (2.66%) Previous candidate gene approach can be replicated in genome-wide scan approach with 100,000 SNP Challenge of ultra-high throughput SNP genotyping Enormous SNP data for storage, transfer, and extraction - billions SNP genotyping data per study Statistical method for large number of SNP data Calculation power