Download Word - NIEHS SNPs Program

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Gene expression profiling wikipedia, lookup

Hardy–Weinberg principle wikipedia, lookup

Gene expression programming wikipedia, lookup

Copy-number variation wikipedia, lookup

Genetic drift wikipedia, lookup

Genome (book) wikipedia, lookup

Population genetics wikipedia, lookup

Site-specific recombinase technology wikipedia, lookup

Gene desert wikipedia, lookup

Pathogenomics wikipedia, lookup

Artificial gene synthesis wikipedia, lookup

Genome evolution wikipedia, lookup

Helitron (biology) wikipedia, lookup

Public health genomics wikipedia, lookup

Designer baby wikipedia, lookup

Human genetic variation wikipedia, lookup

Microevolution wikipedia, lookup

RNA-Seq wikipedia, lookup

SNP genotyping wikipedia, lookup

Haplogroup G-M201 wikipedia, lookup

A30-Cw5-B18-DR3-DQ2 (HLA Haplotype) wikipedia, lookup

Tag SNP wikipedia, lookup

Transcript
NIEHS SNPs Workshop
January 11, 2008
Interactive Tutorial 2: Web Tools for SNP Selection
Goal: This tutorial introduces several websites and tools useful for determining linkage
disequilibrium for your gene or region of interest and for tagSNP selection. In this
section, you will cover the following topics.

Linkage disequilibrium (LD) and TagSNPs
o Genome Variation Server (GVS)
o NIEHS SNPs precomputed tagSNPs

NIEHS SNPs website tools
o Visual Haplotype (VH1)
o GeneSNPs – Haplotype view

Batch Genome Variation Server

Haploview
o HapMap Data in Haploview
o EGP Data in Haploview
Part 1. Linkage Disequilibrium using Genome Variation Server
1. Go to the NIEHS SNPs home page at http://egp.gs.washington.edu and select
‘Genome Variation Server’ from the left-hand menu under ‘Software’ (or go directly
to the Genome Variation Server at http://gvs.gs.washington.edu/GVS/).
2. Click the ‘Gene Name’ button. This will take you to a page on which you should
enter FGFR2 in the gene name box, and 2000 in both the upstream and downstream
boxes. Then, click the search button.
3. Under the select data set – deselect EGP-Asian-Panel and select EGP-CEPH-Panel.
In the set-up parameters for display and analysis change the allele frequency cutoff
from 0 to 5, and in Display Results click the green button labeled ‘display linkage
disequilibrium.’
4. A new page will appear with the title Select Display Type. This page contains links
to a table listing the r2 values between pairs of SNPs identified by rs numbers and a
graphical representation of the visual genotypes with a triangular LD plot underneath.
On the Select Display Type page, click on ‘open graphical display of linkage
disequilibrium,’ and a window will appear with a visual display of the genotypes.
The rs numbers of SNPs are listed across the top of the image. These numbers are
coded with keys at the bottom of this window: the ‘Variation Labeling Color’ details
the color-coding of the numbers according to function, and ‘Variation Labeling Style’
explains the font style, with SNPs in unique regions in bold type and SNPs in repeat
regions in normal type. The numbers on the left side of the image represent the
sample ID. Each square represents an individual sample’s genotype: homozygous
for the common allele (blue), heterozygous (red), homozygous for the rare allele
(yellow), undetermined (where no genotypes are available - grey), and conflicting
genotypes (which can occur when you merge multiple data sets – black).
5. Using the visual genotype figure and triangle plot evaluate the LD among SNPs in
this gene. First notice the highest LD is represented by red squares in the triangle plot
– see the LD Min/Max Shading Scale with the lowest r2 set at 0.1. The square colors
in the LD triangle vary between blue (cold for LD, r2 = 0.1) to red (hot for LD, r2 =
1.0) with white squares indicating no LD (r2 <0.1). However, r2 of 0.1 is not very
useful, and many of the sites in this gene are not in LD (more white and blue squares
than red or other colors). So close the graphical display and the select display page
and return to the Genome Variation Server Page and move to the setup parameters for
display and analysis section. Click on the show more parameters button. Now you
will see a section called Color-Coding for LD Plot. To simplify the view change the
Min value from 0.1 to 0.5. Let’s try to display LD again and when the ‘Select
Display Type’ window emerges, click the ‘open graphical display of linkage
disequilibrium.’ What we have done is to get rid of all the low LD matches, but it is
still not clear which sites are in LD with others in the gene. For example, if I asked
you to determine which sites are in high LD (r2 > 0.9, or red squares in the triangle
plot) with rs1219648, which is the fourth SNP from the left, you would not have an
easy task. Let’s try one more way to simplify the LD analysis, which will really help
you answer this question. So close the graphical display and the select display page
and return to the Genome Variation Server Page and move to the setup parameters for
display and analysis section. Go to the section called Clustering in Graphic Display
and select the ‘Cluster SNPs’ option. Now display LD again and when the ‘Select
Display Type’ window emerges, click the ‘open graphical display of linkage
disequilibrium.’ Now determine which SNPs are in high LD with rs1219648. The
first thing you will notice is that associated SNPs are now next to one another. It is
very easy to evaluate the small blocks of LD that are present in this gene.
6. To save the visual genotype/pairwise triangle plot image to your computer, right-click
on the image and choose ‘save as.’
7. You can also get the data in a table form. Return to the ‘Select Display Type’
window and click ‘open table display of linkage disequilibrium.’ The data will appear
as text in a table on a new page. You can also get a text version of the pairwise LD
table by returning to the Genome Variation Server page and, under ‘Data Output and
Display,’ go to ‘Display SNPs by’ and toggle to text. Click on the ‘display linkage
disequilibrium’ button and a savable text-based table will appear.
8. Close both the pairwise LD text table and visual genotype image of LD browser
windows.
9. Explore LD in other populations – this is the CEPH population. In the Genome
Variation Server Page, move up the page to the Select Data Set and deselect EGP
CEPH-Panel and select EGP Yorub-Panel. Leave everything else the same and move
down the page to Display Results and click the display linkage disequibrium and in
the Select Display Type Page, click on ‘open graphical display of linkage
disequilibrium.’ Is there more or less LD in the African population samples?
10. There is one more feature of LD analysis with GVS that could be useful. It is an
important way that you can explore LD in the genome with SNPs identified via
2
association studies either in candidate gene studies or genome-wide studies. For
example, the SNP you were analyzing above rs1219648 is a SNP in the first intron of
the FGFR2 gene. This SNP and several others in the first intron of this gene have
been identified as the top hits in several genome wide association studies of breast
cancer susceptibility (see Hunter et al 2007 Nature Genetics 39:870). Go back to the
opening GVS page where you can select your search. Rather than selecting a gene
name, click on dbSNP rs id, and it will take to the rs ID search. Enter 1219648 and
20000 upstream and downstream and click search. Now when you get to the Select
Data Sets, unselect the EGP Asian-Panel and select EGP CEPH-Panel and the
HapMap CEU. In the Set-Up Parameter for Display and Analysis, you can leave the
merging option set on A common samples and combine variation because these have
similar samples. Then, on the Display SNP by:, toggle to the Custom-Text option.
Click on Display Linkage Disequilibrium. You will then get a choice of formats, but
select SNPs paired with rs 1219648 and enter 0.70 in the r2 box. You can then select
the type of information you will get about the pairs, including function. Scroll down
the page and click submit. How many SNPs in LD with 1219648? What is their
functional status (where do they occur in the gene?). This query is providing more
information because you are exploring more than one data set to simplify your
analysis. Neither of these datasets is complete, but, together, they can give you more
information across the same samples. This type of query can also be done in a batch
mode, which is presented later in this tutorial.
Part 2. TagSNPs determination using GVS
1. To pick tagSNPs, let’s use another gene that has been linked to breast cancer
susceptibility, CASP8. Go back to the opening GVS page. (Hint: Click on GVS:
Genome Variation Server in the left hand corner at the top of the page. This will take
you to the main GVS page). Once there, click on Gene Name, and in the search box
enter CASP8 and enter 20000 in the upstream and downstream boxes, and then click
the search button.
2. Under the select data set – deselect the EGP-Asian-Panel and select EGP-CEPHPanel. This is by far the largest data set available, so best to use for tagSNP selection
since it will give you the most complete set of tags. In the set-up parameters for
display and analysis - change the allele frequency cutoff from 0 to 5, and in Display
Results, click the button labeled ‘display tagSNPs.’ The default parameter for
selecting tagSNPs is r2 > 0.80. When the ‘Select Data Type’ window opens, click on
the graphical display first. How many SNPs meet the allele frequency cutoff
(remember look under the gene name)? Notice that the associated SNPs are now
clustered together in this view. The bars over the SNP ids mark each of the bins
(clusters of associated SNPs) and the asterisk indicates the SNP or SNPs that can be a
tagSNP. SNPs with just asterisk over and no bar are known as singleton SNPs (that
SNP needs to be typed and does not have another proxy in the data set).
3. Close the graphical window and return to the select data type window and click on
‘open table display of tagSNPs.’
4. The output lists each bin, its average minor allele frequency, the SNPs that can be
tagSNPs and, in the last column, the SNPs that the TagSNPs capture information
during an association analysis. Only one tagSNP per bin is required to represent the
3
5.
6.
7.
8.
genetic associations for that bin. Note in Bin 1 all the SNPs can serve as tagSNPs and
they all capture information equally about each other. However, in Bin 3 only one
SNP can capture the associations for four other SNPs in this cluster.
How many tagSNPs should be typed to capture the genetic architecture of this gene
(How many bins are there)? Which is the largest bin of SNPs, and how many SNPs
are in this bin? Are there alternative tagSNPs for this bin? In this bin do any of the
tagSNPs have predicted function? What is its function?
How many bins contain only a single SNP (singleton bins) and how many contain
multiple SNPs?
How many TagSNPs (bins) are needed to capture LD across this gene with a 10%
allele frequency cut-off? Go back to the Genome Variation Server page and, in the
filtering SNPs section, change the allele frequency cut-off from 5 to 10. Display
tagSNPs with this cut-off and click on ‘open table display of tagSNPs’. How many
TagSNPs (bins) are needed to capture the SNP patterns in this gene now?
Of the singleton bins (a SNP that tags itself), which SNP has the largest minor allele
frequency? Remember this SNP would have to be typed to capture the genetic
architecture of this gene – most of these are not captured in the current genome-wide
formats. So they will miss underlying associations. The current formats capture the
largest bins.
Part 3. Merging populations using GVS
1. Close the current tagSNP windows and return to the GVS page for CASP8, and, in
addition to EGP-CEPH-Panel, also select the EGP-AD-Panel (African American
Panel) and the EGP-YORUB- Panel.
2. Scroll down to ‘Merge Samples and Variations’ (under ‘Merging Data Sets’) and
toggle to option C, ‘combined samples with combined variations.’ Keep the 10%
allele frequency cutoff. Choose ‘display genotypes’ and open the graphical display.
How many SNPs are in the combined variation set for these samples? How many
samples altogether? (Remember the number of SNPs and samples are below the
image and Gene Name.)
3. Close the graphical display and select data type window and return to the GVS page
and click ‘display TagSNPs.’ Open the table display of TagSNPs. TagSNPs across
multiple populations are chosen by implementing a Multipop version of LD-Select, as
described in Howie et al. (Hum Genet. 120: 58-68, 2006 - you should be able to
download the .pdf file for free, so yell if you can’t!). The table you have opened
contains the information for the two combined populations, including the tagSNP,
Function, Unique/Repeat, European Bin Represented, and African Bin Represented.
For example, the first tagSNP is rs13402616. It is in the intron of CASP8. It is in a
unique sequence, represents African bin 11, and does not have representation in
Europeans (i.e., it is an African specific SNP). How many TagSNPs are needed to
capture the information in this gene for both populations? (Hint look at the line right
above the table.) How many of these TagSNPs capture information from both
populations? How many population specific TagSNPs are there? Notice some of the
SNPs are in parentheses – these represent alternatives to capture the same information
– function and uniqueness of the sequence determine which tagSNP is listed first.
4
Some of the tagSNPs are in brackets. These represent tagSNPs with genotype
coverage below the threshold. These tagSNPs could be eliminated but for
completeness are included.
4. You can also get a text version of this information. Change ‘Display SNP by’ to
‘text’ and ‘display TagSNPs.’ There is also a custom format option that allows you to
obtain frequency, function, conservation, and flanking sequencing to help you order
your SNP genotyping assays! Try the custom format. Also, you have the option of
sending in all the alternative tagSNPs for genotyping design. Once scored for ease of
genotyping by the company you can view which one to pick as the tagSNP if needed.
This is a very useful feature.
Part 4. TagSNP Selection (LDSelect) in NIEHS SNPs
1. Starting at the NIEHS SNPs home page (http://egp.gs.washington.edu), under “Gene
Targets” (within left-hand navigation menu), click on ‘A-Z Finished Genes
Directory.’
2. Choose ‘A’ and then ‘ADH1C’ to access the gene page data for ADH1C.
3. To find the tagSNPs for ADH1C, scroll down the page to the ‘LD Linkage Data’
section.
4. Click on a population for tagSNPs specifically chosen for that population. TagSNPs
were chosen for each population from all SNPs regardless of minor allele frequency
using the algorithm LDSelect at the default r2 > 0.64.
5. How many tagSNPs (bins) are required to typeADH1C in the European-descent
populations? How many tagSNPs must be genotyped directly because they are not
contained within a bin with another SNP?
6. Which population sample requires more tagSNPs to represent ADH1C: Africandescent or Asian-descent?
Part 5. Using Visual Haplotype (VH1) for Haplotype Analysis in NIEHS
SNPs
1. Go to http://egp.gs.washington.edu
2. Click on “Visual Haplotypes” in the left-hand navigation menu. This software will
display haplotypes. Haplotypes represent the alleles of each SNP assigned to an
individual’s chromosomes. Each individual has two chromosomes representing the
maternal and paternal chromosomes inherited from his or her parents. The visual
haplotype will be twice as long as the visual genotype because now each individual is
represented by two rows of data (haplotypes) instead of just one row of data
(genotypes). NOTE: Be aware that a few of the genes re-sequenced by NIEHS SNPs
are X-linked (males have one X chromosome [haplotype] and females have two X
chromosomes).
3. Using the pull-down menu for ‘EGP Finished Gene Phasebase Input File,’ choose the
gene FEN1 re-sequenced by NIEHS SNPs.
4. To focus on common genetic variation, we suggest you filter by minor allele
frequency for common SNPs. Enter 5 in ‘Rare Allele Percentage (integer, 0 to 50).’
5
5. To identify the number of haplotypes in your population sample, sort by sample. At
‘Cluster By:’ choose ‘SAMPLE.’
6. Click on ‘Run VH1 on the Web!’
7. You should have an image of the haplotypes in a pop-up window. The numbers at
the top of the image represent the SNPs (numbered along a reference sequence used
in re-sequencing the gene). The SNPs here are sorted according to samples with the
same haplotype. The numbers on the side of the image represent the sample ID.
Each square represents an individual sample’s allele: common (blue) and rare
(yellow) allele. Each row represents the individual sample’s haplotype, and each
individual will have two rows representing the two chromosomes. You can identify
the number of common haplotypes manually using VH1.
8. How many haplotypes do you have? (Scroll down – also look at names of samples;
hint there are 2 haplotypes for each sample) How many haplotype tagSNPs would
you genotype to resolve all common haplotypes?
Part 6. Where to Find Haplotypes in NIEHS SNPs
1. In addition to VH1, we offer PHASEv2.0 output for each of the genes re-sequenced
on the NIEHS SNPs website. Under ‘Gene Targets’ in the left-hand navigation
menu, click on ‘A-Z Finished Genes Directory.’
2. Choose ‘F’ and then FEN1.
3. PHASE output is found in the ‘Haplotyping Data’ section of the gene’s web page.
4. Let’s look at the Phase Output file. There are three SNPs listed under ‘Begin List
Summary.’ How many haplotypes? What bases are in the haplotypes for each SNP
site? What is the least frequent haplotype?
Part 7. Haplotypes in GeneSNPs
1. The GeneSNPs resource at the University of Utah
(http://www.genome.utah.edu/genesnps/) is linked off of all NIEHS SNPs gene data
pages. Navigate to GeneSNPs and click on ‘Open Query’ at the top of the page, enter
ADH1C in the Symbol box, then click on the search button.
2. When the entry for the ‘ADH1C’ gene appears, select the link in the ‘Gene Models’
column – ‘UCSC:hg16:4’
3. From the rainbow-colored ‘NIEHS’ drop-down menu select ‘Haplotypes.’ A list of
pre-computed PHASE haplotypes with different minor allele frequency cutoffs and
exonic SNPs are shown for each population.
4. Scroll down the haplotype sections. The EGP95 has been split into 5 populations:
EGP95_AA (African-American), EGP95_AS (Asians), EGP95_AY (Yorubans),
PDR95_EU (European-Americans - CEPH), and PDR95_HI (Hispanics). Go to the
PDR95_EU and find the row with PHASE output for haplotypes with a 0.10 cutoff
and nonsynonymous SNPs not included (nonsyn=NO), then click on ‘view’ in that
row.
5. A new window will appear with a visual representation of the haplotypes in this gene
for this population and for each polymorphic site (across the top). The haplotypes are
color-coded, red for the major allele and yellow for the minor allele. How many
6
different haplotypes are shown? The number of chromosomes carrying these
haplotypes can be found to the far left. How many haplotypes are found three or
more times in this population? Twenty-two CEPH samples are sequenced in the
PDR95. So what should the haplotype count total to? Does it?
Part 8. Batch GVS
Using GVS there are also ways to batch large jobs. These are the jobs were you wouldn’t
want to enter each query independently. For example you can provide a list of genes, i.e.
an entire pathway of genes, that you are interested in picking tagSNPs for.
1. Go to GVS Batch http://gvsbatch.gs.washington.edu/GVSBatch/ or go to the GVS
front page and there is link to GVS Batch on the left hand side. Just click and you are
there.
2. Under the ‘Input List File’ – There is a link to the information page for GVS Batch.
GVS batch can do all of the queries that interactive GVS can do. Click on the link to
the information page and read the introduction. You will then move onto the
parameters that can be requested.
3. If you move down the page you will see examples that you can download. Go to
example #6 and download this onto your desktop. Then go back to the front page of
GVS Batch. Browse to find example6.txt and upload it. Enter your e-mail address in
the box and then click submit. The Server will send you e-mail when the job is
finished. And you can download your file.
4. Open the file you downloaded and examine - the first part of the file is just a repeat of
the query. After that is the tagSNP output for the ABO gene. How many tagSNPs
(bins) are required to type ABO? How many for VKORC1?
5. You can try lots of other simple modifications of this one file like changing the
population from 596, which is our PGA-CEPH samples to population 1409, the
HapMap CEU samples.
Here are some helpful population identifiers:
population_id |
class
|
handle
| local_population_id
---------------+-------------+------------+--------------------1409 | CSHL-HAPMAP | HapMap-CEU | EUROPE
1410 | CSHL-HAPMAP | HapMap-HCB | EAST ASIA
1411 | CSHL-HAPMAP | HapMap-JPT | EAST ASIA
1412 | CSHL-HAPMAP | HapMap-YRI | WEST AFRICA
1471
1472
1473
1474
1475
|
|
|
|
|
EGP_SNPS
EGP_SNPS
EGP_SNPS
EGP_SNPS
EGP_SNPS
|
|
|
|
|
EGP_YORUB-PANEL
EGP_HISP-PANEL
EGP_CEPH-PANEL
EGP_AD-PANEL
EGP_ASIAN-PANEL
You can find the list of all populations at
http://gvsbatch.gs.washington.edu/GVSBatch/populations127.t
xt
7
Haploview
Part 9. Using HapMap Data in Haploview
1. Download and install Haploview from
http://www.broad.mit.edu/mpg/haploview/download.php
Install Haploview as suggested for your computer operating systesm as directed on
the download page. You must also have Java installed on your computer. There is a
link to a site to download Java as well. If Haploview does not open after installation,
you most likely do not have Java installed on your computer. Install Java and try
again. Once you have Haploview open it and you be viewing two windows one
Labeled Haploview 4.0 and one that says Welcome to Haploview . The Welcome to
Haploview window is looking for you to enter data to work in Haploview. On the left
hand down arrow, click and select HapMap Download. You need to be connected to
the internet. Leave the release at 21 (that is the current release for HapMap data),
select chromosome 17 and in the start enter 23105 and end 23154. Click “OK.”
2. The first view of the data is the “Check Markers” window. This provides a nice
summary of the marker data, including the name of the markers, genomic position of
the markers, observed heterozygosity, predicted heterozygosity, Hardy Weinberg, %
of samples successfully genotyped, the number of fully genotyped family trios for
each marker, the number of Mendelian inheritance errors, minor allele frequency, and
pass/fail quality control for each marker.
3. Haploview offers a graphical view of the LD statistic. Click on the “LD Plot” tab. To
view the entire image at once go to the Display at the top of the window and go to
LDZoom and choose unzoomed. You can change the haplotype block definitions by
going to “Analysis” and select the block definition. The default is the block
definition by Gabriel et al. in Science (2002). To change the LD statistic, click on
“Analysis” then define blocks and choose the “four-gamete rule.”
4. How many blocks are in NOS2A for the CEU population using the four gamete rule?
(You can also select Haplotypes to get the answer as well)
5. If you didn’t go to Haplotypes yet go there now. How many haplotypes were
identified in Block 2? How many haplotype tagging SNPs were identified? (Go to
display options and select ‘Show tags in blocks’. TagSNPs are indicated by triangles
above the genotype.)
6. For the minimal set of tagSNPs, go to the “Tagger” tab. You can choose the
algorithm used to define the tagSNPs. For this example, choose “pairwise tagging
only” (This option is just an implementation of LD-Select which is noted in the
Tagger documentation). Then click “Run Tagger.” The results are displayed so that
the tagSNPs are shown in one window, the SNPs being tagged in the other window.
Using “Tagger” and “pair-wise tagging only,” how many Haploview tagSNPs are in
NOS2A for the CEU HapMap data? Does this change if you use the aggressive
tagging approach (with combinations of 2 or 2 and 3 markers)?
8
Part 10. Using EGP Data in Haploview
1. Download the EGP data for NOS2A for this exercise from GVS. Go to the main
GVS page and select gene name. Enter NOS2A don’t bother with upstream or
downstream its enough as is. In the Select Data Set(s) – deselect EGP-YORUBPANEL and Select the EGP-CEPH-PANEL. Go to Set-up parameters and select
Custom-Text in the Display SNPs By and then click on display genotypes. You can
now select the format you want, since you need two files for Haploview the
Haploview genotypes and Haploview marker info, the simplest thing to do is select
Download a tarball with all formats. Submit and download your data. You can also
go to http://egp.gs.washington.edu/workshop/download/ and download
GVS.haploviewGenotypes.NOS2A.txt and GVS.haploviewMarkers.NOS2A.txt
2. Open Haploview. If Haploview is already open from the previous exercise, under
“File,” choose “Open new data.” You want to stay in the Linkage Format this time
and browse to load the files in your downloaded folder from GVS. For Data file upload the Genotypes file and Locus Information File upload the Markers file. Click
“OK”.
3. Repeat steps 3 and 4 from “Using HapMap Data in Haploview.” Note the difference
between a complete set of common variation data (EGP) and common set of sampled
variation data (HapMap). There are significant differences.
4. How many blocks are in NOS2A for the CEU population using the four gamete rule?
(You can also select Haplotypes to get the answer as well)
5. How many haplotypes were identified in Block 2? How many haplotype tagging
SNPs were identified? (Go to display options and select ‘Show tags in blocks’.
TagSNPs are indicated by triangles above the genotype.)
6. How many tagSNPs are identified using pair-wise tagging only in “Tagger” using the
EGP data? Does the 2-marker or 2-marker or 3-marker save you many tagSNPs.
Also try 2-marker, or 2- or 3-marker more than once – how many SNPs does it give
you same answer as before. With incomplete data sets you can get slightly different
answers. Think about the influence of aggressive tagging on developing marker sets.
9
Answers to Questions
Part 1.
5.SNPs in LD with rs1219648. Ans: rs2981575, rs3135718, rs11200014, rs11379664,
rs41302265
9. The Yoruban samples had more sites and few more small LD blocks but not much
more LD over the gene.
10. 14 SNPs are in LD with rs1219648: rs11379664, rs10736303, rs11200014,
rs41302265, rs2912780, rs2981579, rs1078806, rs2981578, rs2981575, rs2912774,
rs2936870, rs2860197, rs2981582, rs3135718. These are all intronic.
Part 2.
2. 48 SNPs across 22 samples
5. 23 tagSNPs, Bin1, 6 SNPs
Yes, leads to an amino acid substitution in the coding region – a nonsynonymous
SNP – rs13006529
6. 11 bins are singletons, 12 bins have multiple SNPs
7. 10 tagSNPs
8. Bin 8 – rs1035140 has a 50% minor allele frequency, Bin 9 and 10 are common SNPs
as well and would also need to be typed.
Part 3.
2. 70 SNPs, 49 Samples
3. 39 tagSNPs, 12 tags SNPs capture genetic information in both populations, 27
tagSNPs capture genetic information in only one population.
Part 4.
5. 15 bins or tagSNPs: eight bins with >1 SNP; seven “bins” with only one SNP.
6. African-descent (23 tagSNPs). Asians-descent requires 7 tagSNPs. This gene does
follow expectations for the populations being examined, African – the most tags (least
LD), European – fewer tagSNPs compared to Africans (more LD), and Asians –fewer
tagSNPs than Europeans (and more LD). Also note that there is no frequency cut-off
here so tags for low frequency SNPs are also included. To filter use GVS!
Part 5.
8. How many haplotypes? 3
How many haplotype tagSNPs would you genotype to resolve all common
haplotypes? 2 – the SNP at 1175 captures one haplotype and either 995 or 5213
SNP captures the other haplotype.
Part 6.
4. 3, GCG, GTG, ACT, Haplotype 2 or GTG.
Part 7.
5. 10, 5, 44, Yes
10
Part 8.
4. 15 for ABO, 4 for VKORC1
Part 9.
4. 8 Blocks
5. 5 Haplotypes in Block 2; 4 tagSNPs for Block 2
6. Pairwise – 20 tagSNPs capture 42 SNPs, or aggressive tagging using 2 –marker – 14 to
16 tagSNPs capture 42 SNPs, or 2- and 3-marker – 14 to 17 tagSNPs to capture 42 SNPs.
Part 10.
4. 22 blocks
5. 3 haplotypes in Block2, 2 tagSNPs for block 2
6. Pairwise - 51 tagSNPs other options give you 50 tagSNPs not a significant decrease.
11