Download CSE280A Class Projects

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Copy-number variation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epistasis wikipedia , lookup

Genome (book) wikipedia , lookup

Human genome wikipedia , lookup

Non-coding DNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene expression profiling wikipedia , lookup

Metagenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene desert wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

The Selfish Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Group selection wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Population genetics wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Tag SNP wikipedia , lookup

Genetic drift wikipedia , lookup

HLA A1-B8-DR3-DQ2 wikipedia , lookup

Microevolution wikipedia , lookup

A30-Cw5-B18-DR3-DQ2 (HLA Haplotype) wikipedia , lookup

Transcript
CSE280A Class Projects
1
Complex regions architecture
The KIR region has a very complex architecture, as the diversity of gene regions helps in the immediate
immune response.
1.1
Steps:
1. Visit the IPD-KIR database (http://www.ebi.ac.uk/ipd/kir/) and read the introduction of the attached
manuscript to get an overview of the region.
2. Use http://www.ebi.ac.uk/ipd/kir/sequenced_haplotypes.html to download the complete known sequences of KIR in individuals. Call this set S.
3. Download candidate amino-acid sequences for all of the genes that may be found in this region. Call
this set G.
4. Use Blast or a similar tool to decide the presence/absence of each gene in G in each of the sequences
in S. This may be difficult as the genes are repetitive, and you must use appropriate algorithms to
make the correct decision.
5. Write a script to compute and display a dot-plot for any pair of sequences in S. See Figure 1(b) of
attached manuscript for an example. Along with the dot-plot, you must have gene locations displayed as
vertical and horizontal projections of intervals. Use an optional masking feature to mask out repetitive
sequence.
6. Use a short-read fragment simulator to generate paired-end reads from the candidate sequence. Design
and implement an algorithm that uses the reads from some s ∈ S to decide presence and absence of
each gene g ∈ G. Note that this may be easy or not depending upon the repetitive nature of the genes.
Show results that compare your tool to the known structure of the genome.
7. Repeat the previous experiment for pairs of sequences from S.
1
2
Multi-allelic, and polygenic signatures of Selection
The goal of this project is to understand selection signatures in multi-allelic (soft-sweep) and polygenic
selection. Start by building a forward simulator that can simulate these kinds of selections.
1. Build a standard forward-simulator for haploid population as follows: assume a Wright-Fisher model
with N haplotypes from generation to generation. Haplotypes containing a beneficial allele are selected
with probability ∝ 1 + s whereas other haplotypes are selected with probability ∝ 1. Each individual
is mutated at m sites from its parent, where m is drawn from Poisson distribution with parameter µ.
Assume that there is no recombination.
2. In the beginning, start with all haplotypes being all 0, and run the simulator without selection for
about 2N generations so that we get a well mixed haploid population. Now (under selective constraints)
choose a low-copy number allele as beneficial with selective pressure. This should result in a hard-sweep.
Show results on the following after doing very large number of simulation (100,000):
(a) Time to fixation of the haplotype containing the beneficial allele as a function of N, s. Try doing
the time to fixation in generations a function of 1s ln N s.
(b) Generate plots of scaled-site-frequency spectrum, and distribution of haplotype frequencies of
common haplotypes in the region as a function of time in generations.
3. Next, generate soft-sweep as follows. Use the same model as above, but introduce selective constraint
when the benefical allele already exists in multiple haplotypes. Now, each of these haplotypes will be
selected with probability 1 + s.
4. Compute scaled-site-frequency spectra (SSFS) as well as distribution of haplotype frequencies, and
contrast your results with the hard-sweep case.
5. To mimic polygenic selection simulate two regions simultaneously. Assume that the regions are unlinked
(very far away, or on separate chromosomes), so that an individual chooses a parent independently
in each region. Choose an allele in each of the two generations as being beneficial. Simulate as
before, except that under selection, individuals containing one or more beneficial alleles are as fit as
individuals containing two beneficial alleles. Thus, there is no guarantee that the beneficial allele is
driven to fixation. Compute the impact of this on SSFS, etc. in each region.
6. Design and implement new statistics for identifying signatures of polygenic and/or multi-allelic selection.
2