Download Exercise 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Neurobiological effects of physical exercise wikipedia , lookup

Messenger RNA wikipedia , lookup

X-inactivation wikipedia , lookup

Genomic library wikipedia , lookup

Point mutation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Human genome wikipedia , lookup

Primary transcript wikipedia , lookup

Pathogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epitranscriptome wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genomics wikipedia , lookup

Minimal genome wikipedia , lookup

Genome (book) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome editing wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

RNA-Seq wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Algorithms in Computatonal Molecular Biology
Exercise 1
Due: November 15th.
Credit: This exercise contains 4 items, and constitutes about 6/75 of the exercise grades. Solve 3
items for full credit, or 4 for extra credit.
1. A restriction enzyme, which cleaves upon occurance of the sequence GATC, is applied to a
double stranded DNA molecule of length 2kb for complete digestion (any occurance will be
cut). Assume the nucleotides are random with uniform probability (0.25 for each nucleotide).
Consider the random variable X, which is the number of cleavage sites. Assuming that X is
approximately Poisson distributed, what is its expectation?
2. In prokayotes, often one finds an operon i.e. an mRNA molecule which contains two or more
possibly overlapping genes. These genes may be in different (out of six) reading frames.
Assume no two genes in the same reading frame overlap. Design an algorithm which, upon an
input mRNA molecule, prints all possible genes in this sequence, subject to the following
constraints:
i. The input is a stream, and the algorithm must read each input nucleotide only
once.
ii. The output is a stream of (start,stop) pairs of indices to the gene location.
iii. You may use only a constant size memory.
3. Assume the existence of a linear time algorithm that finds whether a sequence S is a
subsequence of T. Give a linear time algorithm to check whether a sequence S is a circular
shift of another sequence T. For example, ciseexer is a circular shift of exercise. What
happens if one wants to test whether S is a subsequence of a circular shift of T?
4. Browse http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/euk_g.html
and get to some human chromosome of your choice (which?). Click on an arbitrary record
(which?) under the “ev” column. The file contains some comments about the chosen gene.
What evidence do you find for the classification of humans as eukaryotes? (Note: you may
have to check more than one record).
Good Luck !!