* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exercise 1
Public health genomics wikipedia , lookup
Neurobiological effects of physical exercise wikipedia , lookup
Messenger RNA wikipedia , lookup
X-inactivation wikipedia , lookup
Genomic library wikipedia , lookup
Point mutation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Human genome wikipedia , lookup
Primary transcript wikipedia , lookup
Pathogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epitranscriptome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Genome (book) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome editing wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Algorithms in Computatonal Molecular Biology Exercise 1 Due: November 15th. Credit: This exercise contains 4 items, and constitutes about 6/75 of the exercise grades. Solve 3 items for full credit, or 4 for extra credit. 1. A restriction enzyme, which cleaves upon occurance of the sequence GATC, is applied to a double stranded DNA molecule of length 2kb for complete digestion (any occurance will be cut). Assume the nucleotides are random with uniform probability (0.25 for each nucleotide). Consider the random variable X, which is the number of cleavage sites. Assuming that X is approximately Poisson distributed, what is its expectation? 2. In prokayotes, often one finds an operon i.e. an mRNA molecule which contains two or more possibly overlapping genes. These genes may be in different (out of six) reading frames. Assume no two genes in the same reading frame overlap. Design an algorithm which, upon an input mRNA molecule, prints all possible genes in this sequence, subject to the following constraints: i. The input is a stream, and the algorithm must read each input nucleotide only once. ii. The output is a stream of (start,stop) pairs of indices to the gene location. iii. You may use only a constant size memory. 3. Assume the existence of a linear time algorithm that finds whether a sequence S is a subsequence of T. Give a linear time algorithm to check whether a sequence S is a circular shift of another sequence T. For example, ciseexer is a circular shift of exercise. What happens if one wants to test whether S is a subsequence of a circular shift of T? 4. Browse http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/euk_g.html and get to some human chromosome of your choice (which?). Click on an arbitrary record (which?) under the “ev” column. The file contains some comments about the chosen gene. What evidence do you find for the classification of humans as eukaryotes? (Note: you may have to check more than one record). Good Luck !!