* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genomewide Motif Recognition with a Dictionary Model
Survey
Document related concepts
Therapeutic gene modulation wikipedia , lookup
Pathogenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Sequence alignment wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Metagenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic library wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Transcript
The University of Chicago Department of Statistics Seminar Chiara Sabatti Department of Human Genetics, University of California at Los Angeles “Genomewide Motif Recognition with a Dictionary Model” ************************* Monday, April 8, 2002 at 4:00 PM 133 Eckhart Hall, 5734 S. University Avenue ABSTRACT Authors: AUTHORS: Chiara Sabatti and Kenneth Lange Bussemaker et al. (2000, PNAS) proposed the simple idea of modeling DNA non coding sequence as a concatenation of words and gave an algorithm to reconstruct deterministic words from an observed sequence. Moving from the same premises, we consider words that can be spelled in a variety of forms (hence accounting for varying degrees of conservation of the same motif across genome locations). The overall frequency of occurrence of each word in the sequence and the parameters describing the random spelling of words are estimated in a maximum-likelihood framework using an E-M gradient algorithm. Once these parameters are estimated, it is possible to evaluate the probability with which each motif occurs at a given location in the sequence. These conditional probabilities can be used to monitor properties of genome sequences, such as neighboring occurrences of given motifs. 4/1/02