* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DNA Analysis
Genetic code wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Maurice Wilkins wikipedia , lookup
DNA sequencing wikipedia , lookup
DNA barcoding wikipedia , lookup
Genome evolution wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Molecular cloning wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
DNA supercoil wikipedia , lookup
Community fingerprinting wikipedia , lookup
Point mutation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Molecular evolution wikipedia , lookup
Non-coding DNA wikipedia , lookup
DNA Analysis Amir Golnabi ENGS 112 Spring 2008 1 Outline: 1. Markov Chain 2. DNA and Modeling 3. Markovian Models for DNA Sequences 4. Hidden Markov Models (HMM) 5. HMM for DNA Sequences 6. Future Works 7. References 2 1.Markov Chain : • Alphabet: S s1 , s2 ,..., sJ • s j are called states, and S is the state space • Notation > S 1,2,..., J • Sequence of random variables: X 0 , X 1 ,..., X n ,... • A sequence of random variables X n n 0 is called a Markov Chain, (MC), if for all n>=1 and P X n jn X 0 j0 ,..., X n1 jn1 P X n jn X n1 jn1 • The conditional probability of a future event X n jn depends only upon the immediate past event X n1 jn1 3 1.Markov Chain (cont.) • Conditional Probability: Pi j P X n j X n1 i , n 1, i , j S P Pi j • Transition Matrix • Property: Pi j 0 , J P j 1 i j J ,J i 1 , j 1 1 • Higher-Order Markov Chains: – Second order MC: Pi ,k j P X n j X n1 i , X n 2 k , n 1, i , j , k S 4 2.DNA and Modeling: • Bases: {A,T,C,G} • Complementary strands > sequence of bases in a single strand • Sequences are always read from 5’ to 3’ end. • DNA mRNA proteins (transcription and translation) • Codons: Triples of bases which code for amino acids • 61 + 3 ‘stop’ codons • Specific sequence of codons gene Chromosomes genome • exons: coding portion of genes • introns: non-coding regions • Goal: To determine the nucleotide sequence of entire genomes 5 3.Markov Chains for DNA Sequences • Nucleotides are chained linearly one by one local dependence between the bases and their neighbors • Markov chains offer computationally effective ways of expressing the various frequencies and local dependencies • Alphabet of bases = {A,T,C,G} not uniformly distributed in any sequence and the composition vary within and between sequences • The probability of finding a particular base at one position can depend not only on the immediate adjacent bases, but also on several more distant bases upstream or downstream higher order Markov model, (heterogeneous) • Gene finding: Markov models of coding and non-coding regions to classify segments as either exons or introns. • Segmentation for decomposing DNA sequences into homogeneous regions Hidden Markov Models 6 4.Hidden Markov Models (HMM) • Stochastic process generated by two interrelated probabilistic mechanisms • Underlying Markov chain with a finite number of states and a set of random functions, each associated with its respective state • Changing the states: according to transition matrix • Only the output of the random functions can be seen • Advantage: HMM allow for local characteristics of molecular sequences to be modeled and predicted within a rigorous statistical framework, and also allow the knowledge from prior investigations to be incorporated into analysis. 7 5.HMM for DNA Sequences • Every nucleotide in a DNA belongs to either a “Normal” region (N), or a GC-rich region (R). • No random distribution: Larger regions of (N) sequence • Example of such a sequence: NNNNNNNNNRRRRRNNNNNNNNNNNNNNNNNRRRRRRRNNNN • States of HMM: {N,R} • Possible DNA sequence with this underlying collection: TTACTTGACGCCAGAAATCTATATTTGGTAACCCGACGCTAA • No typical random collection of nucleotides: GC in R regions: 83% vs. 23% in N regions • HMM: Identify these types of feature in sequences • Ability to capture both the patchiness of N and R and different compositional frequencies within the categories 8 6.Future work… • Better and deeper understanding of HMM • Different applications of HMM, such as, Segmentation of DNA Sequence and Gene Finding • Build an automata for a simple case 7.References • Koski, Timo. Hidden Markov Models for Bioinformatics. Sweden: Kluwer Academic , 2001. • Birney, E.. "Hidden Markov models in biological sequence analysis". July 2001: • Haussler, David. David Kulp, Martin Reese Frank Eeckman "A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA". • Boufounos, Petros, Sameh El-Difrawy, Dan Ehrlich. "HIDDEN MARKOV MODELS FOR DNA SEQUENCING". 9