Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Why CpG islands? CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Including some slights of Papoulis. These notes will be further modified. Dec/15/09 Notes on probability are from A. Papoulis and S. U. Pillai, 1 RNA interference, and DNA methylation RCOOHRCOOCH3 • Methylation involves on the regulation of the gene expression, protein functioning, and RNA metabolisms. • A cell is combination of numerous proteins, each determining how a cell functions. Disproportionately expressed proteins will have devastating effects. • Two possible vulnerabilities: – one is at the transcriptional level, while dna is converted to mRNA, a fraction of antisense oligonucleotide binding to unprocessed gene in the DNA, and creating a 3strand complex, as a result blocking transcription process, – and the second vulnerability is that at the level of translation. Translation is a ribosome-guided process for manufacturing a protein from mRNA. There, once antisenseRNA hybridizes mRNA, then protein generation is inhibited since the editing enzymes splicing introns from RNAs are blocked. RNaseH recognizes the double helix complex of antisense on bound mRNA, and somehow frees antisense on, and cleaves mRNA. • Antisense therapy: HIV, influenza and for cancer treatment where replication and transcription is targeted. 2 RNA interference, and DNA methylation RCOOHRCOOCH3 • RNA interference (RNAi) is a system controlling (either increasing or decreasing) the activity of RNAs. MicroRNA (miRNA) and small interfering RNA (siRNA) which are the direct products of genes, and can bind to other specific RNAs. They play roles in defending cells against parasitic genes – viruses and transposons – but also gene expression in general. It is universal. • The methylation process differs in prokaryotic and eukaryotic cells, in the former one it occurs at the 5’ of cytosine pyrimidine and at the 6’ of nitrogen of the adenine purine ring, while in the later one, it occurs at the # 5 carbon of the cytosin pyrimidine sites. • In mammalian, metyhlation occurs at the 5C of CpG dinucleotide. CpG is 1% of human genome. Most are methylated. Unmethylated CpG islands present in the regulatory genes, including promoter regions, therefore impeding transcription and protein modeling, (cromotin and histone). • One abnormality for example caused due to the incomplete methylation is Rett syndrome. Epigenetic abnormalities. Methylated histones holding dna tightly and blocking transcriptions. 3 • The occurrence of CpG sequences is the least frequent in many genomes.. rarer than would be expected by the independent probabilities of C and G. This is said (!!because) C in CpG has a tendency to methylate and to become methyle-C, and methylation process is suppressed in areas around genes, hence these areas have a relatively higher concentration of CpG in islands. • Epigenetic Importance: Methyle-C has a high change in mutating to T, therefore important in epigenetic inheritance, as well its importance in in controlling gene expression and regulation. • Questions: How close a short sequence is to be a CpG island, and the likelihood of a long sequence containing one or more CpG islands, and more importantly the relation it bears, coincidental or for some functional reasoning. • Therefore Markov chains. 4 A Markov chain is a stochastic random process, a discrete process { Xn } where n { 0, 1, 2, . . . }, with the Markov property, for which, the conditional probability distribution of the future states depends only upon the current state and a fixed number of past states (with m memories). Continuous Time MC has continuous time index. Pr{ Xm+1 = j | X0 = k0, . . . , Xm-1 = km-1, Xm = i } = Pr{ Xm+1 = j | Xn = i } transition probabilities. Finite state machine, iid sequence. for every i, j, k0, . . . , km-1 and for every m. Stationary: For all n, the transition matrix does not change over time and the future state depends only on the current state i and not on the previous states. Pr{ Xn+1 = j | Xn = i } = Pr{ X1 = j |X0 = i }. 5 0 (0.5) 0 4 (0.5) (1) (1) 2 1 (1) (1) (1) 2 1 (1) The one-step transition matrix for a Markov chain with states S = { 0, 1, 2 } is [………, …, …] where Pr{ X1 = j | X0 = i } = pij(n)>0. Accessibility: A Markov Process is ergodic if if possible to communicate between any two i to j states. Then this is irreducible, if all states communicate.. Periodic if returns to the same state at every k (periodicity) steps. Aperiodic if there is no a repetitive k steps. A system lucking the system is absorbing state. If there is no absorbing state then the Markov Chain irreducible. 6 0 State 0 1 2 3 4 0 X X 0 0 X 1 X X 0 0 0 2 0 0 X X 0 3 0 0 0 X 0 4 0 0 0 0 0 1 4 3 2 0 State 0 1 2 3 0 0 X 0 X 1 X 0 0 0 2 X 0 0 0 3 0 0 X X 1 3 2 7 Learn these.. • Conditional probability, joint probability. • Independence of occurrences of events. • Bayesian process. • Expressing sequences statistically with their distribution. Discriminating states. • MLE, EM. • MCMC, for producing a desired posteriori distribution, 1- Metropolis-Hastings, RWMC, 2-Gibbs sampling. • Markov chains, properties maintained. Stationary, ergodic, irreducibility, aperiodic, • Hidden Markov Models (the goal is to detect the sequence of underlying states that is likely to give rise to an observed sequence). • This is Viterbi Algorithm. 8 Independence: A and B are said to be independent events, if P( AB) P( A) P( B ). (1-45) Notice that the above definition is a probabilistic statement, not a set theoretic notion such as mutually exclusiveness. Suppose A and B are independent, then P( A | B ) P( AB) P( A) P( B ) P( A). P( B ) P( B ) (1-46) Thus if A and B are independent, the event that B has occurred does not give any clue on the occurrence of the event A. It makes no difference to A whether B has occurred or not. 9 PILLAI Example 1.2: A box contains 6 white and 4 black balls. Remove two balls at random without replacement. What is the probability that the first one is white and the second one is black? Let W1 = “first ball removed is white” B2 = “second ball removed is black” 10 PILLAI Ex 1.2: A box contains 6 w and 4 b balls. Remove two at random without replacement. What is the probability that the 1st one is white and the 2nd one is black? Let W = “first ball removed is white” and B = “second ball(1-47) 1 removed is black” We need 2 P(W1 B2 ) ? We have W1 B2 W1B2 B2W1. Using the conditional probability rule, P(W1B2 ) P( B2W1 ) P( B2 | W1 ) P(W1 ). 6 6 3 P (W1 ) , 6 4 10 5 4 4 P ( B2 | W1 ) , and 54 9 5 4 20 and hence P(W1B2 ) 0.25. 9 9 81 But 11 PILLAI Are the events W1 and B2 independent? Our common sense says No. To verify this we need to compute P(B2). Of course the fate of the second ball very much depends on that of the first ball. The first ball has two options: W1 = “first ball is white” or B1= “first ball is black”. Note that W1 B1 , and W1 B1 . Hence W1 together with B1 form a partition. Thus (see (1-42)-(1-44)) P( B2 ) P( B2 | W1 ) P(W1 ) P( B2 | R1 ) P( B1 ) 4 3 3 4 4 3 1 2 42 2 , 5 4 5 6 3 10 9 5 3 5 15 5 and 2 3 20 P( B2 ) P(W1 ) P( B2W1 ) . 5 5 81 As expected, the events W1 and B2 are dependent. 12 PILLAI From (1-35), P( AB) P( A | B ) P( B ). (1-48) Similarly, from (1-35) P( B | A) or P( BA) P( AB) , P( A) P( A) P( AB) P( B | A) P( A). (1-49) P ( A | B ) P ( B ) P ( B | A) P ( A). or Bayes’ theorem P( B | A) P( A | B ) P( A) P( B ) (1-50) 13 PILLAI Although simple enough, Bayes’ theorem has an interesting interpretation: P(A|B): a-posteriori probability of A given B. P(B): (New Infor.) Evidence of “B has occurred”. P(B|A): Likelihood of B given A P(A): the a-priori probability of the event A. We can also view the event B as new knowledge obtained from a fresh experiment. We know something about A as P(A). The new information is available in terms of B. The new information should be used to improve our knowledge/understanding of A. Bayes’ theorem gives the exact mechanism for incorporating such new information. 14 PILLAI A more general version of Bayes’ theorem involves partition of . From (1-50) P ( B | Ai ) P ( Ai ) P ( Ai | B ) P( B ) P ( B | Ai ) P ( Ai ) n P( B | A ) P( A ) i 1 i , (1-51) i where we have made use of (1-44). In (1-51), Ai , i 1 n, represent a set of mutually exclusive events with associated a-priori probabilities P( Ai ), i 1 n. With the new information “B has occurred”, the information about Ai can be updated by the n conditional probabilities P( B | Ai ), i 1 n, using (1 - 47). 15 PILLAI Example 1.3: Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box (B1) has 15 defective bulbs and the second 5. Suppose a box is selected at random and one bulb is picked out. (a) What is the probability that it is defective? Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box B2 has 195 good and 5 defective bulbs. Let D = “Defective bulb is picked out”. Then P ( D | B1 ) 15 0.15, 100 P ( D | B2 ) 5 0.025. 200 16 PILLAI Since a box is selected at random, they are equally likely. P ( B1 ) P ( B2 ) 1 . 2 Thus B1 and B2 form a partition as in (1-43), and using (1-44) we obtain P( D) P( D | B1 ) P( B1 ) P( D | B2 ) P( B2 ) 0.15 1 1 0.025 0.0875. 2 2 Thus, there is about 9% probability that a bulb picked at random is defective. 17 PILLAI (b) Suppose we test the bulb and it is found to be defective. What is the probability that it came from box 1? P( B1 | D) ? P( B1 | D) P( D | B1 ) P( B1 ) 0.15 1 / 2 0.8571. (1-52) P( D) 0.0875 Notice that initially P( B1 ) 0.5; then we picked out a box at random and tested a bulb that turned out to be defective. Can this information shed some light about the fact that we might have picked up box 1? From (1-52), P( B1 | D) 0.857 0.5, and indeed it is more likely at this point that we must have chosen box 1 in favor of box 2. (Recall box1 has six times more defective bulbs compared to box2). 18 PILLAI 14. Stochastic Processes Introduction Let denote the random outcome of an experiment. To every such outcome suppose a waveform X (t, ) X (t , ) is assigned. The collection of such X (t, ) waveforms form a X (t, ) stochastic process. The set of { k } and the time X (t, ) index t can be continuous X (t, ) or discrete (countably t 0 t t infinite or finite) as well. Fig. 14.1 For fixed i S (the set of all experimental outcomes), X (t , ) is a specific time function. For fixed t, X 1 X (t1 , i ) n k 2 1 1 2 is a random variable. The ensemble of all such realizations X (t , ) over time represents the stochastic 19 PILLAI/Cha process X(t). (see Fig 14.1). For example X (t ) a cos( 0t ), where is a uniformly distributed random variable in (0, 2 ), represents a stochastic process. Stochastic processes are everywhere: Brownian motion, stock market fluctuations, various queuing systems all represent stochastic phenomena. If X(t) is a stochastic process, then for fixed t, X(t) represents a random variable. Its distribution function is given by FX ( x, t ) P{X (t ) x} (14-1) Notice that FX ( x, t ) depends on t, since for a different t, we obtain a different random variable. Further dFX ( x, t ) (14-2) f X ( x, t ) dx represents the first-order probability density function of the 20 process X(t). PILLAI/Cha For t = t1 and t = t2, X(t) represents two different random variables X1 = X(t1) and X2 = X(t2) respectively. Their joint distribution is given by FX ( x1 , x2 , t1 , t2 ) P{X (t1 ) x1 , X (t2 ) x2 } (14-3) and 2 FX ( x1 , x2 , t1 , t2 ) f X ( x1 , x2 , t1 , t2 ) x1 x2 (14-4) represents the second-order density function of the process X(t). Similarly f X ( x1 , x2 , xn , t1 , t2 , tn ) represents the nth order density function of the process X(t). Complete specification of the stochastic process X(t) requires the knowledge of f X ( x1 , x2 , xn , t1 , t2 , tn ) for all ti , i 1, 2, , n and for all n. (an almost impossible task in reality). 21 PILLAI/Cha Mean of a Stochastic Process: (t ) E{ X (t )} x f ( x, t )dx X (14-5) represents the mean value of a process X(t). In general, the mean of a process can depend on the time index t. Autocorrelation function of a process X(t) is defined as RXX (t1 , t2 ) E{ X (t1 ) X * (t2 )} x1 x2* f X ( x1 , x2 , t1 , t2 )dx1dx2 (14-6) and it represents the interrelationship between the random variables X1 = X(t1) and X2 = X(t2) generated from the process X(t). Properties: 1. R (t , t ) R* (t , t ) [ E{X (t ) X * (t )}]* XX 1 2 XX 2 1 2 1 (14-7) 2 2. RXX (t, t ) E{| X (t ) | } 0. (Average instantaneous power) 22 PILLAI/Cha 3. RXX (t1 , t2 ) represents a nonnegative definite function, i.e., for any n set of constants{ai }i 1 n n * a a i j RXX (ti , t j ) 0. (14-8) i 1 j 1 n Eq. (14-8) follows by noticing that E{| Y | } 0 for Y ai X (ti ). i 1 The function (14-9) CXX (t1 , t2 ) RXX (t1 , t2 ) X (t1 ) *X (t2 ) 2 represents the autocovariance function of the process X(t). Example 14.1 Let T z T X (t )dt. Then T T T T E[| z | ] T T E{ X (t1 ) X * (t2 )}dt1dt2 2 T T R XX (t1 , t2 )dt1dt2 (14-10) 23 PILLAI/Cha Example 14.2 X (t ) a cos( 0t ), ~ U (0,2 ). (14-11) This gives (t ) E{ X (t )} aE{cos( 0t )} a cos 0 t E{cos } a sin 0 t E{sin } 0, X since E{cos } 1 2 2 0 (14-12) cos d 0 E{sin }. Similarly R XX (t1 , t2 ) a 2 E{cos( 0 t1 ) cos( 0 t2 )} a2 E{cos 0 (t1 t2 ) cos( 0 (t1 t2 ) 2 )} 2 a2 cos 0 (t1 t2 ). (14-13) 2 24 PILLAI/Cha Stationary Stochastic Processes Stationary processes exhibit statistical properties that are invariant to shift in the time index. Thus, for example, second-order stationarity implies that the statistical properties of the pairs {X(t1) , X(t2) } and {X(t1+c) , X(t2+c)} are the same for any c. Similarly first-order stationarity implies that the statistical properties of X(ti) and X(ti+c) are the same for any c. In strict terms, the statistical properties are governed by the joint probability density function. Hence a process is nth-order Strict-Sense Stationary (S.S.S) if f X ( x1 , x2 , xn , t1 , t2 , tn ) f X ( x1 , x2 , xn , t1 c, t2 c, tn c) (14-14) for any c, where the left side represents the joint density function of the random variables X 1 X (t1 ), X 2 X (t2 ), , X n X (tn ) and the right side corresponds to the joint density function of the random variables X 1 X (t1 c), X 2 X (t2 c), , X n X (tn c). A process X(t) is said to be strict-sense stationary if (14-14) is 25 true for all ti , i 1, 2, , n, n 1, 2, and any c. PILLAI/Cha For a first-order strict sense stationary process, from (14-14) we have f X ( x, t ) f X ( x, t c ) (14-15) for any c. In particular c = – t gives f X ( x, t ) f X ( x) (14-16) i.e., the first-order density of X(t) is independent of t. In that case E[ X (t )] x f ( x )dx , a constant. (14-17) Similarly, for a second-order strict-sense stationary process we have from (14-14) f X ( x1 , x2 , t1 , t2 ) f X ( x1 , x2 , t1 c, t2 c) for any c. For c = – t2 we get f X ( x1 , x2 , t1 , t2 ) f X ( x1 , x2 , t1 t2 ) (14-18) 26 PILLAI/Cha i.e., the second order density function of a strict sense stationary process depends only on the difference of the time indices t1 t2 . In that case the autocorrelation function is given by RXX (t1 , t2 ) E{ X (t1 ) X * (t2 )} x1 x2* f X ( x1 , x2 , t1 t2 )dx1dx2 RXX (t1 t2 ) RXX ( ) RXX* ( ), (14-19) i.e., the autocorrelation function of a second order strict-sense stationary process depends only on the difference of the time indices t1 t2 . Notice that (14-17) and (14-19) are consequences of the stochastic process being first and second-order strict sense stationary. On the other hand, the basic conditions for the first and second order stationarity – Eqs. (14-16) and (14-18) – are usually difficult to verify. In that case, we often resort to a looser definition of stationarity, known as Wide-Sense Stationarity (W.S.S), by making use of 27 PILLAI/Cha (14-17) and (14-19) as the necessary conditions. Thus, a process X(t) is said to be Wide-Sense Stationary if (i) E{ X (t )} (14-20) and (14-21) (ii) E{ X (t1 ) X * (t2 )} RXX (t1 t2 ), i.e., for wide-sense stationary processes, the mean is a constant and the autocorrelation function depends only on the difference between the time indices. Notice that (14-20)-(14-21) does not say anything about the nature of the probability density functions, and instead deal with the average behavior of the process. Since (14-20)-(14-21) follow from (14-16) and (14-18), strict-sense stationarity always implies wide-sense stationarity. However, the converse is not true in general, the only exception being the Gaussian process. This follows, since if X(t) is a Gaussian process, then by definition X 1 X (t1 ), X 2 X (t2 ), , X n X (tn ) are jointly Gaussian random variables for any t1 , t2 , tn whose joint characteristic function 28 PILLAI/Cha is given by n (1 , 2 , , n ) e j n ( tk )k C k 1 XX ( ti , tk )ik / 2 l ,k (14-22) X where C XX (ti , tk ) is as defined on (14-9). If X(t) is wide-sense stationary, then using (14-20)-(14-21) in (14-22) we get n (1 , 2 , , n ) e j k 1 k 12 n n C 11 k 1 XX ( ti tk )i k (14-23) X and hence if the set of time indices are shifted by a constant c to generate a new set of jointly Gaussian random variables X 1 X (t1 c), X 2 X (t2 c),, X n X (tn c) then their joint characteristic n { X } function is identical to (14-23). Thus the set of random variables i i 1 n and { X i}i 1 have the same joint probability distribution for all n and all c, establishing the strict sense stationarity of Gaussian processes from its wide-sense stationarity. To summarize if X(t) is a Gaussian process, then wide-sense stationarity (w.s.s) strict-sense stationarity (s.s.s). Notice that since the joint p.d.f of Gaussian random variables depends 29 only on their second order statistics, which is also the basis PILLAI/Cha The ergodic hypothesis: an isolated system in thermal equilibrium, evolving in time, will pass through all the accessible microstates states at the same recurrence rate, i.e. all accessible microstates are equally probable. The average over long times will equal the average over the ensemble of all equienergetic microstates: if we take a snapshot of a system with N microstates, we will find the system in any of these microstates with the same probability. 30