Download Poisson approximation for occurrence times: A new approach and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Helitron (biology) wikipedia , lookup

Transcript
Poisson approximation for occurrence times: A
new approach and an aplication to genetic
Miguel Abadi
Instituto de Matemática, Estatı́stica e Ciência da Computação, Universidade
Estadual de Campinas [[email protected]]
One basic excercise in probability is the convergence of the binomial distribution B(n, p) to the Poisson distribution P (λ) as the product np converges to
λ. namely, the number of successes of a coin tossing converges to the Poisson
distribution for independent processes. In order to model realistic situations generalizations of this fact were recently developed. In particular, to study the number
of occurrences of a certain observable, we are interested in:
• other observables than a succes in a single coin tossing
• other kind of processes than a independent one
• to give not just a convergence theorem but rather an aproximation theorem
which provides an explicit error term for this aproximation.
The most famous tool for proving this convergence is, probably, a method due
to Chen and Stein. For a description of it we suggest Arratia, Goldstein and
Gordon (1990). One feature of their aproach is that this method provides bounds
for the total variation distance between the two distributions.
We present an alternative new method introduced by Abadi and Vergne (2005a)
which uses several previous results obtained by Galves and Schmitt (1997), Collet,
Galves and Schmitt (1999), Abadi (2001), and Abadi (2004).
Our results applies to “words” of any lenght and therefore can be easly addapted to any observable writting it as a disjoint union of words.
Our results are established on the setting of mixing processes, which covers
widely ergodic Markov chains and Gibbs measures, even though the technique is
a very general one.
A crucial difference between our approach and the Chen-Stein method is that
we prove a pointwise error bound: this allow us to control the error over the tail
distribution.
We illustrate with the following application the powerfullness of our approach.
In genetic analysis, one interest is to determine “words” in the DNA sequence
that have some specific functionality. One way to do this is to test if an especific
word appears in the DNA sequence with a frequency which differs (either higher or
lower) from the expected (randomly generted) one. Clearly, the expected frequency
depends on the model of the sequence.
We present some simulations when this model is an ergodic Markov chain. In
such a case, our results apply and the aim is to test if the frequency of the specific
word is close or not to that of a Poisson random variable. If its occurrences are
close to a Poisson random variable then we can assume that the word is randomly
generated. If it differs, we identify it as a word with some esfecific functionality.
1
Our results holds for more general processes allowing us to use the same approach even when the sequence is modeled by more general processes that seems
more adequated to model the DNA sequences.
A formal presentation of the theoretical results can be found on the paper
Statistics and error terms of occurrence times in mixing processes which can be
downloded from http://www.ime.usp.br/˜ miguel/statis.pdf. The applications are
expossed in Abadi and Vergne (2005b).
1. Abadi, M. (2001). Exponential approximation for hitting times in mixing processes.
Math. Phys. Elec. Journal 7 2.
2. Abadi, M. (2004). Sharp error terms and necessary conditions for exponential hitting
times in mixing processes. Annals of Probability, 32, no 1A, 243-264.
3. Abadi, M. and Vergne N. (2005a). Statistics and error terms of occurrence times in
mixing processes. Submitted. Can be downloaded from: http://www.ime.usp.br/ãbadi
4. Abadi, M. and Vergne N. (2005b). Poisson approximation in biological context.
Unicamp e Univ. Evry.
5. Arratia, D. Goldstein, L. and Gordon, L. (1990). Poisson approximation and
the Chen-Stein method. With comments and a rejoinder by the authors. Stat. sci. 5,
403-434.
6. Collet, P. Galves, A. and Schmitt, B. (1999). Repetition times for Gibbsian
sources. Nonlinearity 12, 1225-1237.
7. Galves, A. and Schmitt, B. (1997). Inequalities for hitting times in mixing dynamical systems. Random Comput. Dyn. 5, 337-348.
2