









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Concentration inequalities and conditional probabilities in the context of large scale typicality and conditional typicality. It covers empirical estimates on probabilities and conditional probabilities of a sequence of partitions, as well as the limit when the number of partitions grows to infinity. The document also mentions the work of jerome dedecker and csiszár. It is written by v. Maume-deschamps.
Typology: Papers
1 / 17
This page cannot be seen from the preview
Don't miss anything!










V. MAUME-DESCHAMPS
Abstract. We prove concentration inequalities inspired from [DP] to obtain estimators of conditional probabilities for weak dependant se- quences. This generalize results from Csisz´ar ([Cs]). For Gibbs mea- sures and dynamical systems, these results lead to construct estimators of the potential function and also to test the nullity of the asymptotic variance of the system.
This paper deals with the problems of typicality and conditional typical- ity of “empirical probabilities” for stochastic process and the estimation of potential functions for Gibbs measures and dynamical systems. The ques- tions of typicality have been studied in [FKT] for independent sequences, in [BRY, R] for Markov chains. In order to prove the consistency of estima- tors of transition probability for Markov chains of unknown order, results on typicality and conditional typicality for some (Ψ)-mixing process where obtained in [CsS, Cs]. Unfortunately, lots of natural mixing process do not satisfy this Ψ-mixing condition (see [DP]). We consider a class of mixing process inspired from [DP]. For this class, we prove strong typicality and strong conditional typicality. In the particular case of Gibbs measures (or complete connexions chains) and for certain dynamical systems, from the typicality results we derive an estimation of the potential as well as proce- dure to test the nullity of the asymptotic variance of the process.
More formally, we consider X 0 , ...., Xn, ... a stochastic process taking values on an complete set Σ and a sequence of countable partitions of Σ, (Pk )k∈N such that if P ∈ Pk then there exists a unique P˜ ∈ Pk− 1 such that almost
surely, Xj ∈ P ⇒ Xj− 1 ∈ P˜. Our aim is to obtain empirical estimates on the probabilities : P(Xj ∈ P ), P ∈ Pk,
and the conditional probabilities :
P(Xj ∈ P | Xj− 1 ∈ P˜ ), P ∈ Pk
and the limit when k → ∞ when it makes sense. We shall define a notion of mixing with respect to a class of functions. Let C be a Banach space of real bounded functions endowed with a norm of
2000 Mathematics Subject Classification. 37A50, 60E15, 37D20. Key words and phrases. Concentration inequalities, weak dependent sequences, dy- namical systems. Many ideas of the paper have been discussed with Bernard Schmitt, it is a pleasure to thank him. I am also grateful to J´erˆome Dedecker who patiently answered my questions on weak-dependence coefficients. 1
2 V. MAUME-DESCHAMPS
the form :
‖f ‖C = C(f ) + ‖f ‖,
where C(f ) is a semi-norm (i.e. ∀f ∈ C, C(f ) ≥ 0, C(λf ) = |λ|C(f ) for λ ∈ R, C(f + g) ≤ C(f ) + C(g)) and ‖ ‖ is a norm on C. We will denote by C 1 the subset of functions in C such that C(f ) ≤ 1. Particular choices of C may be the space BV of functions of bounded varia- tion on Σ if it is totally ordered or the space of H¨older (or piecewise H¨older) functions. Recall that a function f on Σ is of bounded variation if it is bounded and ∨ f := sup
∑^ n
i=
|f (xi) − f (xi+1)| < ∞,
where the sup is taken over all finite sequences x 1 < · · · < xn of elements of Σ. The space BV endowed with the norm ‖f ‖ =
f + ‖f ‖∞ is a Banach space. Inspired from [DP], we define the ΦC -mixing coefficients.
Definition 1. For i ∈ N, let Mi be the sigma algebra generated by X 1 , ..., Xi. For k ∈ N,
ΦC (k) = sup{|E(Y f (Xi+k)) − E(Y )E(f (Xi+k))| i ∈ N Y is
(*) Mi − measurable with ‖Y ‖ 1 ≤ 1 , f ∈ C 1 }.
Our main assumption on the process is the following.
Assumption 1. n∑− 1
k=
(n − k)ΦC (k) = O(n).
Remarks. Assumption 1 is equivalent to (ΦC (k))k∈N summable. We prefer to formulate it in the above form because it appears more naturally in our context. Our definition is inspired from Csisz´ar’s (which is Ψ-mixing for variables taking values in a finite alphabet) and Dedecker-Prieur, it covers lots of natural systems (see Section 2 for an example with dynamical systems and [DP] for further examples). Our definition extends Csisz´ar’s which was for random variables on a finite alphabet.
We consider a sequence (Pk)k∈N of countable partitions of Σ such that : almost surely, for all j, k ∈ N, we have
(**) ∀P ∈ Pk ∃! P˜ ∈ Pk− 1 , Xj ∈ P ⇒ Xj− 1 ∈ P .˜
For i, ` ∈ N, for P ∈ Pk, consider the random variable :
N (^) i` (P ) =
`+∑i− 1
j=i
(^1) P (Xj ).
Our aim is to have quantitative informations on how close are the empirical
probabilities N^
ii +n(P ) n to the expected value^ Q
i+n i (P^ ) :=^ E
( (^) N i+n i (P^ ) n
. We
are especially interested in “large scale typicality”, that this we wish to
4 V. MAUME-DESCHAMPS
and
bi,n =
(n−i ∑
k=
Φ(k)
‖ϕ(Xi) − E(ϕ(Xi))‖ p 2 C(ϕ).
For any p ≥ 2 , we have the inequality :
‖Sn(ϕ) − E(Sn(ϕ))‖p ≤
2 p
∑^ n
i=
bi,n
≤ C(ϕ)
2 p
n∑− 1
k=
(n − k)ΦC (k)
As a consequence, we obtain
P (|Sn(ϕ) − E(Sn(ϕ))| > t)
≤ e
1 e (^) exp
−t^2 2 e(C(ϕ))^2
∑n− 1 k=0 (n^ −^ k)ΦC^ (k)
Sketch of proof. There are two ingredients to get (1.1). Firstly we need a counterpart to Lemma 4 in [DP].
Lemma 1.2.
ΦC (k) = sup {‖E(ϕ(Xi+k )|Mi) − E(ϕ(Xi+k ))‖∞ , ϕ ∈ C 1 }.
We postpone the proof of Lemma 1.2 to the end of the proof of the proposition. Secondly, we apply Proposition 4 in [DD] to get : (let Yi = ϕ(Xi)−E(ϕ(Xi )))
‖Sn(ϕ) − E(Sn(ϕ))‖p ≤
2 p
∑^ n
i=
max i≤`≤n
‖Yi
k=i
E(Yk|Mi)‖ p 2
2 p
∑^ n
i=
‖Yi‖ p 2
∑^ n
k=i
‖E(Yk|Mi)‖∞
2 p
∑^ n
i=
bi,n
(we have used that by Lemma 1.2, ‖E(Yk+i|Mi)‖∞ ≤ C(ϕ)ΦC (k)). To obtain the second part of inequality (1.2), use ‖Yi‖ p 2 ≤ ‖Yi‖∞ ≤ C(ϕ)ΦC (0)).
The second inequality (1.2) follows from (1.1) as in [DP].
Proof of Lemma 1.2. We write
E(Y f (Xi+k)) − E(Y )E(f (Xi+k)) = E(Y [E(f (Xi+k)|Mi) − E(f (Xi+k))]) ≤ ‖E(f (Xi+k)|Mi) − E(f (Xi+k))‖∞.
To prove the converse inequality, for ε > 0, consider an event Aε such that for ω ∈ Aε,
|E(f (Xi+k)|Mi)(ω) − E(f (Xi+k))| ≥ ‖E(f (Xi+k)|Mi) − E(f (Xi+k))‖∞ − ε,
CONCENTRATION INEQUALITIES AND CONDITIONAL PROBABILITIES 5
and consider the radom variable
Yε =
(^1) Aε P(Aε)
sign(E(h(Xi+k )|Mi)(ω) − E(f (Xi+k))).
Yε is Mi-measurable, ‖Yε‖ 1 ≤ 1 and
E(Yεf (Xi+k)) − E(Yε)E(f (Xi+k)) ≥ ‖E(f (Xi+k)|Mi) − E(f (Xi+k))‖∞ − ε.
Thus, the lemma is proven.
We shall apply inequality (1.2) to the function ϕ = (^1) P , P ∈ Pk.
Corollary 1.3. If the process (X 1 ,... , Xn,.. .) satisfies Assumption 1, if the sequence of partitions (Pk )k∈N satisfies (**) and for all P ∈ Pk, (^1) P ∈ C, then, there exists a constant C > 0 such that for all k ∈ N, for all P ∈ Pk, for any t ∈ R, for all i, n ∈ N,
N (^) in +i(P ) n
− Qni +i(P )
t
≤ e
(^1) e e
„ − (^) CCt( (^12) Pn ) 2
«
.
Proof. It follows directly from (1.2) applied to ϕ = (^1) P and Assumption
Let us denote by ˆPni +i(P ) = N^
n+i i (P^ ) n. The following corollary is a coun- terpart to Csisz´ar’s result (Theorem 1 in [Cs]) in our context.
Corollary 1.4. ( There exists C > 0 such that for all P ∈ Pk for which
Qni +i(P ) C( (^1) P )
n ≥ ln^2 n, we have :
P^ ˆni +i(P ) Qni +i(P )
∣∣ > t
≤ e
1 e (^) e(−Ct^2 ln
(^2) n) .
Proof. We apply Corollary 1.3 with t · Qni +i(P ) instead of t. We get :
P^ ˆni +i(P ) Qni +i(P )
∣∣ > t
≤ e
1 e (^) exp
Ct^2 (Qni +i(P ))^2 n (C( (^1) P ))^2
The result follows.
Remark. Let us consider the case where C = BV. If the partition Pk is a partition into interval, then for all P ∈ Pk, C( (^1) P ) = 2.
We are now in position to prove our theorem on conditional typicality. Recall that
gˆn(P ) =
n − 1 n
N 1 n +1(P ) N 0 n −^1 ( P˜ )
Theorem 1.5. Let the process (Xp)p∈N satisfy Assumption 1, let the se- quence of partitions (Pk)k∈N satisfy (**) and for all P ∈ Pk, (^1) P ∈ C. There exists K > 0 such that and for all ε < 1 , for all P ∈ Pk for which
Qn 0 −^1 ( P˜ ) C( (^1) P )
and
Qn 0 −^1 ( P˜ ) C( (^1) Pe )
≥ n−^
ε (^2) ,
CONCENTRATION INEQUALITIES AND CONDITIONAL PROBABILITIES 7
We turn now to our main motivation : dynamical systems. Consider a dynamical system (Σ, T , μ). Σ is a complete space, T : Σ → Σ is a measurable map, μ is a T -invariant probability measure on Σ. Let C be a Banach space of functions on Σ (typically, C will be the space of function of bounded variations or a space of piecewise H¨older functions, see examples in Section 2.1). Assume that the norm ‖ ‖ on C is such that for any ϕ ∈ C, there exists a real number R(ϕ) such that ‖ϕ + R(ϕ)‖ ≤ C(ϕ) (for example, this is the case if ‖ ‖ = ‖ ‖∞ and C(ϕ) =
(ϕ) or ‖ ‖ = ‖ ‖∞ and C(ϕ) is the H¨older constant). We assume that the dynamical system satisfy the following mixing property : for all ϕ ∈ L^1 (μ), ψ ∈ C,
Σ
ψ · ϕ ◦ T ndμ −
Σ
ψdμ
Σ
ϕdμ
∣∣ ≤^ Φ(n)‖ϕ‖^1 ‖ψ‖C^ ,
with Φ(n) summable. Consider a countable partition A 1 ,... , Ap,... of Σ. Denote by Pk the count- able partition of Σ whose atoms are defined by : for i 0 ,... , ik− 1 , denote
Ai ik 0 − 1 = {x ∈ Σ / for j = 0,... , k − 1 , T j^ (x) ∈ Aij .}
We assume that for all i 0 ,... , ik− 1 , f = (^1) Aik− 1 i 0
∈ C and note C(ii 0 ,... , ik− 1 ) =
C(f ). Consider the process taking values into Σ : Xj = T j^ , j ∈ N. Clearly
if Xj ∈ Ai ik 0 − 1 then Xj+1 ∈ Ai ik 1 −^1. That is for any P ∈ Pk, there exists a
unique P˜ ∈ Pk− 1 such that Xj ∈ P ⇒ Xj+1 ∈ P˜. Condition (2.1) may be rewritten as : for all ϕ ∈ L^1 (μ), ψ ∈ C,
|Cov(ψ(X 0 ), ϕ(Xn))| ≤ Φ(n)‖ϕ‖ 1 ‖ψ‖C.
Moreover, we assume that for ψ ∈ C, there exists a real number R(ψ) such that ‖ψ + R(ψ)‖ ≤ C(ψ), we have :
|Cov(ψ(X 0 ), ϕ(Xn))| = |Cov([ψ(X 0 ) + R(ψ)], ϕ(Xn ))| ≤ Φ(n)‖ϕ‖ 1 ‖ψ + R(ψ)‖C
(2.2) ≤ 2Φ(n)‖ϕ‖ 1 C(ψ).
Using the stationarity of the sequence (Xj ), we have for all i ∈ N, for ψ ∈ C 1 , ϕ ∈ L^1 , ‖ϕ‖ 1 ≤ 1,
(2.3) |Cov(ψ(Xi), ϕ(Xn+i))| ≤ 2Φ(n).
So, our Assumptions 1 and () are satisfied for a “time reversed” process, that is, consider a process (Yn)n∈N such that (Yn, · · · , Y 0 ) as the same law as (X 0 , · · · , Xn), then Cov(ψ(Xi), ϕ(Xn + i)) = Cov(ψ(Yi+n), ϕ(Yi)) and the process (Yn)n∈N satisfies our Assumptions 1. Using the stationarity, it satisfies also(), see [BGR] and [DP] for more developments on this “trick”. Applying Theorem 1.5 to the process (Yn)n∈N and using that
∑^ n
j=
(^1) P (Yj ) Law=
n∑− 1
0
(^1) P (Xj )
8 V. MAUME-DESCHAMPS
and n∑− 2
j=
(^1) Pe (Yj ) Law=
n∑− 2
j=
(^1) Pe (Xj ),
we obtain the following result.
Theorem 2.1. There exists a constant C > 0 , such that for all k, n ∈ N, for any sequence i 0 ,... , ik− 1 , for all t ∈ R,
N 0 n (Ai ik 0 −^1 ) n
− μ(Ai ik 0 −^1 )
t
≤ e
(^1) e e
− (^) C(i 0 ,...,iCt^2 nk− 1 ) 2 .
Let ˆgn(Ai ik 0 − 1 ) =
N n 0 (Ai ik 0 −^1 ) N 0 n −^1 (Ai ik 1 −^1 )
n− 1 n , there exists^ K >^0 such that and for all
ε < 1 , such that if
μ(Ai ik 1 − 1 ) C(i 0 ,... , ik− 1 )
and
μ(Ai ik 1 − 1 ) C(i 1 ,... , ik− 1 )
≥ n−^ 2 ε ,
we have :
P
∣ˆgn(A
ik− 1 i 0 )^ −^ P(X^0 ∈^ Ai^0 |X^1 ∈^ Ai^1 ,... , Xk−^1 ∈^ Aik−^1 )
∣ > t
≤ 4 e−Kt
(^2) n 1 −ε
1 −ε .
Let us terminate this section with a lemma stating that the elements P ∈ Pk are exponentially small. It indicates that we might not expect to take k of order greater than ln n in the above theorem.
Lemma 2.2. Assume that Cmax = max j=1,...,
C( (^1) Aj ) < ∞. There exists 0 <
γ < 1 such that for all P ∈ Pk, we have
μ(P ) ≤ γk.
Proof. The proof of Lemma 2.2 follows from the mixing property. It is inspired from [Pa]. Let n 0 ∈ N to be fixed later. Let P ∈ Pk, for some indices i 0 ,... , ik− 1 , we have that
P = {x ∈ Ai 0 ,... , T k−^1 x ∈ Aik− 1 }.
Then, let ` = [ (^) nk 0 ],
μ(P ) = P(X 0 ∈ Ai 0 ,... , Xk− 1 ∈ Aik− 1 ) ≤ P(X 0 ∈ Ai 0 , Xn 0 ∈ Ain 0 ,... , Xn 0 ∈ Ain 0 )
Now, the random variable
(^1) Ain 0 (Xn 0 ) · · · (^1) Ai`n 0
(Xn 0 ) P(Xn 0 ∈ Ain 0 ,... , Xn 0 ∈ Ai`n 0 )
is M`n 0 -measurable with L^1 norm less than 1 and
(^1) Ai 0 Cmax is in^ C^1. From the mixing property (2.3), we get : (let s = supj=1,... μ(Aj ) < 1)
P(X 0 ∈ Ai 0 , Xn 0 ∈ Ain 0 ,... , Xn 0 ∈ Ain 0 ) ≤ P(Xn 0 ∈ Ain 0 ,... , Xn 0 ∈ Ain 0 ) · (ΦC (n 0 )Cmax + s).
10 V. MAUME-DESCHAMPS
indicatrices (^1) P belong to BV and C( (^1) P ) =
( (^1) P ) = 2. So, we shall apply Theorem 2.1, this will lead to the announced estimation of the potential g. Let us also introduce a very useful tool in dynamical systems : the transfer operator. For f ∈ BV , let
L(f )(x) =
y/T (y)=x
g(y)f (y).
We have L( 1 ) = 1 , for all f 1 ∈ BV , f 2 ∈ L^1 (μ), ∫
I
L(f 1 ) · f 2 dμ =
I
f 1 · f 2 ◦ T dμ.
The process (Yn)n∈N introduced after Lemma 2.2 is a Markov process with kernel L (see [BGR]). The following three lemmas are the last needed bricks between Theorem 2.1 and the estimation of the potential g.
Lemma 2.4. Assume that T satisfies Assumption 2 and is a Markov map. There exists K > 0 such that for all k ∈ N, for all x ∈ I,
(2.4) (1 − Kθk)g(x) ≤
μ(Ck(x)) μ(Ck− 1 (T x))
≤ (1 + Kθk)g(x).
Proof. Because the map is Markov, for all x ∈ I, T (Ck(x)) = Ck− 1 (T x). We have :
μ(T (Ck(x))) =
g
(^1) Ck (x)dμ,
min y∈Ck (x)
g(y)
(^1) Ck (x)dμ ≤
g
(^1) Ck (x)dμ ≤ max y∈Ck (x)
g(y)
(^1) Ck(x)dμ.
Since the map is Markov, h and h ◦ T are C 1 on each Ck(x), so g is C^1 on Ck(x) and since T is expanding, we conclude that
max y∈Ck (x)
g(y)
≤ (1 + Kθk)
g(x)
and min y∈Ck (x)
g(y)
≥ (1 − Kθk)
g(x)
The result follows.
If the map T is not Markov, we shall prove a result not so strong (but sufficient for our purpose). To deal with non Markov maps, we have to modify the above proof at two points : firstly, we have not T (Ck(x)) = Ck− 1 (T x) for all x (but for lots of them) ; secondly, g = (^) |T ′|hh◦T is not smooth
(due to h). The following lemma shows that we control the irregularity of h.
Lemma 2.5. Let a =
h, for any interval P , let
P
h be the variation of
h on P. For all k ≥ 1 , for all uk > 0 ,
μ{x ∈ [0, 1] /
Ck (x)
h ≥ uk} ≤
γk uka
CONCENTRATION INEQUALITIES AND CONDITIONAL PROBABILITIES 11
Proof. We have :
μ{x ∈ [0, 1] /
Ck (x)
h ≥ uk} =
W^ P^ ∈Pk P h≥uk
μ(P ).
a =
h ≥
P ∈Pk
P
h
≥ #{P ∈ Pk /
P
h ≥ uk}uk.
In other words, #{P ∈ Pk /
P h^ ≥^ uk} ≤^
a uk
. Using Lemma 2.2, we get :
μ{x ∈ [0, 1] /
Ck(x)
h ≥ uk} ≤ #{P ∈ Pk /
P
h ≥ uk}γk
γk uka
Corollary 2.6. For all κ > γ and κ ≥ θ, there exists a constant K > 0 and
for all k ∈ N∗, a set Bk such that μ(Bk) ≤ γ
k κka and if^ x^6 ∈^ Bk,^ y^ ∈^ Ck(x),
(2.5) (1 − Kκk) ≤
g(x) g(y)
≤ (1 + Kκk).
Proof. Recall that g = (^) |T ′|hh◦T. Because T is piecewise C^2 and expanding, 1 |T ′| satisfies an equation of the type (2.5) for all^ x^ ∈^ [0,^ 1], for^ κ^ =^ θ. We just have to prove that h satisfies such an inequality. Let
Bk = {x ∈ [0, 1] /
Ck(x)
h ≥ κk}.
Let x 6 ∈ Bk and y ∈ Ck(x).
|h(x) − h(y)| ≤
Ck(x)
h ≤ κk.
Now, h h((xy)) = 1 + h(x h)−(yh)( y), thus
sup h
κk^ ≤
h(x) h(y)
inf h
κk.
Of course, the same equation holds for h ◦ T by replacing k with k − 1, combining this tree equations gives the result.
Lemma 2.7. Assume that T satisfies Assumption 2 and is not necessary a Markov map. There exists K > 0 such that for all k ∈ N, for all κ > γ and κ ≥ θ,
μ
x ∈ I / (1 − Kκk)g(x) ≤
μ(Ck(x)) μ(Ck− 1 (T x))
≤ (1 + Kκk)g(x)
≥ 1 − (2`γk^ + a
( (^) γ
κ
)k ).
CONCENTRATION INEQUALITIES AND CONDITIONAL PROBABILITIES 13
We shall use Theorem 2.1.
P(|ˆgn,k(x) − g(x)| > t)
≤ P(|ˆgn,k(x) −
μ(Ck(x)) μ(Ck− 1 (T x))
| > t − |
μ(Ck(x)) μ(Ck− 1 (T x))
− g(x)|)
≤ P(|ˆgn,k(x) −
μ(Ck(x)) μ(Ck− 1 (T x))
| > t − Kκk) because x 6 ∈ Dk
≤ 4 e−L(t−Kκ
k) (^2) n 1 −ε
1 −ε we have used Theorem 2.1.
If
ln( 2 t ) ln( (^1) κ )
≤ k, we conclude
P(|ˆgn,k(x) − g(x)| > t) ≤ 4 e−Lt
(^2) n 1 −ε
1 −ε .
We derive the following corollary. Fix κ > γ and κ ≥ θ, let α <
ε 2
ln (^) κ^1 ln (^) γ`
Corollary 2.9. Let k(n) be an increasing sequence such that
α
ln n ln (^1) κ
≤ k(n) ≤
ε 2
ln 2n ln( (^) γ` )
let ˆgn = ˆgn,k(n), then for all p ≥ 1 , ˆgn goes to g in Lp(μ).
Proof. Recall that ˆgn,k(x) and g(x) are less than 1. Let t = Θ(n−α), we have : ∫ |gˆn(x) − g(x)|pdμ ≤ 2 pμ(Dk(n) ∪ Ek(n)) + 2pμ(|gˆn(x) − g(x)| > t) + tp
≤ Ct2p^
( (^) γ
κ
)k(n)
(^2) n 1 −ε
1 −ε ) + tp
≤ Ct2p^
( (^) γ
κ
)k(n)
1 −ε− 2 α
1 −ε ) + Ctn−pα
Remark. In [CMS], an exponential inequality is proven for Lipschitz func- tions of several variables for expanding dynamical systems of the interval. We can not use such a result here because characteristic functions of in- tervals are not Lipschitz, the result could maybe be improved to take into consideration piecewise Lipschitz functions. The Lipchitz constant enter in the bound of the exponential inequality and any kind of piecewise Lipschitz constant would be exponentially big for (^1) P , P ∈ Pk. Nevertheless, such a result for functions of several variables could be interesting to estimate the conditional probabilities and potential g : we could construct an estimator by replacing N (^) j` (Ai ik 0 − 1 ) with
N^ ˜ (^) jn (Ai ik−^1 0 ) =^
p ∈ {j, ..., n + j − k} / Xj ∈ Ai 0 ,... , Xj+k− 1 ∈ Aik− 1
14 V. MAUME-DESCHAMPS
In this section, we state our results in the particular setting of Gibbs measures or chains with complete connections. Gibbs measures and chains with complete connections are two different point of view of the same thing
lim k→∞
P(X 0 = a 0 |X 1 = a 1 ,... , Xk− 1 = ak− 1 ) = P(X 0 = a 0 |Xi = ai, i ≥ 1),
exists. Moreover, there exists a summable sequence γk > 0 such that if a 0 = b 0 , ..., ak = bk,
P(X 0 = a 0 |Xi = ai, i ≥ 1) P(X 0 = b 0 |Xi = bi, i ≥ 1)
∣ ≤^ γk.
Define Σ ⊂ AN^ be the set of admissible sequences :
Σ = {x = (x 0 ,... , xk,... , ) ∈ AN^ / for all k ≥ 0 , P(X 0 = x 0 ,... , Xk = xk) 6 = 0}.
Σ is compact for the product topology and is invariant by the shift map σ : σ(x 0 , x 1 ,.. .) = (x 1 ,.. .), we denote by μ the image measure of the Xi’s. We assume that the process is mixing : there exists N > 0 such that for all i, j ∈ A, for all n > N ,
P(X 0 = i and Xn = j) 6 = 0.
We shall denote by
Aj = {x ∈ Σ / x 0 = j} and Ai ik 0 − 1 = {x ∈ Σ / xj = ij j = 0,... k − 1 }.
As before, Pk is the partition of Σ whose atoms are the Ai ik 0 − 1 ’s and Ck(x) is the atom of Pk containing x. We assume also that the process has a Markov structure : for x = (x 0 ,... , ) ∈ Σ, ax = (a, x 0 ,.. .) ∈ Σ if and only if ay ∈ Σ for all y ∈ Ax 0. For x ∈ Σ, let g(x) = P(X 0 = x 0 |Xi = xi, i ≥ 1). We shall prove that ˆgn,k is a consistent estimator of g. This is known (see [KMS], [Ma2], [BGF], [Po]) that such a process is mixing for suitable functions. Let γn? =
k≥n γk, define a distance on Σ by^ d(x, y) =^ γ
? n if and only if xj = yj for j = 0,... , n − 1 and xn 6 = yn. Let L be the space of Lipschitz functions for this distance, endowed with the norm ‖ψ‖ = sup |ψ| + L(ψ) where L(ψ) is the Lipschitz constant of ψ.
Theorem 3.1. ([KMS], [Ma2], [BGF], [Po]) A process satisfying (3.1), be- ing mixing and having a Markov structure is mixing for functions in L in the sense that equation (2.1) is verified for ϕ ∈ L^1 (μ) and ψ ∈ L with
Φ(n) n −→→∞ 0. If γn? is summable, so is Φ(n).
In what follows, we assume that γn? is summable. For any ψ ∈ L, let R = − inf ψ then sup |ψ + R| ≤ L(ψ), then we have (2.3) for the process
16 V. MAUME-DESCHAMPS
if we are in the context of Section 3. Our arguments should probably be generalized to non complete situations. In what follows, we shall denote T for T : I → I as well as σ : Σ → Σ.
Definition 2. ([Br]) Let
Sn =
n∑− 1
j=
(Xj − E(X 0 )) and Mn =
Sn √ n
dP.
The sequence Mn converges to V which we shall call the asymptotic variance.
Proposition 4.1. ([Br], [CM]) The asymptotic variance V is zero if and only if the potential log g is a cohomologue to a constant : log g = log a + u − u ◦ T , with a > 0 , u ∈ BV or u ∈ L.
Because we are in a stationary setting, we have that the asymptotic vari- ance is zero if and only if g is indeed constant (the fact that the system is complete is here very important). We deduce a way of testing if the asymp- totic variance is zero. Using Theorem 2.8 or Theorem 3.3, we have that if g is constant,
P(| sup ˆgn,k − inf ˆgn,k| > t) ≤ 2 · (4e−Lt
(^2) n 1 −ε
1 −ε ) + γk.
To use such a result, we have to compute sup ˆgn,k and inf ˆgn,k, so we have `k^ computations to make with k = Ω(ln n). A priori, all the constants in the above inequality, may be specified. In theory, for t > 0, we may find k, n satisfying the hypothesis of Theorem 2.8 or Theorem 3.3 so that P(| sup ˆgn,k − inf ˆgn,k| > t) is smaller than a specified value. If the computed values of sup ˆgn,k and inf ˆgn,k agree with this estimation this will indicates that g is probably constant so that the asymptotic variance is probably 0.
References
[BGR] A.D. Barbour, R. Gerrard, G. Reinert, Iterates of expanding maps. Probab. The- ory Related Fields 116 (2000), no. 2, 151–180. [BRY] A. Barron, J. Rissanen, B. Yu, The minimum description length principle in cod- ing and modeling. Information theory: 1948–1998. IEEE Trans. Inform. Theory 44 (1998), no. 6, 2743–2760. [BGF] X Bressaud, R. Fernandez, A. Galves, Decay of correlations for non-H¨olderian dynamics. A coupling approach. Electron. J. Probab. 4 (1999), no. 3, 19 pp. [Br] A. Broise, Transformations dilatantes de l’intervalle et th´eoremes limites. Etudes´ spectrales d’op´erateurs de transfert et applications. Ast´erisque 1996, no. 238, 1–109. [CM] F. Chazal, V. Maume-Deschamps, Statistical properties of Markov dynamical sources: applications to information theory. Discrete Math. Theor. Comput. Sci. 6 (2004), no. 2, 283– [Co] P. Collet, Some ergodic properties of maps of the interval. Dynamical systems (Temuco, 1991/1992), 55–91, Travaux en Cours, 52, Hermann, Paris, 1996. [CMS] P. Collet,S. Martinez, B. Schmitt, Exponential inequalities for dynamical mea- sures of expanding maps of the interval. Probab. Theory Related Fields 123 (2002), no. 3, 301–322. [Cs] I. Csisz´ar, Large-scale typicality of Markov sample paths and consistency of MDL order estimators. Special issue on Shannon theory: perspective, trends, and ap- plications. IEEE Trans. Inform. Theory 48 (2002), no. 6, 1616–1628. [CsS] I. Csisz´ar, P. C. Shields, The consistency of the BIC Markov order estimator. Ann. Statist. 28 (2000), no. 6, 1601–1619.
CONCENTRATION INEQUALITIES AND CONDITIONAL PROBABILITIES 17
[DD] J. Dedecker, P. Doukhan, A new covariance inequality and applications. Stochas- tic Process. Appl. 106 (2003), no. 1, 63–80. [DP] J. Dedecker, C. Prieur, New dependence coefficients. Examples and applications to statistics. To appear in Probab. Theory and Relat. Fields. [FKT] P. Flajolet, P. Kirschenhofer, R.F. Tichy, Deviations from uniformity in random strings. Probab. Theory Related Fields 80 (1988), no. 1, 139–150. [KMS] A. Kondah, V. Maume, B. Schmitt, Vitesse de convergence vers l’´etat d’´equlibre pour des dynamiques markoviennes non hold´eriennes. Ann. Inst. H. Poincar´e Probab. Statist. 33 (1997), no. 6, 675–695. [Li] C. Liverani, Decay of correlations for piecewise expanding maps. J. Statist. Phys. 78 (1995), no. 3-4, 1111–1129. [L,S,V] C. Liverani, B. Saussol, S. Vaienti, Conformal measure and decay of correlation for covering weighted systems. Ergodic Theory Dynam. Systems 18 (1998), no. 6, 1399–1420. [Ma1] V. Maume-Deschamps, Correlation decay for Markov maps on a countable state space. Ergodic Theory Dynam. Systems 21 (2001), no. 1, 165–196. [Ma2] V Maume-Deschamps, Propri´et´es de m´elange pour des sytemes dy- namiques markoviens. These de l’Universit´e de Bourgogne, available at http://math.u-bourgogne.fr/IMB/maume. [Pa] F. Paccaut Propri´et´es statistiques de syst`emes dynamiques non markoviens Ph D Thesis Universit´e de Bourgogne (2000). Available at http://www.lamfa.u-picardie.fr/paccaut/publi.html [Po] M. Policott Rates of mixing for potentials of summable variation. Trans. Amer. Math. Soc. 352 (2000), no. 2, 843–853. [R] J. Rissanen, non k-Markov points.Stochastic complexity in statistical inquiry. World Scientific Series in Computer Science, 15. World Scientific Publishing Co., Inc., Teaneck, NJ, 1989. vi+178 pp. [Sc] B. Schmitt, Ergodic theory and thermodynamic of one-dimensional Markov expanding endomorphisms. Dynamical systems (Temuco, 1991/1992), 93–123, Travaux en Cours, 52, Hermann, Paris, 1996. [V] B. Vall´ee Dynamical sources in information theory: fundamental intervals and word prefixes Algorithmica, 29 , 262-306, (2001).
Universit´e de Bourgogne B.P. 47870 21078 Dijon Cedex FRANCE, vmaume@u- bourgogne.fr