Notas de Aulas - Processos Estocásticos | Notas de aula Estatística

0 Notation and Terminology.

This course will be concerned with the applications

of information theory concepts in statistics. Much of the

course will be based on lectures given by Imre Csisz´ar

at Maryland in 1989. Some recent results about depen-

dent processes will also be given. It is assumed that the

reader is familiar with basic information theory ideas

as presented, for example, in the initial chapters of the

Csisz´ar-K¨orner book, and with basic statistical concepts

as presented, for example, in the book by Cox and Hink-

ley. Notation and terminology that will be used in these

lectures will be introduced in this section.

The symbol A={a1, a2, . . . , a|A|}will denote a finite

set of cardinality |A|and xn

mwill denote the sequence

xm, xm+1, . . . , xn, where each xi∈A. The set of all n-

length sequences xn

1will be denoted by An, the set of

all infinite sequences x=x∞

1,with xi∈A, i ≥1 will be

denoted by A∞, and the set of all finite sequences drawn

from Awill be denoted by A∗. If uand vare finite length

sequences then their concatenation is denoted by uv, and

uk=uk−1u, k > 1.

The entropy H(P) of a probability distribution, P=

(P(a)) on A, is defined by the formula

H(P) = −X

a∈A

P(a) log P(a),

where here, as elsewhere in these lectures, base two loga-

rithms are used. Random variable notation is often used

in this context, that is, H(X) denotes the entropy of

the distribution Pof the random variable X. If Pand

Qare two distributions on Athen their divergence or

cross-entropy is defined by

D(PkQ) = X

a∈A

P(a) log P(a)

Q(a).

If Pis the joint distribution of two random variables

(X, Y ) then their joint entropy is defined by

H(X, Y ) = −X

(a,b)

P(a, b) log P(a, b),

while the conditional entropy H(X|Y) and mutual infor-

mation I(X∧Y) are defined, respectively, by

H(X|Y) = H(X, Y )−H(Y),

I(X∧Y) = H(X) + H(Y)−H(X, Y )

=H(X)−H(X|Y) = H(Y)−H(Y|X).

Two types of codes will be of interest. A block code

is a mapping C:An7→ Bm, while a variable-length code

is a mapping C:An7→ B∗. The length function L:An7→

{1,2, . . .}for a variable-length is defined by the formula

C(xn

1) = bL(xn

Thus, in particular, a block code is just a variable length

code whose length function is constant.

A block code Cis invertible (or faithful) if it is one-

to-one. A variable-length code is uniquely decodable if

for any two distinct sequences, u(1),u(2), . . . , u(m) and

v(1), v(2), . . . , v(k), where u(i), v (j)∈An,∀i, j, the con-

catenations of the images, C(u(1))C(u(2)) ·· ·C(u(m))

and C(v(1))C(v(2)) ·· ·C(v(k)), are not equal. A con-

dition that guarantees unique decodability is the prefix

condition. A variable-length code Csatisfies the prefix

condition if

C(v) = C(u)w, u, v ∈An, w ∈B∗⇒w= Λ, u =v,

where Λ denotes the empty string.

In most cases of interest to us, the image alphabet will

be binary, that is, B={0,1}. It is easy to see that the

length function for a binary prefix code must satisfy the

so-called Kraft inequality.

2−L(xn

1)≤1.

It can in fact be shown that a uniquely decodable binary

code also satisfies the Kraft inequality, and that if Lis

a positive integer-valued function on Anfor which the

Kraft inequality holds then there is a binary prefix code

Cwhose length function is L. (Thus, in particular, for

any uniquely decodable code Cwith length function L

there is a prefix code ˜

Cwhose length function is also

L.) The reason for the connection between the Kraft

inequality and prefix codes is the connection between

the Kraft inequality and binary trees, a connection that

we now sketch.

A (binary) tree is a directed graph (V, E ), along with

a distinguished vertex r∈V, called the root, such that

the following properties hold.

1. The outdegree of each vertex is at most 2.

2. The indegree of the root is 0. The indegree of all

other vertices is exactly 1.

3. Given any v∈V−rthere is a directed path from r

to v.

It is easy to see from the above that there is only one

path from rto any v6=r; the length of this path is called

Notas de Aulas - Processos Estocásticos, Notas de aula de Estatística

Documentos relacionados

Pré-visualização parcial do texto

Baixe Notas de Aulas - Processos Estocásticos e outras Notas de aula em PDF para Estatística, somente na Docsity!

0 Notation and Terminology.

D(Q∗‖P ) =

= D(Q‖Q∗) > 0.

2 I-projections.

D(P ‖Q).

− D(P ∗‖Q)

P ∗(·)

Q(·)

− D(P ∗‖Q)

)

D(P ‖Q).

D(P ‖Q).

P = {P :

→ D∞(P ‖U ) < , P ∈ N, (13)

−L(U ) ≤ 1

6 Redundancy bounds.

]

`

≤ |S|

8 Additions.

8.1 The scaling formula.

8.2 Pearson’s χ^2.

Notas de Aulas - Processos Estocásticos, Notas de aula de Estatística

Documentos relacionados

Pré-visualização parcial do texto

Baixe Notas de Aulas - Processos Estocásticos e outras Notas de aula em PDF para Estatística, somente na Docsity!

0 Notation and Terminology.

D(Q∗‖P ) =

= D(Q‖Q∗) > 0.

2 I-projections.

D(P ‖Q).

− D(P ∗‖Q)

P ∗(·)

Q(·)

− D(P ∗‖Q)

)

D(P ‖Q).

D(P ‖Q).

P = {P :

→ D∞(P ‖U ) < , P ∈ N, (13)

−L(U ) ≤ 1

6 Redundancy bounds.

]

`

≤ |S|

8 Additions.

8.1 The scaling formula.

8.2 Pearson’s χ^2.

→ D∞(P ‖U ) < , P ∈ N, (13)