



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Barg; Subject: Electrical & Computer Engineering; University: University of Maryland; Term: Unknown 1989;
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!




ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 3 (draft; 9/1/03). Decoding of codes.
Our task of code construction is not complete if we cannot decode the codes we are studying. Decoding another major task in coding theory studied in a variety of scenarios.
Definition 1. Given a code C ∈ H (^) qn and a point y ∈ H (^) qn , maximum likelihood, or complete decoding takes y to (one of ) the closest code vectors by the Hamming distance.
This procedure may be actually called minimum distance decoding, the name “maximum likelihood” is justified in a short while. Intuitively is seems natural to assume that this decoding is the best we can hope for in terms of minimizing the error rate. Note if a vector x ∈ C (say a binary code) is sent over a binary symmetric channel W , the probability that a vector y is received on its output P (y|x) = W n(yx) = pw(1 − p)n−w^ where w = d(x, y). If c is the decoding result, the probability of a decoding error equals 1 − P (c|y). Therefore, the mapping ψ that minimizes the probability of error for the codeword ci is given by
ψ(y) = ci ⇐⇒ P (ci|y) > P (cj |y) ∀j 6 = i.
Letting P (y) =
c∈C W^ n(y|c) we have
W n(y|c) =
P (c|y)P (y) P (c)
If we assume that P (c) = 1/M for every c ∈ C then maximum likelihood decoding is defined equivalently as
ψ(y) = ci ⇐⇒ W n(y|ci) > W n(y|cj ) ∀j 6 = i.
Finally note that maximizing W n(y|c) is equivalent to minimizing the distance d(y, c), so minimum distance decoding is indeed max-likelihood or ML decoding.
We know from ENEE722 that max-likelihood decoding of a linear binary [n, k, d] code C can be imple- mented by the standard array (or the syndrome trellis or by other equivalent means). The problem is that all these methods for codes of large length and rate not very close to 0 or 1 are very computationally involved. Therefore coding theory is also concerned with restricted decoding procedures. One of the most commonly used is bounded distance decoding. This name is used loosely for all procedures that find the closest code- word to the received vector in a sphere of some fixed radius t if this sphere contains codewords, and returns “erasure” if it does not. For t = b(d − 1)/ 2 c such a codeword (if it exists) is always unique; for greater t the sphere will sometimes contain more than one codeword. In the case of a multitude of codewords in the sphere it is sometimes desirable to output them all as a decoding result: this procedure is called list decoding and the decoding result a list.
Clearly, minimum distance decoding is the same as bounded distance decoding with t = r(C), the covering radius of the code. In general, intuition about bounded distance decoding is as follows: shrinking the radius increases the probability of erasure and decreases the probability of undetected error. Another extreme (compared to minimum distance decoding, called (pure) error detection is considered toward the end of this lecture.
To describe ML decoding in geometric terms, define a Voronoi region D(c, C) of a code point c ∈ C as follows: D(c, C) = {y ∈ H (^) qn : d(y, c) < d(y, c′) ∀c′^ ∈ C{c}}.
This defines a partition of H (^) qn into decoding (Voronoi) regions of the code points (with ties broken arbi- trarily). We have Pe(c) = Pr[H (^) qn \D(c, C)]
where Pr is computed with respect to the binomial probability distribution defined by the channel and with “center” at c.
What to remember: ML decoding is optimal but generally infeasible (except for short codes, or codes with very few codewords, or codes of rate close to one). 1
Define the average and maximum error probability for the code C as
Pe(C) =
c∈C
Pe(c)
Pe,m(C) = max c∈C
Pe(c).
The quantity Pe(C) is one of the main parameters of the code, with lots of attention devoted to it in the literature. For communication applications the error probability is more important than the minimum distance which does not provide much information about the decoding performance except for low noise.
For bounded distance decoding the probabilities of undetected error (miscorrection) and erasure can be defined analogously:
Pe,t(C) =
c∈C
Pr
c′∈C{c}
Bt(c′)
c
(undetected error)
Px,t(C) =
c∈C
Pr
H (^) qn \
c′∈C
Bt(c′)
∣c
(erasure)
Remark. For a linear code C Voronoi regions of all code points are congruent (the set of correctable errors for any code point is the same), and so Pe(C) = Pe(c), Pe,t(C) = Pe,t(c), ... for any code point c.
Computing Pe(C) generally is a difficult task. For instance, for a linear code C
Pe(C) =
x a coset non-leader
( (^) p q − 1
)wt(x) (1 − p)n−wt(x).
However for most codes no reasonable bounds for the weight distribution of coset leaders are known (a few exceptions, apart from the trivial cases like the Hamming code, are the duals of the 2-error-correcting BCH codes and a few related cases [1, 2]). Hence we are faced with a search for bounds on Pe(C) from both sides. This is one of the main topics of coding theory with hundreds of papers devoted to it; this is also the main topic of such books as [4, 9, 3].
Bounds on the error probability Pe(C) of complete decoding
Bounding Pe(C) is a difficult task. Essentially the only technique available relies upon the distance distribution of the code C; it is likely that this is the best way to estimate Pe in the general case. Traditionally, upper bounds on Pe(C) receive more attention than lower bounds. This is justifiable because upper bounds give some level of confidence in estimating performance of communication systems.
It is important to realize right away that, with all the literature devoted to these bounds, the mainstream technique relies on a combination of several trivial observations presented in this section. One simple idea, the union bound principle, is to upper bound the error probability as follows:
(1) Pe(C) ≤
c∈C
c′∈C
P (c → c′),
where P (c → c′) := Pr{x ∈ H (^) qn : d(x, c′) ≤ d(x, c)}
is the probability of the half-space cut out by the median hyperplane between c and c′. It is clear that this approach is generally not optimal because the probability of large parts of H (^) qn is counted many times.
Theorem 1. Let C ⊂ H (^) qn be a code with distance d used over the qSC with crossover probability p/(q − 1). Then the error probability of complete decoding
(2) Pe(C) ≤
∑^ n
w=d
Bw
∑n
e=dw/ 2 e
π(e)
∑^ e
s=
pwe,s,
where pwe,s is the intersection number of H (^) qn and π(e) = ( (^) q−p 1 )e(1 − p)n−e.
Let ρ = r/n ≤ 1 /2. The maximum on i of the product
(w i
)(n−w r−i
is attained when i ≈ ρw ≤ w/2; hence,
bn/ ∑ 2 c
r=dw/ 2 e
w w/ 2
n − w r − w/ 2
pr^ (1 − p)n−r^.
Taking logarithms, we obtain
P 1 ∼= max ω/ 2 ≤ρ≤ω
exp
n(ω + (1 − ω)h 2
( (^) ρ − ω/ 2 1 − ω
The exponent is maximized for ρ = ω/2 + (1 − ω)p. Substituting we obtain
P 1 ∼= exp[nω log 2 2
p(1 − p)].
Of course, −ω log 2 2
p(1 − p) < − log 2 2
p(1 − p), so (4) is proved. Let us show that under the condition of the lemma,
F (ω) := −ω log 2 2
p(1 − p) − D(ω‖p) < 0.
First we check that for F [
p(1 − p)/(1 + 2
p(1 − p))] < 0 and F (1/2) < 0. Next F ′(ω) < 0 for √ p √ p + 2
1 − p − 2 p
1 − p
< ω < 1 / 2.
Finally, √ p √ p + 2
1 − p − 2 p
1 − p
p(1 − p) 1 + 2
p(1 − p)
for all 0 < p < 1 / 2. This shows that F (ω) < 0 , so our claim is proved.
Theorem 4. (Bhattacharyya bound) Let C be a code with distance enumerator B(x, y) used for transmission over a BSC(p). Then
Pe(C) ≤ B(1, 2
p(1 − p)) − 1
Proof : An asymptotic equivalent of the claimed inequality follows directly from the union bound (1) and the previous lemma (part (i)). Let us prove that a stronger version claimed here is also true. Let c ∈ C be the transmitted vector, which can be assumed all-zero.
Pe(c) ≤
c′∈C{c}
P (c → c′) ≤
c′
y:d(c′,y)≤d(c,y)
P (y|c)
c′
y:d(c′,y)≤d(c,y)
P (y|c)P (y|c′)
c′
y∈H 2 n
P (y|c)P (y|c′) =
c′
∏^ n
i=
y=
P (y|ci)P (y|c′ i)
c′
i:c′ i=
y=
P (y|ci)P (y|c′ i) =
c′
i:c′ i=
p(1 − p)
∑^ n
w=
Bw(c)(
p(1 − p))w,
where Bw(c) is the number of neighbors of c in C at distance w. The proof is completed by averaging over c.
The Bhattacharyya bound is good for small values of p relative to the code distance and deteriorates for stronger noise. For (much) more information on this bound see [9].
The main result about the error probability of decoding for the qSC is given in the following theorem.
Theorem 5. Let C = log 2 q − hq (p). For any rate 0 ≤ R ≤ C there exists a sequence of codes Cn ⊂ H (^) qn , n = 1 , 2 ,... of length n such that R(Cn) → R as n → ∞ and
(6) Pe(Cn) < exp(−n(E(R, p) − o(1))),
where E(R, p) > 0 is a convex, monotone decreasing function of R. Conversely, for R > C the error probability Pe(C) of any sequence of codes approaches one.
This theorem is a simple corollary of Lemma 2 and the Bhattacharyya bound if one substitutes in it the weight profile α 0 of a random linear code. However I prefer another proof which develops geometric intuition about the decoding process (one of the next lectures).
What to remember: There are code sequences which under complete (ML decoding) have error probability that falls exponen- tially with the code length for any code rate below capacity. For instance, random linear codes have this property (and achieve the best known error exponent), and generally a good code sequence is expected to have this property. Analogous results hold for a large class of information transmission channels. The quantity C = 1 − h 2 (p), which depends only on p, is called the capacity of the channel. For the Gaussian channel with signal-to-noise ratio A
(7) C =
ln (1 + A).
Though Theorem 5 says that there are codes of sufficiently large length for which Pe falls as long as R(C) < C and that for any code Pe → 1 if R(C) > C , it does not imply any conclusions for a particular code C. However, a similar result is still possible. We begin with an auxiliary statement.
Theorem 6. Suppose a code C ⊂ H (^) qn is used for transmission over a qSC with transition probability p, 0 < p < 1 − q−^1. Then the error probability of complete decoding Pe(C) = Pe(C, p) is a continuous monotone increasing function of p and Pe(C, 0) = 0, Pe(C, 1 − q−^1 ) > 1 / 2.
Proof : Only the inequality Pe(C, 1 − q−^1 ) > 1 /2 is nonobvious. Let C = (c 1 , c 2 ,... , cM ) be a code and let D(ci, C) be the Voronoi region of the codeword ci. We will prove that |D(ci, C)| < qn/2 for any 1 ≤ i ≤ M if M ≥ 3. First let M = 3. The claim is true by inspection for n = 2. Suppose it is true for any code of length n − 1. Consider a code of length n. Puncturing it on any fixed coordinate, we obtain a code C′^ for which D(c′ i, C′) < qn−^1 /2 for all i = 1, 2 , 3. Consider the way the Voronoi regions change in transition from C′^ to C. Every vector y′^ ∈ D(c′ i, C′) can be augmented in q ways to a vector y of length n; even if all these vectors are in D(ci, C), then
|D(ci, C)| ≤ q|D(c′ i, C′)| < qn−^1 2
q = qn/ 2.
Thus the fact |D(ci, C)| < qn/2 is justified by induction on n for M = 3. Now perform induction on M. For the code C\cM the claim is true by the induction hypothesis; adding a codeword can only decrease the Voronoi regions of c 1 ,... , cM − 1. Thus the full claim is justified by symmetry.
Now observe that the qSC with transition probability 1 − q−^1 induces on H (^) qn a uniform distribution. Together with our assumption that decoding ψ(x) always ends in error if there are two or more codewords at an equal distance from x we obtain
Pe(C) =
c∈C
(1 − Pc(c)) =
c∈C
q−n(qn^ − |D(c, C)|) > q−n^
qn 2
where Pc(c) is the probability of correct decoding conditioned on the fact that c is transmitted.
This theorem enables us to give the following definition.
Definition 2. Suppose a code C is used over a qSC with complete decoding. The transition probability θ is called the threshold probability of C if Pe(C, θ) = 1/ 2.
The term “threshold” suggests that θ separates in some way the values of Pe(C, p). This is indeed the case as shown by the following theorem [8], stated for binary codes.
Let us first give a formal definition of the exponent:
Eu(n, R, p) = max C an (n, nR) code
n
log 2 Pu(C, p)
Eu(R, p) = lim n→∞ Eu(n, R, p)
The existence of this limit again is an unknown fact, so this definition should be handled similarly to the definition of R(δ).
Theorem 8. [6] Let T (δGV(R), p) = h 2 (δGV(R)) + D(δGV(R)‖p). We have
Eu(R, p) ≥ T (δGV(R), p) 0 ≤ R ≤ 1 − h 2 (p) Eu(R, p) ≥ 1 − R 1 − h 2 (p) < R ≤ 1.
Proof : Take a random [n, k = Rn, d] linear code C with weight distribution given by
A 0 = 1, Aw ≤ n^2
n w
2 k−n, w ≥ d.
For large n the asymptotic behavior of the weight profile is given by Corollary 5 of Lecture 1. Substituting this into the expression for Pu, we obtain
Pu(C) ≤
∑^ n
w=d
n^2
n w
2 k−n(1 − p)n−wpw.
Switching to exponents, we obtain
Eu(R, p) ≥ max ω≥δGV (R)
(1 − R + D(ω‖p)).
Since D ω′ (ω‖p) = log (1ω(1−−ω)pp) , the unrestricted maximum in the exponent is attained for ω = p. Thus if
p > δGV(R), or equivalently, R ≥ 1 − h 2 (p), we can substitute ω = p and obtain 1 − R in the exponent (since D(p‖p) = 0. If p ≤ δGV(R) the dominating term in the sum for Pu is the first one, i.e., w = d → nδGV(R). The number of codewords of minimum weight in the code is nonexponential in n and the exponent of Pu is equal to the exponent of pd(1 − p)n−d^ which is T (δGV(R), p).
References