





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Barg; Subject: Electrical & Computer Engineering; University: University of Maryland; Term: Unknown 1989;
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 7 (draft; 10/14/03). Random matrices over finite fields. Erasure channel and its error exponent. Complexityissues in coding theory. http://www.enee.umd.edu/˜abarg/ENEE739C/course.html
In lectures 3-6 we looked at decoding of codes from a probabilistic perspective, ignoring the constructive aspect of our systems. Here we wish to change the point of view and study issues related to implementation complexityof decoding of linear codes. We will start with a technical topic of independent interest: properties of random matrices over Fq. The main use of these results will be in analysis of some decoding algorithms; however we also point out another, unrelated direction which we can treat almost “for free”, that of the reliabilityfunction of the erasure channel.
Let q be a prime power and let C be a linear q-ary[ n, k] code. A k-subset of coordinates E = {i 1 ,... , ik} is called an information set (a message set) if all the qk^ codewords differ in these coordinates. In the notation of the previous lecture, dim(CE ) = k.
The question about the number U (^) kk of information sets for a given code is difficult. Some bounds are given in [5]; however, there are no general bounds which are verydifferent from
(n k
. Our first goal will be to explain the reason for this. In doing so, we will analyze the rank of random matrices over Fq.
What is the probabilitythat a large square binary k × k matrix is nonsingular (over F 2 )? The answer is given by
lim k→∞
[k]k 2 k^2
i=
(1 − 2 −i) ≈ 0. 2888.
Remark. Generally, the probability that a large k × k matrix over Fq has rank k − c (corank c) falls very rapidlyas c grows. The following table shows this probabilityfor 0 ≤ c ≤ 5:
q c 0 1 2 3 4 5
What is the expected corank (i.e., k − (rank)) of a k × k binarymatrix? It is given bythe sum
c cπ(c), where π(c) are given bythe first row of this table. The answer is (veryclose to) 0 .8502.
We will need a generalization of this argument, proved bya more detailed analysis. We begin with a technical lemma.
Lemma 1. The number of m × matrices of rank r equals [ m r
r
[r]r.
Proof : Let M = Fmq be an m-dimensional space and L ⊂ M be its -dimensional subspace. We will count the number of linear maps f : M → L of rank r. We have dim ker f = m − r, so the number of kernels mapping is
[m r
. Further, dim Imf = m − dim ker f = r,
so the number of image subspaces L is
r
. Suppose the image and the kernel are fixed. Then to everychoice of the basis in Imf there corresponds exactlyone matrix A with the needed properties, and there are [r]r such choices.
Theorem 2. Let G be a k × n matrix over Fq with independent equiprobable entries. The probability that it contains a k × k submatrix of rank ≤ − 1 is
τn,k( ) <
n k
q−(k−^ )
2 .
1
Proof : Bythe previous lemma, the probabilitythat G contains a k × k matrix of rank u equals
π(k, u) = q−k
2
k u
[u]u =
u∏− 1
j=
(qk^ − qj^ )^2 qu^ − qj^
< q−k
(^2) +ku u∏− 1
j=
qk−j^ − 1 qu−j^ − 1
The last product can be bounded above as [ k u
u∏− 1
j=
qk−i qu−i^ − 1
= qu(k−u)
u∏− 1
j=
qu−i qu−i^ − 1
= qu(k−u)
∏^ u
i=
qi^ − 1
Denote the last product byΠ. We have
Π = q q − 1
q^2 − 1
For q ≥ 2 the sum of the omitted terms is less than 1, so Π < 5. Then
π(k, u) < 5 qu(k−u)−k
(^2) +ku = 5q−(k−u)
2 .
Now let us use the union bound. For a given matrix G there are
(n k
choices of the k × k submatrix; thus, the probabilitythat there exists a submatrix A of rank rk(A) = u < does not exceed
τn,k( ) ≤
∑^ k−
u=
5 q−(k−u)
2
n k
n k
q−(k−^ )
u=
q−(k−u)
(^2) +(k− ) 2
u = − i
n k
q−(k−^ )
i=
q−(i+1)
(^2) + .
The last sum is maximum for q = 2. It can be checked not to exceed 0.2. This proved the claimed inequality.
Corollary 3. Let n → ∞, k/n → R > 0 , c ≥ hq (R)/R. For almost all choices of G the rank of every k × k submatrix is at least k −
ck.
Proof : Let E ⊂ [n] be a k-subset. The probabilitythat G contains a submatrix G(E) of rank ≤ − 1 is at most τn,k( ). Let k − =
ck. Compute
logq τn,k( ) < −ck + logq
n k
< −k
c −
hq(R) R
Thus for c > hq(R)/R the probability τn,k( ) → 0 exponentiallyfast. In other words, for almost all matrices G the rank of every k × k submatrix is ≥ k −
ck.
Bya variation on the above argument we can prove
Lemma 4. Let G be a k × n random matrix over Fq, n → ∞, k/n fixed. Then with probability → 1 over the choice of G every submatrix formed by k + O(
k) columns of G has rank k.
Lemma 5. Let G be as above. Then with probability → 1 every submatrix formed by k + 2 logq n cyclically consecutive columns is nonsingular.
Erasure channel. Consider a memoryless binary-input channel W with X = { 0 , 1 }, Y = { 0 , 1 , ∗}, where ∗ represents erasure. The transition probabilities are given by P (a|a) = 1 − p, P (∗|a) = p.
Let C be a binary[ n, k] code and let x ∈ C be a codevector transmitted over W. Let E ⊂ { 1 ,... , n} be the subset of nonerased positions. The received vector y equals x on E. It is clear that if E contains an information set of C, it is possible to decode correctly. Otherwise, if rk(G(E)) = k − i, then dim(C E¯^ ) = i.
Since 0 = k − rk(G(W )) = n − k − rk(H( W¯ )), the rank rk(H( W¯ )) = n − k, so the system has a unique solution. This motivates the following decoding algorithm
Information set decoding (C, y)
Theorem 7. The information set decoding algorithm performs ML decoding.
Proof : (see Homework 2)
It is possible to use information set decoding to perform bounded distance decoding (that is to correct up to a certain multiplicityof errors). Choosing a small collection of information sets which will solve this problem for a given code turns into a separate task. (Discuss the example of the [24, 12 , 8] Golaycode).
The following lemma shows that bygiving up a little on the code performance we can indeed decode with an exponentiallysmaller complexity. Given n, k, define (again) the GV distance
d 0 = max
d :
d∑− 1
i=
n i
(q − 1)i^ ≤ qn−k
Lemma 8. [4] Let C be an [n, k] linear code and let d 0 be the GV distance. Let pc = Pe(C, p) be the error probability of ML decoding of C on a given BSC(p). Let pb be the error probability of bounded distance decoding of C in the sphere of radius d 0. Then pb ≤ 2 pc.
Proof :
B B L L
n Eq
Let L be the set of qn−k^ coset leaders. Everyerror pattern e outside L contributes to the error probability pc:
(1) pc = Pr{e ∈ H (^) qn \ L}
For the error probabilityof bounded distance decoding we have
pb = Pr{e ∈ H (^) qn \ (B ∩ L)} = Pr{e ∈ H (^) qn \ L} + Pr{e ∈ L \ (B ∩ L)} ≤ pc + Pr{e ∈ B \ (B ∩ L)}
The last inequalityfollows because |B| = |L| and B is formed bythe most probable vectors. Finally, observe that the last term describes a part of the event in (1), so its probabilitydoes not exceed pc.
Fact to remember: Let C be an [n, k] linear code used on a BSC. Performing bounded distance decoding restricted to the sphere of radius equal to the GV distance for the parameters n, k at most doubles the error probability of complete ML decoding.
Therefore, consider the following decoding algorithm. Let κ = k + 2 logq n.
Sliding window decoding (C, y)
It is clear that if the weight of error wt(e) ≤ d 0 , one of the projections e(W ) will be of weight at most d 0 κ/n. Furthermore, for almost all codes everysubmatrix G(W ) will have rank k. Moreover, for each W we have to search over d ∑ 0 κ/n
i=
κ i
(q − 1)i^ ≤ 2 κhq^ (δGV^ )^ 2 κ(log^2 q−R)
error patterns.
Hence we have
Theorem 9. [4] Almost all linear codes can be decoded with error probability p ≤ 2 pc and complexity of order O(n^42 nR(1−R^ logq^ 2)).
The complexityestimate in this theorem can be improved. We know that every k × k submatrix G(E) of a random generator matrix G is of rank ≥ k − O(
k). In other words, for E ⊂ [n], |E| = k we have dim(CE ) ≥ k − O(
k). Thus the number of codewords project to the all-zero vector (or in general project
identically) on E does not exceed 2O(
√ k) (^) (recall Thm. 6.1(ii): CE ∼= C/C E¯ (^) ).
Let Tn(k) = (n log n)
n d 0
n − k d 0
consider the following (probabilistic) algorithm:
Covering set decoding (C, y)
The properties of this algorithm are summarized as follows.
Theorem 10. [2, 7] For almost every choice of a linear code C, covering set decoding for almost all codes has error probability ≤ 2 pc(C)(1 + o(1)). Its implementation complexity has exponential order
exp 2 [n(h 2 (δGV ) − (1 − R)h 2
( (^) δGV 1 − R
(1 + o(1))].
Proof : A decoding error can occur if the transmitted codeword is not the nearest one in the code to the received word y, which happens with probability pc, or if the repeated choice fails to find an error-free k-set. The probabilitythat a randomlychosen k-set W is not error-free equals
1 −
n − k d 0
n d 0
as follows;
Pe(C, p) ≤
w≥d
AMw
∑n
e= w/ 2
π(e)
∑^ e
s=
pwe,s,
where AMw is the number of minimal vectors of weight w. As we said in the previous lecture, for a random code and large n the number AMw ∼= Aw so there is no hope for improving our exponential estimates of the error probability^1. In examples the number of minimal vectors is also large. For instance, let C be a [2m,
(m 2
N = {c : ∃y∈B d(c, y) ≤ wt(y)}.
(Intuition: the Voronoi regions of the codewords in N share a common boundarywith D(0, C).) Codewords in this set are called zero neighbors. It is possible to prove
Theorem 13. Zero neighbors in a linear code form a test set.
This result is due to [8]. In communication practice a more important problem related to maximum likelihood decoding is that of decoding of codes on the Gaussian channel. The above results still applyalthough theyare substantially more complicated^2.
Let C be a code used on a memoryless channel W : X → Y, where we assume that Y is an additive group (possiblyinfinite) and that X ⊂ Y is its subgroup. Our goal is, as usual, having received y from the channel to find the max-likelihood decision x ∈ C :
x = arg max W (y|x).
First we note that minimal-vectors decoding of binarylinear codes can be used to solve this problem. We proceed as follows. Let
vy(c) = −
∑^ n
i=
log W (yi|ci).
Minimal-vectors ML decoding (W , C, y)
Proposition 14. For any binary linear code C this algorithm performs complete maximum likelihood de- coding.
Proof : Let c be the current approximation to the decoding result. Suppose there is a c′^ ∈ C such that vy(c + c′) < vy(c). ByLemma 6a.11 it is possible to write c′^ as a sum of minimal vectors. Therefore, let
c′^ =
mu, mu ∈ M.
Note that since the code is binary, the supports of different vectors in this expansion are disjoint. Since c′ improves the current decision, so does at least one of the minimal vectors in its expansion. To prove that this process eventuallyconverges, note that if , = min c=c′^
|vy (c) − vy(c′)|, then everydecoding iteration reduces
the weight byat least ,.
(^1) An even stronger statement holds true: let w = (n − k + 1) − . Then if → ∞, no matter how slowly, AMw ∼ Aw. (^2) In general, passing from hard decision decoding to soft decision, or from error multiplicities to probabilities (reliabilities) of symbols usually poses serious problems.
Almost ML decoding on a general channel Let us call a memoryless channel W : X → Y symmetric if Y can be written as a disjoint union of finite sets Y = ∪αYα such that everymatrix Wα = ‖W (y|x)|, y ∈ Yα satisfies the usual definition of a symmetric channel (every row (column) is a permutation of a fixed set of numbers).
For instance, the AWGN channel is symmetric: write Y = R as ∪x{±x}. Let y be the received vector. Recall from Lecture 3 that if the messages are equiprobable, ML decoding finds a codevector c that satisfies Pr(c|y) > Pr(c′|y), c′^ = c.
Let us establish a total order x 1 , x 2 ,... on X n^ induced bythe order of aposteriori probabilities:
Pr(x 1 |y) ≥ Pr(x 2 |y) ≥...
where vectors with equal aposteriori probabilities are ordered lexicographically. Given a vector y let us denote the rank of a vector x ∈ X n^ (its number in this ordering) by /y(x).
Let N ≤ |C| be some number. Let c 0 = ψ(y) be the ML decision for a given y ∈ Yn. Our goal is to parallellize “bounded distance decoding”: we will perform search over a subset of N most probable vectors. Namely, for a given y let ψN be a decoding mapping defined by
ψN (y) =
c 0 /y(c 0 ) ≤ N 0 otherwise
(to maintain the intuition, recall the framed remark after Lemma 8).
Let us state a far-reaching generalization of Lemma 8 due to [3]
Lemma 15. [3] Let pN be the error probability of decoding ψN. Then
pN ≤ Pe(C, W )(1 +
where Pe is the error probability of ML decoding of C on the channel W.
The proof is not easy, so we refer to the source (or to an exposition in [1]). The goal of this lemma is to formulate a decoding algorithm which will have implementation complexity of the same order as Sliding Window Decoding (i.e., exp n(R(1 − R))). The result is due to [3]. We will stop here.
What if we are not satisfied with almost max-likelihood decoding and prefer max-likelihood instead? In that case we mayrelyon trellis decoding which you have seen in ENEE722.
(Define the syndrome trellis, give examples). More results in this direction are found in [9], see also a recent paper [6]. This concludes our discussion of exponential-complexitydecoding algorithms. In the following lectures we will concentrate on code families which afford polynomial-time decoding algorithms. It will be seen that under these complexityrestrictions it is still possible to construct codes with verygood performance.
References
1.A.Barg, Complexity issues in coding theory, Handbook of Coding Theory (V.Pless and W.C.Huffman, eds.), vol.1, Elsevier Science, Amsterdam, 1998, pp.649–754.