








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The problem of reconstructing a vector from a corrupted version that differs in at most r coordinates, using a random gaussian matrix as a tool for error correction. The conditions under which a gaussian matrix can be used as an (m, n, r)-error correcting code and provides a theorem stating that a n by m gaussian matrix will be an (m, n, r)-error correcting code with high probability, as long as the fraction of corrupted entries is less than a constant ρ∗ depending on n/m. The document also discusses the gaussian ensemble and the design of matrices with good restricted isometry constants.
Typology: Papers
1 / 14
This page cannot be seen from the preview
Don't miss anything!









Suppose we wish to transmit a vector f ∈ Rn^ reliably. A frequently discussed approach consists in encoding f with an m by n coding matrix A. Assume now that a fraction of the entries of Af are corrupted in a completely arbitrary fashion by an error e. We do not know which entries are affected nor do we know how they are affected. Is it possi- ble to recover f exactly from the corrupted m-dimensional vector y′^ = Af + e? This paper proves that under suitable conditions on the coding matrix A, the input f is the unique solution to the 1 -minimization problem (‖x‖ 1 :=
i |xi|) min f^ ˜ ∈Rn
‖y′^ − A f˜ ‖` 1
provided that the fraction of corrupted entries is not too large, i.e. does not exceed some strictly positive constant ρ∗^ ( numerical values for ρ∗^ are actually given). In other words, f can be recovered exactly by solving a simple convex optimization problem; in fact, a linear program. We report on numerical experiments suggesting that ` 1 - minimization is amazingly effective; f is recovered exactly even in situations where a very significant fraction of the output is corrupted. In the case when the measurement matrix A is Gaus- sian, the problem is equivalent to that of counting low- dimensional facets of a convex polytope, and in particular of a random section of the unit cube. In this case we can strengthen the results somewhat by using a geometric func- tional analysis approach.
Keywords. Linear codes, decoding of (random) lin-
ear codes, sparse solutions to underdetermined systems, ` 1 - minimization, linear programming, restricted orthonormal- ity, Gaussian random matrices.
This paper considers the model problem of recovering an input vector f ∈ Rn^ from corrupted measurements y′^ = Af + e. Here, A is an m by n matrix (we will as- sume throughout the paper that m > n), and e ∈ Rm^ is an unknown vector of errors. We will assume that at most r entries are corrupted, thus at most r entries of e are non- zero, but apart from this restriction e will be arbitrary. The problem we consider is whether it is possible to recover f exactly from the data y. And if so, how? In its abstract form, our problem is of course equivalent to the classical error correcting problem which arises in cod- ing theory as we may think of A as a linear code; a linear code is a given collection of codewords which are vectors a 1 ,... , an ∈ Rm—the columns of the matrix A. Given a vector f ∈ Rn^ (the “plaintext”) we can then generate a vec- tor Af in Rm^ (the “ciphertext”); if A has full rank, then one can clearly recover the plaintext f from the ciphertext Af. But now we suppose that the ciphertext Af is corrupted by an arbitrary vector e ∈ Rm^ with at most r non-zero entries so that the corrupted ciphertext is of the form Af + e. The question is then: given the coding matrix A and Af + e, can one recover f exactly? Let us say that the linear code A is a (m, n, r)-error cor- recting code if one can recover f from Af + e whenever
e has at most r non-zero coefficients. As is well-known, if the number r of corrupted entries is too large, then of course we have no hope of having a (m, n, r)-error correct- ing code. For instance, if n + 2r > m then elementary linear algebra shows that there exist plaintexts f, f ′^ ∈ Rn and errors e, e′^ ∈ Rm^ with at most r non-zero coefficients each such that Af + e = Af ′^ + e′, and so one cannot dis- tinguish f from f ′^ in this case. In particular, if the fraction ρ := (^) mr of corrupted entries exceeds 1 / 2 then an (m, n, r)- error correcting code is impossible regardless of how large one makes m with respect to n. This situation raises an important question: for which fraction ρ of the corrupted entries is accurate decoding pos- sible with practical algorithms? That is, with algorithms whose complexity is at most polynomial in the length m of the codewords? Setting y := Af , and letting Y ⊂ Rm^ be the image of A, we can rephrase the problem more geometrically as follows: how to reconstruct a vector y in an n-dimensional subspace Y of Rm^ from a vector y′^ ∈ Rm^ that differs from y in at most r coordinates? If the matrix A is chosen randomly, for instance by the Gaussian ensemble, then it is easy to show that with prob- ability one that all the plaintexts can be distinguished in an information-theoretic sense as soon as n + 2r ≤ m. How- ever, this result provides no algorithm for recovering the plaintext f from the corrupted ciphertext Af + e, other than brute force search, which has exponential complexity in m. Based on analogy with discrete (e.g. finite field) analogues of this problem, to obtain a polynomial-time recovery algo- rithm it is more reasonable to expect as a necessary condi- tion the Gilbert-Varshamov bound
n/m ≥ 1 − H(Cr/n) (1.1)
which is fundamental in coding theory (see [34]); here H(x) is the entropy function, and C, c, c 1 , etc. will be used to denote various positive absolute constants. This heuris- tic can be made rigorous if one requires a certain stability property for the recovery algorithm; see Section 6. One can instead consider a mean square approach, based on the minimization problem
(P 2 ) min f^ ˜ ∈Rn
‖y′^ − A f˜ ‖` 2
or equivalently
(P 2 ′) min ˜y∈Y ‖y′^ − y˜‖` 2
but the minimizer f?^ (resp. y?) may be arbitrarily far away from the plaintext f (resp. y) since we have no size control on the error e.
To recover f accurately from corrupted data y′^ = Af + e, we consider solving the following ` 1 -minimization (or Basis Pursuit) problem
(P 1 ) min f^ ˜ ∈Rn
‖y′^ − A f˜ ‖` 1 (1.2)
or equivalently
(P 1 ′) min ˜y∈Rn^
‖y′^ − y‖` 1
Thus the minimizer y?^ to (P 1 ′) is the metric projection of y′ onto the vector space Y with respect to the 1 norm. This is a convex program which can be classically reformulated as a linear program. Indeed, the 1 -minimization problem (P 1 ) is equivalent to
min
∑^ m
i=
ti, −t ≤ y′^ − A f˜ ≤ t,
where the optimization variables are t ∈ Rm^ and f˜ ∈ Rn (as is standard, the generalized vector inequality x ≤ y means that xi ≤ yi for every coordinate i). Hence, (P 1 ) is an LP with inequality constraints and can be solved ef- ficiently using standard or specialized optimization algo- rithms, see [4], [5]. The main claim of this paper is that for suitable coding matrices A, the solution f?^ to our linear program is actually exact; f?^ = f! (Equivalently, y?^ = y.)
y’
(MLS) (BP)
Y Y
u y y’ y=u
The potential of Basis Pursuit for exact re- construction is illustrated by the following heuristics, essentially due to [18]. The minimizer u to (P 2 ′) is the contact point where the smallest Euclidean ball centered at y′^ meets the subspace Y. That contact point is in general different from y. The situation is much better in (P 1 ′): typically the solution coincides with y. The minimizer u to (P 1 ′) is the contact point where the smallest octahedron centered at y′^ (the ball with respect to the 1 -norm) meets Y. Because the vector y − y′^ lies in a low-dimensional coordinate subspace, the octahedron has a wedge at y. Thus, many subspaces Y through y will
the annihilator B of a random Gaussian matrix can be cho- sen to be another random Gaussian matrix. Similar statements with different constants hold for other types of ensembles, e.g. for binary matrices with i.i.d. en- tries taking values ± 1 /
p with probability 1/2. It is in- teresting that our methods actually give numerical values, instead of the traditional “for some positive constant ρ.” However, the numerical bounds we derived in this paper are overly pessimistic. We are confident that finer arguments and perhaps new ideas will allow to derive versions of The- orem 1.2 with better bounds. Numerical experiments actu- ally suggests that the threshold is indeed much higher, see Section 5. Returning to the Gaussian case, it turns out that when r is somewhat small then we can come close to the theoretical limit n + 2r ≤ m. More precisely, in Section 3 we will prove
Theorem 1.3. Let m, n and r < cm be positive integers such that
m = n + R, where R ≥ Cr log(m/r). (1.7)
Let G be an m × n matrix whose entries are independent N (0, 1) normal random variables. Then, with probability at least 1 −e−cR, the matrix G is an (m, n, r) error-correcting code with exact recovery algorithm (P 1 ′).
The assymption (1.7) meets, up to a constant, the Gilbert-Varshamov bound (1.1), and can be rephrased in terms of the corruption rate ρ = r/m as m ≥ (1 + Cρ log (^) ρ^1 )n. Theorem 1.3 then asserts that m × n Gaus- sian matrices will be an (m, n, r)-error correcting code with high probability once this condition is attained. In the signal processing, linear codes are known as trans- form codes. The general paradigm about transform codes is that the redundancies in the coefficients of y that come from the excess of the dimension m > n should guarantee a stability of the signal with respect to noise, quantization, erasures, etc. This is confirmed by an extensive experimen- tal and some theoretical work, see e.g. [3, 6, 13, 29–32, 36] and the bibliography contained therein. Theorem 1.3 thus states that most orthogonal transform codes are good error- correcting codes.
E. C. is partially supported in part by a National Science Foundation grant DMS 01-40698 (FRG) and by an Alfred P. Sloan Fellowship. M. R. is partially supported by the NSF grant DMS 0245380. T. T. is supported by a grant from the Packard Foundation. R. V. is an Alfred P. Sloan Re- search Fellow He was also partially supported by the NSF grant DMS 0401032 and by the Miller Scholarship from the
University of Missouri-Columbia. R. V. is grateful to Uni- versity of Missouri for their hospitality during this period, when part of this research was started. E. C. and T. T. would like to thank Rafail Ostrovsky for pointing out possible con- nections between their earlier work and the error correction problem. Parts of this paper are an abridged version of [11] and [43].
2 Proof of Theorem 1.
The proof of the theorem makes use of two geomet- rical special facts about the solution d?^ to (P 1 ′). First, Bd?^ = By′^ which geometrically says that d?^ belongs to a known plane of co-dimension p where p. Second, because e is feasible, we must have ‖d?‖1 ≤ ‖e‖ 1. Decompose d? as d?^ = e + h, thus Bh = 0. As observed in [19]
‖e‖1 − ‖hT 0 ‖ 1 + ‖hT 0 c ‖1 ≤ ‖e + h‖ 1 ≤ ‖e‖` 1 ,
where T 0 is the support of e, and hT 0 (t) = h(t) for t ∈ T 0 and zero elsewhere (similarly for hT 0 c ). Hence, h obeys the cone constraint
‖hT 0 c ‖1 ≤ ‖hT 0 ‖ 1 (2.1)
which expresses the geometric idea that h must lie in the cone of descent of the ` 1 -norm at e. Exact recovery occurs provided that the null vector is the only point in the inter- section between {h : Bh = 0} and the set of h obeying (2.1). We begin by dividing T 0 c into subsets of size M (we will choose M later) and enumerate T 0 c as
n 1 , n 2 ,... , nm−|T 0 |
in decreasing order of magnitude of hT 0 c. Set Tj = {n, (j− 1)M + 1 ≤ ≤ jM }. That is, T 1 contains the indices of the M largest coefficients of hT 0 c , T 2 contains the indices of the next M largest coefficients, and so on. With this decomposition, the 2 -norm of h is concen- trated on T 01 = T 0 ∪ T 1. Indeed, the kth largest value of hT 0 c obeys |hT 0 c |(k) ≤ ‖hT 0 c ‖ 1 /k and, therefore,
‖hT 01 c ‖^2 2 ≤ ‖hT 0 c ‖^2 1
∑^ m
k=M +
1 /k^2 ≤ ‖hT 0 c ‖^2 ` 1 /M.
Further, the ` 1 -cone constraint gives
‖hT 01 c ‖^2 2 ≤ ‖hT 0 ‖^2 1 /M ≤ ‖hT 0 ‖^2 ` 2 · |T 0 |/M
and thus ‖h‖^2 2 = ‖hT 01 ‖^2 2 + ‖hT 01 c ‖^2 2 ≤ (1 + |T 0 |/M ) · ‖hT 01 ‖^2 2.
Observe now that
‖Bh‖` 2 = ‖BT 01 hT 01 +
j≥ 2
BTj hTj ‖` 2
≥ ‖BT 01 hT 01 ‖` 2 − ‖
j≥ 2
BTj hTj ‖` 2
≥ ‖BT 01 hT 01 ‖` 2 −
j≥ 2
‖BTj hTj ‖` 2
1 − δM +|T 0 | ‖hT 01 ‖` 2 − √ 1 + δM
j≥ 2
‖hTj ‖` 2.
Set ρM = |T 0 |/M. As we shall see later,
∑
j≥ 2
‖hTj ‖` 2 ≤
ρM · ‖hT 0 ‖` 2 , (2.3)
and since Bh = 0, this gives
1 − δM +|T 0 | −
ρM
1 + δM ] · ‖hT 01 ‖` 2 ≤ 0. (2.4)
It then follows from (2.2) that h = 0 provided that the quantity
1 − δM +|T 0 | −
ρM
1 + δM is positive. Take M = 3|T 0 | for example. Then this quantity is positive if δ 3 |T 0 | + 3δ 4 |T 0 | < 2. Since |T 0 | ≤ r, this follows from (1.6). It remains to argue about (2.3). Observe that by con- struction, the magnitude of each coefficient in Tj+1 is less than the average of the magnitudes in Tj :
|hTj+1 (t)| ≤ ‖hTj ‖` 1 /M.
Then ‖hTj+1 ‖^2 2 ≤ ‖hTj ‖^2 1 /M
and (2.3) follows from
∑
j≥ 2
‖hTj ‖` 2 ≤
j≥ 1
‖hTj ‖` 1 /
≤ ‖hT 0 ‖` 1 /
|T 0 |/M · ‖hT 0 ‖` 2.
3 Proof of Theorem 1.
Theorem 1.3 turns out to be equivaent to a problem of counting lower-dimensional facets of polytopes. Let B 1 m denote the unit ball with respect to the 1 -norm; it is some- times called the unit octahedron. The polar body is the unit cube Bm ∞ := [− 1 , 1]m. Note that the range of the matrix G is an n-dimensional subspace uniformly distributed over the Grassmannian. Thus the conclusion of Theorem 1.3 can
be reformulated as follows. Let y ∈ Y be an unknown vec- tor, and we are given a vector y′^ in Rm^ that differs from y on at most r coordinates. Then y can be exactly recon- structed from y′^ as the solution to the minimization problem (P 1 ′). This means that the affine subspace z + Y is tangent to the unit octahedron at point z, where z = y′^ − y. This should happen for all z from the coordinate subspaces RI with |I| = r. By the duality, this means that the subspace Y ⊥^ intersects all (m − r)-dimensional facets of the unit cube. The section of the cube by the subspace Y ⊥^ forms an origin-symmetric polytope of dimension R and with 2 m facets. Our problem can thus be stated as a problem of count- ing lower-dimensional facets of polytopes. Consider an R-dimensional origin symmetric polytope with 2 m facets. How many (R − r)-dimensional facets can it have? Clearly^1 , no more than 2 r^
(m r
. Does there exist a poly- tope with that many facets? Our ability to construct such a polytope is equivalent to the existence of the efficient error correcting code. Indeed, looking at the canonical realiza- tion of such a polytope as a section of the unit cube by a subspace Y ⊥, we see that Y ⊥^ intersects all the (m − r)- dimensional facets of the cube. Thus Y satisfies the conclu- sion of Theorem 1.3. We can thus state Theorem 1.3 in the following form: Theorem 3.1. There exists an R-dimensional symmetric polytope with m facets and with the maximal number of (R − r)-dimensional facets (which is 2 r^
(m r
), provided R ≥ Cr log(m/r). A random section of the cube forms such a polytope with probability 1 − e−cR.
∑^ The^ p-norm (^1 ≤^ p <^ ∞) on^ Rm^ is defined by^ ‖x‖pp^ = i |xi| p, and for p = ∞ it is ‖x‖∞ = maxi |xi|. The unit ball with respect to the p-norm on Rn^ is denoted by Bpm. When the p-norm is considered on a coordinate sub- space RI^ , I ⊂ { 1 ,... , m}, the corresponding unit ball is denoted by BpI. The unit Euclidean sphere in a subspace E is denoted by S(E). The normalized rotational invariant Lebesgue measure on S(E) is denoted by σE. The orthog- onal projection in onto a subspace E is denoted by PE. The standard Gaussian measure on E (with the identity covari- ance matrix) is denoted by γH. When E = Rd, we write σd− 1 for σE and γd for γE.
The proof of Theorem 1.3 begins with a typical duality argument, leading to the same reformulation of the prob- (^1) Any such facet is the intersection of some r facets of the polytope of full dimension R − 1 ; there are m facets to choose from, each coming with its opposite by the symmetry.
Proof. Let f be the multiple of the vector PE∩lin(F )θ such that f − θ is orthogonal to θ. Such a multiple exists and is unique, as this is a two-dimensional problem.
E lin(F)
Then f ∈ E ∩ aff(F ). Notice that D = ‖f − θ‖ 2. By the similarity of the triangles with the vertices (0, θ, PE∩lin(F )θ) and (0, f, θ), we conclude that
‖PE∩lin(F )θ‖ 2 = r √ r + D^2
r r + D^2
‖θ‖ 2
because ‖θ‖ 2 =
r. This completes the proof.
The length of the projection of a fixed vector onto a ran- dom subspace in Lemma 3.2 is well known. The asymptot- ically sharp estimate was computed by S. Artstein [1], but we will be satisfied with a much weaker elementary esti- mate, see e.g. [40, Theorem 15.2.2].
Lemma 3.3. Let θ ∈ Rd−^1 and let G be a random subspace in Gd,k. Then
c
k d ‖θ‖ 2 ≤ ‖PGθ‖ 2 ≤ C
k d ‖θ‖ 2
≥ 1 − 2 e−ck.
We apply this lemma for G = E ∩ lin(F ), which is a random subspace in the Grassmanian of (l+1)-dimensional subspaces of lin(F ). Since dim lin(F ) = m − r + 1, we have
‖PE∩lin(F )θ‖ 2 ≥ c
l + 1 m − r + 1
‖θ‖ 2
≥ 1 − 2 e−cl.
Together with Lemma 3.2 this gives
D ≤ c
m − r
r l
≥ 1 − 2 e−cl. (3.2)
Note that
m − r is the radius of the Euclidean ball cir- cumscribed on the facet F. The statement D ≤
m − r would only tell us that the random subspace E intersects the circumscribed ball, not yet the facet itself. The ratio r/l in (3.2) will be chosen logarithmically small, which will force E intersect also the facet F.
By (3.1) and (3.2),
Gm−r,m−R
σH
( (^) c √ m − r
l r
PH B ∞m−r
dν(H)
− 2 e−cl.
We can replace the spherical measure σH by the Gaussian measure γH via a simple lemma: Lemma 3.4. Let K be a star-shaped set in Rd. Then γd(c
d · K) − e−d^ ≤ σd− 1 (K) ≤ γd(C
d · K) · (1 + e−d).
Proof. Passing to polar coordinates, by the rotational in- variance of the Gaussian measure we see that there exists a probability measure μ on R+^ so that the Gaussian mea- sure of every set A can be computed as
R+^ σ
t(A) dμ(t), where σt^ denotes the normalized Lebesgue measure on the Euclidean sphere of radius t in Rd. Since K is star-shaped, σt(K) is a non-increasing function of t. Hence
γd(K) ≥
∫ (^) C√d
0
σt(K) dμ(t) ≥ σC
√d (K) · γd(C
dB 2 d )
and
γd(K) ≤
∫ (^) c√d
0
dμ(t) + σc
√ d(K)
c √ d
dμ(t)
≤ γd(c
d · B 2 d ) + σc
√d (K).
The classical large deviation inequalities imply γd(c
d · B 2 d ) ≤ e−d^ and γd(C
dBd 2 ) ≥ 1 − e−d/ 2. Using the above argument for c
d · K, we conclude that γd(c
d · K) ≤ e−d^ + σd− 1 (K) and γd(C
d · K) ≥ σd− 1 (K) · (1 − e−d/2).
Using Lemma 3.4 in the space H of dimension d = m − R, we obtain
Gm−r,m−R
γH
c
m − R m − r
l r
PH B ∞m−r
dν(H)
− 2 e−cl^ − em−R.
By choosing the absolute constant c in the assumption r < cm appropriately small, we can assume that 2 r < R < m/ 2. Thus
Gm−r,m−R
γH
c
r
PH Bm ∞−r
dν(H) − 2 e−cR.
We now compute the Gaussian measure of random projec- tions of the cube.
Proposition 3.5. Let H be a random subspace in Gn,n−k, k < n/ 2. Then the inequality
γH
log
n k
PH B ∞n
≥ 1 − e−ck
holds with probability at least 1 −e−ck^ in the Grassmanian.
The proof of this estimate will follow from the concen- tration of Gaussian measure, combined with the existence of a big Euclidean ball inside a random projection of the cube.
Lemma 3.6 (Concentration of Gaussian measure). Let ε > 0 and let A ⊂ Rn^ be a measurable set such that γn(A) ≥ e−ε (^2) n
. Then
γn(A + Cε
nBn 2 ) ≥ 1 − e−ε
(^2) n .
With the stronger assumption γ(A) ≥ 1 / 2 , this lemma is the classical concentration inequality, see [37] 1.1. The fact that the concentration holds also for exponentially small sets follows formally by a simple extension argument that was first noticed by D. Amir and V. Milman in [2], see [37] Lemma 1.1. The optimal result on random projections of the cube is due to Garnaev and Gluskin [28].
Theorem 3.7 (Euclidean projections of the cube [28]). Let H be a random subspace in Gn,n−k, where k = αn < n/ 2. Then with probability at least 1 − e−ck^ in the Grassmanian, we have
c(α) PH (
nB 2 n ) ⊆ PH (Bn ∞) ⊆ PH (
nB 2 n )
where
c(α) = c
α log(1/α)
Proof of Proposition 3.5. Let g 1 , g 2 ,... be independent standard Gaussian random variables. Then for a suitable positive absolute constant c and for every 0 < ε < 1 / 2 ,
γn
log
ε
Bn ∞
max 1 ≤j≤n
|gi| ≤ C
log
ε
≥ (1 − ε^2 /10)n^ ≥ e−ε
(^2) n .
Since for every measurable set A and every subspace H one has γH (PH A) ≥ γ(A), we conclude that
γH
log
ε
PH B ∞n
≥ e−ε
(^2) n for 0 < ε < 1 / 2.
Then by Lemma 3.6,
γH
log
ε PH B ∞n + Cε
n PH Bn 2
≥ 1 − e−ε
(^2) n
(3.4) for 0 < ε < 1 / 2. Theorem 3.7 tells us that for a random subspace H, if ε = c
α = c
k/n, then Euclidean ball is absorbed by the projection of the cube in (3.4):
ε
n PH B 2 n ⊂ C
log
ε PH B ∞n.
Hence for a random subspace H and for ε as above we have
γH
log
ε
PH Bn ∞
≥ 1 − e−ε
(^2) n ,
which completes the proof.
Coming back to (3.3), we shall use Lemma 3.5 for a ran- dom subspace H in the Grassmanian Gm−r,m−R. We con- clude that if c
r
log
m − r R − r
then with probability at least 1 − e−cR^ in the Grassmanian,
γH
c
r PH Bm ∞−r
≥ 1 − e−cR.
Since mR−−rr ≤ mr , the choice of R in (1.7) satisfies condition (3.5). Thus (3.3) implies
P ≥ 1 − 3 e−cR.
This completes the proof.
The logarithmic term in Theorems 1.3 and 4.1 is nec- essary, at least in the case of small r. Indeed, combining formula (3.1) and Lemmas 3.2, 3.3, 3.4, we obtain
Gm−r,m−R
γH
c
r
PH Bm ∞−r
dν(H) + 2e−cR.
To estimate the Gaussian measure we need the following
Lemma 3.8. Let x 1 ,... xs be vectors in Rs. Then
γs
∑^ s
j=
[−xj , xj ]
(^) ≤ γs(M · Bs ∞),
where M = maxj=1,...s ‖xj ‖ 2.
R = Cr log m Gaussian measurements. However, the poly- nomial probability is clearly not sufficient to deduce that there is one set vectors Xk that can be used to reconstruct all functions f of small support. The following equivalent form of Theorem 1.3 does yield a uniform exact reconstruc- tion. It provides us with one set of linear measurements from from which we can effectively reconstruct every sig- nal of small support.
Theorem 4.1 (Uniform Exact Reconstruction). Let m, r < cm and R be positive integers satisfying R ≥ Cr log(m/r). The independent standard Gaussian vectors Xk in Rm^ sat- isfy the following with probability at least 1 −e−cR. Let f ∈ Rm^ be an unknown function of small support, |suppf | ≤ r, and we are given R measurements 〈f, Xk〉. Then f can be exactly reconstructed from these measurements as a solu- tion to the Basis Pursuit problem (BP).
This theorem gives uniformity in [10], improves the polynomial probability to an exponential probability, and improves upon the number R of measurements (which was R ≥ Cr log m in [10]). Donoho [16] proved a weaker form of Theorem 4.1 with R/r bounded below by some function of m/r.
Proof. Write g = f − u for some u ∈ Rm. Then (BP′) reads as
min ‖u − f ‖ 1 subject to 〈u, Xk〉 = 0, ∀k.
The constraints here define a random (n = m − R)- dimensional subspace Y of Rm. Now apply Theorem 1. with y = 0 and y′^ = f. It states that the unique solution to the minimization problem above is u = 0. Therefore, the unique solution to (BP′) is f.
In a larger class of compressible functions [16], we can only hope for an approximate reconstruction. This is a class of functions f that are well compressible by a known or- thogonal transform, such as Fourier or wavelet. This means that the coefficients of f with respect to a certain known orthogonal basis have a power decay:
f ∗(s) ≤ s−^1 /p, s = 1,... , m (4.1)
where f ∗^ denotes a nonincreasing rearrangement of f. Many natural signals are compressible for some 0 < p < 1 , such as smooth signals and signals with bounded variations (see [10]), Theorem 4.1 implies, by the argument of [10], that functions compressible in some basis can be approxi- mately reconstructed from few fixed linear measurements. This is an improvement of a result of Donoho [16].
Corollary 4.2 (Uniform Approximate Reconstruction). Let m and r be positive integers. The independent standard Gaussian vectors Xk in Rm^ satisfy the following with prob- ability at least 1 − e−cR. Assume that an unknown func- tion f ∈ Rm^ satisfies either (4.1) for some 0 < p < 1 or ‖f ‖ 1 ≤ 1 for p = 1. Suppose that we are given R measure- ments 〈f, Xk〉. Then f can be approximately reconstructed from these measurements: a unique solution g to the Basis Pursuit problem (BP) satisfies
‖f − g‖ 2 ≤ Cp
( (^) log(m/R) R
) (^) p (^1) − (^12)
where Cp depends on p only.
Corollary 4.2 was proved by Donoho [16] un- der an additional assumption that m ∼ CRα for some α > 1. Notice that in this case log(m/R) ∼ log m. Now this assumption is removed. In [10] Corollary 4.2 was proven without the uniformity in f due to a weaker (polynomial) probability. Finally, Corollary 4.2 also improves upon the approximation error (there is now the ratio m/r instead of m in the logarithm).
5 Numerical Experiments
In this section, we empirically investigates the perfor- mance of our decoding strategy. Of special interest is the location of the breakpoint beyond which ` 1 fails to decode accurately. To study this issue, we performed a first series of experiments as follows:
The results are presented in Figure 1. In these experi- ments, we choose n = 128, and set m = 2n (Figure 1(a)) or m = 4n (Figure 1(b)). Our experiments show that the linear program recovers the input vector all the time as long as the fraction of the corrupted entries is less or equal to 15% in the case where m = 2n and less or equal to 35% in the case where m = 4n. We repeated these experiments for different values of n, e.g. n = 256 and obtained very similar recovery curves.
(^00) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.
1
Fraction fo corrupted entries
Frequency
Empirical frequency of exact decoding, n = 128, m = 2n
(^00) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.
1
Fraction fo corrupted entries
Frequency
Empirical frequency of exact decoding, n = 128, m = 4n
Figure 1. ` 1 -recovery of an input signal from y′^ = Af + e with A an m by n matrix with in- dependent Gaussian entries. In these experi- ments, we set n = 128. (Top) Success rate of (P 1 ) for m = 2n. (Bottom) Success rate of (P 1 ) for m = 4n. On top, exact recovery occurs as long as the corruption rate does not exceed 15%. The bottom breakdown is near 35%.
It is clear that versions of Theorem 1.2 exist for other type of random matrices, e.g. binary matrices. In the next experiment, we take the plaintext f as a binary sequence of zeros and ones (which is generated at random), and sample A with i.i.d entries taking on values in {± 1 }, each with probability 1/2. To recover f , we solve the linear program
min g∈Rn^
‖y − Ag‖` 1 subject to 0 ≤ g ≤ 1 , (5.1)
and round up the coordinates of the solution to the near- est integer. We follow the same procedure as before except that now, we select S locations of Af at random (the cor- ruption rate is again S/m) and flip the sign of the selected coordinates. We are again interested in the location of the breakpoint.
(^00) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.
1
Fraction fo corrupted entries
Frequency
Empirical frequency of exact decoding, n = 128, m = 2n
(^00) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.
1
Fraction fo corrupted entries
Frequency
Empirical frequency of exact decoding, n = 128, m = 4n
Figure 2. ` 1 -recovery of a binary sequence from corrupted data y′ ; A an m by n matrix with independent binary entries and the vec- tor of errors is obtained by randomly select- ing coordinates of Af and flipping their sign. In these experiments, we set n = 128. (Top) Success rate for m = 2n. (Bottom) Success rate for m = 4n. On top, exact recovery oc- curs as long as the corruption rate does not exceed 22.5%. The bottom breakdown is near 35%.
The results are presented in Figure 2. In these exper- iments, we choose n = 128 as before, and set m = 2n (Figure 2(a)) or m = 4n (Figure 2(b)). Our experiments show that the linear program recovers the input vector all the time as long as the fraction of the corrupted entries is less or equal to 22.5% in the case where m = 2n and less than about 35% in the case where m = 4n. We repeated these experiments for different values of n, e.g. n = 256
ing on n and m) such that accurate decoding occurs for all plaintexts and corrupted patterns (in the sense of Theorem 1.1) as long as the fraction of corrupted entries does not ex- ceed ρc. It would be of theoretical interest to identify this critical threshold, at least in the limit of large m and n, with perhaps n/m converging to a fixed ratio. From a different viewpoint, this is asking about how far does the equivalence between a combinatorial and a related convex problem hold. We pose this as an interesting challenge.
References
[1] S. Artstein. Proportional concentration phenomena on the sphere. Israel J. Math. 132: 337–358, 2002.
[2] D. Amir, and V. D. Milman. Unconditional and symmetric sets in n-dimensional normed spaces. Israel J. Math. 37: 3– 20, 1980.
[3] B. Beferull-Lozano, and A. Ortega. Efficient quantization for overcomplete expansions in Rn. IEEE Trans. Inform. Theory 49: 129–150, 2003.
[4] S. Boyd, and L. Vandenberghe. Convex Optimization. Cam- bridge University Press, 2004.
[5] P. G. Casazza, and J. Kovacevi´c. Equal-norm tight frames with erasures. Adv. Comput. Math. 18: 387–430, 2003.
[6] E. J. Cand`es, and J. Romberg, Quantitative robust un- certainty principles and optimally sparse decompositions. To appear Foundations of Computational Mathematics, November 2004.
[7] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incom- plete frequency information. To appear IEEE Transactions on Information Theory, June 2004. Available on the ArXiV preprint server: math.NA/0409186.
[8] E. J. Cand`es, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. To appear Comm. Pure Appl. Math. Available on the ArXiV preprint server: math.NA/0409186.
[9] E. J. Cand`es, and T. Tao. Near optimal signal recovery from random projections: universal encoding strategies? Submitted to IEEE Transactions on Information Theory, October 2004. Available on the ArXiV preprint server: math.CA/0410542.
[10] E. J. Cand`es, and T. Tao. Decoding by linear program- ming. Submitted, December 2004. Available on the ArXiV preprint server: math.MG/0502327.
[11] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic de- composition by basis pursuit. SIAM J. Scientific Computing 20: 33–61, 1998.
[12] I.Daubechies. Ten lectures on wavelets. SIAM, Philadelphia,
[13] D. L. Donoho. For most large underdetermined systems of linear equations the minimal ` 1 -norm solution is also the sparsest solution. Manuscript, September 2004.
[14] D. L. Donoho. For most large undetermined systems of lin- ear equations the minimal ` 1 -norm near-solution is also the sparsest near-solution. Manuscript September 2004.
[15] D. Donoho. Compressed sensing. Manuscript, September
[16] D. L. Donoho, and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ` 1 minimization. Proc. Natl. Acad. Sci. USA 100: 2197–2202 (2003).
[17] D. Donoho, M. Elad, and V. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. Manuscript, 2004.
[18] D. L. Donoho, and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Transactions on Information Theory, 47:2845–2862, 2001.
[19] D. L. Donoho, and Y. Tsaig. Extensions of compresed sens- ing. Preprint, 2004.
[20] D. Donoho, and Y. Tsaig. Breakdown of equivalence be- tween the minimal ` 1 -norm solution and the sparsest solu- tion. Preprint, 2004.
[21] N. El Karoui. New Results about Random Covariance Ma- trices and Statistical Applications. Stanford Ph. .D. Thesis, August 2004.
[22] M. Elad, and A. Bruckstein. A generalized uncertainty prin- ciple and sparse representation in pairs of bases. IEEE Trans. Inform. Theory 48: 2558–2567, 2002.
[23] J. Feldman. Decoding Error-Correcting Codes via Linear Programming. Ph.D. Thesis 2003, Massachussets Institute of Technology.
[24] J. Feldman, LP decoding achieves capacity, 2005 ACM- SIAM Symposium on Discrete Algorithms (SODA), preprint (2005).
[25] J. Feldman, T. Malkin, C. Stein, R. A. Servedio, and M. J. Wainwright, LP decoding corrects a constant fraction of errors. Proc. IEEE International Symposium on Information Theory (ISIT), June 2004.
[26] A. Feuer, and A. Nemirovski. On sparse representation in pairs of bases. IEEE Trans. Inform. Theory 49: 1579–1581,
[27] A. Yu. Garnaev, E. D. Gluskin, The widths of a Euclidean ball (Russian), Dokl. Akad. Nauk SSSR 277: 1048–1052,
[28] V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine 18(5): 9–21, 2001.
[29] V. K. Goyal. Multiple description coding: compression meets the network. IEEE Signal Processing Magazine 18(5): 74–93, 2001.
[30] V. K. Goyal, J. Kovacevic, and J. A. Kelner. Quantized frame expansions with erasures. Applied and Computational Har- monic Analysis 10: 203–233, 2001.
[31] V. K. Goyal, M. Vetterli, and N. T. Thao. Quantized over- complete expansions in RN^ : analysis, synthesis and al- gorithms, IEEE Trans. on Information Theory 44: 16–31,
[32] R. Gribonval, and M. Nielsen. Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49: 3320– 3325, 2003.
[33] Handbook of coding theory. Vol. I, II. Edited by V. S. Pless, W. C. Huffman and R. A. Brualdi. North-Holland, Amster- dam, 1998.
[34] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29: 295–327,
[35] J. Kovacevic, P. Dragotti, and V. Goyal. Filter bank frame expansions with erasures. IEEE Trans. on Information The- ory, 48: 1439–1450, 2002.
[36] M. Ledoux. The concentration of measure phenomenon. Mathematical Surveys and Monographs 89, American Mathematical Society, Providence, RI, 2001.
[37] M. A. Lifshits, Gaussian random functions. Mathematics and its Applications, 322. Kluwer Academic Publishers, Dordrecht, 1995.
[38] V. A. Marchenko, and L. A. Pastur. Distribution of eigenval- ues in certain sets of random matrices. Mat. Sb. (N.S.) 72: 407–535, 1967 (in Russian).
[39] J. Matousek. Lectures on discrete geometry. Graduate Texts in Mathematics, 212. Springer-Verlag, New York, 2002.
[40] S. Mendelson. Geometric parameters in learning theory. Ge- ometric aspects of functional analysis. Lecture Notes in Mathematics 1850: 193–235, Springer, Berlin, 2004.
[41] B. K. Natarajan. Sparse approximate solutions to linear sys- tems. SIAM J. Comput. 24: 227–234, 1995.
[42] M. Rudelson, and R. Vershynin. Geometric approach to error correcting codes and reconstruction of signals. Sub- mitted, 2005. Available on the ArXiV preprint server: math.FA/0502299.
[43] S. J. Szarek. Condition numbers of random matrices. J. Complexity 7:131–149, 1991.
[44] J. Tropp. Recovery of short, complex linear combinations via ` 1 minimization. To appear IEEE Trans. Inform. Theory.
[45] J. Tropp, Greed is good: Algorithmic results for sparse ap- proximation, IEEE Trans. Inform. Theory, 50(10): 2231- 2242, October 2004.
[46] J. Tropp. Just relax: Convex programming methods for sub- set selection and sparse approximation. ICES Report 04-04, UT-Austin, 2004.