Error Correction with Gaussian Matrices: Reconstructing Vectors from Corrupted Data, Papers of Cryptography and System Security

The problem of reconstructing a vector from a corrupted version that differs in at most r coordinates, using a random gaussian matrix as a tool for error correction. The conditions under which a gaussian matrix can be used as an (m, n, r)-error correcting code and provides a theorem stating that a n by m gaussian matrix will be an (m, n, r)-error correcting code with high probability, as long as the fraction of corrupted entries is less than a constant ρ∗ depending on n/m. The document also discusses the gaussian ensemble and the design of matrices with good restricted isometry constants.

Typology: Papers

Pre 2010

Uploaded on 08/30/2009

koofers-user-q89
koofers-user-q89 🇺🇸

9 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Error Correction via Linear Programming
Emmanuel Candes
Applied and Computational
Mathematics, Caltech,
Pasadena, CA 91125, USA
Mark Rudelson
Department of Mathematics
University of Missouri,
Columbia, MO 65203, USA
Terence Tao
Department of Mathematics
University of California,
Los Angeles, CA 90095, USA
Roman Vershynin
Department of Mathematics
University of California,
Davis, CA 9595616, USA
Abstract
Suppose we wish to transmit a vector fRnreliably. A
frequently discussed approach consists in encoding fwith
an mby ncoding matrix A. Assume now that a fraction
of the entries of Af are corrupted in a completely arbitrary
fashion by an error e. We do not know which entries are
affected nor do we know how they are affected. Is it possi-
ble to recover fexactly from the corrupted m-dimensional
vector y0=Af +e?
This paper proves that under suitable conditions on the
coding matrix A, the input fis the unique solution to the
`1-minimization problem (kxk`1:= Pi|xi|)
min
˜
fRnky0A˜
fk`1
provided that the fraction of corrupted entries is not too
large, i.e. does not exceed some strictly positive constant
ρ( numerical values for ρare actually given). In other
words, fcan be recovered exactly by solving a simple
convex optimization problem; in fact, a linear program.
We report on numerical experiments suggesting that `1-
minimization is amazingly effective; fis recovered exactly
even in situations where a very significant fraction of the
output is corrupted.
In the case when the measurement matrix Ais Gaus-
sian, the problem is equivalent to that of counting low-
dimensional facets of a convex polytope, and in particular
of a random section of the unit cube. In this case we can
strengthen the results somewhat by using a geometric func-
tional analysis approach.
Keywords. Linear codes, decoding of (random) lin-
ear codes, sparse solutions to underdetermined systems, `1-
minimization, linear programming, restricted orthonormal-
ity, Gaussian random matrices.
1 Introduction
1.1 The error correction problem
This paper considers the model problem of recovering
an input vector fRnfrom corrupted measurements
y0=Af +e. Here, Ais an mby nmatrix (we will as-
sume throughout the paper that m > n), and eRmis
an unknown vector of errors. We will assume that at most
rentries are corrupted, thus at most rentries of eare non-
zero, but apart from this restriction ewill be arbitrary. The
problem we consider is whether it is possible to recover f
exactly from the data y. And if so, how?
In its abstract form, our problem is of course equivalent
to the classical error correcting problem which arises in cod-
ing theory as we may think of Aas a linear code; a linear
code is a given collection of codewords which are vectors
a1, . . . , anRm—the columns of the matrix A. Given a
vector fRn(the “plaintext”) we can then generate a vec-
tor Af in Rm(the “ciphertext”); if Ahas full rank, then one
can clearly recover the plaintext ffrom the ciphertext Af.
But now we suppose that the ciphertext Af is corrupted by
an arbitrary vector eRmwith at most rnon-zero entries
so that the corrupted ciphertext is of the form Af +e. The
question is then: given the coding matrix Aand Af +e, can
one recover fexactly?
Let us say that the linear code Ais a (m, n, r)-error cor-
recting code if one can recover ffrom Af +ewhenever
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Error Correction with Gaussian Matrices: Reconstructing Vectors from Corrupted Data and more Papers Cryptography and System Security in PDF only on Docsity!

Error Correction via Linear Programming

Emmanuel Candes

Applied and Computational

Mathematics, Caltech,

Pasadena, CA 91125, USA

[email protected]

Mark Rudelson

Department of Mathematics

University of Missouri,

Columbia, MO 65203, USA

[email protected]

Terence Tao

Department of Mathematics

University of California,

Los Angeles, CA 90095, USA

[email protected]

Roman Vershynin

Department of Mathematics

University of California,

Davis, CA 9595616, USA

[email protected]

Abstract

Suppose we wish to transmit a vector f ∈ Rn^ reliably. A frequently discussed approach consists in encoding f with an m by n coding matrix A. Assume now that a fraction of the entries of Af are corrupted in a completely arbitrary fashion by an error e. We do not know which entries are affected nor do we know how they are affected. Is it possi- ble to recover f exactly from the corrupted m-dimensional vector y′^ = Af + e? This paper proves that under suitable conditions on the coding matrix A, the input f is the unique solution to the 1 -minimization problem (‖x‖ 1 :=

i |xi|) min f^ ˜ ∈Rn

‖y′^ − A f˜ ‖` 1

provided that the fraction of corrupted entries is not too large, i.e. does not exceed some strictly positive constant ρ∗^ ( numerical values for ρ∗^ are actually given). In other words, f can be recovered exactly by solving a simple convex optimization problem; in fact, a linear program. We report on numerical experiments suggesting that ` 1 - minimization is amazingly effective; f is recovered exactly even in situations where a very significant fraction of the output is corrupted. In the case when the measurement matrix A is Gaus- sian, the problem is equivalent to that of counting low- dimensional facets of a convex polytope, and in particular of a random section of the unit cube. In this case we can strengthen the results somewhat by using a geometric func- tional analysis approach.

Keywords. Linear codes, decoding of (random) lin-

ear codes, sparse solutions to underdetermined systems, ` 1 - minimization, linear programming, restricted orthonormal- ity, Gaussian random matrices.

1 Introduction

1.1 The error correction problem

This paper considers the model problem of recovering an input vector f ∈ Rn^ from corrupted measurements y′^ = Af + e. Here, A is an m by n matrix (we will as- sume throughout the paper that m > n), and e ∈ Rm^ is an unknown vector of errors. We will assume that at most r entries are corrupted, thus at most r entries of e are non- zero, but apart from this restriction e will be arbitrary. The problem we consider is whether it is possible to recover f exactly from the data y. And if so, how? In its abstract form, our problem is of course equivalent to the classical error correcting problem which arises in cod- ing theory as we may think of A as a linear code; a linear code is a given collection of codewords which are vectors a 1 ,... , an ∈ Rm—the columns of the matrix A. Given a vector f ∈ Rn^ (the “plaintext”) we can then generate a vec- tor Af in Rm^ (the “ciphertext”); if A has full rank, then one can clearly recover the plaintext f from the ciphertext Af. But now we suppose that the ciphertext Af is corrupted by an arbitrary vector e ∈ Rm^ with at most r non-zero entries so that the corrupted ciphertext is of the form Af + e. The question is then: given the coding matrix A and Af + e, can one recover f exactly? Let us say that the linear code A is a (m, n, r)-error cor- recting code if one can recover f from Af + e whenever

e has at most r non-zero coefficients. As is well-known, if the number r of corrupted entries is too large, then of course we have no hope of having a (m, n, r)-error correct- ing code. For instance, if n + 2r > m then elementary linear algebra shows that there exist plaintexts f, f ′^ ∈ Rn and errors e, e′^ ∈ Rm^ with at most r non-zero coefficients each such that Af + e = Af ′^ + e′, and so one cannot dis- tinguish f from f ′^ in this case. In particular, if the fraction ρ := (^) mr of corrupted entries exceeds 1 / 2 then an (m, n, r)- error correcting code is impossible regardless of how large one makes m with respect to n. This situation raises an important question: for which fraction ρ of the corrupted entries is accurate decoding pos- sible with practical algorithms? That is, with algorithms whose complexity is at most polynomial in the length m of the codewords? Setting y := Af , and letting Y ⊂ Rm^ be the image of A, we can rephrase the problem more geometrically as follows: how to reconstruct a vector y in an n-dimensional subspace Y of Rm^ from a vector y′^ ∈ Rm^ that differs from y in at most r coordinates? If the matrix A is chosen randomly, for instance by the Gaussian ensemble, then it is easy to show that with prob- ability one that all the plaintexts can be distinguished in an information-theoretic sense as soon as n + 2r ≤ m. How- ever, this result provides no algorithm for recovering the plaintext f from the corrupted ciphertext Af + e, other than brute force search, which has exponential complexity in m. Based on analogy with discrete (e.g. finite field) analogues of this problem, to obtain a polynomial-time recovery algo- rithm it is more reasonable to expect as a necessary condi- tion the Gilbert-Varshamov bound

n/m ≥ 1 − H(Cr/n) (1.1)

which is fundamental in coding theory (see [34]); here H(x) is the entropy function, and C, c, c 1 , etc. will be used to denote various positive absolute constants. This heuris- tic can be made rigorous if one requires a certain stability property for the recovery algorithm; see Section 6. One can instead consider a mean square approach, based on the minimization problem

(P 2 ) min f^ ˜ ∈Rn

‖y′^ − A f˜ ‖` 2

or equivalently

(P 2 ′) min ˜y∈Y ‖y′^ − y˜‖` 2

but the minimizer f?^ (resp. y?) may be arbitrarily far away from the plaintext f (resp. y) since we have no size control on the error e.

1.2 Solution via ` 1 -minimization

To recover f accurately from corrupted data y′^ = Af + e, we consider solving the following ` 1 -minimization (or Basis Pursuit) problem

(P 1 ) min f^ ˜ ∈Rn

‖y′^ − A f˜ ‖` 1 (1.2)

or equivalently

(P 1 ′) min ˜y∈Rn^

‖y′^ − y‖` 1

Thus the minimizer y?^ to (P 1 ′) is the metric projection of y′ onto the vector space Y with respect to the 1 norm. This is a convex program which can be classically reformulated as a linear program. Indeed, the 1 -minimization problem (P 1 ) is equivalent to

min

∑^ m

i=

ti, −t ≤ y′^ − A f˜ ≤ t,

where the optimization variables are t ∈ Rm^ and f˜ ∈ Rn (as is standard, the generalized vector inequality x ≤ y means that xi ≤ yi for every coordinate i). Hence, (P 1 ) is an LP with inequality constraints and can be solved ef- ficiently using standard or specialized optimization algo- rithms, see [4], [5]. The main claim of this paper is that for suitable coding matrices A, the solution f?^ to our linear program is actually exact; f?^ = f! (Equivalently, y?^ = y.)

y’

(MLS) (BP)

Y Y

u y y’ y=u

The potential of Basis Pursuit for exact re- construction is illustrated by the following heuristics, essentially due to [18]. The minimizer u to (P 2 ′) is the contact point where the smallest Euclidean ball centered at y′^ meets the subspace Y. That contact point is in general different from y. The situation is much better in (P 1 ′): typically the solution coincides with y. The minimizer u to (P 1 ′) is the contact point where the smallest octahedron centered at y′^ (the ball with respect to the 1 -norm) meets Y. Because the vector y − y′^ lies in a low-dimensional coordinate subspace, the octahedron has a wedge at y. Thus, many subspaces Y through y will

the annihilator B of a random Gaussian matrix can be cho- sen to be another random Gaussian matrix. Similar statements with different constants hold for other types of ensembles, e.g. for binary matrices with i.i.d. en- tries taking values ± 1 /

p with probability 1/2. It is in- teresting that our methods actually give numerical values, instead of the traditional “for some positive constant ρ.” However, the numerical bounds we derived in this paper are overly pessimistic. We are confident that finer arguments and perhaps new ideas will allow to derive versions of The- orem 1.2 with better bounds. Numerical experiments actu- ally suggests that the threshold is indeed much higher, see Section 5. Returning to the Gaussian case, it turns out that when r is somewhat small then we can come close to the theoretical limit n + 2r ≤ m. More precisely, in Section 3 we will prove

Theorem 1.3. Let m, n and r < cm be positive integers such that

m = n + R, where R ≥ Cr log(m/r). (1.7)

Let G be an m × n matrix whose entries are independent N (0, 1) normal random variables. Then, with probability at least 1 −e−cR, the matrix G is an (m, n, r) error-correcting code with exact recovery algorithm (P 1 ′).

The assymption (1.7) meets, up to a constant, the Gilbert-Varshamov bound (1.1), and can be rephrased in terms of the corruption rate ρ = r/m as m ≥ (1 + Cρ log (^) ρ^1 )n. Theorem 1.3 then asserts that m × n Gaus- sian matrices will be an (m, n, r)-error correcting code with high probability once this condition is attained. In the signal processing, linear codes are known as trans- form codes. The general paradigm about transform codes is that the redundancies in the coefficients of y that come from the excess of the dimension m > n should guarantee a stability of the signal with respect to noise, quantization, erasures, etc. This is confirmed by an extensive experimen- tal and some theoretical work, see e.g. [3, 6, 13, 29–32, 36] and the bibliography contained therein. Theorem 1.3 thus states that most orthogonal transform codes are good error- correcting codes.

Acknowledgements.

E. C. is partially supported in part by a National Science Foundation grant DMS 01-40698 (FRG) and by an Alfred P. Sloan Fellowship. M. R. is partially supported by the NSF grant DMS 0245380. T. T. is supported by a grant from the Packard Foundation. R. V. is an Alfred P. Sloan Re- search Fellow He was also partially supported by the NSF grant DMS 0401032 and by the Miller Scholarship from the

University of Missouri-Columbia. R. V. is grateful to Uni- versity of Missouri for their hospitality during this period, when part of this research was started. E. C. and T. T. would like to thank Rafail Ostrovsky for pointing out possible con- nections between their earlier work and the error correction problem. Parts of this paper are an abridged version of [11] and [43].

2 Proof of Theorem 1.

The proof of the theorem makes use of two geomet- rical special facts about the solution d?^ to (P 1 ′). First, Bd?^ = By′^ which geometrically says that d?^ belongs to a known plane of co-dimension p where p. Second, because e is feasible, we must have ‖d?‖1 ≤ ‖e‖ 1. Decompose d? as d?^ = e + h, thus Bh = 0. As observed in [19]

‖e‖1 − ‖hT 0 ‖ 1 + ‖hT 0 c ‖1 ≤ ‖e + h‖ 1 ≤ ‖e‖` 1 ,

where T 0 is the support of e, and hT 0 (t) = h(t) for t ∈ T 0 and zero elsewhere (similarly for hT 0 c ). Hence, h obeys the cone constraint

‖hT 0 c ‖1 ≤ ‖hT 0 ‖ 1 (2.1)

which expresses the geometric idea that h must lie in the cone of descent of the ` 1 -norm at e. Exact recovery occurs provided that the null vector is the only point in the inter- section between {h : Bh = 0} and the set of h obeying (2.1). We begin by dividing T 0 c into subsets of size M (we will choose M later) and enumerate T 0 c as

n 1 , n 2 ,... , nm−|T 0 |

in decreasing order of magnitude of hT 0 c. Set Tj = {n, (j− 1)M + 1 ≤ ≤ jM }. That is, T 1 contains the indices of the M largest coefficients of hT 0 c , T 2 contains the indices of the next M largest coefficients, and so on. With this decomposition, the 2 -norm of h is concen- trated on T 01 = T 0 ∪ T 1. Indeed, the kth largest value of hT 0 c obeys |hT 0 c |(k) ≤ ‖hT 0 c ‖ 1 /k and, therefore,

‖hT 01 c ‖^2 2 ≤ ‖hT 0 c ‖^2 1

∑^ m

k=M +

1 /k^2 ≤ ‖hT 0 c ‖^2 ` 1 /M.

Further, the ` 1 -cone constraint gives

‖hT 01 c ‖^2 2 ≤ ‖hT 0 ‖^2 1 /M ≤ ‖hT 0 ‖^2 ` 2 · |T 0 |/M

and thus ‖h‖^2 2 = ‖hT 01 ‖^2 2 + ‖hT 01 c ‖^2 2 ≤ (1 + |T 0 |/M ) · ‖hT 01 ‖^2 2.

Observe now that

‖Bh‖` 2 = ‖BT 01 hT 01 +

j≥ 2

BTj hTj ‖` 2

≥ ‖BT 01 hT 01 ‖` 2 − ‖

j≥ 2

BTj hTj ‖` 2

≥ ‖BT 01 hT 01 ‖` 2 −

j≥ 2

‖BTj hTj ‖` 2

1 − δM +|T 0 | ‖hT 01 ‖` 2 − √ 1 + δM

j≥ 2

‖hTj ‖` 2.

Set ρM = |T 0 |/M. As we shall see later,

j≥ 2

‖hTj ‖` 2 ≤

ρM · ‖hT 0 ‖` 2 , (2.3)

and since Bh = 0, this gives

[

1 − δM +|T 0 | −

ρM

1 + δM ] · ‖hT 01 ‖` 2 ≤ 0. (2.4)

It then follows from (2.2) that h = 0 provided that the quantity

1 − δM +|T 0 | −

ρM

1 + δM is positive. Take M = 3|T 0 | for example. Then this quantity is positive if δ 3 |T 0 | + 3δ 4 |T 0 | < 2. Since |T 0 | ≤ r, this follows from (1.6). It remains to argue about (2.3). Observe that by con- struction, the magnitude of each coefficient in Tj+1 is less than the average of the magnitudes in Tj :

|hTj+1 (t)| ≤ ‖hTj ‖` 1 /M.

Then ‖hTj+1 ‖^2 2 ≤ ‖hTj ‖^2 1 /M

and (2.3) follows from

j≥ 2

‖hTj ‖` 2 ≤

j≥ 1

‖hTj ‖` 1 /

M

≤ ‖hT 0 ‖` 1 /

M ≤

|T 0 |/M · ‖hT 0 ‖` 2.

3 Proof of Theorem 1.

3.1 Low-dimensional facets of polytopes.

Theorem 1.3 turns out to be equivaent to a problem of counting lower-dimensional facets of polytopes. Let B 1 m denote the unit ball with respect to the 1 -norm; it is some- times called the unit octahedron. The polar body is the unit cube Bm ∞ := [− 1 , 1]m. Note that the range of the matrix G is an n-dimensional subspace uniformly distributed over the Grassmannian. Thus the conclusion of Theorem 1.3 can

be reformulated as follows. Let y ∈ Y be an unknown vec- tor, and we are given a vector y′^ in Rm^ that differs from y on at most r coordinates. Then y can be exactly recon- structed from y′^ as the solution to the minimization problem (P 1 ′). This means that the affine subspace z + Y is tangent to the unit octahedron at point z, where z = y′^ − y. This should happen for all z from the coordinate subspaces RI with |I| = r. By the duality, this means that the subspace Y ⊥^ intersects all (m − r)-dimensional facets of the unit cube. The section of the cube by the subspace Y ⊥^ forms an origin-symmetric polytope of dimension R and with 2 m facets. Our problem can thus be stated as a problem of count- ing lower-dimensional facets of polytopes. Consider an R-dimensional origin symmetric polytope with 2 m facets. How many (R − r)-dimensional facets can it have? Clearly^1 , no more than 2 r^

(m r

. Does there exist a poly- tope with that many facets? Our ability to construct such a polytope is equivalent to the existence of the efficient error correcting code. Indeed, looking at the canonical realiza- tion of such a polytope as a section of the unit cube by a subspace Y ⊥, we see that Y ⊥^ intersects all the (m − r)- dimensional facets of the cube. Thus Y satisfies the conclu- sion of Theorem 1.3. We can thus state Theorem 1.3 in the following form: Theorem 3.1. There exists an R-dimensional symmetric polytope with m facets and with the maximal number of (R − r)-dimensional facets (which is 2 r^

(m r

), provided R ≥ Cr log(m/r). A random section of the cube forms such a polytope with probability 1 − e−cR.

3.2 Notation.

∑^ The^ p-norm (^1 ≤^ p <^ ∞) on^ Rm^ is defined by^ ‖x‖pp^ = i |xi| p, and for p = ∞ it is ‖x‖∞ = maxi |xi|. The unit ball with respect to the p-norm on Rn^ is denoted by Bpm. When the p-norm is considered on a coordinate sub- space RI^ , I ⊂ { 1 ,... , m}, the corresponding unit ball is denoted by BpI. The unit Euclidean sphere in a subspace E is denoted by S(E). The normalized rotational invariant Lebesgue measure on S(E) is denoted by σE. The orthog- onal projection in onto a subspace E is denoted by PE. The standard Gaussian measure on E (with the identity covari- ance matrix) is denoted by γH. When E = Rd, we write σd− 1 for σE and γd for γE.

3.3 Duality.

The proof of Theorem 1.3 begins with a typical duality argument, leading to the same reformulation of the prob- (^1) Any such facet is the intersection of some r facets of the polytope of full dimension R − 1 ; there are m facets to choose from, each coming with its opposite by the symmetry.

Proof. Let f be the multiple of the vector PE∩lin(F )θ such that f − θ is orthogonal to θ. Such a multiple exists and is unique, as this is a two-dimensional problem.

P

E lin(F)

f

Then f ∈ E ∩ aff(F ). Notice that D = ‖f − θ‖ 2. By the similarity of the triangles with the vertices (0, θ, PE∩lin(F )θ) and (0, f, θ), we conclude that

‖PE∩lin(F )θ‖ 2 = r √ r + D^2

r r + D^2

‖θ‖ 2

because ‖θ‖ 2 =

r. This completes the proof.

The length of the projection of a fixed vector onto a ran- dom subspace in Lemma 3.2 is well known. The asymptot- ically sharp estimate was computed by S. Artstein [1], but we will be satisfied with a much weaker elementary esti- mate, see e.g. [40, Theorem 15.2.2].

Lemma 3.3. Let θ ∈ Rd−^1 and let G be a random subspace in Gd,k. Then

P

c

k d ‖θ‖ 2 ≤ ‖PGθ‖ 2 ≤ C

k d ‖θ‖ 2

≥ 1 − 2 e−ck.

We apply this lemma for G = E ∩ lin(F ), which is a random subspace in the Grassmanian of (l+1)-dimensional subspaces of lin(F ). Since dim lin(F ) = m − r + 1, we have

P

‖PE∩lin(F )θ‖ 2 ≥ c

l + 1 m − r + 1

‖θ‖ 2

≥ 1 − 2 e−cl.

Together with Lemma 3.2 this gives

P

D ≤ c

m − r

r l

≥ 1 − 2 e−cl. (3.2)

Note that

m − r is the radius of the Euclidean ball cir- cumscribed on the facet F. The statement D ≤

m − r would only tell us that the random subspace E intersects the circumscribed ball, not yet the facet itself. The ratio r/l in (3.2) will be chosen logarithmically small, which will force E intersect also the facet F.

3.6 Gaussian measure of random projections of

the cube

By (3.1) and (3.2),

P ≥

Gm−r,m−R

σH

( (^) c √ m − r

l r

PH B ∞m−r

dν(H)

− 2 e−cl.

We can replace the spherical measure σH by the Gaussian measure γH via a simple lemma: Lemma 3.4. Let K be a star-shaped set in Rd. Then γd(c

d · K) − e−d^ ≤ σd− 1 (K) ≤ γd(C

d · K) · (1 + e−d).

Proof. Passing to polar coordinates, by the rotational in- variance of the Gaussian measure we see that there exists a probability measure μ on R+^ so that the Gaussian mea- sure of every set A can be computed as

R+^ σ

t(A) dμ(t), where σt^ denotes the normalized Lebesgue measure on the Euclidean sphere of radius t in Rd. Since K is star-shaped, σt(K) is a non-increasing function of t. Hence

γd(K) ≥

∫ (^) C√d

0

σt(K) dμ(t) ≥ σC

√d (K) · γd(C

dB 2 d )

and

γd(K) ≤

∫ (^) c√d

0

dμ(t) + σc

√ d(K)

c √ d

dμ(t)

≤ γd(c

d · B 2 d ) + σc

√d (K).

The classical large deviation inequalities imply γd(c

d · B 2 d ) ≤ e−d^ and γd(C

dBd 2 ) ≥ 1 − e−d/ 2. Using the above argument for c

d · K, we conclude that γd(c

d · K) ≤ e−d^ + σd− 1 (K) and γd(C

d · K) ≥ σd− 1 (K) · (1 − e−d/2).

Using Lemma 3.4 in the space H of dimension d = m − R, we obtain

P ≥

Gm−r,m−R

γH

c

m − R m − r

l r

PH B ∞m−r

dν(H)

− 2 e−cl^ − em−R.

By choosing the absolute constant c in the assumption r < cm appropriately small, we can assume that 2 r < R < m/ 2. Thus

P ≥

Gm−r,m−R

γH

c

R

r

PH Bm ∞−r

dν(H) − 2 e−cR.

We now compute the Gaussian measure of random projec- tions of the cube.

Proposition 3.5. Let H be a random subspace in Gn,n−k, k < n/ 2. Then the inequality

γH

C

log

n k

PH B ∞n

≥ 1 − e−ck

holds with probability at least 1 −e−ck^ in the Grassmanian.

The proof of this estimate will follow from the concen- tration of Gaussian measure, combined with the existence of a big Euclidean ball inside a random projection of the cube.

Lemma 3.6 (Concentration of Gaussian measure). Let ε > 0 and let A ⊂ Rn^ be a measurable set such that γn(A) ≥ e−ε (^2) n

. Then

γn(A + Cε

nBn 2 ) ≥ 1 − e−ε

(^2) n .

With the stronger assumption γ(A) ≥ 1 / 2 , this lemma is the classical concentration inequality, see [37] 1.1. The fact that the concentration holds also for exponentially small sets follows formally by a simple extension argument that was first noticed by D. Amir and V. Milman in [2], see [37] Lemma 1.1. The optimal result on random projections of the cube is due to Garnaev and Gluskin [28].

Theorem 3.7 (Euclidean projections of the cube [28]). Let H be a random subspace in Gn,n−k, where k = αn < n/ 2. Then with probability at least 1 − e−ck^ in the Grassmanian, we have

c(α) PH (

nB 2 n ) ⊆ PH (Bn ∞) ⊆ PH (

nB 2 n )

where

c(α) = c

α log(1/α)

Proof of Proposition 3.5. Let g 1 , g 2 ,... be independent standard Gaussian random variables. Then for a suitable positive absolute constant c and for every 0 < ε < 1 / 2 ,

γn

C

log

ε

Bn ∞

= P

max 1 ≤j≤n

|gi| ≤ C

log

ε

≥ (1 − ε^2 /10)n^ ≥ e−ε

(^2) n .

Since for every measurable set A and every subspace H one has γH (PH A) ≥ γ(A), we conclude that

γH

C

log

ε

PH B ∞n

≥ e−ε

(^2) n for 0 < ε < 1 / 2.

Then by Lemma 3.6,

γH

C

log

ε PH B ∞n + Cε

n PH Bn 2

≥ 1 − e−ε

(^2) n

(3.4) for 0 < ε < 1 / 2. Theorem 3.7 tells us that for a random subspace H, if ε = c

α = c

k/n, then Euclidean ball is absorbed by the projection of the cube in (3.4):

ε

n PH B 2 n ⊂ C

log

ε PH B ∞n.

Hence for a random subspace H and for ε as above we have

γH

C

log

ε

PH Bn ∞

≥ 1 − e−ε

(^2) n ,

which completes the proof.

Coming back to (3.3), we shall use Lemma 3.5 for a ran- dom subspace H in the Grassmanian Gm−r,m−R. We con- clude that if c

R

r

≥ C

log

m − r R − r

then with probability at least 1 − e−cR^ in the Grassmanian,

γH

c

R

r PH Bm ∞−r

≥ 1 − e−cR.

Since mR−−rr ≤ mr , the choice of R in (1.7) satisfies condition (3.5). Thus (3.3) implies

P ≥ 1 − 3 e−cR.

This completes the proof.

3.7 Optimality

The logarithmic term in Theorems 1.3 and 4.1 is nec- essary, at least in the case of small r. Indeed, combining formula (3.1) and Lemmas 3.2, 3.3, 3.4, we obtain

P ≤

Gm−r,m−R

γH

c

R

r

PH Bm ∞−r

dν(H) + 2e−cR.

To estimate the Gaussian measure we need the following

Lemma 3.8. Let x 1 ,... xs be vectors in Rs. Then

γs

∑^ s

j=

[−xj , xj ]

 (^) ≤ γs(M · Bs ∞),

where M = maxj=1,...s ‖xj ‖ 2.

R = Cr log m Gaussian measurements. However, the poly- nomial probability is clearly not sufficient to deduce that there is one set vectors Xk that can be used to reconstruct all functions f of small support. The following equivalent form of Theorem 1.3 does yield a uniform exact reconstruc- tion. It provides us with one set of linear measurements from from which we can effectively reconstruct every sig- nal of small support.

Theorem 4.1 (Uniform Exact Reconstruction). Let m, r < cm and R be positive integers satisfying R ≥ Cr log(m/r). The independent standard Gaussian vectors Xk in Rm^ sat- isfy the following with probability at least 1 −e−cR. Let f ∈ Rm^ be an unknown function of small support, |suppf | ≤ r, and we are given R measurements 〈f, Xk〉. Then f can be exactly reconstructed from these measurements as a solu- tion to the Basis Pursuit problem (BP).

This theorem gives uniformity in [10], improves the polynomial probability to an exponential probability, and improves upon the number R of measurements (which was R ≥ Cr log m in [10]). Donoho [16] proved a weaker form of Theorem 4.1 with R/r bounded below by some function of m/r.

Proof. Write g = f − u for some u ∈ Rm. Then (BP′) reads as

min ‖u − f ‖ 1 subject to 〈u, Xk〉 = 0, ∀k.

The constraints here define a random (n = m − R)- dimensional subspace Y of Rm. Now apply Theorem 1. with y = 0 and y′^ = f. It states that the unique solution to the minimization problem above is u = 0. Therefore, the unique solution to (BP′) is f.

4.2 Compressible functions

In a larger class of compressible functions [16], we can only hope for an approximate reconstruction. This is a class of functions f that are well compressible by a known or- thogonal transform, such as Fourier or wavelet. This means that the coefficients of f with respect to a certain known orthogonal basis have a power decay:

f ∗(s) ≤ s−^1 /p, s = 1,... , m (4.1)

where f ∗^ denotes a nonincreasing rearrangement of f. Many natural signals are compressible for some 0 < p < 1 , such as smooth signals and signals with bounded variations (see [10]), Theorem 4.1 implies, by the argument of [10], that functions compressible in some basis can be approxi- mately reconstructed from few fixed linear measurements. This is an improvement of a result of Donoho [16].

Corollary 4.2 (Uniform Approximate Reconstruction). Let m and r be positive integers. The independent standard Gaussian vectors Xk in Rm^ satisfy the following with prob- ability at least 1 − e−cR. Assume that an unknown func- tion f ∈ Rm^ satisfies either (4.1) for some 0 < p < 1 or ‖f ‖ 1 ≤ 1 for p = 1. Suppose that we are given R measure- ments 〈f, Xk〉. Then f can be approximately reconstructed from these measurements: a unique solution g to the Basis Pursuit problem (BP) satisfies

‖f − g‖ 2 ≤ Cp

( (^) log(m/R) R

) (^) p (^1) − (^12)

where Cp depends on p only.

Corollary 4.2 was proved by Donoho [16] un- der an additional assumption that m ∼ CRα for some α > 1. Notice that in this case log(m/R) ∼ log m. Now this assumption is removed. In [10] Corollary 4.2 was proven without the uniformity in f due to a weaker (polynomial) probability. Finally, Corollary 4.2 also improves upon the approximation error (there is now the ratio m/r instead of m in the logarithm).

5 Numerical Experiments

In this section, we empirically investigates the perfor- mance of our decoding strategy. Of special interest is the location of the breakpoint beyond which ` 1 fails to decode accurately. To study this issue, we performed a first series of experiments as follows:

  1. select n (the size of the input signal) and m so that with the same notations as before, A is an m by n matrix; sample A with independent Gaussian entries and select the plaintext f at random;
  2. select S as a percentage of m;
  3. select a support set T of size |T | = S uniformly at random, and sample a vector e on T with independent and identically distributed Gaussian entries, and with standard deviation about that of the coordinates of the output (Af ) (the errors are then quite large compared to the “clean” coordinates of Af )^2 ;
  4. make ˜y = Af + e, solve (P 1 ) and obtain f ?; compare f to f ?;
  5. repeat 100 times for each S, and for various sizes of n and m. (^2) The results presented here do not seem to depend on the actual distri- bution used to sample the errors.

The results are presented in Figure 1. In these experi- ments, we choose n = 128, and set m = 2n (Figure 1(a)) or m = 4n (Figure 1(b)). Our experiments show that the linear program recovers the input vector all the time as long as the fraction of the corrupted entries is less or equal to 15% in the case where m = 2n and less or equal to 35% in the case where m = 4n. We repeated these experiments for different values of n, e.g. n = 256 and obtained very similar recovery curves.

(^00) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.

1

Fraction fo corrupted entries

Frequency

Empirical frequency of exact decoding, n = 128, m = 2n

(^00) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.

1

Fraction fo corrupted entries

Frequency

Empirical frequency of exact decoding, n = 128, m = 4n

Figure 1. ` 1 -recovery of an input signal from y′^ = Af + e with A an m by n matrix with in- dependent Gaussian entries. In these experi- ments, we set n = 128. (Top) Success rate of (P 1 ) for m = 2n. (Bottom) Success rate of (P 1 ) for m = 4n. On top, exact recovery occurs as long as the corruption rate does not exceed 15%. The bottom breakdown is near 35%.

It is clear that versions of Theorem 1.2 exist for other type of random matrices, e.g. binary matrices. In the next experiment, we take the plaintext f as a binary sequence of zeros and ones (which is generated at random), and sample A with i.i.d entries taking on values in {± 1 }, each with probability 1/2. To recover f , we solve the linear program

min g∈Rn^

‖y − Ag‖` 1 subject to 0 ≤ g ≤ 1 , (5.1)

and round up the coordinates of the solution to the near- est integer. We follow the same procedure as before except that now, we select S locations of Af at random (the cor- ruption rate is again S/m) and flip the sign of the selected coordinates. We are again interested in the location of the breakpoint.

(^00) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.

1

Fraction fo corrupted entries

Frequency

Empirical frequency of exact decoding, n = 128, m = 2n

(^00) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.

1

Fraction fo corrupted entries

Frequency

Empirical frequency of exact decoding, n = 128, m = 4n

Figure 2. ` 1 -recovery of a binary sequence from corrupted data y′ ; A an m by n matrix with independent binary entries and the vec- tor of errors is obtained by randomly select- ing coordinates of Af and flipping their sign. In these experiments, we set n = 128. (Top) Success rate for m = 2n. (Bottom) Success rate for m = 4n. On top, exact recovery oc- curs as long as the corruption rate does not exceed 22.5%. The bottom breakdown is near 35%.

The results are presented in Figure 2. In these exper- iments, we choose n = 128 as before, and set m = 2n (Figure 2(a)) or m = 4n (Figure 2(b)). Our experiments show that the linear program recovers the input vector all the time as long as the fraction of the corrupted entries is less or equal to 22.5% in the case where m = 2n and less than about 35% in the case where m = 4n. We repeated these experiments for different values of n, e.g. n = 256

ing on n and m) such that accurate decoding occurs for all plaintexts and corrupted patterns (in the sense of Theorem 1.1) as long as the fraction of corrupted entries does not ex- ceed ρc. It would be of theoretical interest to identify this critical threshold, at least in the limit of large m and n, with perhaps n/m converging to a fixed ratio. From a different viewpoint, this is asking about how far does the equivalence between a combinatorial and a related convex problem hold. We pose this as an interesting challenge.

References

[1] S. Artstein. Proportional concentration phenomena on the sphere. Israel J. Math. 132: 337–358, 2002.

[2] D. Amir, and V. D. Milman. Unconditional and symmetric sets in n-dimensional normed spaces. Israel J. Math. 37: 3– 20, 1980.

[3] B. Beferull-Lozano, and A. Ortega. Efficient quantization for overcomplete expansions in Rn. IEEE Trans. Inform. Theory 49: 129–150, 2003.

[4] S. Boyd, and L. Vandenberghe. Convex Optimization. Cam- bridge University Press, 2004.

[5] P. G. Casazza, and J. Kovacevi´c. Equal-norm tight frames with erasures. Adv. Comput. Math. 18: 387–430, 2003.

[6] E. J. Cand`es, and J. Romberg, Quantitative robust un- certainty principles and optimally sparse decompositions. To appear Foundations of Computational Mathematics, November 2004.

[7] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incom- plete frequency information. To appear IEEE Transactions on Information Theory, June 2004. Available on the ArXiV preprint server: math.NA/0409186.

[8] E. J. Cand`es, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. To appear Comm. Pure Appl. Math. Available on the ArXiV preprint server: math.NA/0409186.

[9] E. J. Cand`es, and T. Tao. Near optimal signal recovery from random projections: universal encoding strategies? Submitted to IEEE Transactions on Information Theory, October 2004. Available on the ArXiV preprint server: math.CA/0410542.

[10] E. J. Cand`es, and T. Tao. Decoding by linear program- ming. Submitted, December 2004. Available on the ArXiV preprint server: math.MG/0502327.

[11] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic de- composition by basis pursuit. SIAM J. Scientific Computing 20: 33–61, 1998.

[12] I.Daubechies. Ten lectures on wavelets. SIAM, Philadelphia,

[13] D. L. Donoho. For most large underdetermined systems of linear equations the minimal ` 1 -norm solution is also the sparsest solution. Manuscript, September 2004.

[14] D. L. Donoho. For most large undetermined systems of lin- ear equations the minimal ` 1 -norm near-solution is also the sparsest near-solution. Manuscript September 2004.

[15] D. Donoho. Compressed sensing. Manuscript, September

[16] D. L. Donoho, and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ` 1 minimization. Proc. Natl. Acad. Sci. USA 100: 2197–2202 (2003).

[17] D. Donoho, M. Elad, and V. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. Manuscript, 2004.

[18] D. L. Donoho, and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Transactions on Information Theory, 47:2845–2862, 2001.

[19] D. L. Donoho, and Y. Tsaig. Extensions of compresed sens- ing. Preprint, 2004.

[20] D. Donoho, and Y. Tsaig. Breakdown of equivalence be- tween the minimal ` 1 -norm solution and the sparsest solu- tion. Preprint, 2004.

[21] N. El Karoui. New Results about Random Covariance Ma- trices and Statistical Applications. Stanford Ph. .D. Thesis, August 2004.

[22] M. Elad, and A. Bruckstein. A generalized uncertainty prin- ciple and sparse representation in pairs of bases. IEEE Trans. Inform. Theory 48: 2558–2567, 2002.

[23] J. Feldman. Decoding Error-Correcting Codes via Linear Programming. Ph.D. Thesis 2003, Massachussets Institute of Technology.

[24] J. Feldman, LP decoding achieves capacity, 2005 ACM- SIAM Symposium on Discrete Algorithms (SODA), preprint (2005).

[25] J. Feldman, T. Malkin, C. Stein, R. A. Servedio, and M. J. Wainwright, LP decoding corrects a constant fraction of errors. Proc. IEEE International Symposium on Information Theory (ISIT), June 2004.

[26] A. Feuer, and A. Nemirovski. On sparse representation in pairs of bases. IEEE Trans. Inform. Theory 49: 1579–1581,

[27] A. Yu. Garnaev, E. D. Gluskin, The widths of a Euclidean ball (Russian), Dokl. Akad. Nauk SSSR 277: 1048–1052,

  1. English translation: Soviet Math. Dokl. 30: 200–204,

[28] V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine 18(5): 9–21, 2001.

[29] V. K. Goyal. Multiple description coding: compression meets the network. IEEE Signal Processing Magazine 18(5): 74–93, 2001.

[30] V. K. Goyal, J. Kovacevic, and J. A. Kelner. Quantized frame expansions with erasures. Applied and Computational Har- monic Analysis 10: 203–233, 2001.

[31] V. K. Goyal, M. Vetterli, and N. T. Thao. Quantized over- complete expansions in RN^ : analysis, synthesis and al- gorithms, IEEE Trans. on Information Theory 44: 16–31,

[32] R. Gribonval, and M. Nielsen. Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49: 3320– 3325, 2003.

[33] Handbook of coding theory. Vol. I, II. Edited by V. S. Pless, W. C. Huffman and R. A. Brualdi. North-Holland, Amster- dam, 1998.

[34] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29: 295–327,

[35] J. Kovacevic, P. Dragotti, and V. Goyal. Filter bank frame expansions with erasures. IEEE Trans. on Information The- ory, 48: 1439–1450, 2002.

[36] M. Ledoux. The concentration of measure phenomenon. Mathematical Surveys and Monographs 89, American Mathematical Society, Providence, RI, 2001.

[37] M. A. Lifshits, Gaussian random functions. Mathematics and its Applications, 322. Kluwer Academic Publishers, Dordrecht, 1995.

[38] V. A. Marchenko, and L. A. Pastur. Distribution of eigenval- ues in certain sets of random matrices. Mat. Sb. (N.S.) 72: 407–535, 1967 (in Russian).

[39] J. Matousek. Lectures on discrete geometry. Graduate Texts in Mathematics, 212. Springer-Verlag, New York, 2002.

[40] S. Mendelson. Geometric parameters in learning theory. Ge- ometric aspects of functional analysis. Lecture Notes in Mathematics 1850: 193–235, Springer, Berlin, 2004.

[41] B. K. Natarajan. Sparse approximate solutions to linear sys- tems. SIAM J. Comput. 24: 227–234, 1995.

[42] M. Rudelson, and R. Vershynin. Geometric approach to error correcting codes and reconstruction of signals. Sub- mitted, 2005. Available on the ArXiV preprint server: math.FA/0502299.

[43] S. J. Szarek. Condition numbers of random matrices. J. Complexity 7:131–149, 1991.

[44] J. Tropp. Recovery of short, complex linear combinations via ` 1 minimization. To appear IEEE Trans. Inform. Theory.

[45] J. Tropp, Greed is good: Algorithmic results for sparse ap- proximation, IEEE Trans. Inform. Theory, 50(10): 2231- 2242, October 2004.

[46] J. Tropp. Just relax: Convex programming methods for sub- set selection and sparse approximation. ICES Report 04-04, UT-Austin, 2004.