Bayesian inference with error variable splitting and sparsity enforcing priors for linear inverse problems, Slides for Applied Statistics. Université Paris Sud (Paris XI)
reza-djafari
reza-djafari

Bayesian inference with error variable splitting and sparsity enforcing priors for linear inverse problems, Slides for Applied Statistics. Université Paris Sud (Paris XI)

23 pages
30Number of visits
Description
IMA Conference on Inverse problems, Center for Mathematical Sciences, Cambridge
20 points
Download points needed to download
this document
Download the document
Preview3 pages / 23
This is only a preview
3 shown on 23 pages
Download the document
This is only a preview
3 shown on 23 pages
Download the document
This is only a preview
3 shown on 23 pages
Download the document
This is only a preview
3 shown on 23 pages
Download the document
Bayesian inference with error variable splitting and sparsity enforcing priors for linear inverse problems

.

Bayesian inference with error variable splitting and sparsity enforcing priors for linear inverse problems

Ali Mohammad-Djafari, Mircea Dumitru Laboratoire des Signaux et Systèmes (L2S)

UMR8506 CNRS-CentraleSupélec-UNIV PARIS SUD SUPELEC, 91192 Gif-sur-Yvette, France

http://lss.centralesupelec.fr Email: djafari@lss.supelec.fr

http://djafari.free.fr http://publicationslist.org/djafari

IMA Conference on Inverse problems, Center for Mathematical Sciences, Cambridge

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 1/23

Contents

1. Classical Linear Inverse Problems

2. Variable splitting or How to account for all uncertainties

3. State of the art Regularization methods for the simple case

4. State of the art Bayesian methods for the simple case

5. Hierarchical models for more robustness

6. Proposed model with direct sparsity

7. Proposed model with sparsity in the Transform domain

8. Applications

9. Conclusions

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 2/23

Linear models

I Signal processing (Deconvolution)

g(t) = ∫

f (τ)h(t − τ) dτ

I Image processing (Image restoration)

g(x , y) = ∫

f (x ′, y ′)h(x − x ′, y − y ′) dx ′ dy ′

I Computed imaging (X-ray Tomography)

g(r , φ) = ∫

f (x , y)δ(r − x cos φ− y sin φ) dx dy

I Machine learning: Regression, Dictionary based modeling, ...

g = Hf + e,

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 3/23

Linear Inverse Problems: Regularization methods

g = Hf + e,

I Regularization (L2, L1, TV, ...)

J(f) = 1

2 ‖g−Hf‖22 + λR(f)

R(f) =

{ ‖f‖22, ‖f‖

β β, ‖Df‖

2 2, ‖Df‖

β β, ∑

j

φ([Df ]j )

} , 1 ≤ β ≤ 2

I Optimization algorithms I R(f) quadratic: Gradient based, CG I R(f) nonquadratic, but convex and derivable:

Gradient based, Conjugate Gradient (CG) I R(f) convex but non-derivable: sub-gradient,

Chambol-Pollack, ... I R(f) convex: all the convex optimization algorithms

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 4/23

Linear Inverse Problems: Regularization methods

g = Hf + e, I L2 or quadratic:

J(f) = 12‖g−Hf‖22 + λ‖Df‖22 Analytic solution: f̂ = (H′H+ λD′D)−1H′g Gradient based: ∇J(f) = −H′(g−Hf) + 2λD′Df

I L1 : convex but not derivable at zero:

J(f) = 12‖g−Hf‖22 + λ‖Df‖1 Fenchel conjugate, Dual problem, subgradient, proximal operator, ...

I Variable splitting and Augmented Lagrangian

(f, ẑ) = arg min f ,z

{ 1

2 ‖g−Hf‖22 + λ‖z‖1 + q‖z‖22

} s.t. z = Df

I A great number of optimization algorithms have been proposed: Chambol-Pollack, ADMM, ISTA, FISTA, ...

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 5/23

Linear Inverse Problems: Bayesian approach

g = Hf + e,I Bayese rule

p(f |g) = p(g|f)p(f) p(g)

with p(g) = ∫

p(g|f)p(f) df

I Gaussian noise:

p(g|f) = N (g|Hf, veI) ∝ exp [ −1 2ve ‖g−Hf‖22

] I Prior model:

p(f) ∝ exp [ −1 2vf ‖f‖22

] , exp

[ −1 2vf ‖Df‖22

] , exp

[ −1 2vf ‖f‖1

] , ...

I MAP estimate: J(f) = ‖g−Hf‖22 + λR(f):

p(f |g) ∝ exp [ −1 2ve

J(f)

] → f̂MAP = arg max

f {p(f |g)} = arg min

f {J(f)}

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 6/23

Variable splitting or How to account for all uncertainties

Go beyond the classical forward model:

g = Hf + e −→ g = Hf + ξ + e, g = Hf + ξ + u+ e

Variable splitting: Different interpretations

Case 1 : {

g = g0 + e, g0 = Hf + ξ

Case 4 : {

g = g0 + e, g0 = H(f + f0) + ξ

Case 2 : {

g = g0 + e, g0 = Hf + u+ ξ

Case 5 : {

g = g0 + u+ e, g0 = Hf + ξ

Case 3 : {

g = g0 + e, g0 = (H+ δH)f + ξ,

Case 6 :

 g = g0 + e, g0 = Hf + u+ ξ, u = δHf + ζ.

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 7/23

State of the art Regularization methods for the simple case

g = Hf + e

Regularization (or MAP):

J(f) = 1

2 ‖g−Hf‖22 + λR(f)

R(f) =

{ ‖f‖ββ, ‖Df‖

β β, ∑

j

φ([Df ]j )

} Optimization algorithms:

I Gradient based (Steepest descent, CG, ...)

f(k+1) = f(k) + α(k)[H′(g−Hf(k))− λ∇R(f(k))] I Augmented Lagrangian (ADMM):

Minimize J(f, g0) = 1 2‖g− g0‖22 + λR(f) subject to g0 = Hf

I Bregman convex optimization (ISTA, FISTA,...): Minimize J(f) = 12‖g−Hf‖22 + λR(f) subject to R(f) ≤ q‖f‖2

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 8/23

State of the Art Bayesian methods for the simple case g = Hf + e

Supervized Gaussian case:

{ p(g|f) = N (g|Hf, veI) p(f) = N (f |0, vf I)

 p(f |g) = N (f |̂f, Σ̂) f̂ = [H′H+ λI]−1H′g

Σ̂ = ve[H ′H+ λI]−1, λ = vevf

Unsupervised Gaussian

 p(g|f, ve) = N (g|Hf, veI) p(f |vf ) = N (f |0, vf I) p(ve) = IG(vf |αe0 , βe0) p(vf ) = IG(vf |αf0 , βf0)



p(f |g, ve, vf ) = N (f |̂f, Σ̂) f̂ = [H′H+ λ̂I]−1H′g

Σ̂ = v̂e[H ′H+ λ̂I]−1, λ̂ = v̂ev̂f

p(ve|g, f) = IG(ve|α̃e, β̃e) p(vf |g, f) = IG(vf |α̃f , β̃f ) α̃e, β̃e, α̃f , β̃f

Different approximate computation tools: JMAP, VBA, Gibbs sampling MCMC A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 9/23

State of the Art Bayesian methods for the simple case p(g|f, ve) = N (g|Hf, veI) p(f |vf ) = N (f |0, vf I) p(ve) = IG(vf |αe0 , βe0) p(vf ) = IG(vf |αf0 , βf0)



p(f |g, ve, vf ) = N (f |̂f, Σ̂) f̂ = [H′H+ λ̂I]−1H′g

Σ̂ = v̂e[H ′H+ λ̂I]−1, λ̂ = v̂ev̂f

p(ve|g, f) = IG(ve|α̃e, β̃e) p(vf |g, f) = IG(vf |α̃f , β̃f )

p(f, ve, vξ |g) ∝ exp [ −J(f, ve, vξ)

] I JMAP: Alternate optimization with respect to f, ve, vf :

J(f, ve, vf ) = 1 2ve ‖g−Hf‖22 + 12vf ‖f‖

2 2+

(αe0 + 1) ln ve + βe0 ve

+ (αf0 + 1) ln vf + βf0 vf

I Gibbs sampling MCMC:

f ∼ p(f, ve, vξ |g)→ ve ∼ p(ve|g, f)→ vf ∼ p(vf |g, f) I Variational Bayesian Approximation: Approximate

p(f, ve, vξ |g) by a separable one q(f, ve, vξ) = q1(f)q2(ve)q3(vξ) minimizing KL(q|p).

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 10/23

Hierarchical models for more robustness{ g = Hf + e, f = Dz+ ζ, z sparse modeled by Double Exp (DE)

Supervized Gaussian case: p(g|f) = N (g|Hf, veI) p(f |z) = N (f |Dz, vξI)→ p(z) = DE(f |γ) ∝ exp [−γ‖z‖1]

 p(f, z|g) ∝ exp [−J(f, z)] J(f, z) = 12ve ‖g−Hf‖

2 2+

1 2vξ ‖f −Dz‖22+

γ‖z‖1Unsupervized Gaussian case:

p(g|f) = N (g|Hf, veI) p(f |z) = N (f |Dz, vξI) p(z) ∝ exp [−γ‖z‖1]→ p(γ) = IG(γ|αγ0, βγ0) p(ve) = IG(ve|αe0 , βe0) p(vξ) = IG(vξ |αξ0 , βξ0)



p(f, z,γ, ve, vξ |g) ∝ exp [ −J(f, z, γ, ve, vξ)

] J(f, z ve, vξ , γ) =

1 2ve ‖g−Hf‖22+

1 2vξ ‖f −Dz‖22 + γ‖z‖1+

(αγ0 + n/2) ln γ + βγ0/γ+ (αe0 +m/2) ln ve + βe0/ve+ (αξ0 + n/2) ln vξ + βξ0/vξ

Alternate optimization of this criterion gives ADMM like algorithms Main advantage: direct updates of the hyperparameters

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 11/23

Hierarchical models for more robustness g = Hf + e,

f = Dz+ ζ, z sparse Student→ { p(z j |vz j ) = N (z j |0, vz j ), p(vz j ) = IG(vz j |αz0 , βz0)

p(g|f) = N (g|Hf, veI) p(f |z) = N (f |Dz, vξI) p(z|vz ) = N (z|0,Vz )→ p(vz ) = ∏j IG(vz j |αz0 , βz0) p(ve) = IG(ve|αe0 , βe0) p(vξ) = IG(vξ |αξ0 , βξ0)



p(f, z, vz , ve, vξ |g) ∝ exp [ −J(f, z, vz , ve, vξ)

] J(f, z, vz , ve, vξ) =

1 2ve ‖g−Hf‖22+

1 2vξ ‖f −Dz‖22 + ‖Vz

−1 2 z‖22+

∑j (αz0 + 1) ln vz j + βz0/vz j+ (αe0 +m/2) ln ve + βe0/ve+ (αξ0 + n/2) ln vξ + βξ0/vξ

Main advantages:

I Quadratic optimization with respect to f and z

I Direct updates of the hyperparameters ve and vξ I Possibility to compute posterior mean and quantify

uncertainties analytically via VBA

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 12/23

Non stationary noise and sparsity enforcing prior in the same framework

g = Hf + e, e non stationary→ { p(ei |v ei ) = N (ei |0, v ei ), p(v ei ) = IG(v ei |αe0 , βe0)

f = Dz+ ζ, z sparse Student→ { p(z j |vz j ) = N (z j |0, vz j ), p(vz j ) = IG(vz j |αz0 , βz0)

p(g|f) = N (g|Hf,Ve) p(f |z) = N (f |Dz, vξI) p(z|vz ) = N (z|0,Vz )→ p(vz ) = ∏j IG(vz j |αz0 , βz0) p(ve) = ∏i IG(v ei |αe0 , βe0) p(vξ) = IG(vξ |αξ0 , βξ0)



p(f, z, vz ,ve, vξ |g) ∝ exp [ −J(f, z, vz , ve, vξ)

] J(f, z, vz ,ve, vξ) = ‖V

−1 2

e (g−Hf)‖22+ 1 2vξ ‖f −Dz‖22 + ‖Vz

−1 2 z‖22+

∑j (αz0 + 1) ln vz j + βz0/vz j+ ∑i (αe0 + 1) ln v ei + βe0/v ei+ (αξ0 + n/2) ln vξ + βξ0/vξ

Main advantages: I Quadratic optimization with respect to f and z I Direct updates of the hyperparameters I Possibility to compute posterior mean and quantify

uncertainties analytically via VBA A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 13/23

Variable splitting or How to account for all uncertainties

Standard case:

g = Hf + e→ {

g = g0 + e, g0 = Hf

Error splitting:

g = Hf + ξ + e→ {

g = g0 + e, measurement error g0 = Hf + ξ, modeling error

or even

g = Hf + u+ ξ + e→ {

g = g0 + u+ e, u background g0 = Hf + ξ

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 14/23

New Proposed Method 1:

I Error terme variable splitting (Gaussian + Student-t)

I Direct Sparsity enforcing prior: (f, vf ) Normal-IG, f Student-t

αf0 , βf0 αe0 , βe0αξ0 , βξ0

vf

f

g0

g

ve

vξ

H

g = g0 + e

g0 = Hf + ξ

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 15/23

New Proposed Method 1: Error terme variable splitting (Gaussian + Student-t) Direct Sparsity enforcing prior: (f, vf ) Normal-IG, f Student-t

I g = g0 + e, : e Gaussian:

p(g|g0, ve) = N (g|g0, veI), p(ve) = IG(ve|αe0 , βe0),

I g0 = Hf + ξ, ξ Student:{ p(g0|f, vξ) = N (g0|Hf,Vξ), Vξ = diag

{ vξ }

,

p(vξ) = ∏Mi=1 p(vξ i ) = ∏ M i=1 IG(vξ i |αξ0 , βξ0),

I f Sparse:{ p(f |vf ) = N (f |0,Vf ), Vf = diag {vf } p(vf ) = ∏Nj=1 p(v fj ) = ∏

N j=1 IG(v fj |αf0 , βf0)

which results in: p(f, g0, ve, vξ , vf |g) ∝ exp

[ −J(f, g0, ve, vξ , vf )

] with

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 16/23

J(f, g0, ve, vξ , vf ) = 1 2ve ‖g− g0‖22 + (αe0 + 1) ln ve +

βe0 ve

+ 12‖V −1/2 ξ (g0 −Hf) ‖22 + ∑

M i=1

[ (αξ0 + 1) ln vξi +

βξ0 vξi

] + 12‖V

−1/2 f f‖22 + ∑

N j=1

[ (αf0 + 1) ln v fj +

βf0 v fj

] Alternate optimization:

I with respect to f: J(f) = 12‖V −1/2 ξ (g0 −Hf) ‖

2 2 +

1 2vζ ‖f −Dz‖22

I with respect to g0: J(g0) =

1 2ve ‖g− g0‖22 +

1 2‖V

−1/2 ξ (g0 −Hf) ‖

2 2

I with respect to ve: J(ve) = 1 2ve ‖g− g0‖22 + (αe0 + 1) ln ve +

βe0 ve

I with respect to vξi :

J(vξi ) = 1 2‖V

−1/2 ξ (g0 −Hf) ‖

2 2 +

M

∑ i=1

[ (αξ0 + 1) ln vξi +

βξ0 vξi

] I with respect to v fj :

J(v fj ) = 1 2‖Vz

−1/2f‖22 + ∑ N j=1

[ (αf0 + 1) ln v fj +

βf0 v fj

] A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 17/23

New Proposed Method 2:

I Error terme variable splitting (Gaussian + Student-t)

I Sparsity enforcing in Transformed domain: f = Dz+ ζ, z sparse, (z, vz ) Normal-IG, z Student-t

αz0 , βz0 αe0 , βe0αζ0 , βζ0αξ0 , βξ0

vz

z

f

g0

g

ve

vξ

vζ

D

H

g = g0 + e

g0 = Hf + ξ

f = Dz+ ζ

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 18/23

New Proposed Method 2: Error terme variable splitting (Gaussian + Student-t)+ Sparsity enforcing in Transformed domain: f = Dz+ ζ, z sparse, (z, vz ) Normal-IG, z Student-t

I g = g0 + e, : e Gaussian:

p(g|g0, ve) = N (g|g0, veI), p(ve) = IG(ve|αe0 , βe0), I g0 = Hf + ξ, ξ Student:{

p(g0|f, vξ) = N (g0|Hf,Vξ), Vξ = diag { vξ }

,

p(vξ) = ∏Mi=1 p(vξ i ) = ∏ M i=1 IG(vξ i |αξ0 , βξ0),

I f = Dz+ ζ, ζ Gaussian:

p(f |z, vζ) = N (f |Dz, vζI), p(vζ) = IG(vζ |αζ0, βζ0), I z Sparse:{

p(z|vz ) = N (z|0,Vz ), Vz = diag {vz} p(vz ) = ∏Nj=1 p(vz j ) = ∏

N j=1 IG(vz j |αz0 , βz0)

which results in: p(f, g0, z, ve, vξ , vz |g) ∝ exp

[ −J(f, g0, z, ve, vξ , vz )

] with

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 19/23

J(f, g0, z, ve, vξ , vz ) = 1 2ve ‖g− g0‖22 + (αe0 + 1) ln ve +

βe0 ve

+ 12‖V −1/2 ξ (g0 −Hf) ‖22 + ∑

M i=1

[ (αξ0 + 1) ln vξi +

βξ0 vξi

] + 12vζ ‖f −Dz‖

2 2 + (αζ0 + 1) ln vξ +

βζ0 vξ

+ 12‖Vz −1/2z‖22 + ∑Nj=1

[ (αz0 + 1) ln vz j +

βz0 vz j

] Alternate optimization: I with respect to f: J(f) = 12‖V

−1/2 ξ (g0 −Hf) ‖

2 2 +

1 2vζ ‖f −Dz‖22

I with respect to g0: J(g0) =

1 2ve ‖g− g0‖22 +

1 2‖V

−1/2 ξ (g0 −Hf) ‖

2 2

I with respect to z: J(z) = 12vζ ‖f −Dz‖ 2 2 +

1 2‖Vz

−1/2z‖22

I with respect to ve: J(ve) = 1 2ve ‖g− g0‖22 + (αe0 + 1) ln ve +

βe0 ve

I vξi : J(vξi ) = 1 2‖V

−1/2 ξ (g0 −Hf) ‖

2 2 +

M

∑ i=1

[ (αξ0 + 1) ln vξi +

βξ0 vξi

] I with respect to vζ : J(vζ) =

1 2vζ ‖f −Dz‖22 + (αζ0 + 1) ln vξ +

βζ0 vξ

I to vz j : J(vz j ) = 1 2‖Vz

−1/2z‖22 + ∑ N j=1

[ (αz0 + 1) ln vz j +

βz0 vz j

] A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 20/23

Main advantages and limitations

Advantages:

I All the optimization are either quadratic or explicite

I Quadratic optimizations can be done efficiently

I For great dimensional problems, the needed operators H, H′, D and D′ can be implemented on GPU

I For Computed Tomography, efficient GPU implementation of these operators have been done in our group for 2D and 3D CT.

Limitations:

I Huge amount of memory is needed for f, g0, vξ and vz I No easy way to study the global convergency of the algorithm.

I The number of hyper-hyperparameters (αe0 , βe0), (αξ0 , βξ0), (αζ0, βζ0), (αz0 , βz0) to be fixed is important. However, the results are not so sensitive to these parameters. See however the presentation of Mircea Dumitru.

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 21/23

Applications:

I Biological time series analysis: Estimating periodic components in very short time biological time series [MD, AMD]

I Signal deconvolution in Mass Spectrometry [AMD]

I Image restoration in RAMAN Mass Spectrometry or in Radio Astronomy [AMD]

I 2D and 3D Industrial Computed Tomography for Non Destructive Testing Application [CC,AMD]

I Low dose and limited angle CT for biological or medical applications [LW,CC,MD,AMD]

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 22/23

I Thanks for the invitation.

I Thanks for your attention.

I Questions, Comments?

A. Mohammad-Djafari, Bayesian inference with error variable splitting ..., IMA 2017, Cambridge 23/23

no comments were posted
This is only a preview
3 shown on 23 pages
Download the document