The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Icml12
1. Learning incoherent dictionaries for sparse
approximation using iterative projections and
rotations
Daniele Barchiesi and Mark D. Plumbley
Centre for Digital Music
School of Electronic Engineering and Computer Science
Queen Mary University of London
daniele.barchiesi@eecs.qmul.ac.uk
mark.plumbley@eecs.qmul.ac.uk
30th June 2012
2. Overview
Background
Dictionary learning model and algorithms
Learning incoherent dictionaries
Previous work
Learning incoherent dictionaries using iterative projections and
rotations
Constructing Grassmannian frames using iterative projections
The rotation step
Iterative projections and rotation algorithm
Numerical experiments
Incoherence results, comparison with existing methods
Sparse approximation results
Conclusions and future research
Proposed applications
Summary
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
3. Background: Dictionary Learning
Problem Definition
Let {ym ∈ RN }M be a set of M observed signals of dimension N. The
m=1
goal of dictionary learning is to express:
Y ≈ ΦX
where Y contains the signals along its columns, Φ is a dictionary
containing unit norm atoms and every column of X contains at most S
non-zero coefficients.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
4. Background: Dictionary Learning
Problem Definition
Let {ym ∈ RN }M be a set of M observed signals of dimension N. The
m=1
goal of dictionary learning is to express:
Y ≈ ΦX
where Y contains the signals along its columns, Φ is a dictionary
containing unit norm atoms and every column of X contains at most S
non-zero coefficients.
Optimisation
ˆ ˆ 2
(Φ, X) = arg min ||Y − ΦX||2
Φ,X
such that ||xm ||0 ≤ S ∀m
The problem is not convex even if the 0 pseudo-norm is relaxed by the 1 norm
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
5. Background: Dictionary Learning Algorithms
Optimisation Strategy
Start from an initial dictionary Φ(0)
Repeat for t = {1, . . . , T } iterations:
Sparse coding : given a fixed dictionary Φ(t) , find a sparse
approximation X(t) with any suitable algorithm.
Dictionary update : given X(t) , update the dictionary Φ(t+1) to
minimise the DL objective (possibly subject to
additional constraints).
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
6. Background: Dictionary Learning Algorithms
Optimisation Strategy
Start from an initial dictionary Φ(0)
Repeat for t = {1, . . . , T } iterations:
Sparse coding : given a fixed dictionary Φ(t) , find a sparse
approximation X(t) with any suitable algorithm.
Dictionary update : given X(t) , update the dictionary Φ(t+1) to
minimise the DL objective (possibly subject to
additional constraints).
Previous Work
Methods for dictionary learning include:
Probabilistic models [Lewicki and Sejnowski]
Method of optimal directions ( mod) [Engan et al.]
k-svd [Aharon et al.]
Online learning [Mairal et al.]
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
7. Learning Incoherent Dictionaries
Mutual Coherence
The coherence of a dictionary expresses the similarity between atoms or
groups of atoms in the dictionary. The mutual coherence is defined as:
def
µ(Φ) = max φi , φj
i=j
Results on sparse recovery link the performance of sparse approximation
algorithms to the coherence of the dictionary. For over-complete
approximations, low µ leads to recovery guarantees.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
8. Learning Incoherent Dictionaries
Mutual Coherence
The coherence of a dictionary expresses the similarity between atoms or
groups of atoms in the dictionary. The mutual coherence is defined as:
def
µ(Φ) = max φi , φj
i=j
Results on sparse recovery link the performance of sparse approximation
algorithms to the coherence of the dictionary. For over-complete
approximations, low µ leads to recovery guarantees.
Goal
The objective is to learn dictionaries that are both:
Well adapted to a set of training data Y
Mutually incoherent
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
9. Learning Incoherent Dictionaries
Advantages
Advantages of incoherent dictionaries include:
Sub-dictionaries have low condition number and their
(pseudo)inverse computed by many sparse approximation algorithms
is well-posed.
Convergence of greedy algorithms is faster for incoherent dictionaries
(experimental results).
Application-oriented intuitions (future work).
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
10. Learning Incoherent Dictionaries
Advantages
Advantages of incoherent dictionaries include:
Sub-dictionaries have low condition number and their
(pseudo)inverse computed by many sparse approximation algorithms
is well-posed.
Convergence of greedy algorithms is faster for incoherent dictionaries
(experimental results).
Application-oriented intuitions (future work).
Previous Work
Method of coherence-constrained directions ( mocod) [Sapiro et al.]
Incoherent k-svd ( ink-svd) [Mailh´ et al.]
e
Parametric dictionary design for sparse coding [Yaghoobi et al.]
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
11. Incoherent Dictionary Learning: Previous Work
mocod
Uncostrained, penalised optimisation:
ˆ ˆ
(Φ, X) =arg min ||Y − ΦX||2 + τ
F log(|xkm | + β)+
Φ,X
m,n
K
2
+ ζ ||G − I||2 + η
F ||φk ||2 − 1
2
k=1
where the factor multiplied by τ promotes sparsity and the factors multiplied by
ζ and η promote incoherence and unit-norm atoms.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
12. Incoherent Dictionary Learning: Previous Work
mocod
Uncostrained, penalised optimisation:
ˆ ˆ
(Φ, X) =arg min ||Y − ΦX||2 + τ
F log(|xkm | + β)+
Φ,X
m,n
K
2
+ ζ ||G − I||2 + η
F ||φk ||2 − 1
2
k=1
where the factor multiplied by τ promotes sparsity and the factors multiplied by
ζ and η promote incoherence and unit-norm atoms.
ink-svd
Greedy algorithm that includes a dictionary de-correlation step after a
k-svd dictionary update:
Find pairs of coherent atoms
De-correlate atoms two-by-two
Repeat until a target mutual coherence is reached
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
13. ipr Algorithm: constructing Grassmannian frames
A Grassmannian frame is a dictionary with minimal mutual coherence.
K −N
For a N × K dictionary, µ ≥ N(K −1) and this bound can be reached
only for some (N, K ) pairs.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
14. ipr Algorithm: constructing Grassmannian frames
A Grassmannian frame is a dictionary with minimal mutual coherence.
K −N
For a N × K dictionary, µ ≥ N(K −1) and this bound can be reached
only for some (N, K ) pairs.
Iterative Projections Algorithm
Start from an initial dictionary Φ(0)
def T
Calculate its Gram matrix G(0) = Φ(0) Φ(0)
Repeat for t = {0, . . . , T − 1} iterations:
Project Gram matrix onto the structural constraint set
def
Kµ0 = {K : K = KT , diag(K) = 1, max |ki,j | ≤ µ0 }.
i>j
Project Gram matrix onto the spectral constraint set
def
F = F : F = FT , eig(F) ≥ 0, rank(F) ≤ N
T
Factorise the Gram matrix as Φ(T −1) Φ(T −1) = G(T −1)
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
15. ipr Algorithm: the rotation step
Idea!
The factorisation at the end of the iterative projection algorithm is not
unique, since for any orthonormal matrix W
T
(WΦ) (WΦ) = ΦT WT WΦ = ΦT Φ
Therefore, we can optimise an orthonormal matrix for the DL objective!
This is an (improper) rotation of the dictionary Φ.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
16. ipr Algorithm: the rotation step
Idea!
The factorisation at the end of the iterative projection algorithm is not
unique, since for any orthonormal matrix W
T
(WΦ) (WΦ) = ΦT WT WΦ = ΦT Φ
Therefore, we can optimise an orthonormal matrix for the DL objective!
This is an (improper) rotation of the dictionary Φ.
Dictionary Rotation
ˆ
W = arg min ||Y − WΦX||F
W:WT W=I
A closed-form solution to this problem can be found by computing the
def
svd decomposition of the covariance matrix C = ΦXYT = UΣV T
and setting:
ˆ
W = VUT
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
17. Iterative Projections and Rotations algorithm
Start from a dictionary Φ(0) returned by the dictionary update step of any
DL algorithm.
Repeat for t = {0, . . . , T − 1} iterations:
T
Calculate the Gram matrix: G(t) ← Φ(t) Φ(t)
Project Gram matrix onto the structural constraint set:
diag(G) ← 1
G ← Limit(G, µ0 )
Factorise Gram matrix and project it onto the spectral constraint set
[Q, Λ] ← evd(G)
Λ ← Thresh(Λ, N)
Φ ← Λ1/2 QT
Rotate the dictionary
C ← Y(ΦX)T
[U, Σ, V] ← svd(C)
W ← VUT
Φ ← WΦ
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
18. Numerical Experiments: The SMALLBox framework
The SMALLBox is a Matlab framework for benchmarking and developing
dictionary learning algorithms developed by a team at Queen Mary
University of London.
Latest version can be downloaded from
http://code.soundsoftware.ac.uk/
SMALLBox integrates many third-party toolboxes such as Sparco,
SparseLab, CVX, SPAMS, etc.
SMALLBox provides a unique interface for different DL algorithms
that can be used for benchmark
The new distribution of SMALLBox allows to program add-ons to
expand the functionalities of the framework without interfering with
the core code.
IncoherentDL is a SMALLBox add-on and can be used to reproduce
some of the results presented in this talk.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
19. Numerical Experiments: Mutual coherence vs residual norm
Test Conditions
Tests on a 16kHz guitar audio signal divided in N = 256 long
overlapping blocks
A fixed number of active atoms was chosen S = 12 (around 5% of
the dimension N).
A twice-overcomplete dictionary was initialised with either:
Randomly selected samples from the training set.
An over-complete Gabor frame.
DL algorithms were run for 50 iterations.
Test Objective
The mutual coherence achieved by every learned dictionary is paired with
the approximation error defined as:
||Y||F
snr(Φ, X) = 20 log10 .
||Y − ΦX||F
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
23. Numerical Experiments: Sparse Approximation
Test Conditions
Matching pursuit algorithm ( mp) run for 1000 iterations on the
following signals:
Training set.
Different guitar recording taken from the rwc database.
A piano recording taken from the rwc database.
Dictionaries with different mutual coherences were selected as
returned from the ipr algorithm with data initialisation.
Test Objective
The norm of the residual in decibel defined as:
20 log10 ||y − Φx||2
is computed and averaged over the number of signals M and 10
dictionaries resulting from independent trials of the learning algorithm.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
24. Numerical Experiments: Training set approximation
guitar − training signal
0
−20
average residual norm (dB)
−40
−60
µ = 0.72
−80
µ = 0.37
µ = 0.19
−100 µ = 0.1
µ = 0.06
−120
0 200 400 600 800 1000
number of iterations
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
25. Numerical Experiments: Guitar approximation
guitar − test signal
0
−20
average residual norm (dB)
−40
−60
µ = 0.72
−80 µ = 0.37
µ = 0.19
−100 µ = 0.1
µ = 0.06
−120
0 200 400 600 800 1000
number of iterations
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
26. Numerical Experiments: Piano approximation
piano
0
−20
average residual norm (dB)
−40
−60
µ = 0.72
−80 µ = 0.37
µ = 0.19
−100 µ = 0.1
µ = 0.06
−120
0 200 400 600 800 1000
number of iterations
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
27. Conclusions: Possible Applications
Morphological Component Analysis
Morphological component analysis is a dictionary learning approach to
classification.
Different dictionaries are learned on morphologically dissimilar
training sets (e.g., edges and textures, percussive and steady state
sounds)
A test signal is classified according to the support or magnitude of
the coefficients of its sparse approximation (i.e., what is the best
dictionary to represent it?)
ipr could be used to enforce incoherence between the atoms belonging
to different morphological components and enhance classification and
separation performance.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
28. Conclusions: Possible Applications
Blind Compressed Sensing
Blind compressed sensing generalises compressed sensing to the case of
an unknown dictionary that generates the signals to be recovered.
A set of observations Z is acquired through the known measurement
matrix M. Z = MY = MΦX
Dictionary learning is used to optimise Ψ and factorize the observed
data as Z ≈ ΨX.
ˆ
The learned dictionary is factorized as the product Ψ ≈ MΦ and
ˆ ˆ
the signals reconstructed as Y = ΦX.
The two factorisations are not unique and strong constraints on Φ are
assumed to correctly reconstruct the signals. ipr might be used to
constrain the factorisations and lead to a less ambiguous solution.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
29. Conclusions: Summary
The ipr algorithm can be used to learn dictionaries that are both
adapted to a training set and mutually incoherent.
The ipr algorithm can be used as a de-correlation step in any
dictionary learning algorithm.
Experimental data show that ipr performed generally better than
benchmark techniques on audio signals.
Incoherent dictionaries are useful for sparse approximation and could
be used in a number of potential applications.
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries
30. Conclusions: Summary
The ipr algorithm can be used to learn dictionaries that are both
adapted to a training set and mutually incoherent.
The ipr algorithm can be used as a de-correlation step in any
dictionary learning algorithm.
Experimental data show that ipr performed generally better than
benchmark techniques on audio signals.
Incoherent dictionaries are useful for sparse approximation and could
be used in a number of potential applications.
Thank you for your attention
and for any question!
D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries