1. Introduction. Sparse representation is a popular mathematical tool for many applications in areas such as geophysics [33], statistics [34], signal processing [8] and computer vision [23]. Given a matrix called the dictionary and a vector
, the problem of sparse recovery is to find a vector
with the fewest nonzero entries that solves the under-determined linear system b = Ac. In the field of compressive sensing, tools from signal processing, numerical optimization, statistics and applied mathematics have significantly advanced the theoretical understanding of the sparse recovery problem [7, 3, 19]. The key insight is that it is often possible to recover an N-dimensional sparse signal c from
) linear measurements b = Ac (i.e., the row vectors of A are viewed as the means to obtain linear measurements). It is now well-known that such a c is the unique sparsest solution to b = Ac if A satisfies the spark condition [12]. Moreover, the reconstruction of such a sparse vector c is not only feasible in theory but also computationally efficient via numerical algorithms such as Basis Pursuit (BP) [8] and Orthogonal Matching Pursuit (OMP) [29]. Specifically, such algorithms are guaranteed to recover c if the dictionary A satisfies the incoherence [35, 13] or restricted isometry [6, 5, 10, 24, 25, 4] properties.
The mathematical theory for sparse recovery triggered the design of new techniques for broader applications [30]. One of the most successful instances is the task of learning parsimonious representations of high-dimensional data such as images, videos, audios and genetics [40, 23]. For such data, the intrinsic dimension is often much smaller than the dimension of the ambient space. For example, while face images taken by a mega-pixel camera are million-dimensional, images of a particular face exhibit low-dimensional structure since their variations can be described by a few factors such as the pose of the head, the expression of the face and the lighting conditions. To see how low-dimensionality is connected to sparse recovery, let A be a data matrix with each column corresponding to a data point, and assume that a subset of the columns is drawn from a low-dimensional linear subspace. Then, for any data point b drawn from this subspace, there exist sparse solutions to the linear system b = Ac whose nonzero entries correspond to columns of A that lie in the same subspace as b; we say that such a sparse solution c is subspace-preserving. Finding subspace-preserving solutions plays a central role in a range of learning tasks such as classification [41, 17], clustering [16, 14, 45, 42, 37], outlier detection [31, 46], and subset selection [15, 43].
As a concrete example, consider the task of face recognition where the goal is to classify a given face image to its correct group by using a collection of face images as a training set. For each face, training images are collected under various illumination conditions. Each image for each face is represented as a column vector that contains its pixel intensity values, and this collection of vectors becomes the columns of the dictionary A. Since the images of a given face under various illumination conditions lie approximately in a low-dimensional linear subspace [1], the columns of A approximately lie in a union of subspaces with each subspace corresponding to one of the faces. In the test phase, a test image b is expressed as a sparse linear combination of the training images, i.e. b = Ac. Owing to the subspace structure, the representation c is expected to be subspace-preserving, whereby the label of b is given by the label of the training images that correspond to nonzero entries in c. This method for face recognition is known as the sparse representation-based classification (SRC) [41].
The subspace-preserving recovery problem (the focus of this paper) needs to be clearly distinguished from the sparse signal recovery problem extensively studied in the literature [7, 3, 19]. Although both problems concern solving a sparse recovery problem, their objectives and standard for correctness are different. In sparse signal recovery, the goal is to recover a sparse signal c from the measurements b = Ac, and the computation is carried out under the presumption that c is the unique sparsest solution. Consequently, the uniqueness of the sparsest solution to b = Ac is a fundamental property in the theoretical study of sparse signal recovery. This is in sharp contrast to subspace-preserving recovery for which the uniqueness generally does not hold. Instead of aiming to recover a unique sparsest solution, subspace-preserving recovery is regarded as successful when the nonzero entries in the recovered solution correspond to columns of A that lie in the same subspace as b. This sets the stage for the development of radically new theory for subspace-preserving recovery.
1.1. Motivation for the proposed research. It is surprising that sparse recovery methods such as SRC work well for subspace structured data since the spark, coherence, and restricted isometry conditions are often violated. For example, if the training set contains two images of the same face taken under similar illumination conditions, then the coherence of the dictionary is close to one. The gap between the existing theory and the empirical success of sparse recovery for subspace structured
data calls for the study of new theoretical guarantees for subspace-preserving recovery.
1.2. Problems studied. Let A := be a finite dictionary com- posed of a subset of points
that span a
-dimensional subspace
, with the remaining points
:=
being outside the subspace
. We use A to denote the matrix that contains all points from A as its columns, and likewise for
and
. We assume throughout that all vectors in A have unit
norm. For any
, we formally define a subspace-preserving representation as follows.
Definition 1.1 (subspace-preserving representation). Given a dictionary A, matrix A, and vector b as above, a vector c satisfying b = Ac is called a subspace-preserving representation if = 0 implies that
for all
.
In words, a c satisfying b = Ac is subspace-preserving if the points in A that correspond to the nonzero entries of c lie in . For the face recognition application described earlier, we can think of
as the set of training images for a particular face that spans the subspace
, and
as the set of training data from all other faces. Then, roughly speaking, determining whether the vector b belongs to the subspace
is equivalent to deciding whether a subspace-preserving representation of b exists.
An important observation for recovering a subspace-preserving representation is that there always exist such representations with at most nonzero entries. Indeed, one can always find a set of
linearly independent data points from
that spans
, and can thus linearly represent all points
. When
is small relative to N, such representations are sparse. This suggests that, among all representation vectors c that satisfy b = Ac, the sparsest ones should be subspace-preserving.
In this paper, we provide a theoretical justification for such ideas. Since the problem of finding the sparsest solution is NP-hard in general [26], we focus our theoretical analysis on sparse recovery using algorithms BP and OMP (see subsection 2.2 for an overview of these methods). We consider the following theoretical question: What conditions on A and b ensure that BP and OMP produce subspace-preserving solutions? In answering this question, we distinguish two different but related problems.
Problem 1.2 (instance recovery problem). Given A and as above, under what conditions is subspace-preserving recovery guaranteed by BP and OMP?
Problem 1.3 (universal recovery problem). Given A as above, under what conditions is subspace-preserving recovery guaranteed by BP and OMP for all ?
We derive instance and universal recovery conditions in section 3 and section 4.
1.3. Paper contributions. This paper makes the following contributions towards understanding the conditions for subspace-preserving recovery.
• Instance recovery conditions. We present conditions for ensuring instance recovery by both BP and OMP. A summary of the conditions is given in Figure 1. – We derive a Tight Instance Recovery Condition (T-IDC) for BP, which requires the existence of a point in BP), the set of dual solutions to the BP problem with dictionary
, that is well separated from points in
. – We derive the Instance Dual Condition (IDC) and the Instance Residual Condition (IRC) for BP and OMP, respectively. These sufficient conditions require the set of dual points
) for BP or the set of residual vectors
) for OMP with dictionary
to be well separated from points in
.
Fig. 1: Instance recovery conditions for BP and OMP. The first row states instance subspace-preserving recovery by BP and OMP. The second row has a tight condition for ensuring subspace-preserving recovery for BP characterized by the set of dual solutions BP) (see Theorem 3.1). The third row gives the dual and residual conditions characterized by the dual points
) (see Theorem 3.4) and residual points
) (see Theorem 3.7), respectively. The fourth row gives geometric conditions characterized by the covering radius
and the angular distance between
and either (i) an arbitrary point from the set of dual points
) for BP (see Theorem 3.5), or (ii) all vectors from the set of residual points
) for OMP (see Theorem 3.8).
– We derive the Geometric Instance Dual Condition (G-IDC) and the Geometric Instance Residual Condition (G-IRC) for BP and OMP, respectively. These conditions are characterized by the covering radius of points associated with
, and the angular distance between
and either (i) the set of dual points
) for BP, or (ii) the set of residual points
) for OMP.
• Universal recovery conditions. We derive tight, sufficient, and geometric conditions for universal recovery by both BP and OMP (see Figure 2) that parallel our conditions for instance recovery. The primary difference is that sufficient conditions for universal recovery by BP and OMP are equivalent, and are implied by a shared Geometric Universal Dual Condition (G-UDC) that depends on the covering radius of points associated with and the angular distance between the set of dual points
) and the data outside of the subspace
.
• Connections to sparse signal recovery. We show that sparse signal recovery is a special case of subspace-preserving recovery. By applying our instance and universal conditions to this specialized setting we obtain new conditions for sparse signal recovery. We clarify the connections between these conditions and well-known sparse signal recovery conditions such as the exact recovery condition [35], the null space property [9] and the mutual coherence condition [11]. The problems of instance recovery by OMP, instance recovery by BP and universal recovery have been studied in the conference proceedings article [45], the related work [31], and another conference proceedings article [47], respectively. This paper integrates part of the existing results developed under different settings and characterized
Fig. 2: Universal recovery conditions for BP and OMP. The first row states universal subspace-preserving recovery by BP and OMP. The second row has a tight condition for ensuring universal subspace-preserving recovery for BP characterized by the set of dual points ) (see Theorem 4.4). The third row gives the dual and residual conditions characterized by the dual points
) (see Theorem 4.6) and residual points (see Theorem 4.11). Geometric conditions characterized by the inradius
and the coherence between
and (i) the set of dual points
) (see Theorem 4.7) and (ii) the subspace
(see Theorem 4.8), are given in rows four and five, respectively.
by different mathematical concepts, and provides a unified study in which conditions for both BP and OMP, as well as for both instance and universal recovery, are derived and formulated in comparable forms. This makes it easy to understand their relationship and to interpret their differences. In addition, we fill in missing components in existing works by introducing new conditions (e.g., the IDC and the URC), proving tightness of existing conditions (i.e., the T-IDC), and establishing equivalence of our new conditions (e.g., the T-UDC) with those in prior works. This also allows us to identify the close relationship between subspace-preserving and sparse signal recovery conditions.
1.4. Major results. We highlight our instance and universal recovery conditions that are characterized by the covering radius and the angular distance (i.e., the bottom rows of Figures 1 and 2), and demonstrate their geometric interpretations. The , formally defined in Definition 2.2, is the smallest angle such that closed spherical balls of that radius centered at the points of
cover the unit sphere of
. This covering radius measures how well distributed the points from
are in the subspace
, and should be relatively small if the points are equally distributed in all directions within the subspace and not skewed in a certain direction. The angular distance between two sets
and
is given by
Fig. 3: Geometric interpretations of subspace-preserving conditions in Theorem 1.4. The atoms :=
(drawn in blue) and the signal b (marked as
, for (a) and (b) only) lie on the unit circle of a two-dimensional subspace
. The atoms
(not drawn) lie on the unit sphere in the ambient space
. In (a) and (b), points in
) and
) are illustrated as green dots (see also Figures 5 and 6). In (c), points in
) are illustrated as red dots (see also Figure 4). The G-IDC (resp., G-IRC, G-UDC and G-USC) holds if no point from
lies in the yellow region in (a) (resp, (b), (c) and (d)), which corresponds to the set
:
with v in the set
) (resp.,
) and
).
the minimum angular distance between points from these two sets, i.e.,
Our major results for subspace-preserving recovery are given in the following theorem, which encapsulates Theorems 3.5, 3.8, 4.7, and 4.8 and Corollary 4.15.
Theorem 1.4. The following geometric recovery conditions hold: (i) (Instance recovery conditions) For a fixed , BP and OMP recover subspace-preserving solutions if G-IDC and G-IRC hold, respectively, where
where and
are the sets of dual points (see Definition 3.2) and residual points (see Definition 3.6), respectively.
(ii) (Universal recovery conditions) BP and OMP recover subspace-preserving solutions for all if either G-UDC holds or G-USC holds, where
where is the set of dual points associated to
(see Defini-tion 4.2).
Conditions G-IDC, G-IRC, G-UDC and G-USC in Theorem 1.4 all require the covering radius to be smaller than the angular distance between and a pair of points
, i.e., they require
(1.1)
for v that is chosen from a particular subset of (that varies for the four conditions). Such conditions conform with our intuition that for subspace-preserving recovery to succeed, points in
should be well distributed in the subspace
and points in
should be well separated from a subset of
. The only difference among the four conditions is that (1.1) needs to be satisfied for different v’s. In Figure 3 we illustrate the G-IDC, G-IRC, G-UDC and G-USC using a two dimensional subspace
in
. The atoms in
are illustrated as blue dots on the unit sphere of
. The atoms in
(not drawn) are points that lie on the unit sphere of
but outside of
. The instance recovery conditions G-IDC and G-IRC depend on a specific vector
, which is illustrated by the symbol
in Figures 3a and 3b.
Therefore, roughly speaking, subspace-preserving recovery requires the points to be separated from a specific region of the subspace
that is determined by both
and b for instance recovery, and by only
for universal recovery.
1.5. Paper outline. The remainder of the paper is organized as follows. In
section 2 we introduce inradius, covering radius and circumradius for characterizing the distribution of points in , as well as coherence for measuring the separation between points from
and
. These notions are used to formulate the instance and universal recovery conditions, which are derived in section 3 and section 4, respectively. In section 5, we discuss the relationship between our subspace-preserving recovery conditions with a dual certificate, mutual incoherence, exact recovery and null space
condition that are well known in the sparse signal recovery literature. Conclusions and future research are given in section 6.
2. Background Material. In this section we present the background material on the geometric characterization of the dictionary A required for our analysis in subsection 2.1, and an overview of the BP and OMP methods in subsection 2.2.
2.1. Geometric characterization of the dictionary A. Our geometric conditions for subspace-preserving recovery rely on geometric properties that characterize the distribution of points in the subspace
and the separation between
and the subspace
. In this section, we introduce the relevant concepts and notations.
The convex hull of the symmetrized points in is defined as
where conv() denotes the convex hull, and its (relative) polar set is defined as
where the second equality follows from Lemma A.1. Both and
are symmetric convex bodies
, as the polar of a convex body is also a convex body [2].
Next, we introduce three concepts for characterizing the distribution of the data points in . The first concept is the inradius of a convex body.
Definition 2.1 (inradius). The (relative) inradius of a convex body P, denoted by r(P), is the radius of the largest Euclidean ball in span(P) inscribed in P.
In our analysis, we use the inradius
which characterizes the distribution of the data points in
. If the elements of
are well distributed in
, the inradius is near one; otherwise, the inradius is small.
The second concept for quantifying the distribution of points is given by the covering radius. Let :=
:
= 1} be the unit sphere of
. The (relative) covering radius of a set is then defined as follows.
Our analysis uses the covering radius of the set , namely
which by Definition 2.2 is computed by finding that is furthest away from all points in
(see Figure 4). It can also be interpreted as the smallest radius such that closed spherical balls of that radius centered at the points of
cover
. The covering radius characterizes how well the points in
are distributed.
The third characterization of the distribution of is in terms of the circumradius.
Definition 2.3 (circumradius). The circumradius R(P) of a convex body P is defined as the radius of the smallest Euclidean ball containing P.
Fig. 4: Illustration of the geometric characterizations of the dictionary A. In this example, the data set :=
lies on the unit circle of a two-dimensional subspace
. The sets
and
are illustrated as the blue and red polygons, respectively. (a) The inradius
is the radius of the inscribing ball of
shown as the orange dashed circle. The covering radius
is the angle
) where
is the maximizer of the optimization problem in Definition 2.2. (b) The circumradius
is the radius of the smallest ball containing
shown as the orange dashed circle.
Our analysis uses the circumradius of the polar set , namely
The next result shows the relationship between the circumradius , the inradius
, and the covering radius
(see Figure 4), and is proved in the appendix.
Theorem 2.4. = cos(
) = 1
Finally, we measure the separation between and an arbitrary subset of
using their coherence, which is the maximum inner product (in absolute value) between points from the two sets. That is, the coherence between
and
is
(2.1) ) := max
max
2.2. Overview of BP and OMP. BP and OMP are two of the most popular methods for finding sparse solutions to an under-determined system of linear equations. Given the dictionary matrix A and a signal b, BP and OMP aim to solve
(2.2) min
It is well-known that finding the solution to (2.2) is an NP-hard problem in general. The BP method is a convex relaxation approach that replaces the -regularization in (2.2) by an
-norm, whose solution set we denote by BP(A, b), i.e.,
The convex optimization problem (2.3) can be efficiently solved using existing solvers. Under the problem setup in subsection 1.2, there always exists a subspace-preserving representation c satisfying Ac = b so that the set BP(A, b) is non-empty. We use a capitalized “Argmin” in (2.3) to indicate that BP(A, b) is a set of solutions.
The OMP method is a greedy strategy that sequentially chooses dictionary elements in a locally optimal manner. Specifically, OMP maintains a working set (i.e., a subset of the dictionary) that for the kth iteration is denoted by in Algorithm 2.1. Using the kth working set, a representation vector
with support corresponding to
is computed in (2.4). The next subset
is computed by adding to
a dictionary element with maximum absolute inner product with the kth constraint residual computed in Step 5. For convenience, let us define
OMP(A, b) := the vector returned in Step 6 of the OMP Algorithm 2.1.
Note that the constraint residual computed in Step 5 is the projection of b onto the orthogonal complement of the space spanned by the columns of A indexed by
. It follows that the vector added to
in Step 2.5 is linearly independent from the columns of A indexed by
. Thus, the solution
computed in (2.4) is the unique solution to the optimization problem. Finally, we comment that some fixed strategy should be used when the optimization problem in (2.5) does not have a unique minimizer, such as adding the minimizer
with smallest index j.
In a practical implementation of Algorithm 2.1, termination would occur when either for some tolerance
[0
) or when the iteration k reaches a maximum allowed value, say
. For the analysis and for simplicity of presentation in Algorithm 2.1, we assume
= 0 and
, and note that termination will always occur (assuming exact arithmetic) for the problem setup described in subsection 1.2.
3. Instance Recovery Conditions. We present conditions under which BP and OMP produce subspace-preserving solutions for a . A summary of the instance recovery conditions is given in Figure 1. As shown in subsection 3.1 and subsection 3.2, our conditions are characterized by the distribution of points
in the subspace
and the separation of
from a set of dual points (for BP) and a set of residual points (for OMP). In subsection 3.3 we discuss how these conditions are related.
Fig. 5: Illustration of the set of dual points ) (see Definition 3.2) and the set of dual solutions BP
). The atoms
:=
(drawn in blue) and b (marked as
) lie on the unit circle of a two-dimensional subspace
) contains points from
that maximize the inner product with b. BP
) contains points whose orthogonal projection onto
lies in
) (see Lemma 3.3).
3.1. Instance recovery conditions for BP. In this section we consider the instance recovery problem for BP. In subsection 3.1.1 we give a tight condition for recovery (i.e., a condition that is equivalent to the statement that BP always returns subspace-preserving solutions), in subsection 3.1.2 we give a sufficient condition for ensuring recovery, and in subsection 3.1.3 we give a weaker sufficient condition for ensuring recovery that has a clear geometric interpretation.
3.1.1. A tight condition. The BP approach seeks a subspace-preserving representation by solving the optimization problem in (2.3). From the optimality conditions, if BP(A, b) is optimal for (2.3), then there exists a
so that
(3.1)
where 1 and
= sgn(
) for
= 0} is the subdifferential of
at
. In particular,
BP
), where BP
) is the set of solutions to the dual optimization problem for (2.3) given by
From the optimality condition (3.1), one sees that if 1 for all
then the vector
is subspace-preserving. This suggests that the condition for BP(A, b) to be subspace-preserving depends on the dot product between the data points and the solution to the dual optimization problem. This motivates the following result, which gives a tight condition for solutions in BP(A, b) to be subspace-preserving.
Theorem 3.1 (Tight Instance Dual Condition (T-IDC)). Let . All elements in BP(A, b) are subspace-preserving if and only if
(3.3) BP
The condition in (3.3) requires that there exists a point from the set of dual solutions BP) such that the absolute value of the dot product between the
dual solution and all points in is less than one. The interpretation of this condition will become clearer once we understand the geometric structure of the set BP
). To that end, observe from the definition of BP
) and the fact that both b and all columns of
lie in the subspace
, that if
BP
) then
BP
), where
is the space orthogonal to
. Consequently, BP
) is composed of a collection of affine subspaces, each of dimension
, that are perpendicular to
. Among all points in BP
), of particular interests are those that lie in the intersection of BP
) and
, which we define as the set of dual points as follows.
Definition 3.2 (dual points associated with and b). Given
and
, we define the set of dual points, denoted by
), as
The relationship between BP) and
) is now made precise.
(3.5) BP) =
) +
Proof. First, we show that the left-side of (3.5) is contained in the right-hand side. To that end, let ¯BP
). Then, define ¯p as the projection of ¯v onto
, and then define ¯q = ¯
so that ¯v = ¯p + ¯q. Thus, it remains to prove that ¯
). Using
, ¯
, and ¯
BP
) it follows that
Combining this with ¯and
¯
¯
1 shows that ¯
).
Next, we show that the right-side of (3.5) is contained in the left-hand side. Let ¯) and ¯
. By defining ¯v = ¯p + ¯q, it remains to prove that ¯
BP
). For a proof by contradiction, suppose it was not true. Since
¯
=
¯
1 (i.e., ¯v is feasible for BP
)), we know there exists a vector ˆv satisfying ˆ
and
ˆ
1. Defining ˆp as the projection of ˆv onto
and ˆq = ˆ
, it follows using
, ˆ
, and ¯
that
which, together with ˆ, contradicts ¯
). This completes the proof.
Since the optimization problem in (3.4) is a linear program with constraint set , the solution set
) is a face of the convex polyhedron
(see [49]). If b is drawn from a distribution that is independent of
, then the optimal face is 0-dimensional (i.e., a vertex of
) with probability one, in which case
) contains a single point. Correspondingly, the set of dual solutions BP
) is a
dimensional affine subspace that passes through the unique point in
) and is perpendicular to
(see Figure 5). The optimal face may also be higher-dimensional when b is perpendicular to such a face, in which case
) contains infinitely many points and BP
) contains a union of infinitely many affine subspaces.
We end the discussion of the tight condition for instance recovery by BP with a proof of Theorem 3.1.
Proof of Theorem 3.1. Without loss of generality, let A = []. It will be convenient to define
= (
), where
BP(
) which exists since
= span(
).
For the “if” direction, let BP
) be any point satisfying (3.3)
and ¯c = (¯) any point in BP(A, b). Since the linear problems BP(
) and BP
) are dual, strong duality,
1, and
1 yield
so that all these inequalities are equalities. Combining the last of these inequalities (now as an equality) with 1 and
1 shows that
implying that ¯; thus, ¯c is subspace-preserving, as claimed.
Next, to prove the “only if” direction, we suppose that all vectors in BP(A, b) are subspace-preserving and construct a BP
) such that
1.
The support of is
= 0} and define
so that
. Following the proof in [48], let us define
where and
denote submatrices of A containing the columns indexed by T and R, respectively, and
denotes the subvector of
that contains entries indexed by T . The optimality conditions satisfied by
BP(
) show that problem (3.6) is feasible. Moreover, using
BP(
), the definitions of R and T , and
= 0 it follows that any feasible point v for problem (3.6) satisfies
It follows from this equality, the fact that is the optimal value for the solutions in BP(
), and strong duality between BP(
) and BP
) that any feasible point for problem (3.6) belongs to BP
). Thus,
BP
). It remains to prove that the objective value of (3.6) at
is strictly less than one.
Defining , the optimization problem (3.6) is equivalent to the problem
Note that range(
) if and only if
, where Q denotes a basis of the null space of A. By writing u = w + sgn(
), we can equivalently write (3.7) as
The dual of this problem may be derived to be
Since (3.8) is a feasible linear program (it is equivalent to (3.6), which is feasible), the optimal objective values of problems (3.8) and (3.9) are finite and equal. Combining
this with the fact that the objective values in (3.6) and (3.8) are equal under the relation =
+ sgn(
), we have that any solution
to (3.9) satisfies
Thus, it remains to prove that the right-hand side of (3.10) is less than one.
First, consider the case when = 0. We claim that the objective value of (3.9) at
(i.e., the right-hand side of (3.10)) must be zero. To see this, note that if the objective value of (3.9) is not zero, then it must be strictly positive since the objective of (3.8) is nonnegative and strong duality holds. Thus, 2
is a feasible solution to (3.9) that gives a larger objective value, which contradicts the optimality of
.
Second, consider the case when . Choose any t < 0 small enough that sgn(
) = sgn(
+
). Since all elements in BP(A, b) are subspace-preserving by assumption, it follows that
BP(A, b). Moreover, since
+
is feasible for (2.3) (recall that
null(A) according to (3.9)) but is not subspace-preserving because
= 0, we know that
is not optimal for (2.3). Therefore, we have
which may be combined with t < 0 and the constraint of (3.9) to conclude that
which establishes that the right-hand side of (3.10) is less than one, as claimed.
3.1.2. A sufficient condition. Theorem 3.1 shows that BP produces subspace-preserving solutions if and only if there exists a dual solution in BP) such that the absolute value of the dot product between the dual solution and all points in
is less than one. Let us note that condition (3.3) in Theorem 3.1 is equivalent to
(3.12) BP
) satisfying max
Here, the left-hand side of the inequality in (3.12) is the maximum absolute value of the dot product between the normalized vector v and any point from , which is small when the points from these two sets are sufficiently separated. On the other hand, a finite upper bound on
(i.e., a positive lower bound on the right-hand side of (3.12)) does not exist in general since the set BP
) is unbounded when
(see Lemma 3.3). Nonetheless, the smallest
-norm solutions in BP
) are those that lie in
, which are exactly the dual points in
). This motivates us to introduce the following weaker subspace-preserving recovery condition.
Theorem 3.4 (Instance Dual Condition (IDC)). Let . All elements in BP(A, b) are subspace-preserving if
Proof. Let v be the vector satisfying condition (3.13). It follows from (3.13) and Lemma 3.3 that BP
) and
1. Combining this result with Theorem 3.1 shows that all elements in BP(A, b) are subspace-preserving.
The condition in (3.13) requires that there exists a dual point v that has small inner product with all points in . In the next section, we present a weaker sufficient condition for subspace-preserving recovery by using an upper bound for
).
3.1.3. A geometrically intuitive sufficient condition. We derive a geometrically intuitive sufficient condition that is characterized by the distribution of points in and the angular distance between the dual set
) and
. Note that if
, it follows from Definition 2.3 and Theorem 2.4 that
(3.14) = 1
Combining this result with (3.12) gives the following theorem.
Theorem 3.5 (Geometric Instance Dual Condition (G-IDC)). All elements in the set BP(A, b) are subspace-preserving if
Proof. Let v satisfy (3.15). It follows from (3.15) and (3.14) that
which is equivalently to 1. From (3.5) we have
BP
). Combining this fact with
1 and Theorem 3.1 shows that all elements in BP(A, b) are subspace-preserving, as claimed. The fact that (3.15) and (3.16) are equivalent follows from Theorem 2.4.
For condition (3.15) to hold, the points in must be well distributed within
as measured by the inradius
, and there must exist a dual point
) that is well separated from
as measured by the inner products. The geometric interpretation of (3.15) is clearer when written in its equivalent angular form (3.16). The inequality (3.16) states that all points from
need to be outside of a pair of antipodal spherical caps centered at
with radius equal to the covering radius
(see Figure 3a).
Prior results in [31] are related to Theorem 3.5. In [31], a dual point is defined as the minimum -norm solution of (3.4), and their instance recovery condition for BP is that (3.15) is satisfied for this dual point. When
) contains a single point, the recovery condition from [31] is equivalent to the condition in (3.15). Otherwise, the recovery condition from [31] may be stronger than our condition in (3.15).
3.2. Instance recovery conditions for OMP. In this section we consider the instance recovery problem for OMP. In subsection 3.2.1 we give a sufficient condition for ensuring subspace-preserving recovery, and in subsection 3.2.2 we give a weaker sufficient condition for ensuring recovery that has a clear geometric interpretation.
3.2.1. A sufficient condition. The OMP method computes a sparse representation using Algorithm 2.1. One way to derive conditions that guarantee instance recovery is to consider the performance of OMP on a related problem, namely with data and b. A feature of this related problem is that the solution returned by OMP will be subspace-preserving since only
is used. When Algorithm 2.1 is called with input data
and b, we denote the kth computed working set, representation vector, and residual vector, respectively, by
,
, and
. Motivated by (2.5) in Algorithm 2.1, conditions guaranteeing subspace preservation when the entire data matrix A is used can then be written in terms of the residual vector
and its angle with the columns of
and
. This observation motivates the following definition.
Fig. 6: Illustration of the set of residual points ) (see Definition 3.6). The atoms
:=
(drawn in blue) and b (marked as
) lie on the unit circle of a two-dimensional subspace
. We show the residual vectors
(drawn in green) computed in iterations
of OMP(
). (a) k = 0:
and
= b. (b) k = 1:
and
is the component of b that is perpendicular to
. (c) k = 2:
and
. By definition,
) =
.
Definition 3.6 (residual points). Given any and any
, we define the set of residual points, denoted by
), as the set of nonzero residual vectors computed by Algorithm 2.1 with input data
and b.
An illustration of the residual points ) is given in Figure 6. The size of
) is equal to the number of iterations computed by OMP(
), which is at most
. In addition,
) is a subset of
since each residual vector is computed by subtracting a linear combination of vectors in
from the vector b, where both b and all points in
lie in the subspace
.
Definition 3.6 may now be used to provide a sufficient condition for OMP to produce a subspace-preserving representation, as we now state.
Theorem 3.7 (Instance Residual Condition (IRC)). solution OMP(A, b) is subspace-preserving if
Proof. We prove that the sequence of working sets computed by OMP with input A and b, and the sequence of working sets
computed by OMP with input
and b are the same. Once this is established, and observing that
=
, it immediately follows that OMP(A, b) is subspace-preserving.
To prove the working sets are equal using a proof by contradiction, let k be the first iteration so that =
and
. Since k is the smallest iteration when the working sets change, it follows from (2.4) that
=
, which in turn implies that
=
. Combining this with (2.5),
=
, and
shows that
, which contradicts (3.18).
Condition (3.18) is actually both necessary and sufficient for OMP to select a point from during every iteration (assuming the optimization problem in (2.5) has a unique optimizer). On the other hand, condition (3.18) is only sufficient for subspace-preserving recovery. Cases exist when OMP with input data A and b selects points
from during iterations prior to termination, but the representation computed by the final iteration is subspace-preserving. For example, let
with
(3.19) (cos 3
sin 3
0), (cos2
sin 2
0)} and
(cos1
sin 1
Note that where
is the x-y plane. Let b = (1, 0, 0)
. Condition (3.18) does not hold because the first iteration of OMP selects the point in
. Nonetheless, the solution OMP(A, b) is subspace-preserving because it terminates after three iterations with working set
=
and subspace-preserving solution
.
3.2.2. A geometrically intuitive sufficient condition. The aim is now to derive a geometrically intuitive sufficient condition characterized by the distribution of points in and the angular distance between the residual set
) and
.
Theorem 3.8 (Geometric Instance Residual Condition (G-IRC)). Given =
, the solution OMP(A, b) is a subspace-preserving representation if
Proof. We prove that condition (3.21) holding implies that condition (3.18) holds. To that end, let ). From the definition of the covering radius it holds that
= max
) :
, and since
it follows that
) =
). Combining this with (3.21), we get
(3.22)
By taking cosine of both sides we get (3.18), which with Theorem 3.7 shows that OMP(A, b) is subspace-preserving, as claimed. The fact that (3.20) and (3.21) are equivalent follows from Theorem 2.4.
Condition (3.20) requires that the points are well enough distributed as measured by the inradius of
, and the residual points to be separated from points in
as measured by their coherence. On the other hand, the equivalent condition (3.21) states that the points
do not lie in the spherical caps of radius
centered at the residual points in
) and their antipodal points
) (see Figure 3b).
3.3. Comparison of BP and OMP for instance recovery. Although BP and OMP are different strategies for solving the sparse recovery problem, our analysis provides a means to compare their ability for finding subspace-preserving representations. Specifically, both the T-IDC for BP (see (3.3)) and the IRC for OMP (see (3.18)) require the condition to be satisfied for certain
. In particular, BP requires the condition to be satisfied for all
BP
), while OMP requires the condition to be satisfied for all
).
The G-IDC for BP (see (3.15) and (3.16)) and the G-IRC for OMP (see (3.20) and (3.21)) allow for a geometrically interpretable comparison of instance recovery. Specifically, the only difference between (3.16) and (3.21) is that BP requires the condition to be satisfied by at least one ) while OMP requires the condition
to be satisfied by all ). Geometrically, the condition for BP gives a pair of antipodal spherical caps centered at
for
) (assuming
) contains a single point), while the condition for OMP gives a collection of
pairs of antipodal spherical caps centered at
for each
) (assuming
) contains
points). Then, instance recovery by BP and OMP are guaranteed if no point from
lies in any of their respective spherical caps. If we assume that points in
are distributed independently of points in
and the vector b, then the G-IDC is satisfied with higher probability than the G-IRC.
4. Universal Recovery Conditions. In many applications one is concerned whether subspace-preserving recovery is possible not only for one point in a subspace but for all points in a subspace, i.e., to achieve universal recovery. In sparse representation-based face recognition, for example, a universal recovery guarantee means that the classifier is able to correctly classify all possible face test images to the classes that they belong to. Unlike the instance recovery conditions developed in section 3 that depend on both A and b, the recovery conditions in this section ensure subspace-preserving representations for all , and therefore rely only on A. An overview of the recovery conditions developed in this section is found in Figure 2.
4.1. Universal recovery conditions for BP. One way to derive universal recovery conditions is to require the instance recovery conditions derived in section 3 to hold for all . In subsection 4.1.1, we adopt such an approach to derive a tight condition for universal recovery by BP from the T-IDC in Theorem 3.1. We then give a sufficient condition for recovery in subsection 4.1.2 and two weaker sufficient conditions for recovery in subsection 4.1.3 that have clear geometric interpretations.
4.1.1. A tight condition. Recall that the T-IDC in Theorem 3.1 is necessary and sufficient for instance recovery by BP. We may require that the T-IDC is satisfied for all , which gives a tight condition for universal recovery by BP.
Lemma 4.1. All elements in BP(A, b) are subspace-preserving representations for all if and only if
Condition (4.1) does not provide insight since it is not directly related to the properties of A. In the following, we derive a condition equivalent to (4.1) that provides insight. We need the following notion of dual points associated with .
Definition 4.2 (dual points associated with ). The set of dual points associated with
, denoted as
), is defined as the extreme points of
.
Definition 4.2 defines dual points ) associated with
, which needs to be distinguished from Definition 3.2 that defines dual points
) associated with
and b. Geometrically, the set of dual points
) corresponds to a face of the polyhedron
that is determined by b, while the set of dual points
) is the set of all vertices (i.e., all faces of dimension 0) of the polyhedron
(see Figure 4).
The following result provides an upper bound on the size of ).
(4.2) card(
Proof. Consider a linear program with variable v, constraint , and an arbitrary linear objective function. Since the set of dual points
) is by definition the set of extreme points of
, they are the same as the basic feasible solutions of the linear program [27]. Since each basic feasible solution is determined by
linearly independent constraints from among the 2
constraints of
1, there are at most 2
ways to choose such a set of constraints (we use that at most one of the two constraints
1 can be chosen for each
).
The bound in Lemma 4.3 is not tight in general since not every set of constraints in the 2combinations produces a basic feasible solution. For example, in Figure 4 the combination of constraints
1 and
1 produces a basic feasible solution, while the combination of constraints
1 and
1 does not. Nonetheless, this bound is sufficient for the purpose of this paper
.
We now state a result with a tight condition for universal recovery by BP.
Theorem 4.4 (Tight Universal Dual Condition (T-UDC)). All elements in BP(A, b) are subspace-preserving for all if and only if
(4.3) )
Proof. In view of Lemma 4.1, we only need to show that (4.1) and (4.3) are equivalent. Note from (3.5) that (4.1) is equivalent to
(4.4) ) and
) satisfying
Therefore, we only need to show that (4.3) and (4.4) are equivalent.
To show that (4.4) implies (4.3), take any ). We will find a
) such that
1. For that purpose, let us define
. We have
) = Arg max
, where the first equality follows from Definition 3.2 and the second equality follows from the fact that
), i.e., b is an extreme point of
. Therefore, w is the only element in
). By (4.4), there exists a
) such that
1. This shows that (4.3) holds.
We now prove that (4.3) implies (4.4). Take any . Note that
) is a face of
, and therefore it must contain at least a vertex of
. That is, there exists a
) such that
). From (4.3), there exists a
) such that
1. This shows that (4.4) holds.
Theorem 4.4 states that BP produces subspace-preserving solutions for all as long as for each dual point
) there exists a point on the affine subspace
that has inner product (in absolute value) less than one for all points in
. In order to establish connections between Theorem 4.4 and existing results, we present two equivalent conditions to the T-UDC in (4.3) that are characterized by the objective value of the BP optimization problem and the null space of A.
Theorem 4.5 (Equivalent forms of T-UDC). The T-UDC holds if and only if either of the following two conditions holds: (i) , it holds that
The condition in (4.5) states that the objective value of BP with signal b and dictionary is strictly smaller than that with signal b and dictionary
. In the context of subspace clustering, it is known that (4.5) is an equivalent condition for universal recovery condition by BP [18]. Combining this with Theorem 4.4 we know that (4.5) is equivalent to the T-UDC in (4.3). On the other hand, the condition in (4.6), which is characterized by properties associated with the null space of A, has not appeared before to the best of our knowledge. As we will see in section 5, (4.6) is closely related to the null space property in the study of sparse signal recovery.
Proof of Theorem 4.5. The previous paragraph shows that (4.5) is equivalent to the T-UDC. To establish the theorem, we only need to show that the conditions (4.5) and (4.6) are equivalent.
We first show that (4.5) implies (4.6). To that end, let v = () be any vector in the null space of A that satisfies
, so that in particular it holds that
= 0. If
, then the left-hand side of (4.6) becomes zero. Combining this with
0 shows that (4.6) holds. Otherwise, we may let
and apply (4.5), which gives
which establishes that (4.6) holds.
To show that (4.6) implies (4.5), let . Note that the optimization problem on the left-hand side of (4.5) is feasible. Therefore, (4.5) trivially holds if the optimization problem on its right-hand side is not feasible. Otherwise, we can define
and let be any vector satisfying
(this system is feasible as stated above). We can now observe that
(i.e., (
) is in the null space of A) and
since
, which means that we may apply (4.6) to obtain
where the last equality follows from (4.7). This completes the proof.
4.1.2. A sufficient condition. Note that condition (4.3) is equivalent to
(4.8) ) satisfying max
As with our discussion for instance recovery by BP, the left-hand side of the inequality in (4.8) captures the similarity between v and , but the right-hand side can be arbitrarily small. Therefore, we restrict our attention to those v with smallest
-norm. It is easy to see that such v is given by v = w for each
). This leads to the following result that gives a sufficient condition for universal recovery by BP.
Theorem 4.6 (Universal Dual Condition (UDC)). All elements in BP(A, b) are subspace-preserving for all if
Proof. We prove that condition (4.9) implies condition (4.3), which is sufficient to establish the desired result because of Theorem 4.4. To that end, assume that (4.9) holds and let ). Then, choose v = w so that
trivially holds, and from (4.9) we know that
1. Thus, condition (4.3) holds.
Using Theorem 4.6, we now derive two geometrically interpretable conditions.
4.1.3. Two geometrically intuitive sufficient conditions. For any , it follows from Definition 2.3 and Theorem 2.4 that
(4.10) = 1
Combining (4.10) with (4.9) we have the following theorem.
Theorem 4.7 (Geometric Universal Dual Condition (G-UDC)in BP(
Proof. We prove that condition (4.11) holding implies that condition (4.9) holds, which is sufficient due to Theorem 4.6. To this end, assume that (4.11) holds and let ). It then follows from (4.10), (4.11) and (2.1) that
(4.13) 1) =
which means that condition (4.9) is satisfied, as claimed. The fact that conditions (4.11) and (4.12) are equivalent follows from Theorem 2.4.
Condition (4.11) requires the points in to be sufficiently well distributed as measured by the inradius of
, and the dual points
) to be sufficiently separated from points in
as measured by their coherence. The equivalent condition given by (4.12) states that all points from
do not lie in the union of spherical caps of radius
centered at each of the points from
) (see Figure 3c). By noting that
) is a subset of
, we immediately have the following result.
Theorem 4.8 (Geometric Universal Subspace Condition (G-USC)elements in BP(
Geometrically, condition (4.15) states that all points from do not lie within an angular distance of
to the subspace
(see Figure 3d).
4.2. Universal recovery conditions for OMP. Similar to the case of BP, we may derive universal recovery conditions for OMP by requiring that the instance recovery conditions for OMP presented in subsection 3.2 are satisfied for all . Following this idea, we present a universal residual condition (URC) in subsection 4.2.1, which is a sufficient condition for universal recovery by OMP. We further establish, perhaps surprisingly, that the URC is equivalent to the UDC in (4.9), showing that the same condition guarantees universal recovery of both BP and OMP. By utilizing such an equivalency, we further show in subsection 4.2.2 that the G-UDC (4.11) and G-USC (4.14) also guarantee universal recovery by OMP.
4.2.1. A sufficient condition. By requiring that the IRC in Theorem 3.7 be satisfied for all , we get a sufficient condition for universal recovery by OMP as stated in the following lemma.
The following result shows that the set ) appearing in (4.16) is simply a disguised mathematical way of describing the subspace
.
Proof. We can use the fact that ) is a subset of
for all
(see the discussion in subsection 3.2.1) to conclude that
Note from Algorithm 2.1 that the residual vector computed during the first iteration of OMP with data
and b is equal to b, i.e.,
= b. Therefore, by Definition 3.6 we know that
) for all
. It follows, by considering all possible
, that
The desired result follows by combining (4.17) and (4.18).
Lemma 4.10 is now used to give the following universal recovery result for OMP.
Proof. The theorem follows by combining Lemma 4.9 with Lemma 4.10.
The URC (4.19) for universal recovery by OMP may be compared with the UDC (4.9) for universal recovery by BP. They both require that is satisfied for certain
. However, the condition for BP must be satisfied for all
), while the condition for OMP must be satisfied for all
. Since
) is a subset of
, it is clear that the UDC is implied by the URC. It is surprising, however, that the conditions are actually equivalent, as we now set out to prove.
We need to use the result that the polar set is a symmetric convex body that induces a norm on the space
, by means of the Minkowski functional. The relevant definition and accompanying result are stated next.
Definition 4.12 (Minkowski functional). The Minkowski functional of a given set K is defined over span(K) as := inf{t > 0 :
.
Proof. The desired result follows from [38] using the fact that is a symmetric convex body and span(
) = span(
) = span(
) =
.
We may now establish that the UDC and URC are equivalent.
Proof. To show that the UDC implies the URC, we need to take any and show that
. Equivalently, by defining ¯v :=
, we may show that
¯
, which will be our goal.
Definition 4.2 states that ) is the set of extreme points of
. Since the convex hull of the set of the extreme points of a convex body is this convex body itself [2], we know that
is the convex hull of
). Thus, since ¯
(see Lemma 4.13), we may express ¯v as a finite convex combination of points from
), i.e., ¯v =
with
) and
[0, 1] for all i and
= 1. This gives
where the strict inequality follows from the UDC and the final equality follows from the fact that = 1. This shows that the URC holds, as intended.
To show that the URC implies the UDC, let . From the URC we get
. Also, by noting that
we get
1. Combining these two results gives
1, which is the UDC.
4.2.2. Two geometrically interpretable conditions. We can deduce that our geometrically interpretable conditions for universal subspace recovery for BP are also applicable to OMP by making the following observations: (i) the G-USC (4.14) and G-UDC (4.11) are sufficient conditions for ensuring the UDC (4.9) holds (see Figure 2, Theorems 4.7 and 4.8); (ii) the UDC is equivalent to the URC (see Figure 2 and Theorem 4.14); and (iii) the URC is a sufficient condition for universal recovery for OMP (see Figure 2 and Theorem 4.11). We formally state this observation next.
Corollary 4.15. The solution OMP(A, b) is subspace-preserving for all if either the G-UDC in (4.11) holds or the G-USC in (4.14) holds.
5. Sparse Signal Recovery. In this section, we show that the instance and universal recovery conditions for subspace-preserving recovery can also provide correctness guarantees for sparse signal recovery. To formally state such results, we identify three fundamental problems that are studied in compressive sensing.
Problem 5.1 (instance non-uniform recovery problem). What conditions on A ensure that a particular -sparse vector c that has nonzero entries corresponding to
can be recovered from the linear measurements b = Ac?
Problem 5.2 (universal non-uniform recovery problem). What conditions on A ensure that any -sparse vector c that has nonzero entries corresponding to
can be recovered from the linear measurements b = Ac?
Problem 5.3 (uniform recovery problem). What conditions on A ensure that any -sparse vector c can be recovered from the linear measurements b = Ac?
In the literature, the first two problems are sometimes referred to as the non-uniform recovery problem and the third problem as the uniform recovery problem [19]. To further distinguish the first two problems, we will refer to them as the instance and universal non-uniform recovery problems, respectively. The dual certificate condition [20, 48], exact recovery condition [35] and mutual coherence condition [12, 35] are, respectively, well-known conditions that answer these three questions.
Observe that if contains linearly independent data points, then the instance and universal subspace-preserving recovery problems are equivalent to the instance and universal non-uniform sparse signal recovery problems, respectively. More precisely, BP (resp., OMP) recovers a
-sparse vector c that has nonzero entries corresponding to
from b = Ac if and only if all solutions in BP(
) (resp., OMP(
)) are subspace-preserving. Likewise, BP (resp., OMP) recovers any
-sparse vector c with nonzero entries corresponding to
from its measurement b = Ac if and only if all solutions to BP(
) (resp., OMP(
)) are subspace-preserving for any
.
As a consequence of this equivalency, our instance and universal recovery conditions also provide guarantees for sparse signal recovery. In this section, we establish that some of our conditions are equivalent to existing conditions in the sparse signal recovery literature. In such cases, our conditions are generalizations of the corresponding conditions for sparse signal recovery. In the other cases, our conditions provide new sparse signal recovery conditions that bring new insight to the area.
5.1. Conditions for instance non-uniform recovery. By applying Theorems 3.1, 3.4, and 3.7, we obtain instance non-uniform sparse signal recovery results for BP and OMP. Similarly, we obtain geometrically interpretable conditions by using Theorems 3.5 and 3.8. These results are now summarized.
(i) c is the unique solution in BP(A, b) if and only if the T-IDC in (3.3) holds. (ii) c is the unique solution in BP(A, b) if either the IDC in (3.13) holds or the
G-IDC in (3.15) holds. (iii) c is given by OMP(A, b) if either the IRC in (3.18) holds or the G-IRC in
We note that when the T-IDC is considered under the assumption that the columns of A are linearly independent, it is equivalent to a well-known sparse signal recovery condition [20, Theorem 1] called the dual certificate condition in [48]. Therefore, although the result Theorem 5.4 (i) does not provide a new result for sparse signal recovery, it does show that the T-IDC may be regarded as a generalization of the dual certificate condition to the instance subspace-preserving recovery problem. Theorem 5.4 (ii) shows that the IDC is a sufficient condition for sparse signal recovery by BP that follows trivially by relaxing the condition in Theorem 5.4 (i). This condition seldomly, if ever, appears in the sparse signal recovery literature.
Theorem 5.4 (iii) shows that the IRC is a sufficient condition for sparse signal recovery by OMP. This condition has been implicitly used in the derivation of sparse recovery conditions for OMP (see, e.g. [35]) but is not identified as an independent result.
Theorem 5.4 (ii)-(iii) also provide new geometrically interpretable conditions (i.e., the G-IDC and the G-IRC) for sparse signal recovery. They state that instance non-uniform recovery by BP and OMP is guaranteed to be successful if the points in are well distributed as measured by the inradius
and the points in
are sufficiently
separated from (i) a certain point in ) for BP, or (ii) all points in
) for OMP. To the best of our knowledge, such conditions are new to the sparse signal recovery literature.
Unlike the general subspace-preserving recovery problem for which the inradius is difficult to compute in general, there exists an easy way of computing
for sparse signal recovery when the columns of
are linearly independent. Since
can be computed, one may check whether the G-IDC and G-IRC are satisfied for a given A and c. To show how
may be computed, we first establish the following result.
Lemma 5.5. Let the columns of be linearly independent. Then, the set of dual points contains exactly 2
points and is given by
) =
, where
:=
) :
1 for i = 1
.
Proof. From Lemma 4.3, there are at most 2dual points in the case that
has full column rank. So in order to prove the result, it is enough to show that the set
contains 2
points, and each of them is a dual point.
Note that has 2
points. Also, if
with
, then
because
has full column rank as a consequence of
having full rank; thus,
has 2
points.
Next, we show that :=
is a dual point for any
. By definition, we need to show that
is an extreme point of the set
=
:
. First, note that
because
range(
) and
=
= 1. Second, suppose that there are two points
such that
= (1
for some
(0, 1). Since span(
) =
and
, there exists
and
satisfying
for
. Combining this with the fact that
has full column rank reveals that
Consider the ith entry of , namely [
= (1
. Since [
1 and the right-hand side of (5.1) is a convex combination with
(0, 1) of two points in [
1] (this can be seen from
= 1 for
because
), it follows that [
= [
. Since the index i was arbitrary, it follows that
, which implies
, thus showing that
is an extreme point.
We may now use Theorem 2.4 and Lemma 5.5 to compute as
= 1
= 1/ max
= 1/ max
= 1/ max
(5.2)
which requires computing over all
. However, no polyno- mial algorithm in
for doing so exists since calculating max
:
is known to be NP-hard [36].
5.2. Conditions for universal non-uniform recovery. By applying Theorems 4.4, 4.6, and 4.11, and noting from Theorem 4.14 that the UDC and the URC are equivalent, we obtain universal non-uniform sparse signal recovery results for BP and OMP. Moreover, we can apply Theorems 4.7 and 4.8 and Corollary 4.15 to obtain geometrically interpretable conditions. These results are summarized as follows.
Theorem 5.6. Let the columns of be linearly independent. The following hold: (i) Any c supported on
is the unique solution in BP(A, b) where b = Ac if and only if the T-UDC in (4.3) holds.
(ii) Any c supported on is the unique solution in BP(A, b) and OMP(A, b) where b = Ac if either the UDC in (4.9) (equivalently, the URC in (4.19)) holds, or the G-UDC in (4.11) holds, or the G-USC in (4.14) holds.
Theorem 5.6 (i) provides a tight condition for universal non-uniform recovery by BP. When the columns of are linearly independent, the T-UDC is equivalent to a sparse signal recovery condition called the null space property ([9], see also [19, Definition 4.1.]). In particular, a matrix A = [
] is said to satisfy the null space property if for any
= (
) in the null space of A, it holds that
(5.3)
It is straightforward to verify that (5.3) is equivalent to (4.6) by using the assumption that has linearly independent columns. Therefore, it follows from Theorem 4.5 that (5.3) is equivalent to the T-UDC.
Theorem 5.6 (ii) shows that the UDC (equivalently, the URC) is a sufficient condition for universal non-uniform recovery by BP and OMP. When the columns of are linearly independent, we claim that the UDC is equivalent to the exact recovery condition in sparse signal recovery condition [35, Theorem A] given by
(5.4) it holds that
where denotes the pseudo-inverse of
. To see that it is equivalent to the UDC, we use Lemma 5.5 and the relation
= (
to conclude that
(5.5)
which is (5.4). Consequently, Theorem 5.6 (ii) is not new for sparse signal recovery. However, it does show that the UDC is a generalization of the exact recovery condition.
Theorem 5.6 (ii) also provides new geometrically interpretable conditions for sparse signal recovery. It states that universal non-uniform recovery is guaranteed if the points in are sufficiently well distributed as measured by the inradius
and the points in
are sufficiently well separated from (i) all points in
) for GUDC, or (ii) all points in span(
) for G-USC. As for the instance non-uniform sparse signal recovery setting, one may check whether the G-UDC and G-USC are satisfied for a given A since all of the quantities in these conditions are easy to compute.
5.3. Conditions for uniform recovery. To derive conditions that guarantee any -sparse vector c can be recovered by BP and OMP regardless of the support of c, we may require that the universal non-uniform recovery conditions in Theorem 5.6 are satisfied for all possible partitions of the data A into
, where
contains
atoms. This leads to the following theorem.
Theorem 5.7. Given a dictionary A, any -sparse vector c is the unique solution in BP(A, b) and OMP(A, b) where b = Ac if for any partition
where
contains
atoms, the columns of
are linearly independent and either the UDC in (4.9) (equivalently, the URC in (4.19)) holds, or the G-UDC in (4.11) holds, or the G-USC in (4.14) holds.
Since we proved the UDC is equivalent to the exact recovery condition (5.4) when the columns of are linearly independent, the requirement in Theorem 5.7 that the UDC holds is equivalent to requiring that the exact recovery condition holds.
Let us now compare the results in Theorem 5.7 with existing results in sparse signal recovery. To this end, the mutual coherence of A is defined as
(5.6) ) := max
which should not be confused with the coherence in (2.1). It is known [12, 35] that if
(5.7)
then both BP and OMP correctly recover any -sparse vector. The following result shows how the mutual coherence condition (5.7) is related to Theorem 5.7.
Theorem 5.8. If the mutual coherence condition (5.7) holds, then the columns of are linearly independent and the G-USC (4.14) holds, which implies using Theorem 4.8 that the G-UDC holds, which implies using Theorem 4.7 that the UDC holds.
Proof. The fact that the columns of are linearly independent when the mutual coherence condition holds is well known in sparse recovery [12]. Therefore, we only need to show that the G-USC holds. We first derive a bound on
. From Lemma 5.5, any nonzero
can be written as
for some 0
1. Letting
) denote the minimum eigenvalue of
, it follows that
Since the diagonal entries of are 1 and the magnitude of the off-diagonal entries is bounded by
), it follows from Gershgorin’s disc theorem that
1
1)
), which then combined with (5.8) gives
for all
. Consequently, Theorem 2.4 implies that
(5.9) = 1
1)
We now give an upper bound on the right-hand side of the G-USC. By definition,
(5.10) (span(
max
:
span(
) and
= 1}.
and its dual problem
Since strong duality holds for linear problems and the primal problem is feasible, we known that . Moreover, we can decompose
into two orthogonal components given by
=
+
, where
span(
). Since v and all columns of
lie in span(
), we have
and
=
1. This establishes that
, and therefore
= 1
. It follows that
which may then be used to show that
Combining this inequality with (5.10), (5.9), and (5.7) yields
)1
1)
) =
)(2
1)
11
1)
)
(5.14)
This proves that the G-USC holds, and completes the proof.
6. Conclusions and Future Work. In this work, we studied the BP and OMP algorithms as a tool for the task of subspace-preserving recovery. Our key results are sufficient conditions for both instance and universal subspace-preserving recovery characterized by the inradius and coherence properties of the data, which have clear geometric interpretations. We further showed that our results apply to the traditional sparse signal recovery problem and that some of our conditions may be regarded as generalization of well-known conditions in sparse recovery. We believe that these results provide new perspectives into the traditional sparse recovery problem.
The analysis in this paper assumes the data are not corrupted by noise. As a follow-up task, it is important to understand whether BP and OMP are able to recover subspace-preserving solutions for noisy data. Prior work on subspace clustering has provided partial results for noisy data [32, 39, 37], indicating that our conditions for subspace-preserving recovery may be generalizable to the case of noisy data as well.
Proof. Since , it follows that
. To prove the opposite inclusion, we take any ¯
that satisfies
1 for all
, and any ¯
, and show that
1.
Since = conv(
), there exist
and
such that
+
) = 1 and ¯
). It follows that
which completes the proof.
Proof. From [31] we know that = 1
. Thus, it remains to show that
= 1/ cos(
). From the definition of the covering radius
we have
By taking the cosine on both sides of (A.2), we get
On the other hand, by the definition of circumradius, we have
(A.4) = max
1 and
To complete the proof we show that the RHS of (A.4) is the reciprocal of (A.3), i.e.,
(A.5) max1 and
= 1/ min
To see that (A.5) holds, let and
be, respectively, an arbitrary solution to the optimization problems on the LHS and RHS of (A.5). Defining ¯
, we get 1
, where the inequality follows because ¯v satisfies the constraints of the optimization on the LHS of (A.5), i.e.,
¯
1 and ¯
. On the other hand, we may define ¯
and get 1
¯
, where the final inequality follows from the fact that ¯w satisfies the constraint of the optimization on the RHS of (A.5). Combining these two gives 1
=
so that (A.5) holds.
Acknowledgments. We would like to acknowledge that the equivalent form of the T-UDC that is given in (4.6) is established by Dr. Mustafa Devrim Kaba at Johns Hopkins University.
[1] R. Basri and D. Jacobs, Lambertian reflection and linear subspaces, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (2003), pp. 218–233.
[2] S. Brazitikos, A. Giannopoulos, P. Valettas, and B. Vritsiou, Geometry of Isotropic Convex Bodies:, Mathematical Surveys and Monographs, American Mathematical Society, 2014.
[3] A. Bruckstein, D. Donoho, and M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Review, 51 (2009), pp. 34–81.
[4] T. T. Cai and A. Zhang, Sparse representation of a polytope and recovery of sparse signals and low-rank matrices, IEEE Transactions on Information Theory, 60 (2014), pp. 122–132.
[5] E. Cand`es, The restricted isometry property and its implications for compressed sensing, Comptes Rendus Mathematique, 346 (2008), pp. 589–592.
[6] E. Cand`es and T. Tao, Decoding by linear programming, IEEE Trans. on Information Theory, 51 (2005), pp. 4203–4215.
[7] E. Cand`es and M. Wakin, An introduction to compressive sampling, IEEE Signal Processing Magazine, 25 (2008), pp. 21–30.
[8] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput., 20 (1998), pp. 33–61.
[9] A. Cohen, W. Dahmen, and R. DeVore, Compressed sensing and best k-term approximation, J. Amer. Math. Soc., 22 (2009), pp. 211–231.
[10] M. A. Davenport and M. B. Wakin, Analysis of orthogonal matching pursuit using the restricted isometry property, IEEE Transactions on Information Theory, 56 (2010), pp. 4395– 4401.
[11] D. Donoho and X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Trans. Information Theory, 47 (2001), pp. 2845–2862.
[12] D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) , Proceedings of National Academy of Sciences, 100 (2003), pp. 2197–2202.
[13] D. L. Donoho, M. Elad, and V. N. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Trans. on Information Theory, 52 (2006), pp. 6–18.
[14] E. L. Dyer, A. C. Sankaranarayanan, and R. G. Baraniuk, Greedy feature selection for subspace clustering, Journal of Machine Learning Research, 14 (2013), pp. 2487–2517.
[15] E. Elhamifar, G. Sapiro, and R. Vidal, See all by looking at a few: Sparse modeling for finding representative objects, in IEEE Conference on Computer Vision and Pattern Recognition, 2012.
[16] E. Elhamifar and R. Vidal, Sparse subspace clustering, in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 2790–2797.
[17] E. Elhamifar and R. Vidal, Block-sparse recovery via convex optimization, IEEE Transactions on Signal Processing, 60 (2012), pp. 4094–4107.
[18] E. Elhamifar and R. Vidal, Sparse subspace clustering: Algorithm, theory, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013), pp. 2765–2781.
[19] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing, Applied and Numerical Harmonic Analysis, Springer New York, 2013.
[20] J.-J. Fuchs, On sparse representations in arbitrary redundant bases, IEEE transactions on Information theory, 50 (2004), pp. 1341–1344.
[21] K. Fukuda et al., Frequently asked questions in polyhedral computation, Swiss Federal Institute of Technology, Lausanne and Zurich, Switzerland., (2004).
[22] C.-G. Li, C. You, and R. Vidal, On geometric analysis of affine sparse subspace clustering, IEEE Journal on Selected Topics in Signal Processing, 12 (2018), pp. 1520–1533.
[23] J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing, Foundations and Trends in Computer Graphics and Vision, 8 (2014), pp. 85–283, https:// doi.org/10.1561/0600000058.
[24] , Applied and Computational Harmonic Analysis, 31 (2011), pp. 460–468, https://doi.org/10.1016/j.acha.2011. 04.005.
[25] Q. Mo and Y. Shen, A remark on the restricted isometry property in orthogonal matching pursuit, IEEE Transactions on Information Theory, 58 (2012), pp. 3654–3656, https://doi. org/10.1109/TIT.2012.2185923, http://dx.doi.org/10.1109/TIT.2012.2185923.
[26] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM journal on computing, 24 (1995), pp. 227–234.
[27] J. Nocedal and S. J. Wright, Numerical Optimization, second edition, World Scientific, 2006.
[28] I. Novik, A tale of centrally symmetric polytopes and spheres, in Recent Trends in Algebraic Combinatorics, Springer, 2019, pp. 305–331.
[29] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with application to wavelet decomposition, in Asilomar Conference on Signals, Systems and Computation, 1993.
[30] S. Qaisar, R. M. Bilal, W. Iqbal, M. Naureen, and S. Lee, Compressive sensing: From theory to applications, a survey, Journal of Communications and networks, 15 (2013), pp. 443–456.
[31] M. Soltanolkotabi and E. J. Cand`es, A geometric analysis of subspace clustering with outliers, Annals of Statistics, 40 (2012), pp. 2195–2238.
[32] M. Soltanolkotabi, E. Elhamifar, and E. J. Cand`es, Robust subspace clustering, Annals of Statistics, 42 (2014), pp. 669–699.
[33] H. L. Taylor, S. C. Banks, and J. F. McCoy44 (1979), pp. 39–52.
[34] R. Tibshirani, Regression shrinkage and selection via the LASSO, Journal of the Royal Statistical Society B, 58 (1996), pp. 267–288.
[35] J. Tropp, Greed is good: Algorithmic results for sparse approximation, IEEE Transactions on Information Theory, 50 (2004), pp. 2231–2242.
[36] J. A. Tropp, Topics in sparse approximation, PhD thesis, 2004.
[37] Transactions on Information Theory, 64 (2018), pp. 4081–4104.
[38] R. Vershynin, Lectures in geometric functional analysis, (2009).
[39] Y.-X. Wang and H. Xu, Noisy sparse subspace clustering, Journal of Machine Learning Research, 17 (2016), pp. 1–41.
[40] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, Sparse representation for computer vision and pattern recognitiong, Proceedings of the IEEE, 98 (2010), pp. 1031– 1044.
[41] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (2009), pp. 210–227.
[42] European Conference on Computer Vision, 2016, pp. 731–747.
[43] C. You, C. Li, D. P. Robinson, and R. Vidal, A scalable exemplar-based subspace clustering algorithm for class-imbalanced data, in European Conference on Computer Vision, 2018.
[44] C. You, C.-G. Li, D. P. Robinson, and R. Vidal, Is an affine constraint needed for affine subspace clustering?, in IEEE International Conference on Computer Vision, 2019.
[45] C. You, D. P. Robinson, and R. Vidal, Scalable sparse subspace clustering by orthogonal matching pursuit, in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3918–3927.
[46] C. You, D. P. Robinson, and R. Vidal, Provable self-representation based outlier detection in a union of subspaces, in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4323–4332.
[47] C. You and R. Vidal, Geometric conditions for subspace-sparse recovery, in International Conference on Machine Learning, 2015, pp. 1585–1593.
[48] H. Zhang, M. Yan, and W. Yin, One condition for solution uniqueness and robustness of both l1-synthesis and l1-analysis minimizations, Advances in Computational Mathematics, 42 (2016), pp. 1381–1399.
[49] L.-H. Zhang, W. H. Yang, and L.-Z. Liao, On an efficient implementation of the face algorithm for linear programming, Journal of Computational Mathematics, 31 (2013), pp. 335– 354.