Low-rank matrix completion (LRMC) has attracted a lot of attention in recent years because of its broad range of applications, e.g., recommender systems and collaborative filtering [1] and image processing [2].
The problem entails exactly recovering all the entries in a rank-r matrix, given only a subset of its entries. LRMC is usually studied under a missing-at-random and bounded-coherence model. Under this model, necessary and suffi-cient conditions for perfect recovery are known [3–8]. Other approaches require additional coherence and spectral gap conditions [9], use rigidity theory [10], algebraic geometry and matroid theory [11] to derive necessary and sufficient conditions for completion of deterministic samplings, but a characterization of completable sampling patterns remains an important open question.
We say an incomplete matrix is finitely rank-r completable if there exist at most finitely many rank-r matrices that agree with all its observed entries. There exist sampling/observation patterns that guarantee finite completablility, but if just a single one of the observed entries is instead missing, then there are infinitely many completions. Conversely, adding a few observations to such a pattern guarantees unique completability. Thus, finite completablility is the tipping point in LRMC.
Whether a matrix is finitely completable depends on which entries are observed. Yet no characterization of the sets of observed entries that allow or prevent finite completablility is known.
The main result of this paper is a sampling condition for finite completablility, that is, a condition on the observed entries of a matrix to guarantee that it can be completed in at most finitely many ways. In addition, we provide deterministic sampling conditions for unique completability that can be efficiently verified. Finally, we show that uniform random samplings with O(max{r,log d}) entries per column satisfy these conditions with high probability.
Our results have implications on LRMC regarding lower bounds, sample and computational complexity, the role of coherence, adaptive settings and validation conditions to verify the output of any completion algorithm. We complement our theoretical results with experiments that support our findings and motivate future analysis of uncharted sampling regimes.
Organization of the Paper
In Section 2 we formally state the problem and our main results. In Section 3 we discuss their implications in the context of previous work, and present our experiments. We present the proof of our main theorem in Section 4, and we leave the proofs of our other statements to Sections 5 and 6.
Let denote the incomplete version of a
, rank-r data matrix X, observed only in the nonzero locations of
, a
matrix with binary entries. The goal of LRMC is to recover X from
.
This problem is tantamount to identifying the r-dimensional subspace spanned by the columns in X, and this is how we will approach it. First observe that since X is rank-r, a column with fewer than r samples cannot be completed. A column with exactly r observations can be uniquely completed once
is known, but it provides no information to identify
. We will thus assume that:
The key insight of the paper is that observing r + 1 entries in a column of X places one constraint on what may be. For example, if we observe r + 1 entries of a particular column, then not all r-dimensional subspaces will be consistent with the entries. If we observe more columns with r + 1 entries, then even fewer subspaces will be consistent with them. In effect, each column with r + 1 observations places one constraint that an r-dimensional subspace must satisfy in order to be consistent with the observations. The observed entries in different columns may or may not produce redundant constraints. As we will see, the pattern of observed entries determines whether or not the constraints are redundant, thus indicating the number of subspaces that satisfy them. The main result of this paper is a simple condition on the pattern of observed entries that guarantees that only a finite number of subspaces satisfies all the constraints. This in turn provides a simple condition for exact matrix completion.
Remark 1. We point out that any observation, in addition to the r + 1 per column that we assume, cannot increase the number of rank-r matrices that agree with the observations. So in general, if some columns of X are observed on more than r + 1 entries, all we need is that the observed entries include a pattern with exactly r + 1 observations per column satisfying our sampling conditions.
Also notice that completing X is the same as completing , so a row with fewer than r observations cannot be completed. While we do not assume that each row is observed on at least r entries, our sampling conditions guarantee that this is the case.
Let Grdenote the Grassmannian manifold of r-dimensional subspaces in
. Observe that each
rank-r matrix X can be uniquely represented in terms of a subspace
Gr
(spanning the columns of X) and an
coefficient matrix
. See Figure 1 to build some intuition. Let
denote the uniform measure on Gr
, and
the Lebesgue measure on
. Our statements hold for almost every (a.e.) X with respect to the product measure
.
The paper’s main result is the following theorem, which gives a deterministic sampling condition to guarantee that at most a finite number of r-dimensional subspaces are consistent with .
Given a matrix, let denote its number of columns and
the number of its nonzero rows.
Figure 1: Each column in a rank-r matrix X corresponds to a point in an r-dimensional subspace . In these figures,
is a 2-dimensional subspace (plane) in general position. In the left, the columns of X are in general position inside
, that is, drawn independently according to an absolutely continuous distribution with respect to the Lebesgue measure on
, for example, according to a gaussian distribution on
. In this case, the probability of observing a sample as in the right, where all columns lie in a line inside
, is zero. Our results hold for every rank-r matrix, except for a set of measure zero of pathological cases as in the right.
The proof of Theorem 1 is given in Section 4. In words, condition (i) asks that every subset of n columns of has at least n/r + r nonzero rows.
such that has exactly r+1 nonzero entries per column. This way, each column of
encodes exactly one constraint that candidate subspaces must satisfy in order to be consistent with the observed data. In this case we can simply take
to be the matrix formed with the first
columns of
. One may verify that
satisfies (i). Hence
satisfies the conditions of Theorem 1.
Unique Completability
Theorem 1 is easily extended to a condition on that is sufficient to guarantee that one and only one subspace is consistent with
, which in turn suffices for exact matrix completion.
The proof of Theorem 2 is given in Section 5. In words, condition (ii) asks that every subset of n columns of has at least n+r nonzero rows. Notice that (1) is a weaker condition than (2), but (1) is required to hold for all the subsets of
columns, while (2) is required to hold only for all the subsets of
columns.
Example 2. Consider as in Example 1. Take
to be the matrix formed with the first
columns of
and
to be the matrix formed with the last
columns of
. One may verify that
satisfies (i) and that
satisfies (ii). Hence
satisfies the conditions of Theorem 2.
Theorem 1 implies that columns with r + 1 entries are necessary for finite completablility (hence also for unique completability). There are cases when
columns are also sufficient for unique completability, e.g., if r = 1, where finite completablility is equivalent to unique completability (see Proposition 1).
In general, though, unique completability requires more columns than finite completablility (see Example 5). Theorem 2 gives deterministic sufficient sampling conditions for unique completability that only require columns. This shows that with just a few more observations, unique completability follows from finite completablility.
We point out that when the conditions of Theorem 2 are met, can be uniquely identified as
where V is the unique solution to the polynomial system F(V) = 0, with F as defined in Section 4.
Once is known, X can be perfectly recovered observing only r entries per column. To see this, let
be a basis of
, and let
be a subset of {1,... ,d} with exactly r elements. We will use the subscript
to denote restriction to the rows in
. Since the coefficients of column x in the basis
are given by
, we can recover the entire column as
.
In general, verifying condition (i) in Theorems 1 and 2 may be computationally prohibitive, especially for large d. On the other hand, one can easily and effi-ciently verify whether (ii) is satisfied by checking the dimension of the null-space of a sparse matrix (Algorithm 1). Fortunately, there is a tight relation between conditions (i) and (ii), summarized in the following lemma. The proofs of the statements in this section are given in Section 6.
Lemma 1. Let be a
matrix formed with a subset of the columns in
. Suppose
can be partitioned into r matrices
, each of size
, such that (ii) holds for every
. Then
satisfies (i).
As consequence of Lemma 1 we obtain an additional sufficient condition for completability that only involves (ii).
Corollary 1. Let be given, and suppose A1 holds. Then almost every X can be uniquely recovered from
if
contains r + 1 disjoint matrices
, each of size
, such that (ii) holds for every
.
Example 3. Consider as in Example 1. We can partition
into
, as depicted in Example 1. One may verify that
satisfies (ii) for every
1. Hence
satisfies the conditions of Corollary 1.
With Corollary 1 we show that completable patterns appear with high probability under uniform random sampling schemes with as little as O(max{r,log d}) samples per column.
Theorem 3. Let 0 1 be given. Suppose
and that each column of X is observed in at least
entries, distributed uniformly at random and independently across columns, with
Then with probability at least 1 will be finitely rank-r completable (if
) and uniquely completable (if
).
In many situations, though, sampling is not uniform. For instance, in vision, occlusion of objects can produce missing data in very non-uniform random patterns. In cases like this, we can partition (e.g., randomly) into matrices
, each with
columns. We can use Algorithm 1 below to determine whether each
satisfies (ii). If this is the case,
is completable by Corollary 1. More about this is discussed in Section 3.
To present the algorithm, let us introduce the matrix A that will allow us to determine efficiently whether a sampling satisfies (ii). Let
be a matrix formed with ˘
columns of
, and let
index the nonzero entries in the
column of
. Let U be a
matrix drawn according to
, an absolutely continuous distribution with respect to the Lebesgue measure on
, and let
denote the restriction of U to the nonzero rows in
. Let
be a nonzero vector in ker
, and
be the vector in
with the entries of
in the nonzero locations of
and zeros elsewhere. Finally, let A denote the
˘N matrix with
as columns.
Algorithm 1 will verify whether dimker , and this will determine whether
contains a
matrix
satisfying (ii). The key insight behind Algorithm 1 is that A encodes the information of the projections of S = span{U} onto the canonical coordinates indicated by
. Theorem 1 in [21] shows that these projections will uniquely determine S if and only if dimker
, which will be the case if and only if
contains a
matrix
satisfying (ii). We thus have the following corollary, which states that with probability 1, Algorithm 1 will determine whether
contains a matrix
satisfying (ii).
Corollary 2. Let be a matrix formed with ˘
columns of
. Construct A as in Algorithm 1. Then
-almost surely,
contains a
matrix
satisfying (ii) if and only if dim ker
.
Algorithm 1 can also be used to design completable samplings. As will be discussed in Section 3, this can be particularly useful for adaptive settings, where one may choose which entries to observe, yet it is undesirable or impossible to observe full columns or full rows.
Example 4. One may use Algorithm 1 to verify that each of the r + 1 blocks in the sampling matrix below satisfies (ii). This implies that
satisfies the conditions of Corollary 1, and can thus be uniquely completed. In contrast with Example 1, this pattern does not sample full columns nor full rows.
In this section we discuss implications of the results stated above and explore how well they predict performance in a series of simulation experiments. In all our experiments we use the so-called iterative hard-thresholded SVD (IHTSVD) algorithm [12]. This algorithm iterates between truncating the SVD of the current estimate to a user-specified rank r, and then replacing the values in the observed entries with their original (observed) values. This algorithm is also quite similar to the Singular Value Thresholding algorithm [13], OptSpace [14] and FPCA [15]. In the very low sampling regimes of interest in our studies, we found the IHTSVD algorithm typically performed as well or better than several other completion algorithms (e.g., SVT [13], GROUSE [16], alternating minimization [17] and EM [18]).
Lower bound
It is easy to see that max{r,log d}) uniformly randomly sampled entries per column are necessary to complete an
matrix. This is because a column with fewer than r observed entries cannot be completed, and if fewer than O(log d) uniformly random samples per column are observed, then a row may be completely unobserved with large probability, making it impossible to complete a matrix. Thus,
max{r,log d}) is a lower bound for LRMC.
It was further shown [4] that there exist matrices that cannot be completed unless log d) uniformly randomly sampled entries per column are observed, where
is the standard coherence parameter defined as
where denotes the projection operator onto
, and
the
canonical vector in
. Our results imply that this is only the case for a set of matrices with measure zero, and that a.e. matrix can be uniquely completed with as little as
max{r,log d}) uniformly randomly sampled samples per column, regardless of
.
To better understand this, and see that our results do not contradict previous theory, let us revisit the proof of Theorem 1.7 in [4]. The proof is based on the construction of block-diagonal matrices with blocks of size and coherence
that cannot be recovered with fewer than
log d) uniformly random samples per column, e.g.,
This is so because zero valued entries provide no information for the reconstruction process. It follows that the larger , the smaller the blocks will be, and more intensive random sampling would be required to guarantee that entries in
Figure 2: Theoretical sampling regimes of LRMC. In the white region, where the dashed line is given by , it is easy to see that LRMC is impossible by a simple count of the degrees of freedom in a subspace (see Section 4). In the light-gray region, LRMC is possible provided the entries are observed in the right places, e.g., satisfying the conditions of Theorem 2. By Theorem 3, uniform random samplings will satisfy these conditions with high probability as long as
, hence with high probability LRMC is possible in the dark-grey region. Previous analyses showed that LRMC is possible from uniform random sampling in the striped region [3], but the rest remained unclear until now.
the diagonal blocks are observed. This is why more samples (log d) per column) are required to reconstruct more coherent matrices like this one, and hence the dependency on
in the bound of Theorem 1.7 in [4].
However, matrices with this block structure have measure zero (with respect to the measure defined above). Our results show that for a.e. matrix, an incomplete column contains the same exploitable information regardless of the coherence parameter, and O(max{r,log d}) uniform random entries per column are sufficient for completion. This means that while there are some matrices that require log d) uniform random samples per column for reconstruction, a.e. matrix only requires O(max{r,log d}), regardless of
.
Sample Complexity
Coherence aside, it is also known that N = d columns, and log d) uniform random samples per column are sufficient for completion [3]. Theorem 3 extends this result, showing that
columns and
max{r,log d}) uniform random samples per column are sufficient to uniquely complete a.e. matrix. This exposes an interesting tradeoff between the required number of columns and observed entries per column for completion, defining new unstudied sampling regimes where completion is now known to be possible (Figure 2).
The purpose of our first experiment is to support that max{r,log d}) random samples per column are truly sufficient for LRMC, as opposed to O(r log d). To this end, we will study the behavior of the IHTSVD algorithm as a function of the ambient dimension d and the rank r (see the beginning of Section 3 for a discussion of this algorithmic choice).
To obtain low-rank matrices, we first generated a random matrix
with N(0,1) i.i.d. entries to use as basis of
. We then generated an
1
random matrix
, also with N(0,1) i.i.d. entries, to use as coefficient vectors, to construct
. Matrices generated this way are known to have low coherence.
Figure 3: Results of the IHTSVD algorithm as a function of the ambient dimension d and the number of uniform random samples per column 7. In each of the 2, 000 trials we declared a success if the normalized completion error was below 10
(using normalized Frobenius norm). The black line represents the linear discriminant between success and failure trials.
Next, for different values of the rank r, we tested whether a matrix could be completed as a function of its ambient dimension d and the number of uniform random samples per column . For example, the results of this experiment for r = 7 can be seen in Figure 3.
We then computed the linear discriminant between successful and failure trials for each value of r. If log d) samples were necessary, we would expect the slope between these lines to grow proportionally to r. However, the results, depicted in Figure 4, show that the slope of these lines remain fairly constant, and the offset grows with r, supporting that
max{r,log d}) samples are sufficient.
Computational Complexity
Our results show that completion is theoretically possible with as little as with max{r,log d}) uniform random samples per column, or even with as little as
1 (provided they are located in the right places). Nevertheless, this may involve solving the system of polynomial equations F = 0 (see Section 4), which is computationally impractical. It is thus currently unknown whether there exist practical completion algorithms for these uncharted sampling regimes.
We now present a series of experiments that suggest three things: first, that even in cases where LRMC is theoretically possible, missingness seems to come at a price: the more missing data the more computationally expensive completion seems to be. This further suggests that there is a minimal sampling regime where, though theoretically possible, LRMC might be computationally prohibitive in practice. Second, that even though theoretically, whether a.e. matrix can be completed does not depend on its coherence, in practice, extremely coherent matrices may be computationally more expensive to complete. Sim-
Figure 4: Linear discriminants for different values of the rank r, between successful (above line) and unsuccessful (below line) completions for the experiment in Figure 3. That is, for a given above the linear discriminant typically succeeds at completion, and below the linear discriminant typically fails. Theorem 3 shows that
uniform random observations per column are sufficient for completion. The slope of these lines remain fairly constant, and the offset grows with r, supporting this result.
ilarly, this suggests that there is a maximal coherence regime where, though theoretically possible, LRMC might be computationally prohibitive in practice. And third, there seems to be an additional uncharted sampling regime with log d) samples per column where completion is computationally feasible.
To summarize, we have the following sampling regimes, where LRMC is:
We first study the computational cost of missing data. To this end we computed the minimum number of iterations required to complete a matrix, as a function of the number of uniform random samples per column . The results are summarized in Figure 5. Unsurprisingly, the more missing data, the more iterations are required to complete the matrix.
In addition, we constructed samplings with only
1 samples per column selected uniformly at random, and kept only those samplings satisfying the conditions of Corollary 1, to guarantee that
were uniquely completable (we used Algorithm 1 to determine whether each sampling satisfied these conditions). Unfortunately, even though
was uniquely completable, the matrix was incorrectly completed in every single trial. This suggests that completion in this regime, now known to be theoretically possible (through the solution of the polynomial system F = 0; see Section 4), might be computationally prohibitive in practice.
We thus tested how much missing data can practical algorithms handle while remaining computationally efficient. To this end, we sampled log d entries per column, drawn uniformly at random, and ran the IHTSVD algorithm for
Figure 5: Average number of iterations (over 500 trials) required by IHTSVD to complete a matrix with low coherence (3) with an accuracy of 10
(using normalized Frobenius norm), as a function of
, the proportion of uniform random samples per column, with ambient dimension d = 500 and rank r = 10.
Figure 6: Average completion error of IHTSVD (over 500 trials) for different levels of additive i.i.d. zero-mean Gaussian noise (noise variance as indicated in the legend), after at most 250 iterations, as a function of , the proportion of uniform random samples per column, with ambient dimension d = 500 and rank r = 10. Previous guarantees would require all entries to be observed, and so in practice, existing theory would not allow one to confirm the correctness of a completion. Our results do. The dashed line represents
with
, the sufficient condition of Theorem 3, which implies that with probability at least 1
above this threshold, a rank-r completion is guaranteed to be correct, regardless of the completion method.
at most T = d iterations (see the beginning of Section 3 for a discussion of this algorithmic choice).
To truly test this regime, we considered a setup where previous theory would require all entries to be observed to guarantee a correct completion with probability at least 1 . There are plenty of such scenarios. We arbitrarily selected d = 500 and r = 10, and
.
Our simulations, summarized in Figure 6, show that practical algorithms tend to work consistently well with log
. This suggests that there is a regime with
log
samples per column where completion is computa- tionally feasible, even in the presence of noise.
Figure 7: Results of the IHTSVD algorithm as a function of the coherence parameter and the proportion of uniform random samples per column
, with ambient dimension d = 100 and rank r = 5. Similar results were observed for other algorithms, including alternating minimization [17] and EM [18]. In each of the 5, 000 trials we declared a success if the normalized completion error was below 10
(using normalized Frobenius norm). Our theoretical results show that whether a.e. matrix can be uniquely completed does not depend on its coherence. This experiment suggests that in practice, this is also the case for most of the range of
. For instance, given p, the success rate of this algorithm is about the same for most of the range of
17). Nevertheless, the success rate quickly decays if the coherence is extremely high (
close to the maximum possible,
Dependence on Coherence Parameter
In our next experiment, we study the practical role of coherence in LRMC. More precisely, we tested whether a matrix could be computationally efficiently completed as a function of its coherence parameter , and the number of uniform random samples per column
(to generate matrices with a specific coherence parameter, we simply increased the magnitude of a few entries in
, until it had the desired coherence). The results, summarized in Figure 7, suggest that for most of the coherence range, whether this algorithm can correctly complete the matrix mainly depends on the number of samples rather than on the coherence parameter. For instance, see in Figure 7 that given the number of samples, the success rate of this algorithm is about the same for most of the range of
. Nonetheless, there are some cases with extremely large coherence (
close to the maximum possible,
, corresponding to subspaces almost perfectly aligned with the canonical axes), where this algorithm tends to fail more often at reconstructing the matrix (these are cases where most of the information is concentrated in only a few entries, which brings computational and numerical accuracy problems).
To further study the role of coherence in practice, we recorded the number of iterations that were required to complete each matrix (in the success cases of the previous experiment). The results, summarized in Figure 8, suggest
Figure 8: Average number of iterations (of the success trials from Figure 7) required by IHTSVD to complete a matrix with an accuracy of 10(using normalized Frobenius norm), as a function of its coherence parameter
. This suggests the existence of a maximal coherence regime (e.g., after the dashed line) where, though theoretically possible, completion may become computationally impractical.
that while coherent matrices may be theoretically as completable as incoherent ones, in practice, the more coherent a matrix is, the more computationally expensive it may be to complete it. Furthermore, the number of iterations seems to increase steadily for most of the coherence range, but after a transition point it suddenly seems to grow exponentially, suggesting the existence of a maximal coherence regime where, though theoretically possible, completion may be computationally impractical (similar to the minimal sampling regime from our previous experiment).
New Guarantees
It is known that log d) uniform random samples per column (with constants greater than 1) are sufficient for completion [3]. There are non-pathological regimes (e.g., d = 500 and r = 10, or d = 100 and r = 5, and ideal coherence, as in our experiments) where these conditions end up requiring that all entries are observed. Experiments show that the IHTSVD algorithm can exactly complete such matrices when even fewer than half of the entries are observed, but prior theory gives no guarantees in these regimes, and so in practice, one would be unable to confirm the correctness of a completion.
Furthermore, typical conditions for LRMC usually apply to matrices with bounded coherence, and require uniform random sampling with rates that depend on the coherence parameter . In many practical applications, sampling is hardly uniform (e.g., vision, where occlusion of objects produce missing data in very non-uniform random patterns), and
is typically unknown, so the existing theory does not allow one to confirm the correctness of a completion.
Our results shed new light on these issues. Theorem 2 states that regardless of coherence and the sampling model, if the observation pattern satisfies the conditions of the theorem, a rank-r completion, obtained by any method whatsoever, is guaranteed to be the correct completion. In particular, Theorem 3 states that this will be the case with high probability under uniform random
Figure 9: We generated 1 samples per column, selected uniformly at random, with d = 100, r = 5. This figure shows the proportion of times (over 500 trials) that
contains a
satisfying (ii), as a function of ˘N. We used Algorithm 1 to determine whether this was the case. Notice that as ˘N grows, the probability that each
contains an
satisfying (ii) quickly approaches 1.
sampling models.
In some cases one can use Corollary 1 together with Algorithm 1 to verify efficiently and deterministically whether these conditions are satisfied. Recall that Corollary 1 states that unique completability is possible if contains r +1 disjoint matrices
, each of size
satisfying (ii). Given a matrix
, Algorithm 1 allows to verify whether it satisfies (ii). However, it provides no means to select the
’s.
In general, one can construct samplings for which finding the right ’s would require exponential time. However, if the samples are well spread across the rows, then one may validate a completion deterministically by selecting the
’s randomly.
To see this, suppose has (r+1) ˘N columns, with ˘
and exactly r+1 observations per column (see Remark 1). We can randomly partition
into r+1 disjoint submatrices
, each of size ˘N. One can then use Algorithm 1 to verify whether each
contains an
submatrix
satisfying (ii). If this is the case, then we know deterministically that the completion is correct.
Figure 9 shows that as ˘N grows, the probability that each contains an
satisfying (ii) quickly approaches 1. For example, with ˘N as small as 2
, i.e., with only twice as many columns as strictly necessary, each
will contain an
satisfying (ii) with probability larger than .999. This suggests that if we find a low-rank completion of a matrix, we can expect that a random partition will certify it through Corollary 1 and Algorithm 1.
This way, our results can be used to certify the correctness of a completion, thus bringing guarantees applicable to any algorithm, under any sampling model, in lieu of coherence assumptions.
Adaptive Sampling
If one could select which entries of X to observe, perhaps the easiest way to recover X is to sample r linearly independent columns to obtain a basis of the subspace, and then r rows to obtain the coefficients of each column in this basis. However, in many LRMC applications, the entries one may observe can be limited. Take for example recommender systems, where obtaining a complete column equates to asking a single user (column) to evaluate every item (row). In these problems the number or rows can be very large, hence this can be an unreasonable thing to ask. Moreover, the combinations of rows that one may sample could be restricted. An other example arises in distributed settings, where at each location one may only sample certain subsets of all the information.
Our results tell us exactly which entries to look for. Furthermore, it is fairly simple to construct sampling patterns that satisfy the conditions of Theorems 1 and 2 and Corollary 1 that do not require to sample full columns or rows. For instance, we can generate random samplings, use Algorithm 1 to verify whether they satisfy condition (ii) (most of them will; see Figure 9), and keep them or discard them depending on this.
Deterministic constructions are also possible. For instance, it is easy to verify that each of the blocks in Example 4 satisfies (ii), which implies satisfies the conditions of Corollary 1, and can thus be uniquely completed. This example corresponds to asking the
user of each block to rate items i through i+r. We conclude that if the entries one may choose to observe are limited (as is the case in many LRMC applications), one can directly apply our results to adaptively design observation patterns that guarantee completability.
For any subspace, matrix or vector that is compatible with a set of indices , we will use the subscript
to denote its restriction to the coordinates/rows in
. For example, letting
denote the indices of the nonzero rows of the
column of
, then
and
denote the restrictions of the
column in X and
, to the indices in
. We say that an r-dimensional subspace
if
.
The Variety S
Let us start by studying the variety of all r-dimensional subspaces that fit . First observe that in general, the restriction of an r-dimensional subspace to
coordinates is
. We formalize this in the following definition, which essentially states that a subspace is non-degenerate if its restrictions to
coordinates are
.
Definition 1 (Degenerate subspace). We say Gr
is degenerate if and only if there exists a set
with
, such that dim
.
Let denote the uniform measure on Gr
. A subspace is degenerate if and only if an
submatrix of one of its bases is rank-deficient. This is equates to having a zero determinant. Since the determinant is a polynomial in the entries of a matrix, this is a condition of
-measure zero.
Since -almost every subspace is non-degenerate, let us consider only the subspaces in Gr
Gr
, the set of all non-degenerate r-dimensional subspaces of
.
Define Gr
such that every
fits
, i.e.,
Let be a basis of
. The condition
is equivalent to saying that there exists a vector
such that
We can see that if has fewer than r observations, (5) will be an underdetermined system with infinitely many solutions, and hence
can be completed in infinitely many ways.
If has exactly r observations, (5) becomes a system with r equations and r unknowns (the elements of
). This will be the case for every
Gr
. Hence a column with exactly r observations can be uniquely completed once
is known, but it provides no information to identify
.
On the other hand, if has exactly r + 1 observations, then (5) becomes an overdetermined system with r + 1 equations and r unknowns. This imposes one constraint on the elements of
, thus restricting the set of subspaces that fit
.
In general, each column with r + 1 observations will impose one constraint that may reduce one of the degrees of freedom in Gr
. Therefore, one necessary condition for completion is that
imposes at least
constraints.
We will now study these constraints and characterize when exactly will they reduce all the degrees of freedom in Gr
, thus restricting
to a set with at most finitely many elements.
Let be a partition of the r + 1 elements of
, such that
has exactly r elements, and
has only one element. We can then expand (5) as
Since S is non-degenerate, is full-rank, so we may solve for
using the top block to obtain
. Plugging this on the last row, we have that (5) is equivalent to:
On the other hand, lies in
by assumption. This implies that there exists a unique
such that
where is a basis of
. Substituting (7) in (6) we obtain
Recall that , where
and
denote the adjugate and the determinant of
. Therefore, we may rewrite (8) as the following polynomial equation:
We conclude that a subspace S with basis U fits if and only if U satisfies (9) for every i = 1,... ,N.
Since every nontrivial subspace has infinitely many bases, even if there is only one r-dimensional subspace in , the variety
has infinitely many solutions. Therefore, we will associate a unique U with each subspace as follows. Observe that for every Gr
, we can write S = span{U} for a unique U in the following column echelon form:
On the other hand, every defines a unique r-dimensional subspace of
, via span{U}. Moreover, span{U} will be non-degenerate for almost every V, with respect to
: the Lebesgue measure on
. Let
denote the set of all
matrices V whose span{U} is non-degenerate, or equivalently, whose
submatrices of U are full-rank. Then we have a bijection between Gr
and
via
span{U}. It follows that a statement holds for
-almost every pair
if and only if it holds for
-almost every pair
. We will use these measures interchangeably.
The Set F
Continuing with our analysis, recall that a subspace S with basis U will fit if and only if U satisfies (9) for every i. With this in mind, define
with U and in the column echelon form in (10). We will use
as shorthand, with the understanding that
is a polynomial in the elements of V, and that the elements of
and
play the role of coefficients.
and use F(V), or simply F as shorthand, with the understanding that F is a set of polynomials in the elements of V, and that the elements of and
play the role of coefficients. We will also use F = 0 as shorthand for
. This way, we may rewrite:
In general, the affine variety
could contain an infinite number of elements. We are interested in conditions that guarantee there is only one or (slightly less demanding) only a finite number. The following lemma states that this will be the case if and only if polynomials in F are algebraically independent.
Lemma 2. Let A1 hold. For a.e. contains at most finitely many subspaces if and only if
polynomials in F are algebraically independent.
Proof. By our previous discussion, for a.e. X there are at most finitely many subspaces in if and only if there are at most finitely many points in V(F). We know from algebraic geometry that this will be the case if and only if dim V(F) = 0 (see, e.g., Proposition 6 in Chapter 9, Section 4 of [19]).
Since , we know that if dimV(F) = 0, then F must contain
algebraically independent polynomials (see, e.g., Exercise 16 in Chapter 9, Section 6 of [19]).
On the other hand, we know that dim V(F) = 0 if polynomials in F are a regular sequence (see, e.g., Exercise 8 in Chapter 9, Section 4 of [19]).
Finally, since being a regular sequence is an open condition, it follows that for -almost every
, polynomials in F are algebraically independent if and only if they are a regular sequence (see, e.g., Remark 3.4 in [20]).
Remark 2. The next part of our analysis studies conditions to guarantee that the polynomials in F are algebraically independent. Following up on Remark 1, any observation, in addition to the r + 1 per column that we assume, cannot increase the number of subspaces that agree with the observations. In effect, each observed entry, in addition to the first r + 1 observations, places one additional polynomial constraint analogous to . However, the polynomials produced by the same column share the same coefficient
. Intuitively, this means that the polynomials are no longer generic. While these polynomials might or might not be algebraically dependent, in general it is difficult to determine which is the case.
For this reason we assume A1: that each column is observed on exactly r +1 entries. This way the r + 1 entries in each column produce only one polynomial constraint. This guarantees that we only use one polynomial per column, so that all the coefficients of the polynomials in F are generic, and easier to study. In general, if some columns are observed on more than r + 1 entries, all we need is that the observed entries contain a pattern with exactly r + 1 observations per column satisfying our sampling conditions.
Algebraic Independence
By the previous discussion, there are at most finitely many r-dimensional subspaces that fit if and only if there is a subset ˜F of
polynomials in F that is algebraically independent.
Whether this is the case depends on the supports of the polynomials in ˜F, i.e., on : the subset of columns in
corresponding to such polynomials. Lemma 3 shows that the polynomials in ˜F will be algebraically independent if and only if
satisfies the conditions in Theorem 1.
Lemma 3. Let A1 hold. For a.e. X, the polynomials in ˜F are algebraically dependent if and only if for some matrix
formed with a subset of the columns in
.
To show this statement we will use Lemmas 4 and 5 below.
Let be a subset of the columns in
, and let
be the subset of the
polynomials in ˜F corresponding to such columns. Notice that
only involves the variables in U corresponding to the
nonzero rows of
.
Let be the largest number of algebraically independent polynomials in
.
Proof. Observe that the column echelon form in (10) was chosen arbitrarily. As a matter of fact, for every permutation of rows and every
Gr
, we may write S = span{U}, for a unique U in the following permuted column echelon form:
For example, we could take to swap the top and bottom blocks in (10), and take U in the following form:
Observe that in general, U, V and F will be different for each choice of . Nevertheless, the condition
is invariant to the choice of basis of S. This implies that while different choices of
produce different F’s, the variety
is the same for every . This implies that the number of algebraically independent polynomials in
is invariant to the choice of . Therefore, showing that Lemma 4 holds for one particular
suffices to show that it holds for every
.
With this in mind, take such that U is written with the identity block in the position of r nonzero rows of
.
Since the polynomials in only involve the elements of the
rows of U corresponding to the nonzero rows of
, and U has the identity block in the position of r nonzero rows of
, it follows that the polynomials in
only involve the
variables in the
corresponding rows of V. Furthermore,
has at least one solution. This implies
, as desired.
We say is minimally algebraically dependent if the polynomials in
are algebraically dependent, but every proper subset of the polynomials in
is algebraically independent.
Lemma 5. For a.e. X, if is minimally algebraically dependent, then
1.
In order to prove Lemma 5 we will need the next two lemmas.
Lemma 6. Take such that
. For a.e. X, if
is minimally algebraically dependent, then all solutions to
satisfy
.
The intuition behind this lemma is as follows: suppose for contrapositive that there are infinitely many solutions to
with
. Each of these solutions defines a different subspace. Since
is minimally algebraically dependent, a.e. solution to
must fit
. This will only happen if
lies in the intersection of infinitely many r-dimensional subspaces, which is at most
-dimensional. But since
is drawn from
(an r-dimensional subspace), we know that almost surely
will not lie in such
-dimensional subspace.
Proof. Suppose that is minimally algebraically dependent, and let
denote the row of V corresponding to
, such that
simplifies into
Since involves
must contain at least one polynomial in
(otherwise
cannot be minimally algebraically dependent). This means that
contains at least one polynomial
involving
:
We want to show that if is minimally algebraically dependent, then
is the only solution to
. So define
, and assume for contradiction that there exists a solution to
with
and
, that is also a solution to
.
Next consider the univariate polynomials in vevaluated at this solution:
and observe that since are a solution to
, then
and
must have a common root.
We know from elimination theory that two distinct polynomials have a common root if and only if their resultant
is zero (see, for example, Proposition 8 in Chapter 3, Section 5 of [19]).
But is a polynomial in the coefficients of
and
. In other words,
for some nonzero polynomial h in
and
. Therefore,
0 for
-almost every
(since the variety defined by h = 0 has measure zero). Equivalently,
0 for a.e. X. Since
0, it follows that
and
do not have a common root v
, which is the desired contradiction.
This will be true for either almost every in an infinite collection, or for every
in a finite collection. In the first case, we would conclude that
has infinitely fewer solutions than
, in contradiction to the minimally algebraically dependent assumption. In the second case, we conclude that
is the only solution to
.
Since vwas an arbitrary entry of
, we conclude that for a.e. X, if
is minimally algebraically dependent, then
is the only solution to
, as desired.
Define as the partition of the variables involved in the polynomials in
, such that all the variables in
are uniquely determined by
.
Lemma 7. Suppose and that every
is a polynomial in at least one of the variables in
. Then for a.e. X, all the variables involved in
are uniquely determined by
.
Proof. Let vbe one of the variables in
and let
be a polynomial in
involving v
. By assumption on
also involves at least one of the variables in
, say v.
v. This way,
is shorthand for
.
mined by .
ate polynomial
Now assume for contradiction that there exists another solution to with
. Let
be an other solution to
, and define
We will first show that . To see this, recall the definition of
, and observe that it depends on the choice of
. Nevertheless, it is easy to see that
0 describes the same variety regardless of the choice of
. Intuitively, this means that even though
might look different for each choice of
, it really is the same.
Therefore, we may select to be the element of
corresponding to the position of a variable of w that takes different values in
and
. This way, a variable with multiple solutions is located in the location of
. Since
is linear in
, it follows that
for
-almost every
.
Now observe that since v is uniquely determined by and
have a common root, which immediately implies that there are at most finitely many distinct
. Otherwise, v would be a common root to infinitely many distinct polynomials, which
-almost surely cannot be the case.
We know from elimination theory that two distinct polynomials have a common root if and only if their resultant
is zero (see, for example, Proposition 8 in Chapter 3, Section 5 of [19]).
But is a polynomial in the coefficients of g and
. In other words,
for some nonzero polynomial h in
and
. Therefore,
0 for
-almost every
(since the variety defined by h = 0 has measure zero). Equivalently,
0 for a.e. X.
Since 0, it follows that g and
do not have a common root v, which is the desired contradiction. This is true for all of the finitely many
. This shows that for a.e. X, all the variables in w (including v
) are uniquely determined by
.
Since vwas an arbitrary element in
, we conclude that all the variables in
are uniquely determined by
.
With this, we are now ready to present the proofs of Lemma 5, Lemma 3 and Theorem 1.
Proof. (Lemma 5) By the same arguments as in Lemma 4, whether is minimally algebraically dependent is invariant to any permutation
of the rows of the column echelon form in (10). Therefore, showing that Lemma 5 holds for one particular choice of
suffices to show it holds for every
.
With this in mind, suppose is minimally algebraically dependent. Take
such that U and
are written in the column echelon form in (10) with the identity block in the rows indexed by
, and let
denote the row of V corresponding to
, such that
We know by Lemma 6 that is uniquely determined by
. We will now iteratively use Lemma 7 to show that all the variables in
(which are the same as the variables in
) are also uniquely determined by
. This will imply that all the variables in
are finitely determined by
, and that
contains the same number of polynomials,
, as variables,
, which is the desired conclusion.
First observe that since is finitely determined by
must contain at least r polynomials in
. Denote these polynomials by
.
We will proceed inductively, indexed by 1. First, set t = 1 and define
. We showed above that the variables in
are uniquely determined by
. Suppose that
involves some variables other than those in
. Note that every polynomial in
involves at least one of the variables in
. Let
be the set of all variables involved in
. By Lemma 7, all the variables in
are uniquely determined by
.
We will now proceed inductively. For any 2, let
be a subset of
variables in V. Assume that all the variables in
are uniquely determined by
. Since dim
dim
, it follows that all the variables in
are finitely determined by
. It follows that
must contain at least
algebraically independent polynomials, each involving at least one of the variables in
. Let
be this set of polynomials. Suppose
involves some variables other than
. Define
to be the set of all variables involved in
. By Lemma 7, all the variables in
are uniquely determined by
.
Since this is true for every t, and there are finitely many variables, this process must terminate at some finite step T, at which point is a set of
algebraically independent polynomials in
variables.
This means that all the variables in are finitely determined by
, and since
only involves a subset of the variables in
, it follows that the polynomials in
are algebraically dependent. Furthermore, since
is minimally algebraically dependent by assumption, we have that
.
Finally, observe that contains
polynomials in
variables. Since
, and
has
polynomials in
variables, it follows that
, as desired.
Proof. (Lemma 3)
() Suppose
is minimally algebraically dependent. By Lemma 5,
, and we have the first implication.
() Suppose there exists an
with
. By Lemma 4,
, which implies the polynomials in
, and hence ˜F, are algebraically dependent.
Proof. (Theorem 1)
() Suppose for contrapositive that for every
formed with
columns of
, there exists an
formed with a subset of its columns such that
. Lemma 3 implies that the polynomials in
, and hence ˜F, are algebraically dependent. It follows by Lemma 2 that there are infinitely many subspaces in
.
() Suppose that for some
formed with
columns of
, every
formed with a subset of the columns in
satisfies
, including
. By Lemma 3, the
polynomials in ˜F are algebraically independent. It follows by Lemma 2 that there are at most finitely many subspaces in
, hence at most finitely many rank-r completions of
.
In this section we give the proof of Theorem 2. We will use and
to denote the
and
submatrices of
corresponding to
and
. In addition, let
and
denote the
columns of
and
.
In order to prove Theorem 2, we will require Theorem 1 in [21], which we state here as the following lemma, with some minor adaptations to our context.
Lemma 8. Suppose is a
matrix with binary entries for which (ii) holds and let
Gr
. Then for
-almost every
if and only if
.
With this, we are ready to give the proof of Theorem 2.
Proof. (Theorem 2) Suppose contains two disjoint matrices
and
satisfying the conditions of Theorem 2.
Since satisfies (i), by Theorem 1 there are at most finitely many r-dimensional subspaces that fit
. Equivalently, the set ˜F, containing the
polynomials defined by the columns in
, is algebraically independent. Let ˆ
be the polynomial defined by
. It follows that the set
ˆ
is algebraically dependent. Let
be a subset of the polynomials in ˜F, such that
ˆ
is minimally algebraically dependent. Then any subspace S with basis U that fits
must satisfy
, implying by Lemma 6 that
.
Therefore, every S that fits both and
must satisfy
. Since
satisfies (ii), it follows by Lemma 8 that
.
In Section 2 we mentioned that there are cases where columns with only r+1 samples are sufficient for unique completability. The next result states that this is indeed the case if r = 1.
Proposition 1. If r = 1, finite completablility is equivalent to unique completability.
Proof. Assume r = 1. Then and
are scalars, so
simplifies into:
This implies that F = 0 is a system of linear equations, hence if it has finitely many solutions, it has only one.
In Section 2 we also mentioned that in general, strictly more than columns with only r + 1 samples are necessary for unique completability. We would like to close this section with an example where this is the case.
It is easy to see that that satisfies the conditions of Theorem 1. One may also verify (for example, solving explicitly F(V) = 0) that for a.e. X there exist two subspaces that fit
.
As a matter of fact, this will also be the case for any permutation of the rows and columns of this matrix. One may construct similar samplings with the same property for larger d and r. All this to say that this is not a singular pathological example; there are many samplings that cannot be uniquely recovered with only columns with r + 1 samples.
In this section we present the proofs of Lemma 1 and Theorem 3. The proof of Corollary 1 follows directly from Lemma 1 and Theorem 2, and the proof of Corollary 2 follows directly from Theorem 1 in [21].
Proof. (Lemma 1) Suppose contains disjoint matrices
satisfying the conditions of Lemma 1. Let
be a matrix formed with a subset of the columns in
. Then
for some matrices
formed with subsets of the columns in
.
It follows that
Assume without loss of generality that this maximum is achieved when 1. Then
where the last two inequalities follow because (2) holds for every by assump- tion, and because
for every
.
Since was arbitrary, we conclude that (1) holds for every matrix
formed with a subset of the columns in
.
The following lemma shows that (ii) is satisfied with high probability under uniform random sampling schemes with only O(max{r,log d}) samples per column.
Lemma 9. Let the assumptions of Theorem 3 hold, and let be a matrix formed with
columns of
. With probability at least 1
will satisfy (ii).
Proof. Let E be the event that for some matrix
formed with a subset of the columns in
. It is easy to see that this will only occur if there is a matrix
formed with n columns of
that has all its nonzero entries in the same
1 rows. Let
denote the event that the matrix formed with the first n columns from
has all its nonzero entries in the first
1 rows. Then
If each column of contains at least
nonzero entries, distributed uniformly and independently at random with
as in (3), it is easy to see that
0 for
, and for
,
For the terms in (12), write
Since ,
and since ,
In this case, since 1 and
, we have
which we may rewrite as
where the last step follows because 3 log
6 log d + 6.Substituting (16) and (18) in (12) and (13), we have that
.
We are now ready to give the proof of Theorem 3.
Proof. (Theorem 3) If , randomly select disjoint matrices
, each formed with
columns of
.
Union bounding over , we may upper bound the probability that
fails to satisfy the conditions of Lemma 1 by
The first part of the statement follows because the conditions in Lemma 1 imply the conditions in Theorem 1.
If , randomly select disjoint matrices
, each formed with
columns of
. By the same arguments, the probability that
fails to satisfy the conditions of Theorem 2 is upper bounded by:
In this paper we give sampling conditions for finite rank-r completability, that is, conditions on the set of observed entries to guarantee that a matrix can be completed in at most finitely many ways. We also provide deterministic sampling conditions for unique completability that can be efficiently verified. In addition, we show that uniform random samplings with only O(max{r,log d}) observed entries per column satisfy these conditions with high probability. These findings have several implications on LRMC regarding lower bounds, sample and computational complexity, the role of coherence, adaptive settings and the validation of any completion algorithm.
We would like to thank Louis Theran for pointing out a mistake in a previous version of the paper. In that earlier version we erroneously assumed that columns with more than r+1 observed entries would yield multiple independent constraints. However, as Theran pointed out through the following example in [11], these constraints may be algebraically dependent. For this reason, in our current analysis we use only one constraint per column.
Example 6. Suppose X is a rank-2 matrix observed on the entries indicated by 1’s
Here is observed on r + 2 entries. Using Definition 1 in our earlier version of this paper,
yields the two central columns of ˘
The two central columns in ˘correspond to
, and encode the two polynomial constraints obtained from the observed entries in
.
While the sampling ˘satisfies the completability conditions of Theorem 1 in our earlier version of this paper, X cannot be completed. This is because the two polynomials defined by
share the same coefficients, which makes them algebraically dependent. If instead of
we observed two columns on the entries indicated by
, then we would also obtain two polynomial constraints. Only these polynomials would have generic coefficients, and would be algebraically independent with probability 1. In fact, a matrix observed on the entries indicated in ˘
(or more) can indeed be completed.
[1] J. Rennie and N. Srebro, Fast maximum margin matrix factorization for collaborative prediction, International Conference on Machine Learning, pp. 713-719, 2005.
[2] K. Weinberger and L. Saul, Unsupervised learning of image manifolds by semidefinite programming, International Journal of Computer Vision, 70(1), pp. 77-90, 2006.
[3] E. Cand`es and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, 9(6), pp. 717-772, 2009.
[4] E. Cand`es and T. Tao, The power of convex relaxation: near-optimal matrix completion, IEEE Transactions on Information Theory, 56(5), pp. 2053-2080, 2010.
[5] B. Recht, A simpler approach to matrix completion, Journal of Machine Learning Research, 12, pp. 3413-3430, 2011.
[6] D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57(3), pp. 1548-1566, 2011.
[7] Y. Chen, Incoherence-optimal matrix completion, IEEE Transactions on Information Theory, 61(5), pp. 2909-2923, 2013.
[8] Y. Chen, S. Bhojanapalli, S. Sanghavi and R. Ward, Coherent matrix completion, International Conference on Machine Learning, pp. 674-682, 2014.
[9] S. Bhojanapalli and P. Jain, Universal matrix completion, International Conference on Machine Learning, pp. 1881-1889, 2014.
[10] A. Singer and M. Cucuringu, Uniqueness of low-rank matrix completion by rigidity theory, SIAM Journal on Matrix Analysis and Applications, 31(4), pp. 1621-1641, 2010.
[11] F. Kir´aly, L. Theran and R. Tomioka, The algebraic combinatorial approach for low-rank matrix completion, Journal of Machine Learning Research, pp. 1391-1436, 2015.
[12] E. Chunikhina, R. Raich and T. Nguyen, Performance analysis for matrix completion via iterative hard-thresholded SVD, IEEE Statistical Signal Processing Workshop, pp. 392-395, 2014.
[13] J. Cai, E. Cand`es and and Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 20(4), pp. 1956-1982, 2010.
[14] R. Keshavan, A. Montanari and S. Oh, Matrix completion from a few entries, IEEE Transactions on Information Theory, pp. 324-328, 2010.
[15] S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Mathematical Programming, 128(1-2), pp. 321-353, 2011.
[16] L. Balzano, R. Nowak, and B. Recht, Online identification and tracking of subspaces from highly incomplete information, Allerton Conference on Communication, Control and Computing, pp. 704-711, 2010.
[17] P. Jain, P. Netrapalli and S. Sanghavi, Low-rank matrix completion using alternating minimization, ACM Symposium on Theory Of Computing, pp. 665-674, 2013.
[18] D. Pimentel-Alarc´on, L. Balzano and R. Nowak, On the sample complexity of subspace clustering with missing data, IEEE Statistical Signal Processing Workshop, pp. 280-283, 2014.
[19] D. Cox, J. Little and D. O’shea, Ideals, varieties, and algorithms: An introduction to computational algebraic geometry and commutative algebra, Third Edition, Springer, 2007.
[20] A. Aramova, L. Avramov and J. Herzog, Resolutions of monomial ideals and cohomology over exterior algebras, Transactions of the American Mathematical Society, 352(2), pp. 579-594, 2000.
[21] D. Pimentel-Alarc´on, N. Boston and R. Nowak, Deterministic conditions for subspace identifiability from incomplete sampling, IEEE International Symposium on Information Theory, pp. 2191-2195, 2015.