Fast Methods for Recovering Sparse Parameters in Linear Low Rank Models

2016Β·Arxiv

ABSTRACT

ABSTRACT

In this paper, we investigate the recovery of a sparse weight vector (parameters vector) from a set of noisy linear combinations. However, only partial information about the matrix representing the linear combinations is available. Assuming a low-rank structure for the matrix, one natural solution would be to first apply a matrix completion to the data, and then to solve the resulting compressed sensing problem. In big data applications such as massive MIMO and medical data, the matrix completion step imposes a huge computational burden. Here, we propose to reduce the computational cost of the completion task by ignoring the columns corresponding to zero elements in the sparse vector. To this end, we employ a technique to initially approximate the support of the sparse vector. We further propose to unify the partial matrix completion and sparse vector recovery into an augmented four-step problem. Simulation results reveal that the augmented approach achieves the best performance, while both proposed methods outperform the natural two-step technique with substantially less computational requirements.

completion; missing data

1. INTRODUCTION

The most common approach in dealing with low-rank models containing missing information is to apply matrix completion methods. Matrix completion has been applied to recommendation problems. There are efficient matrix completion methods for low rank models in the literature such as Singular Value Thresholding (SVT) introduced by Candes et al in [1], and Optspace method by Keshavan et al in [2]. In our paper, we focus on Soft-Impute (SI) completion method which was first brought up by Hastie et al in [3]; we are assuming the low-rank model for data and sparsity for the parameters. In [4], Goldberg et al introduced the concept of direct method in recovering the parameters assuming the low-rank structure for the data matrix which consists of missing entries. The parameters vector is not assumed to be sparse in the problem model in [4]. However, in our paper we assume that it is sparse and linked to the compressed sensing problem. Therefore, the problem includes both the compressed sensing and matrix completion problems. The straightforward approach in dealing with missing data in the literature is to apply matrix completion methods to the data at first to learn the missing information. Afterwards, sparse recovery methods could be applied on the learned data to detect the sparse parameters vector. In section 2, we will elaborate upon the new method we propose in dealing with the aforementioned problem. The main concern in working with big data is that we want to avoid time-consuming algorithms. In big data scenarios, completion methods in the literature are time-consuming since they work based on performing Singular Value Decomposition (SVD) which is considered to be of high time-complexity. Therefore, timeefficient methods for this problem are of great importance due to the numerous applications big data is accompanied with.

2. PROBLEM MODEL

We consider the problem of finding the sparse signal following true linear model:

Therefore . We also suppose that missing entries. For example, we can assume is generated from an initial as follows:

where denotes the Hadamard product, and denotes Bernoulli distribution with parameter

problem model, we know the matrix and the labels idea is to use an initial non-complex completion method, then we argue that by applying Iterative Method of Adaptive Thresholding for Compressive Sensing (IMATCS) as in [5], we can find a good initial approximation of the support of the signal . Our previous works have shown that IMATCS has good performance in sparse recovery without applying any initial completion on the raw data or with a simple precompletion step; specifically accessing the raw data with missing samples we can have a better estimation of the support by IMATCS rather than LASSO [8]. Afterwards, we proceed considering the raw data on the support columns of the data with missing entries. Then, after dimension reduction, we have less time complexity in recovering the data on support columns. Even if we use an accurate method on the initial data, we have saved large amount of time due to dimension reduction. In other words, we focus on the support of our parameters, and try to be accurate on recovery on the coulmns relating to the support elements. This way, we avoid the time-consuming matrix completion on the entire data. We will include comparisons between the root mean square error (RMSE) values achieved by applying a matrix completion on the entire data followed by sparse recovery and the RMSE values achieved by the introduced approach. To illustrate our approach, we provide the following flowchart for more clarification (coined as the four-step method):

where denotes the simple matrix completion, the data after initial matrix completion. is the data on the support of sparse recovered signal. is the reduced data on which accurate matrix completion is applied. is the final recovered signal.

Below is the flowchart for the two-step method:

Where is the entire data completed with accurate completion method, and is the recovered sparse signal in two-step method. It is worth noting that according to [8], IMATCS outperforms LASSO in recovering the sparse signal parameters and therefore, we have included the results of the IMATCS in the final step of the four-step method and provide comparison with LASSO in sparse recovery at the final step to show how IMATCS yields better RMSE.

The SI method solves the following problem iteratively

assuming the data is low-rank.

3. AUGMENTED FOUR-STEP METHOD

In order to increase the prediction accuracy as well as saving time, we propose a second algorithm by concatenating the data matrix with the column . This again, maintains low rank structure specifically for multi label problems. Next, the resulting matrix is taken into account for imputation and the vector is imputed inside the structure of this matrix. The intuition behind this approach is that the labels help recover the structure of data matrix since it contains useful information about the rank of . One can formulate the problem of finding this vector as follows:

This problem is generally a non-convex problem because if we denote , we have the following relationship for the last column of

which makes non-convex. One approach for solving this minimization problem is to apply coordinate descent method which reduces to the two final stages of four-step method method but the structure of the matrix used is different from considering alone. It is worth noting that that this method is only different from the four-step method in the third and fourth stage. We will minimize a different objective, and finally we will show in the results section that applying the aforementioned method we enhance the performance.

4. ANALYSIS OF TWO-PHASE AND FOUR-PHASE METHODS

In [7], authors have established proofs for bounding the difference between the recovered signal and the original signal in noisy scenario. Here, we came up with the idea of applying theorems in [7] to the matrices completed by SI method, and consider the difference between completed matrices and main data rather than the difference between the noisy matrix and the main data in the final steps of two-phase and four-phase methods.

Let denote the solution to the -th iteration of the SI method. If we apply LASSO on , we can recover the support of with good approximation. Then we see that by using fewer iterations we can find the support of way, we can find the important features in recovering the parameters. We assume after the -th iteration in Soft-Impute method, we have the following definitions:

Let

In [7], the authors introduce a condition (lower restricted eigenvalue (RE) condition) as follows:

satisfies lower RE with curvature and tolerance

In our case, are surrogates for (the final output of Soft-Impute method) and the original , respectively. the two following bounds are proved in [7]:

where denotes the operator norm of the matrix is a decreasing function of the singular value. From now on, we denote

Here, we provide an extension to the main result in [7]. If the Lower RE and the two deviation conditions are met, and if there exist (

there is a universal positive constant such that for any satisfies the following bounds:

Now we show how the deviation bounds could be met.

where the last inequality follows from Cauchy Schwartz

inequality. To achieve this bound for this theorem we can set

In order for the bounds to be valid, we can let

We can set to be the point-wise maximum of

(

holds. So now, what is important for bounding according to the theorem is to make smaller and smaller, to make smaller. What we are looking for is the decrease rate in . This is controlled in SI algorithm, and that is the reason we pick this up as a completion method. Since the is decreasing in each iteration of SI method, the bound gets tighter and the recovered is guaranteed to converge to the original both two-step and one four-step methods.

5. SIMULATION RESULTS

In this part, we illustrate the results of the simulations. Table 2 shows the runtime required for three algorithms on diverse data sizes. The data is generated by forming the orthonormal matrices and the diagonal matrix containing the singular values as Gaussian random matrices. We observe that the runtime for four-phase and augmented four-phase methods are far less than the time required for the two-phase approach. Table 3 provides the RMSEs for the three methods. As a general rule, the RMSE for four-step method is less than the two step method or in some cases slightly less than that (approximately close). The RMSE for augmented four-phase is generally less than four phase; however, the more runtime is required due to enhanced size of the problem. It is worth mentioning that the runtimes are obtained on a 2.7 GHz Intel Core i7 processor. Fig. 1 shows the cross-validated RMSE over parameters and the comparison between the RMSEs are provided. As we observe the RMSE for the augmented four-step method is less than that of four-step method and the two-step method, respectively. The data is generated by forming the svd and the orthonormal matrices and singular values are assumed to be generated as Gaussian random matrices. The parameters is 15-sparse. As we observe the Augmented four-step method and four-step method outperform the ordinary method in addition to saving more time. In the final stage in the four-step and augmented four-step paper, we also applied the IMAT method in sparse recovery. The RMSEs were quite similar to the case where we applied Lasso in the last step verifying the result in [8]. Thus, we do not provide further tables for the results of IMAT in the last step. We have applied completion and learning algorithm on four-fifth of the data (training data), and found the RMSEs on the test data. The general observation is that the four-phase method outperforms the two-phase and its accuracy could be enhanced by the concept of augmentation as explained.

6. CONCLUSION

In conclusion, we notice that large runtime is saved if we restrict the completion on the support of the initial approximation of the parameters vector without losing the performance in the prediction. In order to have an initial approximation of the parameters, we have seen that the IMAT method functions well in sparse recovery. We have found that the four-step method of initial completion followed by applying IMAT (initial sparse recovery), accurate matrix

Fig. 1 RMSE values after cross-validation for the three methods on then data with size 2000500 and 50% missing data and the rank is 100.

7. REFERENCES

[1] Cai, Jian-Feng, Emmanuel J. CandΓ¨s, and Zuowei Shen. "A singular value thresholding algorithm for matrix completion." SIAM Journal on Optimization pp. 1956-1982, 20 Apr (2010).

[2] Keshavan, Raghunandan H., Sewoong Oh, and Andrea Montanari. "Matrix completion from a few entries." In 2009 IEEE International Symposium on Information Theory, pp. 324-328. IEEE, 2009.

[3] Mazumder, Rahul, Trevor Hastie, and Robert Tibshirani. "Spectral regularization algorithms for learning large incomplete matrices." Journal of machine learning research pp. 2287-2322., 11 Aug (2010).

[4] Goldberg, Andrew, Ben Recht, Junming Xu, Robert Nowak, and Xiaojin Zhu. "Transduction with matrix completion: Three birds with one stone." InAdvances in neural information processing systems, pp. 757-765., 2010.

[5] Azghani, Masoumeh, and Farokh Marvasti. "Sparse Signal Processing." New Perspectives on Approximation and Sampling Theory. Springer International Publishing, 2014. 189-213.

[6] Marvasti, Farokh, Arash Amini, Farzan Haddadi, Mahdi Soltanolkotabi, Babak Hossein Khalaj, Akram Aldroubi, Saeid Sanei, and Janathon Chambers. "A unified approach to sparse signal processing." EURASIP journal on advances in signal processing 2012 Feb 22, no. 1 (2012): 1.

[7] Raskutti, Garvesh, Martin J. Wainwright, and Bin Yu. "Minimax rates of estimation for high-dimensional linear regression overballs." Information Theory, IEEE Transactions on 57. pp. 6976-6994, Oct 2011.

[8] A.Esmaeili, F.Marvasti, "Comparison of Several Sparse Recovery Methods for Low Rank Matrices with Random Samples.", arXiv:1606.03672.

designed for accessibility and to further open science