b

DiscoverSearch
About
My stuff
Fast Methods for Recovering Sparse Parameters in Linear Low Rank Models
2016Β·arXiv
ABSTRACT
ABSTRACT

In this paper, we investigate the recovery of a sparse weight vector (parameters vector) from a set of noisy linear combinations. However, only partial information about the matrix representing the linear combinations is available. Assuming a low-rank structure for the matrix, one natural solution would be to first apply a matrix completion to the data, and then to solve the resulting compressed sensing problem. In big data applications such as massive MIMO and medical data, the matrix completion step imposes a huge computational burden. Here, we propose to reduce the computational cost of the completion task by ignoring the columns corresponding to zero elements in the sparse vector. To this end, we employ a technique to initially approximate the support of the sparse vector. We further propose to unify the partial matrix completion and sparse vector recovery into an augmented four-step problem. Simulation results reveal that the augmented approach achieves the best performance, while both proposed methods outperform the natural two-step technique with substantially less computational requirements.

image

completion; missing data

The most common approach in dealing with low-rank models containing missing information is to apply matrix completion methods. Matrix completion has been applied to recommendation problems. There are efficient matrix completion methods for low rank models in the literature such as Singular Value Thresholding (SVT) introduced by Candes et al in [1], and Optspace method by Keshavan et al in [2]. In our paper, we focus on Soft-Impute (SI) completion method which was first brought up by Hastie et al in [3]; we are assuming the low-rank model for data and sparsity for the parameters. In [4], Goldberg et al introduced the concept of direct method in recovering the parameters assuming the low-rank structure for the data matrix which consists of missing entries. The parameters vector is not assumed to be sparse in the problem model in [4]. However, in our paper we assume that it is sparse and linked to the compressed sensing problem. Therefore, the problem includes both the compressed sensing and matrix completion problems. The straightforward approach in dealing with missing data in the literature is to apply matrix completion methods to the data at first to learn the missing information. Afterwards, sparse recovery methods could be applied on the learned data to detect the sparse parameters vector. In section 2, we will elaborate upon the new method we propose in dealing with the aforementioned problem. The main concern in working with big data is that we want to avoid time-consuming algorithms. In big data scenarios, completion methods in the literature are time-consuming since they work based on performing Singular Value Decomposition (SVD) which is considered to be of high time-complexity. Therefore, timeefficient methods for this problem are of great importance due to the numerous applications big data is accompanied with.

We consider the problem of finding the sparse signal 𝛽 in thefollowing true linear model:

image

Therefore |𝑆𝑒𝑝𝑝(𝛽)| = 𝑠. We also suppose that 𝑿  hasmissing entries. For example, we can assume 𝑿 is generated from an initial  𝑿̃ as follows:

image

where βŠ™ denotes the Hadamard product, and π΅π‘’π‘Ÿ(𝛼)denotes Bernoulli distribution with parameter 𝛼 . In the

problem model, we know the matrix  𝑿and the labels π‘Œ. Theidea is to use an initial non-complex completion method, then we argue that by applying Iterative Method of Adaptive Thresholding for Compressive Sensing (IMATCS) as in [5], we can find a good initial approximation of the support of the signal 𝛽. Our previous works have shown that IMATCS has good performance in sparse recovery without applying any initial completion on the raw data or with a simple precompletion step; specifically accessing the raw data with missing samples we can have a better estimation of the support by IMATCS rather than LASSO [8]. Afterwards, we proceed considering the raw data on the support columns of the data with missing entries. Then, after dimension reduction, we have less time complexity in recovering the data on support columns. Even if we use an accurate method on the initial data, we have saved large amount of time due to dimension reduction. In other words, we focus on the support of our parameters, and try to be accurate on recovery on the coulmns relating to the support elements. This way, we avoid the time-consuming matrix completion on the entire data. We will include comparisons between the root mean square error (RMSE) values achieved by applying a matrix completion on the entire data followed by sparse recovery and the RMSE values achieved by the introduced approach. To illustrate our approach, we provide the following flowchart for more clarification (coined as the four-step method):

image

where 𝑀𝐢1denotes the simple matrix completion, 𝑿𝑀𝐢1 isthe data after initial matrix completion. 𝑿𝑆is the data on the support of sparse recovered signal. 𝑿𝑀𝐢2𝑆is the reduced data on which accurate matrix completion is applied. 𝛽̂is the final recovered signal.

Below is the flowchart for the two-step method:

image

Where 𝑿𝑀𝐢2 is the entire data completed with accurate completion method, and 𝛽̂(πœ†1)is the recovered sparse signal in two-step method. It is worth noting that according to [8], IMATCS outperforms LASSO in recovering the sparse signal parameters and therefore, we have included the results of the IMATCS in the final step of the four-step method and provide comparison with LASSO in sparse recovery at the final step to show how IMATCS yields better RMSE.

The SI method solves the following problem iteratively

assuming the data is low-rank.

image

In order to increase the prediction accuracy as well as saving time, we propose a second algorithm by concatenating the data matrix 𝑿with the column 𝑿𝛽. This again, maintains low rank structure specifically for multi label problems. Next, the resulting matrix is taken into account for imputation and the vector 𝛽is imputed inside the structure of this matrix. The intuition behind this approach is that the labels help recover the structure of data matrix since it contains useful information about the rank of 𝑿. One can formulate the problem of finding this vector as follows:

image

This problem is generally a non-convex problem because if we denote [𝑿 𝑿𝛽]  with 𝒁, we have the following relationship for the last column of 𝒁:

image

which makes non-convex. One approach for solving this minimization problem is to apply coordinate descent method which reduces to the two final stages of four-step method method but the structure of the matrix used is different from considering 𝑋 alone. It is worth noting that that this method is only different from the four-step method in the third and fourth stage. We will minimize a different objective, and finally we will show in the results section that applying the aforementioned method we enhance the performance.

In [7], authors have established proofs for bounding the difference between the recovered signal and the original signal in noisy scenario. Here, we came up with the idea of applying theorems in [7] to the matrices completed by SI method, and consider the difference between completed matrices and main data rather than the difference between the noisy matrix and the main data in the final steps of two-phase and four-phase methods.

Let π‘Ώπ‘˜ and π›½π‘˜denote the solution to the π‘˜-th iteration of the SI method. If we apply LASSO on π‘Ώπ‘˜, we can recover the support of π›½π‘˜with good approximation. Then we see that by using fewer iterations we can find the support of π›½π‘˜. Thisway, we can find the important features in recovering the parameters. We assume after the π‘˜-th iteration in Soft-Impute method, we have the following definitions:

image

Let 𝛀̂ denote π‘Ώπ‘˜π‘‡π‘Ώπ‘˜ and 𝛾̂ = π‘Ώπ‘˜

In [7], the authors introduce a condition (lower restricted eigenvalue (RE) condition) as follows:

Lower RE condition: 𝛀̂satisfies lower RE with curvature 𝛼1 β‰₯ 0 and tolerance 𝜏(π‘š, 𝑛) β‰₯ 0 𝑖𝑓 βˆ€ πœƒ: πœƒπ‘‡π›€Μ‚πœƒ β‰₯ 𝛼1||πœƒ||22 βˆ’

image

In our case, 𝛀̂ and 𝛾̂are surrogates for π‘Ώβˆž(the final output of Soft-Impute method) and the original 𝛽, respectively. the two following bounds are proved in [7]:

image

where 𝜎(π‘Ώπ‘˜ βˆ’π‘Ώβˆž )denotes the operator norm of the matrix π‘Ώπ‘˜ βˆ’π‘Ώβˆž , and πœ™ is a decreasing function of the singular value. From now on, we denote 𝜎(π‘Ώπ‘˜ βˆ’π‘Ώβˆž ) with 𝜎𝐷(π‘˜).

Here, we provide an extension to the main result in [7]. If the Lower RE and the two deviation conditions are met, and if there exist ( 𝛼1, 𝜏 such that βˆšπ‘ πœ(π‘š, 𝑛) ≀

image

𝑠there is a universal positive constant 𝑐0such that for any ||𝛽||2 ≀ 𝑏0, π›½π‘˜satisfies the following bounds:

image

Now we show how the deviation bounds could be met.

image

where the last inequality follows from Cauchy Schwartz

inequality. To achieve this bound for this theorem we can set

image

≀ 3𝜎𝐷(2𝜎∞ + 𝜎𝐷)||𝛽||2              (20)In order for the bounds to be valid, we can let πœ™(𝜎𝐷(π‘˜)) =3𝜎𝐷(2πœŽπ‘˜βˆ’πœŽπ·)𝑏0

image

We can set πœ™(𝜎𝐷(π‘˜)) to be the point-wise maximum of

(

image

holds. So now, what is important for bounding ||π›½π‘˜ βˆ’ 𝛽||2according to the theorem is to make 𝜎𝐷(π‘˜) smaller and smaller, to make πœ™(𝜎𝐷(π‘˜))smaller. What we are looking for is the decrease rate in 𝜎𝐷(π‘˜). This is controlled in SI algorithm, and that is the reason we pick this up as a completion method. Since the 𝜎𝐷(π‘˜)is decreasing in each iteration of SI method, the bound gets tighter and the recovered 𝛽is guaranteed to converge to the original 𝛽 inboth two-step and one four-step methods.

image

In this part, we illustrate the results of the simulations. Table 2 shows the runtime required for three algorithms on diverse data sizes. The data is generated by forming the orthonormal matrices and the diagonal matrix containing the singular values as Gaussian random matrices. We observe that the runtime for four-phase and augmented four-phase methods are far less than the time required for the two-phase approach. Table 3 provides the RMSEs for the three methods. As a general rule, the RMSE for four-step method is less than the two step method or in some cases slightly less than that (approximately close). The RMSE for augmented four-phase is generally less than four phase; however, the more runtime is required due to enhanced size of the problem. It is worth mentioning that the runtimes are obtained on a 2.7 GHz Intel Core i7 processor. Fig. 1 shows the cross-validated RMSE over parameters and the comparison between the RMSEs are provided. As we observe the RMSE for the augmented four-step method is less than that of four-step method and the two-step method, respectively. The data is generated by forming the svd and the orthonormal matrices and singular values are assumed to be generated as Gaussian random matrices. The parameters is 15-sparse. As we observe the Augmented four-step method and four-step method outperform the ordinary method in addition to saving more time. In the final stage in the four-step and augmented four-step paper, we also applied the IMAT method in sparse recovery. The RMSEs were quite similar to the case where we applied Lasso in the last step verifying the result in [8]. Thus, we do not provide further tables for the results of IMAT in the last step. We have applied completion and learning algorithm on four-fifth of the data (training data), and found the RMSEs on the test data. The general observation is that the four-phase method outperforms the two-phase and its accuracy could be enhanced by the concept of augmentation as explained.

image

In conclusion, we notice that large runtime is saved if we restrict the completion on the support of the initial approximation of the parameters vector without losing the performance in the prediction. In order to have an initial approximation of the parameters, we have seen that the IMAT method functions well in sparse recovery. We have found that the four-step method of initial completion followed by applying IMAT (initial sparse recovery), accurate matrix

Fig. 1 RMSE values after cross-validation for the three methods on then data with size 2000Γ—500 and 50% missing data and the rank is 100.

[1] Cai, Jian-Feng, Emmanuel J. Candès, and Zuowei Shen. "A singular value thresholding algorithm for matrix completion." SIAM Journal on Optimization pp. 1956-1982, 20 Apr (2010).

[2] Keshavan, Raghunandan H., Sewoong Oh, and Andrea Montanari. "Matrix completion from a few entries." In 2009 IEEE International Symposium on Information Theory, pp. 324-328. IEEE, 2009.

[3] Mazumder, Rahul, Trevor Hastie, and Robert Tibshirani. "Spectral regularization algorithms for learning large incomplete matrices." Journal of machine learning research pp. 2287-2322., 11 Aug (2010).

[4] Goldberg, Andrew, Ben Recht, Junming Xu, Robert Nowak, and Xiaojin Zhu. "Transduction with matrix completion: Three birds with one stone." InAdvances in neural information processing systems, pp. 757-765., 2010.

[5] Azghani, Masoumeh, and Farokh Marvasti. "Sparse Signal Processing." New Perspectives on Approximation and Sampling Theory. Springer International Publishing, 2014. 189-213.

[6] Marvasti, Farokh, Arash Amini, Farzan Haddadi, Mahdi Soltanolkotabi, Babak Hossein Khalaj, Akram Aldroubi, Saeid Sanei, and Janathon Chambers. "A unified approach to sparse signal processing." EURASIP journal on advances in signal processing 2012 Feb 22, no. 1 (2012): 1.

[7] Raskutti, Garvesh, Martin J. Wainwright, and Bin Yu. "Minimax rates of estimation for high-dimensional linear regression overballs." Information Theory, IEEE Transactions on 57. pp. 6976-6994, Oct 2011.

[8] A.Esmaeili, F.Marvasti, "Comparison of Several Sparse Recovery Methods for Low Rank Matrices with Random Samples.", arXiv:1606.03672.


Designed for Accessibility and to further Open Science