Gaussian Process Latent Variable Model Factorization for Context-aware Recommender Systems

2019·Arxiv

Abstract

Abstract

Context-aware recommender systems (CARS) have gained increasing attention due to their ability to utilize contextual information. Compared to traditional recommender systems, CARS are, in general, able to generate more accurate recommendations. Latent factors approach accounts for a large proportion of CARS. Recently, a non-linear Gaussian Process (GP) based factorization method was proven to outperform the state-of-the-art methods in CARS. Despite its effectiveness, GP model-based methods can suffer from over-fitting and may not be able to determine the impact of each context automatically. In order to address such shortcomings, we propose a Gaussian Process Latent Variable Model Factorization (GPLVMF) method, where we apply an appropriate prior to the original GP model. Our work is primarily inspired by the Gaussian Process Latent Variable Model (GPLVM), which was a non-linear dimensionality reduction method. As a result, we improve the performance on the real datasets significantly as well as capturing the importance of each context. In addition to the general advantages, our method provides two main contributions regarding recommender system settings: (1) addressing the influence of bias by setting a non-zero mean function, and (2) utilizing real-valued contexts by fixing the latent space with real values.

1 Introduction

With the advent of the era of big data, users are suffering from the information overload problem. Recommender systems are designed to help users find out items of interest, while context-aware recommender systems (CARS) refine recommendations by exploiting additional contextual information that can have an impact on users’ behavior. For example, a user may have preferences on (1) the time of weekday (or weekend) when he/she watches a movie, and (2) the location of office (or home) where he/she uses a mobile phone app. Thus, researchers have introduced CARS that extend the user–item modeling to a more complex user–item–context modeling.

Among existing paradigms for recommender systems, collaborative filtering approaches that predict the interests of a user by collecting preferences from other users are most popular due to their good accuracy and scalability [12]. Because of this reason, it is also the focus of this paper. We concentrate on the class of the latent factor methods based on collaborative filtering that typically involves matrix factorization methods [13] for traditional recommender systems and tensor factorization methods [11] for CARS.

In the realm of latent factor methods for CARS, there are two popular classes of approaches. The first class is based on tensor factorization methods [11, 33], which extend the classical two-dimensional matrix factorization to an n-dimensional tensor factorization. The second class is based on factorization machines [27, 4] which originated from a general predictor [26] that can use second-order feature combinations efficiently. However, both classes adopt the linear combination of second-order or higher-order latent factors to represent user–item–context interactions. In this work, we seek a natural way to capture the inherent non-linear structure of real-world data by the Gaussian process (GP).

GP is a widely used stochastic process in machine learning [25]. Standard GP models can only deal with supervised machine learning tasks while an approach called Gaussian process latent variable model (GPLVM) [15] is designed to address un-supervised problems. Supervised GP learning of user preferences and un-supervised dimensionality reduction from ratings to latent factors constitute the foundation of GP-based collaborative factorization methods. Two works that investigate non-linear matrix factorization for conventional recommender systems [16] and non-linear factorization for CARS [21] are proposed. Both of them use the Gaussian process to realize non-linear factorization and have gained state-of-the-art performances compared to linear methods.

Despite the successful application of the Gaussian process latent variable model to context-aware recommender systems, the model itself cannot infer the impact of each context and is sensitive to over-fitting because it does not marginalize out the latent variables [5]. To address these two drawbacks of GP-based factorization for CARS, we introduce a prior to the latent variables. Inspired by Gaussian process latent variable model [30], where a variational inference framework for training is applied, we further implement a generalized variational inference that includes a non-zero mean function during the GP to solve the model. Besides, our model is flexible enough to integrate both categorical and real-valued contexts by fixing the latent variable with corresponding real values for real-valued contexts while adding a prior to the latent variable for categorical contexts.

As a result, we have developed a powerful non-linear collaborative filtering method, which we name, Gaussian Process Latent Variable Model Factorization (GPLVMF), to improve the performance of GPbased factorization methods for CARS further . To summarize, the main contributions of this paper are:

• We propose a novel algorithm named GPLVMF to achieve a Bayesian non-linear factorization for CARS. Different from GPbased methods, GPLVMF aims to address the over-fitting problem via implementing a prior distribution to the latent variables.

• We model the non-zero mean function during Gaussian processes to capture the bias and provide a generalized variational inference solution. Also, our method can flexibly deal with both real-valued and categorical contexts.

• Both scaled conjugate gradient (SCG) and stochastic gradient descent (SGD) optimization methods are applied to solve the model. The experiment results show that GPLVMF not only improves the accuracy of the real datasets but also can automatically infer the influence of each context.

2 Related Work

There are three main approaches to integrate context information into recommender systems [2], namely: (1) contextual pre-filter where contextual information is used as a filter before applying context [1, 23], (2) contextual post-filtering where contextual information is initially ignored and is then filtered after using traditional recommender algorithms [24], and (3) contextual modeling where contextual information is integrated into the process of modeling directly, which is what this paper is focusing on, since it does not require supervision and fine-tuning in all steps [27].

In contexture modeling, we introduce three classes of approaches. The first class is based on the matrix factorization [3]. Later, [18] used biased matrix factorization as the base model while [32] adopted matrix completion modeling to address CARS. The second class is based on tensor factorization [11]. Then [8, 29] proposed tensor factorization-based methods for implicit feedback data. The final one is based on factorization machines [26]. A factorization machinesbased method to solve the cross-domain problem was proposed by [19] while a higher order factorization machine was proposed by [4]. However, these methods treat user–item–context interactions as linear combinations of latent factors and are insufficient to capture the complex non-linear inter-relationships between the three entities in CARS.

GP has been applied to the recommendation problems. A non-linear method for matrix factorization based on GP [16] for traditional recommender systems was shown to outperform standard matrix factorization methods. Collaborative Gaussian processes for preference learning [9] was proposed to learn pairwise preferences expressed by multiple users. Besides, the GP was used to address the challenge of ranking recommendation on click feedback recommender system [31]. Recently, a GP-based factorization machine for context-aware recommender systems was proposed [21], and it can outperform factorization machines and tensor factorization methods. However, none of these methods have studied the case of applying Gaussian process latent variable model to the CARS.

3 Gaussian Process Latent Variable Model Factorization

In this section, we elaborate on our proposed Bayesian Gaussian process factorization method. First, we explain how to use latent factors to represent the user–item–context interactions for CARS. Then we describe the Gaussian process latent variable model factorization via introducing a prior to latent factors. Finally, we derive a variational inference for solving the model.

3.1 Latent Representation

A conventional recommender system is an information filtering system that seeks to predict the rating a user would give to an item. We denote users and items by tively, where N = |U| and L = |V |. When it comes to the context-aware setting, multiple contexts can be denoted by describe their contextual information (location, time etc.), where

Latent factors approach defines transformation from observation to its latent representation. Let each element of the entities, including user, item, and each of the multiple contexts be represented by a real-valued vector in the latent space U and V. Suppose the dimen- sion of latent spaces for users, items, and contexts are respectively. Then, we have U and V. Here, we use bold capital letters to denote matrices, bold letters for vectors, and regular letters for scalars.

3.2 Gaussian Process

A Gaussian process prior defines a distribution over a real-valued function f(x). Formally, the set of function values f on a collection of any finite inputs X should satisfy the multivariate Gaussian distribution. A GP is completely specified by its mean function m(x) and covariance function , which denotes as , and distribution can be written as:

where m is a vector and K is a covariance matrix where each element corresponds to the value of covariance function evaluated between all pairs of x. For common usage of GP, the mean function is usually assumed to be zero, m(x) = 0, which is not the case in our work. Different covariance functions will be applied to different applications while a popular example is the RBF kernel,

where is known as the signal variance and length-scale. A Gaussian likelihood function between observations y prior f accounts for the noise,

where noise precision and I is an identity matrix with size T. Integrating out vector functions f, we can obtain the marginal likelihood function,

3.3 Mean and Kernel for CARS

For standard GP regression and classification tasks, the mean function for bias is usually assumed to be zero, since data bias can be eliminated by pre-procession. However, in recommendation system bias is a well-known phenomenon. For example, some users may have their preferences on certain items and consistently give high ratings to them. To mitigate this problem, we allocate additional variables to users, items, and contexts. Let B Brepresent latent space of corre- sponding entities for the mean, and are the dimension of corresponding latent space.

For each rating associated with user n to item l under contexts we define a mean function:

Figure 1. Overview of our proposed GPLVMF model. First, the recommendation dataset consisted of two users is shown on the left side. Grey is the user, green is the item, yellow is the first context, blue is the second context, and orange is the rating. The latent space for the mean and kernel are in the middle part, the width of each rectangle represents the dimensionality of each latent vector. We use the identical color and subscript as the items, first contexts, and second contexts in the dataset to indicate the corresponding latent vectors. Finally, GPs from the collection of latent vectors to ratings are illustrated on the right side. The latent space is factorized into two groups for both the mean and kernel according to two users.

where is a parameter regarding the bias of user a latent vector that consists of b

Besides, we focus on the automatic relevance determination (ARD) squared exponential kernel between latent vectors associated with two ratings for user n:

where inverse length-scale and x represents a latent vector that consists of v. ARD squared exponential kernel can help the system to automatically select the dimensionality of its latent features [30]. Therefore, these weights can give us the insights into the impact of each context.

3.4 Probabilistic Factorization

After introducing Gaussian process, we seek a natural probabilistic interpretation of factorization based on GPLVM. We show the overview of GPLVMF in Figure 1. Let Xdenote the collection of latent variables regarding rated entities by user n for the kernel and mean respectively, where is the number of ratings by user to denote the correspond- ing ratings by user n. Our user-centric factorization method for the

context-aware recommender systems takes the probabilistic form:

where X is the collection of all the latent vectors, and Y GPs are taken to be independent across of different users and the likelihood function for user n is written as:

where . We’d like to mention that the latent space of users are marginalized out in the likelihood function, while we could integrate latent space of items instead to obtain another likelihood function as well [16].

The optimization method adopted by GP factorization for both the conventional [16] and context-aware [21] recommender system is to find the maximum likelihood estimation of latent variables X whilst jointly maximizing with respect to the hyper-parameters However, this method is sensitive to over-fitting and cannot determine the dimensionality of latent space automatically for GPLVM [30].

To adopt a fully Bayesian treatment for the latent space, we assign to it, a prior density over X. In this work, we use the standard normal density for the latent variables while we use a function prior to realize fixing the latent variables regarding real-valued contexts. The normal distribution for the X can be written as:

We note the distribution of the latent variables for the mean function has the same form as the kernel function and is dropped from the expression for simplification. The joint probability for the model is:

However, this is not analytically tractable. While a sampling method was proposed by [28] for Bayesian probabilistic matrix factorization problem, a variational inference approach to marginalize the latent variables was proposed by [30]. Inspired by the latter work, we derive a generalized variational inference for Gaussian process latent variable model factorization to include the non-zero mean Equation (5).

3.5 Variational Inference

We aim to compute the marginal data likelihood:

However, the integral of Equation (11) is intractable due to the nonlinearity inside the inverse of the covariance matrix. Following the VI framework, we introduce a variational distribution q(X) to approximate the true posterior p(X|Y),

where the variational parameters for items and contexts are . Again, we drop the variables regarding the mean from expressions for simplification. We apply Jensen’s inequality to find a lower bound the lower bound F(q) takes the form:

where KL denotes the Kullback – Leibler divergence, which can be computed analytically thanks to the fact that distributions of q(X) and p(X) are both Gaussians. The first term can further be broken down to separate form for each user,

Note that for CARS, X, so that the complementary latent variables can be integrated out.

To further solve the Equation (14), we need to introduce a variational spare Gaussian process framework [30] to modify Gaussian process prior. For each vector of latent function values ftroduce a separate set of M inducing variables uuated at a set of inducing input locations given by Z The likelihood associated with GP latent function , is augmented by inducing variables:

Both fare from same distribution, thus

Note we include the bias term m, which is the key of generalized form for VI. The next step is to adopt a variational distribution:

where is a variational distribution over the inducing variables u. Here we simplify our notation by dropping Z from our expressions. By Jensen’s inequality, we obtain a lower bound for the loglikelihood:

Substituting Equation (18) back to Equation (14) and swapping the integration order, we have:

where denotes expectation under the distribution q(X).

The last step is to adopt as a variational distribution to maximize the above lower bound [30]. The optimal setting of distribution is

which is a Gaussian distribution calculating, we obtain its mean and covariance:

where and . In the following computation, we would use another two statistics:

Our contribution to the generalized variational inference form is introducing the statistics ) with regard to the mean, while the original statistics ) are accounting for the kernel. Since all terms are tractable now, we finally get the closedform of the lower bound for

where Wbound can be jointly maximized over the variational parameters and model parameters plying gradient-based optimization techniques.

The bound can be jointly maximized over the variational parameters and model parameters applying gradient-based optimization techniques.

3.6 Optimization and complexity analysis

In this work, we apply stochastic gradient descent (SGD) to solve the optimization problem. We note that the gradient of the KL divergence of each step is averaged over the number of users since there is no independent form for the users. In addition, we adopt the scaled conjugate gradient (SCG) optimization [20] to compare with SGD. To counter one epoch of SGD, we use an iteration that processes the whole data once for SCG.

Bayesian Gaussian process latent variable model cannot handle big data problem, since the computational complexity of and storage demands of O(NM) for general application, where N is the size of data and M is the number of inducing point [7]. However, we would not encounter this challenge for GPLVMF thanks to the structure of the dataset in the process of factorization, as shown in Figure 1. The maximum kernel size is decided by the user who has the most ratings, which is usually no more than millions or billions in the real-world dataset. Finally, the computational complexity and storage demands in the GPLVMF are at an iteration respectively. SGD would be a bet- ter choice when the number of users becomes millions or billions. A complete comparison between SCG and SGD will be presented in the next section.

3.7 Prediction

Once all the variational parameters and model parameters have been learned, they can be used to predict preferences for users. We denote the collection of latent variables for user n for prediction as Xwe have the predictive function f

which is a Gaussian distribution with its mean and covariance:

where

4 Experiment

In this section, we empirically investigate the performance of GPLVMF. First we describe the datasets and settings in our experiments, then report and analyze the experiment results.

4.1 Dataset

In this work, we use 4 real datasets. The statistics of all datasets are presented in Table 1.

• Comoda [14] contains 2296 ratings of 1232 movies by 121 users. We use 12 provided contexts: time of the day, day type, season, location, weather, social, ending emotions and dominant emotions, mood, physical conditions, decision, and interaction.

• Food [22] contains 5554 ratings by 212 users on 20 food menus. We use 2 contexts: three different levels of hunger and real or supposed situation. We have eliminated some conflicted ratings followed by [21].

• Sushi [10] contains 50,000 ratings of 100 types of Sushi by 5000 Japanese users. We use 7 contexts: style, major group, minor group, heaviness/oiliness in the state, popularity, price, and availability in shops. All the contextual information are attributions of the item, and the last four contexts are real-valued contexts.

• Movielens-1M [6] contains 1,000,209 ratings of 3706 movies by 6040 users. In this work, we adopt the hour and day as 2 contexts.

Table 1. Statistics of Real Datasets

4.2 Evaluation

We compare our results with state-of-the-art methods, namely,

• Const, a naive predictor that predicts for every user the mean of his/her ratings.

• Multiverse [11], a state-of-the-art tensor factorization method.

• FM [27], one of the most popular factorization method, which isfamous for its fast training speed. We use LibFM3 to implement the method.

• GPFM [21], a Gaussian process based factorization method and has been shown to outperform other context-aware recommendation models on the Comoda dataset, the Food dataset, and the Sushi dataset.

• GPLVM–MF, a Gaussian process latent variable model based matrix factorization method. We apply our model to a setting where no contextual information is available. Thus GPLVM–MF can be seen as a context-agnostic variant of GPLVMF.

For each dataset, we split it 5 folds and repeat the experiments 5 times using 1 fold as the test set and the remaining 4 folds as the training set. We tune the parameters using 1 of the 5 folds as the validation set and fix the tunes parameters for other 4 folds. To evaluate the prediction of ratings, we use two evaluations metrics: mean absolute error (MAE), root-mean-square error (RMSE). For all datasets, the performance is averaged over the 5 different folds. The results are statistically significant and the variances are small so not reported.

4.3 Performance comparison

The performance comparison of all methods are shown in Table 2 in terms of MAE and RMSE. This table shows our approach achieves the best results on all datasets, which demonstrates the effectiveness of using Gaussian process latent variable model factorization to model contextual-aware recommendation.

First, we show the effect of having contextual information by comparing several context-aware methods with a context-agnostic method, in here, the context-agnostic method is GPLVM–MF. We originally intended to list other context-agnostic methods, such as standard matrix factorization adopted by [21]. However, our Bayesian non-linear matrix factorization method (GPLVM–MF) outperform the “Const” method on all datasets, while the same superiority is not guaranteed by the standard matrix factorization method

Table 2. Performance comparison on the 4 real datasets in terms of MAE and RMSE

reported in [21]. Thus we adopt GPLVM–MF as an appropriate context-agnostic method.

It is evident that all the context-aware methods except (1) “Multiverse” method and (2) the Movielens 1M dataset, achieve better performance than GPLVM-FM. Since “Multiverse” suffers from the high dimensionality of contexts on the Comoda and Sushi dataset, we use 4 out of 12 contexts for the Comoda and 3 out 7 contexts for the Sushi dataset. While there is no explicit contextual information for the Movielens-1M, the choice of model can be the most determining factor in terms of achieving a robust result. It can be even more so than incorporating contextual information. It’s also noteworthy to state that by using contexts, our GPLVMF further improves performance.

Then, we compare GPLVMF with other state-of-the-art context-aware methods. The performance in Table 2 shows that GPLVMF significantly outperforms other context-aware methods. Comparing with the best performance of other models, GPLVMF improves the MAE values by 4.0%, 13.4%, 1.2%, and 4.7% on the Comoda, Food, Sushi, and Movielens-1M datasets, while improves the RMSE values by 3.7%, 12.8%, and 4.5% on the Comoda, Food, and Movielens-1M datasets respectively.

Finally, we’d like to elaborate on the Sushi dataset. There are four real-valued contexts which cannot be modeled directly into GPFM model while they quantize real values into several categories [21]. In the process of quantizing, valuable information may be fil-tered out. However, we use these raw contextual values by fixing them as the corresponding latent variables to utilize the given data as much as possible. To confirm that the performance differences between GPLVMF and GPFM are due to directly utilizing real-valued contexts, we conduct an experiment using categorical contexts by GPLVMF, and the results are nearly the same as the GPFM, which concludes the efficiency of integrating real-valued contexts in GPLVMF.

4.4 Optimization comparison

We demonstrate the comparison of stochastic gradient descent (SGD) and scaled conjugate gradient (SCG) methods in two perspectives. First, we measure the running time for one epoch (SGD) and one iteration (SCG) against the amount of data used for training from the Sushi dataset and the dimensionality of the latent space for both SCG and SGD, where all hyperparameters are identical. We show the running time per epoch (SGD) and per iteration (SCG) in Figure 2, from which we can observe the linear growth of training time with data increasing. Besides, the number of optimization variables increases linearly with d. Thus the training time also scales with d. Overall, both SGD and SCG have the linear computational complexity for GPLVMF, while SGD is more efficient in terms of training time.

Figure 2. Training time per iteration for the Sushi dataset as a function of training size the dimensionality of latent space.

Second, the converge curves of SGD and SCG on the four datasets are illustrated in Figure 3. We show that the MAE of SGD converges after about 100 epochs for all the datasets while MAE of SCG should stop at a specific iteration to obtain the best performance for the small dataset: Comoda and Food. On the other hand, the best performance between SGD and SCG has no difference. To conclude, SGD optimization has a satisfactory convergence rate and can be trained rapidly and effectively in real-world applications.

4.5 Analysis of Contexts

The ability to discover the importance of each context is essential in many regards. In this work, we seek to extract this information by performing a qualitative analysis of the inverse length-scale The original meaning of the inverse length-scale of Gaussian process latent variable model is to determine the strength of each latent dimension. For the application to CARS, the inverse length-scale can be used to automatically determine the strength of each context, as shown in Figure 4.

For the Comoda dataset, 2 inverse length-scale (7 and 8) out of 12 contexts, are significant. These two contexts represent “ending” and “dominant emotions” respectively. The same conclusion has been obtained by [21] where they run experiments using one of the contexts at a time, which is computationally expensive compared to our method. On the Food dataset, the inverse length-scale of “situation” is smaller than that of “hunger degree”, which indicates the context “hunger degree” is more important. The same conclusion has been achieved by [17] as well. The most important context for the Sushi dataset is “minor group”. It is not too surprising since the users typically choose Sushi according to the type of it. Finally, the “day” is less significant than “hour” on the Movielens-1M dataset.

Figure 3. Comparison of the convergence curves of MAE of SGD and SCG on the four datasets.

Figure 4. Inverse length-scale of the contexts on the four real datasets

Finally, the impact of parameters, i.e. different number of inducing points and dimensionalities of the latent space of the item and each context, can be found in the supplementary material.

4.6 Impact of parameters

Figure 5. RMSE of GPLVMF on the Food dataset with different d and M.

Table 3. Empirical optimal parameters (“use mean”, “dimensionality”) for each dataset.

Finally, we study the performance with using mean funtion (bias) or not, different number of inducing points M and dimensionalities d of the latent space of the item and each context. We first compare the performances of GPLVMF models with using mean function and different dimensionalities and list the preference to achieve the best performance in Table 3. We find that datasets with plenty of contexts (Comoda and Sushi) have a preference on the model with mean and low latent dimension while datasets with fewer contexts (Food and Movielens-1M) prefer model without the mean and high latent dimension. Then we use the Food dataset as an example to plot the value of RMSE versus M and d, as shown in Figure 5. With increasing d and M, the value of RMSE decreases at first, then stays nearly stable after d = 5 and M = 15. For all the datasets, the parameter M can be selected in a broad range, which means that the performance of GPLVMF doesn’t rely on the number of inducing point very much.

5 Conclusion

In this paper, we proposed a novel Gaussian process latent variable model factorization algorithm for context-aware recommendation called GPLVMF, to effectively utilize the contextual information. Both the mean and kernel in the Gaussian process are carefully re-modeled to capture rich modeling. The experimental results of four real datasets show that GPLVMF outperforms the state-of-the-art models and can analyze the influence of each context in various datasets.

REFERENCES

[1] Gediminas Adomavicius, Ramesh Sankaranarayanan, Shahana Sen, and Alexander Tuzhilin, ‘Incorporating contextual information in recommender systems using a multidimensional approach’, ACM Transactions on Information Systems (TOIS), 23(1), 103–145, (2005).

[2] Gediminas Adomavicius and Alexander Tuzhilin, ‘Context-aware recommender systems’, in Recommender systems handbook, 217–253, Springer, (2011).

[3] Linas Baltrunas, Bernd Ludwig, and Francesco Ricci, ‘Matrix factorization techniques for context aware recommendation’, in Proceedings of the fifth ACM conference on Recommender systems, pp. 301–304. ACM, (2011).

[4] Mathieu Blondel, Akinori Fujino, Naonori Ueda, and Masakazu Ishihata, ‘Higher-order factorization machines’, in Advances in Neural Information Processing Systems, pp. 3351–3359, (2016).

[5] Andreas Damianou, Deep Gaussian processes and variational propagation of uncertainty, Ph.D. dissertation, University of Sheffield, 2015.

[6] F Maxwell Harper and Joseph A Konstan, ‘The movielens datasets: History and context’, Acm transactions on interactive intelligent systems (tiis), 5(4), 19, (2016).

[7] James Hensman, Nicolo Fusi, and Neil D Lawrence, ‘Gaussian processes for big data’, arXiv preprint arXiv:1309.6835, (2013).

[8] Bal´azs Hidasi and Domonkos Tikk, ‘Fast ALS-based tensor factorization for context-aware recommendation from implicit feedback’, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 67–82. Springer, (2012).

[9] Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, and Jose M Hern´andez-Lobato, ‘Collaborative Gaussian processes for preference learning’, in Advances in Neural Information Processing Systems, pp. 2096–2104, (2012).

[10] Toshihiro Kamishima, ‘Nantonac collaborative filtering: recommendation based on order responses’, in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 583–588. ACM, (2003).

[11] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver, ‘Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering’, in Proceedings of the fourth ACM conference on Recommender systems, pp. 79–86. ACM, (2010).

[12] Yehuda Koren and Robert Bell, ‘Advances in collaborative filtering’, in Recommender systems handbook, 77–118, Springer, (2015).

[13] Yehuda Koren, Robert Bell, and Chris Volinsky, ‘Matrix factorization techniques for recommender systems’, Computer, (8), 30–37, (2009).

[14] Andrej Koˇsir, Ante Odic, Matevz Kunaver, Marko Tkalcic, and Jurij F Tasic, ‘Database for contextual personalization’, Elektrotehniˇski vestnik, 78(5), 270–274, (2011).

[15] Neil D Lawrence, ‘Gaussian process latent variable models for visualisation of high dimensional data’, in Advances in neural information processing systems, pp. 329–336, (2004).

[16] Neil D Lawrence and Raquel Urtasun, ‘Non-linear matrix factorization with gaussian processes’, in Proceedings of the 26th Annual International Conference on Machine Learning, pp. 601–608. ACM, (2009).

[17] Qiang Liu, Shu Wu, and Liang Wang, ‘COT: Contextual operating tensor for context-aware recommender systems.’, in AAAI, pp. 203–209, (2015).

[18] Xin Liu and Wei Wu, ‘Learning context-aware latent representations for context-aware collaborative filtering’, in Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp. 887–890. ACM, (2015).

[19] Babak Loni, Yue Shi, Martha Larson, and Alan Hanjalic, ‘Crossdomain collaborative filtering with factorization machines’, in European conference on information retrieval, pp. 656–661. Springer, (2014).

[20] Martin Fodslette Møller, ‘A scaled conjugate gradient algorithm for fast supervised learning’, Neural networks, 6(4), 525–533, (1993).

[21] Trung V Nguyen, Alexandros Karatzoglou, and Linas Baltrunas, ‘Gaussian process factorization machines for context-aware recommendations’, in Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp. 63–72. ACM, (2014).

[22] Chihiro Ono, Yasuhiro Takishima, Yoichi Motomura, and Hideki Asoh, ‘Context-aware preference model based on a study of difference between real and supposed situation data’, in International Conference on User Modeling, Adaptation, and Personalization, pp. 102–113. Springer, (2009).

[23] Abayomi Moradeyo Otebolaku and Maria Teresa Andrade, ‘Contextaware media recommendations for smart devices’, Journal of Ambient Intelligence and Humanized Computing, 6(1), 13–36, (2015).

[24] Xochilt Ramirez-Garcia and Mario Garc´ıa-Valdez, ‘Post-filtering for a restaurant context-aware recommender system’, in Recent Advances on Hybrid Approaches for Designing Intelligent Systems, 695–707, Springer, (2014).

[25] Carl Edward Rasmussen and Christopher KI Williams, Gaussian process for machine learning, MIT press, 2006.

[26] Steffen Rendle, ‘Factorization machines’, in Data Mining (ICDM), 2010 IEEE 10th International Conference on, pp. 995–1000. IEEE, (2010).

[27] Steffen Rendle, Zeno Gantner, Christoph Freudenthaler, and Lars Schmidt-Thieme, ‘Fast context-aware recommendations with factoriza-

tion machines’, in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 635–644. ACM, (2011).

[28] Ruslan Salakhutdinov and Andriy Mnih, ‘Bayesian probabilistic matrix factorization using MCMC’, ICML08, (2008).

[29] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver, ‘TFMAP: Optimizing MAP for top-n context-aware recommendation’, in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp. 155–164. ACM, (2012).

[30] Michalis Titsias and Neil D Lawrence, ‘Bayesian gaussian process latent variable model’, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 844–851, (2010).

[31] Hastagiri P Vanchinathan, Isidor Nikolic, Fabio De Bona, and Andreas Krause, ‘Explore-exploit in top-n recommender systems via Gaussian processes’, in Proceedings of the 8th ACM Conference on Recommender systems, pp. 225–232. ACM, (2014).

[32] Chia-An Yu, Tak-Shing Chan, and Yi-Hsuan Yang, ‘Low-rank matrix completion over finite Abelian group algebras for context-aware recommendation’, in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2415–2418. ACM, (2017).

[33] Cong Zheng, E Haihong, Meina Song, and Junde Song, ‘CMPTF: Contextual modeling probabilistic tensor factorization for recommender systems’, Neurocomputing, 205, 141–151, (2016).