Network embedding, which aims at learning low-dimensional vector representations of a network, has attracted increasing interest in recent years. It has been shown highly effective in many important tasks in network analysis involving predictions over nodes and edges, such as node classification (Tsoumakas and Katakis 2006; Sen et al. 2008), recommendation (Tu, Liu, and Sun 2014; Yu et al. 2014) and link prediction (Liben-Nowell and Kleinberg 2007).
Various approaches have been proposed toward this goal, typically including Deepwalk (Perozzi, Al-Rfou, and Skiena 2014), LINE (Tang et al. 2015), GraRep (Cao, Lu, and Xu 2015), and node2vec (Grover and Leskovec 2016). These models have been proven effective in several real world networks. Most of the previous approaches utilize information only from the network structure, i.e., the linkage relationships between nodes, while paying scant attention to the content of each node, which is common in real-world networks. In a typical social network with users as vertices, the user-generated contents (e.g., texts, images) will serve as rich extra information which should be important for node representation and beneficial to downstream applications.
Figure 1 shows an example network from Quora, a community question answering website. Users in Quora can follow each other, creating directed connections in the network.
Figure 1: A toy network of Quora users with the content being titles of questions that user has followed.
More importantly, users are expected to ask or answer questions, which can be treated as users’ contents. These contents are critical for identifying the characteristics of users, and thus will significantly benefit tasks like node classifica-tion (e.g. gender, location and profession). For example, we can infer from the contents of user A and user C (Figure 1) that they are likely to be female users (gender). Besides, user B is supposed to be a programmer (profession) and user D probably lives in New York (location).
To cope with this challenge, Yang et al. (2015) presented text-associated DeepWalk (TADW), which incorporates textual features into network embeddings through matrix factorization. This approach typically suffers from high computational cost and not scalable to large-scale networks. Besides, contents in TADW are simply incorporated as unordered text features instead of being explicitly modeled. Therefore, deeper semantics contained in the contents cannot be well captured.
Present work. We present a general framework for learning Content-Enhanced Network Embedding (CENE) that is capable of jointly leveraging the network structure and the contents. We consider textual contents in this study, however, our approach can be flexibly scaled to other modalities of content. Specifically, each piece of content information (e.g., a tweet one posts in twitter, a question one follows in Quora) is formalized as a document, and we integrate each document into the network by creating a special kind of node, whose representation will be computed compositionally from words. The resulting augmented network will consist of two kinds of links: the node-node link and the node-content link. By optimizing the joint objective, the knowledge contained in the contents will be effectively distilled into node embeddings.
To summarize, we make the following contributions:
• We propose a novel network embedding model that captures both textual contents and network structure. Experiments on the tasks of node classification using two real world datasets demonstrate its superiority over various baseline methods.
• We collect a network dataset which contains node attributes and rich textual contents. It will be made publicly available for research purpose.
Text Embedding
In order to obtain text embeddings (e.g., sentence, paragraph), a simple and intuitive approach would be averaging the embeddings of each word in the text (Mitchell and La- pata 2010; Ferrone and Zanzotto 2013; Iyyer et al. 2015). More sophisticated models have been designed to utilize the internal structure of sentences or documents to assist the composition. For example, Socher et al. (2013) and Socher et al. (2014) use recursive neural networks over parse trees to obtain sentence representations. To alleviate the dependency on syntatic parsing, convolutional neural networks (CNN) (Blunsom, Grefenstette, and Kalchbrenner 2014; Johnson and Zhang 2015) are employed which use simple bottom-up hierarchical structures for composition. Another alternative model is the LSTM-based recurrent neural network (RNN) (Kiros et al. 2015), which is a variant of RNN that uses long short-term memory cells for capturing long-term dependencies.
Network Embedding Hoff, Raftery, and Handcock (2002) first propose to learn latent space representation of vertices in a network. Some earlier works focus on the feature vectors and the leading eigenvectors are regarded as the network representations, e.g., MDS (Borg and Groenen 2005), IsoMap (Tenenbaum, De Silva, and Langford 2000), LLE (Roweis and Saul 2000), and Laplacian Eigenmaps (Belkin and Niyogi 2001).
Recent advancements include DeepWalk (Perozzi, Al- Rfou, and Skiena 2014), which learns vertex embeddings using the skip-gram model (Mikolov et al. 2013b) on vertex sequences generated by random walking on the network. Inspired by Deepwalk, walklet (Perozzi, Kulkarni, and Skiena 2016) focuses on multiscale representation learning, node2vec (Grover and Leskovec 2016) explores different random walk strategies and Ou et al. (2016) emphasises the asymmetric transitivity of a network. Some others focus on depicting the distance between vertices. LINE (Tang et al. 2015) exploits both first-order and second-order proximity in an network while Cao, Lu, and Xu (2015) expand the proximity into k-order (or k-step) and integrates global structural information of the network into the learning process. These methods could also be applied to prediction tasks in heterogeneous text networks (Tang, Qu, and Mei 2015). Another attempt is based on the factorization of relationship matrix (Yang and Liu 2015). Most recently, Wang, Cui, and Zhu (2016) adopt a deep model to capture the nonlinear network structure.
Yang et al. (2015) present the first work that combines structure and content information for learning network embeddings. They show that DeepWalk is equivalent to matrix factorization (MF) and text features of vertices can be incorporated via factorizing a text-associated matrix. This method, however, suffers from the high computation cost of MF and has difficulties scaling to large-scale networks. Pan et al. (2016) instead combines DeepWalk with Doc2Vec (Le and Mikolov 2014), along with partial labels of nodes that constitutes a semi-supervised model. However, Doc2Vec is far from being expressive of the contents. Besides, it cannot generalize to other modalities of contents like images.
Definition 1. (Network) Let G = (V, E, C) denote a network, where V is the set of vertices, representing the nodes of the network; is the set of edges, representing the relations between the nodes; and C denotes the contents of nodes.
, where
denotes i-th sentence of doc and is composed of word sequence
. Without loss of generality, we assume the structure of network to be a directed graph.1 Definition 2. (Network Embedding) Given a network denoted as G = (V, E, C), the aim of network embedding is to allocate a low dimensional real-valued vector representation
for each vertex
, where
. Let
denotes the embedded vectors in the latent space.
is supposed to maintain as much topological information of the original network as possible.
As can be regarded as a feature vector of vertex v, it is straightforward to use it as the input of subsequent tasks like node classification. Another notable trait is that this kind of embedding is not task-specific so that it can be applied to different kinds of tasks without retraining.
General Framework To maintain the structural information of a network, we describe a general framework that minimizes the following objective:
where SP is the set of positive vertex pairs and SN is negative pair set. For example, in random walk-based algorithms
Figure 2: Illustration of our framework.
(Deepwalk, walklet, node2vec), SP is the set of adjacent vertex pairs in the routes generated through random walking, and SN is the union of all negative sampling sets. is the joint probability between vertex u and v, which means the probability of pair (u, v) existing in SP and correspondingly
is the probability that (u, v) does not exist.
To further utilize the content information, a simple way is to concatenate the content embedding with the node embedding, both of which are trained independently. Formally, let be the representation of node u, where
is the set of all contents of node u. This method, however, requires each node in the network to be associated with some contents, which is too rigid for real world networks.
In this paper, we introduce contents (documents) as a special kind of nodes, and then the augmented network can be represented as: , where
is vertex set;
is the content set;
is the set of edges between vertices; and
is the set of edges between vertices and contents. In this way, different nodes can also interact through connection with a same content node (e.g., two Twitter users retweet the same post), which significantly alleviates the structural sparsity in
. The resulting framework structure is illustrated in Figure 2.
Next, we will describe the loss functions involving node-node links and node-content links respectively, following the notation in Eq.1.
Node-Node Link
For node-node links, we specify SP as . Inspired by the idea of negative sampling (Mikolov et al. 2013b), we sample a set
for each edge (u, v). Then
.
Here p(v, u) (we omit for simplicity) is computed using a logistic function:
Figure 3: Sentence modeling approaches.
However, Eq.3 is a symmetrical operation, which means p(v, u) = p(u, v), and this is not suitable for directed networks. So we splited where
and
. Then p(v, u) can be computed as:
Node-Content Link
The node-content loss is similar to Eq.2. Let denote the negative sampling set for edge (u, c), then the loss can be written as:
where
Instead of allocating an arbitrary embedding for each document c, here, we use a composition function to compute the content representation in order to fully capture the semantics of texts. In this paper, we further decompose each document into sentences, and model node-sentence link separately (Figure 2). We investigate three typical composition models for learning sentence representations (Figure 3).
Word Embedding Average (WAvg). This approach simply takes the average of word vectors as the sentence embedding. Despite its obliviousness to word order, it has proved surprisingly effective in text categorization tasks (Joulin et al. 2016).
Recurrent Neural Network (RNN). Here we use the gated recurrent unit (GRU) proposed by Cho et al. (2014). GRU is a simplified version of the LSTM unit proposed earlier (Hochreiter and Schmidhuber 1997), with fewer parameters while still preserving the ability of capturing long-term dependencies. Instead of simply using the hidden representation at the final state as the sentence representation, we apply mean pooling over all history hidden states:
Bidirectional Recurrent Neural Network (BiRNN). In practice, even with GRU, RNN still cannot capture very long-term dependencies well. Hence, we further adopt a bidirectional variant (Schuster and Paliwal 1997) that processes a sentence in both directions with two separate hidden layers. The hidden state vectors from two directions’ GRU units at each position are then concatenated, and fi-nally passed through a mean pooling layer.
Joint Learning
Finally, we optimize the following joint objective function, which is a weighted combination of the node-node loss (Eq.2) and the node-content loss (Eq.5):
where is a parameter to balance the importances of the two objectives. With the
increasing, more structure information (node-node link) will be taken into consideration. All parameters, including
and parameters in
are jointly optimized.
We use stochastic gradient descent (SGD) with learning rate decay for optimization. The gradients are computed with back-propagation. In our implementation, we approximate the effect of through instance sampling (node-node and node-content) in each training epoch. More details are shown in Algorithm 1.
Dataset We conduct experiments on two real world datasets: DBLP (Tang et al. 2008) and Zhihu. An overview of these networks is given in Table 1.
DBLP We use the DBLP2 dataset to construct the citation network. Two popular conferences: SIGIR and KDD, are
Table 1: Dataset overview.
chosen as the two categories for node classification.3 Here each paper is regarded as a node, and every directed edge between two nodes indicate a citation. We use the abstract of each paper as the contents. Note that only 16.7% nodes on DBLP have contents and we keep all nodes of DBLP for experiments.
Zhihu Zhihu4 is a Chinese community social-network based Q&A site, which aims at building a knowledge repository of questions and answers created and organized by users. We first collected the users’ following lists, following questions list and their profiles. Then, we construct the Zhihu network with users as vertices, and edges indicating the following relationships. The question titles that each user follows are used as their associated contents.
We select the top three frequent attributes for our experiments: gender, location and profession. Three cities: Beijing, Shanghai and Guangzhou of China are chosen as location categories, and the four most popular professions:financial industry, legal profession, architect and clinical treatment are chosen as profession categories.
Baseline
We consider the following network embedding methods for experimental comparison:
Structure-Based Method
• DeepWalk (DW) (Perozzi, Al-Rfou, and Skiena 2014). DeepWalk learns vertex embeddings by using the skip-gram model over vertex sequences generated through random walking on the network.
• LINE (Tang et al. 2015). LINE takes both 1-order and 2-order proximity into account and the concatenation of these two representations is used as the final embedding.
• Word2vec (W2V). We include an additional baseline that uses Word2vec (Mikolov et al. 2013a) to directly learn vertex embeddings from node-node links. Specifically, we treat each vertex u as the word and all its neighbors as its context. Here we use the word2vecf toolkit.5
Content-Based Method
Table 2: Performance on DBLP. (The input matrix of DBLP for TADW is too large to be loaded into memory of our machine.)
• Doc2vec (D2V) (Le and Mikolov 2014). Doc2vec is an extension of word2vec that learns document representation by predicting the surrounding words in contexts sampled from the document. Here we use the Gensim implementation6.
• Word Average (WAvg). Similar to the WAvg setting in our model (CENE), we are also interested to see how well word average performs when trained separately.
Combined Method
• Naive Combination (NC). We concatenate the two bestperforming network embeddings learned using structure-based methods and content-based methods respectively.
• TADW (Yang et al. 2015). TADW integrates content information into network embeddings by factorizing a text-associated matrix.
Evaluation
We evaluate our network embeddings on the node classifica-tion task. Following the metric used in previous studies (Per- ozzi, Al-Rfou, and Skiena 2014; Tang et al. 2015), we randomly sample a portion (, from 10% to 90%) of the labeled vertices as training data, with the rest of the vertices for testing. We use the scikit-learn (Pedregosa et al. 2011) to train logistic regression classifiers. For each
, the experiments are executed independently for 40 times and we report the averaged Micro-F
measures.
Training Protocols
The initial learning rate is set to for CENEWAvg and
for CENERNN and CENEBiRNN. The dimension of the embeddings for both nodes and contents is set to 200. Word embeddings are pretrained using the whole set of contents associated with the network, with dimension of 200. In addition, the negative sampling size
is 15 for all methods, and
is 25 for CENE; the total number of samples T is 10 billion for LINE (1st) and LINE (2nd) as shown in Tang et al. (2015); window size win = 5, walk length t = 40 and number of walks per vertex
for DeepWalk.
Classification tasks
The classification results are shown in Table 2 (DBLP), Table 3 (Zhihu-Gender), Table 4 (Zhihu-Location) and Table 5 (Zhihu-Profession). The proposed CENE consistently and significantly outperforms both structure-based and content-based methods on all different datasets and most training rations, demonstrating the efficacy of our approach.
Besides, we have the following interesting observations:
1. For most tasks, simple concatenation of structure-based methods and content-based methods yeilds improvements, showing the importance of both network structure and contents.
2. Despite the simplicity, CENEWAvg obtains promising results in general, outperforming most of the baseline methods by a significant margin. Furthermore, CENERNN and CENEBiRNN perform better than WAvg in most cases.
3. BiRNN works better than RNN in DBLP, while RNN is better in Zhihu. The main factor here is the average sentence length in DBLP (25) and Zhihu (11). As discussed earlier (the Introduction part), BiRNN is more powerful for longer sentences.
4. Content-based methods work generally better than structure-based methods on Zhihu, but worse on DBLP. This observation implies that structural relationships are more indicative than contents in DBLP, that is, papers tend to cite papers within the same area. Zhihu, however, is an interest-driven network, and thus contents are more important for node representation.
5. TADW performs poorly on Zhihu. This is mainly because TADW is originally designed for networks where each node has only one document. However, nodes on Zhihu networks may follow multiple questions and the contents are relatively independent.
We further conduct experiments on another DBLP 7 dataset used in TriDNR (Pan et al. 2016) to directly compare with it. We examine both the original semi-supervised version of TriDNR and an unsupervised version, in which the label-node relationship is discarded. Table 6 shows that CENERNN and CENEBiRNN even outperform the semi-supervised TriDNR, which is really promising.
Table 3: Performance on Zhihu-Gender.
Table 4: Performance on Zhihu-Location.
Table 5: Performance on Zhihu-Profession.
Table 6: Performance compared with TriDNR.
Conventional structure-based methods perform poorly on small-degree nodes (e.g., a Zhihu user may neither follow nor be followed). However, the introduction of content nodes would greatly alleviate the structural sparsity. Figure 4a shows the classification performance of CENE over nodes with different degrees on Zhihu-Gender, compared with DeepWalk. Figure 4b shows the curve of the absolute differences. We can see CENE has a significantly larger impact on small-degree nodes, which verifies our hypothesis.
Figure 4: Performance of Zhihu-Gender over user groups with different degrees.
Parameter Sensitivity
CENE has two hyperparameters: iteration number k and balance weight . We fix the training portion to 50% and test the classification F1 score with different k and
.
Figure 5 shows F1 scores with ranging from 10%, 50% to 90%, on four different tasks. For all tasks, all of the three curves converge stably when k approximates 100.
Figure 5: Performance over iteration number.
Figure 6: Performance over .
Figure 6 shows the effect of . Note that if
, only content information will be used, and when
, our model will be degenerated into a structure-based one (W2V). With
increasing, the performance of CENE increases at first but decreases when
is big enough. There is an abrupt decrease when
grows from 0.9 to 1.0, indicating the importance of content information. Another notable phenomenon is that for the location attribute on Zhihu, performance keeps dropping as
increases. This observation makes sense since one of the critical advantage of social networks is to break up the regional limitation, so the network structure provides little hint or even noise for identifying users’ real locations.
In this paper, we present CENE, a novel network embedding method which leverages both structure and textual content information in a network by regarding contents as a special kind of nodes. Experiments on the task of node classification with two real world datasets demonstrate the effectiveness of our model. Three content embedding methods are investigated, and we show that deeper models (RNN and BiRNN) are more competent for text modeling. For future work, we will extend our methods to networks with more diverse contents such as images.
This work was supported by the National Basic Research Program (973 Program) of China via Grant 2014CB340503, the National Natural Science Foundation of China (NSFC) via Grant 61472107 and 61133012.
[Belkin and Niyogi 2001] Belkin, M., and Niyogi, P. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, 585–591.
[Blunsom, Grefenstette, and Kalchbrenner 2014] Blunsom, P.; Grefenstette, E.; and Kalchbrenner, N. 2014. A convolutional neural network for modelling sentences. In Proc. of ACL.
[Borg and Groenen 2005] Borg, I., and Groenen, P. J. 2005. Modern multidimensional scaling: Theory and applications.
[Cao, Lu, and Xu 2015] Cao, S.; Lu, W.; and Xu, Q. 2015. Grarep: Learning graph representations with global structural information. In Proc. of CIKM, 891–900.
[Cho et al. 2014] Cho, K.; Van Merri¨enboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
[Ferrone and Zanzotto 2013] Ferrone, L., and Zanzotto, F. M. 2013. Linear compositional distributional semantics and structural kernels. In Joint Symposium on Semantic Processing., 85.
[Grover and Leskovec 2016] Grover, A., and Leskovec, J. 2016. Node2vec: Scalable feature learning for networks. In ACM SIGKDD, 855–864.
[Hochreiter and Schmidhuber 1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
[Hoff, Raftery, and Handcock 2002] Hoff, P. D.; Raftery, A. E.; and Handcock, M. S. 2002. Latent space approaches to social network analysis. Journal of the american Statistical association 1090–1098.
[Iyyer et al. 2015] Iyyer, M.; Manjunatha, V.; Boyd-Graber, J.; and Daum´e III, H. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proc. of ACL.
[Johnson and Zhang 2015] Johnson, R., and Zhang, T. 2015. Effective use of word order for text categorization with convolutional neural networks. In Proc. of NAACL, 103–112.
[Joulin et al. 2016] Joulin, A.; Grave, E.; Bojanowski, P.; and Mikolov, T. 2016. Bag of tricks for efficient text classifica-tion. arXiv preprint arXiv:1607.01759.
[Kiros et al. 2015] Kiros, R.; Zhu, Y.; Salakhutdinov, R. R.; Zemel, R.; Urtasun, R.; Torralba, A.; and Fidler, S. 2015. Skip-thought vectors. In NIPS, 3294–3302.
[Le and Mikolov 2014] Le, Q., and Mikolov, T. 2014. Dis- tributed representations of sentences and documents. In Proc. of ICML, 1188–1196.
[Liben-Nowell and Kleinberg 2007] Liben-Nowell, D., and Kleinberg, J. 2007. The link-prediction problem for social networks. JASIST 1019–1031.
[Mikolov et al. 2013a] Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[Mikolov et al. 2013b] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS, 3111–3119.
[Mitchell and Lapata 2010] Mitchell, J., and Lapata, M. 2010. Composition in distributional models of semantics. Cognitive science 34:1388–1429.
[Ou et al. 2016] Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; and Zhu, W. 2016. Asymmetric transitivity preserving graph embedding. In Proc. of ACM SIGKDD, 1105–1114.
[Pan et al. 2016] Pan, S.; Wu, J.; Zhu, X.; Zhang, C.; and Wang, Y. 2016. Tri-party deep network representation. In IJCAI.
[Pedregosa et al. 2011] Pedregosa, F.; Varoquaux, G.; Gram- fort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830.
[Perozzi, Al-Rfou, and Skiena 2014] Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In Proc. of ACM SIGKDD, 701–710.
[Perozzi, Kulkarni, and Skiena 2016] Perozzi, B.; Kulkarni, V.; and Skiena, S. 2016. Walklets: Multiscale graph embeddings for interpretable network classification. arXiv preprint arXiv:1605.02115.
[Roweis and Saul 2000] Roweis, S. T., and Saul, L. K. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326.
[Schuster and Paliwal 1997] Schuster, M., and Paliwal, K. K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11):2673–2681.
[Sen et al. 2008] Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; and Eliassi-Rad, T. 2008. Collective classifi-cation in network data. AI magazine 29(3):93.
[Socher et al. 2013] Socher, R.; Perelygin, A.; Wu, J. Y.; Chuang, J.; Manning, C. D.; Ng, A. Y.; and Potts, C. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. of EMNLP, volume 1631, 1642.
[Socher et al. 2014] Socher, R.; Karpathy, A.; Le, Q. V.; Manning, C. D.; and Ng, A. Y. 2014. Grounded compositional semantics for finding and describing images with sentences. TACL 2:207–218.
[Tang et al. 2008] Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; and Su, Z. 2008. Arnetminer: Extraction and mining of academic social networks. In KDD’08, 990–998.
[Tang et al. 2015] Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; and Mei, Q. 2015. Line: Large-scale information network embedding. In Proc. of WWW, 1067–1077.
[Tang, Qu, and Mei 2015] Tang, J.; Qu, M.; and Mei, Q. 2015. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proc. of ACM SIGKDD, 1165–1174.
[Tenenbaum, De Silva, and Langford 2000] Tenenbaum, J. B.; De Silva, V.; and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290(5500):2319–2323.
[Tsoumakas and Katakis 2006] Tsoumakas, G., and Katakis, I. 2006. Multi-label classification: An overview. Dept. of Informatics, Aristotle University of Thessaloniki, Greece.
[Tu, Liu, and Sun 2014] Tu, C.; Liu, Z.; and Sun, M. 2014. Inferring correspondences from multiple sources for microblog user tags. In Chinese National Conference on Social Media Processing, 1–12.
[Wang, Cui, and Zhu 2016] Wang, D.; Cui, P.; and Zhu, W. 2016. Structural deep network embedding. In Proc. of ACM SIGKDD, 1225–1234.
[Yang and Liu 2015] Yang, C., and Liu, Z. 2015. Comprehend deepwalk as matrix factorization. arXiv preprint arXiv:1501.00358.
[Yang et al. 2015] Yang, C.; Liu, Z.; Zhao, D.; Sun, M.; and Chang, E. Y. 2015. Network representation learning with rich text information. In Proc. of IJCAI, 2111–2117.
[Yu et al. 2014] Yu, X.; Ren, X.; Sun, Y.; Gu, Q.; Sturt, B.; Khandelwal, U.; Norick, B.; and Han, J. 2014. Personalized entity recommendation: A heterogeneous information network approach. In Proc. of the WSDM, 283–292.