Named Entities (NEs) concisely represent the knowledge of people, things, and events, among others. Having a hierarchy of such Named Entities, and being able to classify textual data into it, is crucial in order to structure knowledge, and as such make it reusable as a data source for applications.
Classifying Wikipedia into a set of defined classes has been addressed in several works, see for instance [Torisawa, 2007; Dakka and Cucerzan, 2008; Nothman et al., 2013]. However, these studies typically rely on a coarse hierarchy of named entities, not exceeding a few tens of types. Coarse-grained clas-sification is commonly used because it allows the researchers to have enough data points for each class, thus countering the data sparseness issue arising when applying supervised machine learning methods. For example, articles such as “Japan”, “Tokyo”, and “Tokyo Museum of Modern Art” may be classified in different fine-grained types, but all fall into a larger Location type. Another challenge is that that fine-grained types are often overlapping. For example, “Tokyo Museum of Modern Art” can be classified as Location as well as Culture. However, using a fine-grained hierarchy yields the benefit of precision, that is of crucial importance in many machine learning applications. The large amounts of data present in Wikipedia allows such precision, and as such fine-grained entity hierarchies are gaining more and more attention. In this paper, we rely on the Extended Named Entity hierarchy (ENE) as defined by [Sekine, 2008] to classify articles of the online encyclopaedia Wikipedia.
Moreover, in many contexts, including Wikipedia, there is inherent structure to the data: thematically similar pages are likely to refer to each other, data is organized into categories [Wu and Weld, 2008], etc. This structure can be naturally modelled in the form of a graph, i.e. a set of nodes and a set of links between those nodes.
The Wikipedia graph has been studied in many contexts, and it has been shown to exhibit traits of a real-world graph structure [Tsourakakis et al., 2009]. Typical applications of the graph analysis of Wikipedia include trust assessment [Lu- cassen and Schraagen, 2010], key page identification [Hin- nosaar et al., 2019], authority figure identification [Kittur et al., 2007], or knowledge base construction [Mahdisoltani et al., 2013; Lehmann et al., 2015]. Exploiting the graph structure can help classification, by identifying Wikipedia pages that describe the same roles, even though their contents are widely different: this might be the case, for example, of two head of states in different countries.
We argue in this paper that the underlying graphs of Wikipedia can aid classification, and in particular fight class imbalance; the gist behind this idea is that even for an ENE class with few annotated examples, we can rely on the rich underlying structure of the graph. To address this, in this paper, we propose a hybrid classifier that takes into account the structure of Wikipedia as well as the natural language features devised in [Suzuki et al., 2016].
We are interested in predicting the named entity corresponding to a Wikipedia page; in other words, we are interesting in performing a classification task, mapping the Wikipedia pages into one of the categories of the ENE hierarchy. More specifically, in order to capture the structure of exchanges, we use the GraphSAGE learning algorithm [Hamil- ton et al., 2017]. Graph neural network models have known a recent development, and have shown to outperform the state-of-the-art methods in many contexts [Kipf and Welling, 2017; Hamilton et al., 2017; Schlichtkrull et al., 2017].
After assessing the relevance of the graph structure for the classification of Wikipedia in a fine-grained hierarchy, we explore different ways by which the graph structure can be integrated. One of them is the integration of graph-theoretical features, based on notions commonly used in real-world complex network analysis, whereas the other focuses on extending graph neural networks models, in particular by introducing a new aggregator for GraphSAGE, WMEAN, that relies on a weighted mean of the vector representations of neighbours of a page.
Our work shows, in particular, that:
• The underlying graph structure of Wikipedia, is helpful for natural language processing classification tasks, in particular in fighting class imbalance;
• That the ENE structure is hierarchical in nature, and that encoding this structure in the model is beneficial;
• That in a context where deep learning models typically rely on large swaths of annotated data, graph features and models offer an alternative requiring less data.
In Section 2, we discuss the related works, before exploring in Section 3 the relevance of graph information for clas-sification and defining the two directions mentioned above. Section 4 details our problem setting, and we present at-scale experiments in Section 5. We end our paper with perspectives for future work, in Section 6.
In many contexts, Wikipedia can be considered as a knowledge base. This has many advantages: the huge amount of data allows for cross-referencing information, data is typically available in multiple languages, and the data is semi-structured, since every page is categorised. However, as [Zesch and Gurevych, 2007] shows, the Wikipedia Category Graph alone is typically user-contributed, and as such is not necessarily consistent from one page to another, hindering any classification task.
Named Entity Classification. On the other hand, expertcrafted ontologies such as CyC are curated and provide a consistent basis for classification. However, to keep up-to-date with new concepts and retain their genericity, they typically comprise of few classes [Fensel, 2001], or are quickly outdated.
Fine-grained ontologies such as the Extended Named Entities ontology introduced by Sekine [2008] come as a solution to both problems: being fine-grained, they capture the subtleties in the data, and being in part built from real-world datasets (such as a corpus of newspaper articles), they are up-to-date with current knowledge. Yogatama et al. [2015] give another example of such fine-grained ontology. In this paper, we focus on the ENE ontology introduced by Sekine [2008], as it is adapted to the Japanese language.
The work of [Suzuki et al., 2016] uses a Skip-gram model on the hypertext links to learn feature vectors, and as such is a first attempt at taking into account the structure induced by hyperlinks. In their paper, [Shavarani and Sekine, 2019] provide F1-score results for many baselines, on a different dataset extracted from the japanese Wikipedia, using the same ENE structure. It is, after the work of [Suzuki et al., 2016], the closest work from this paper.
Graph neural networks. Learning on graph-based data has been a recent development in machine learning, in order to take into account the rich structure of real-world datasets. Since first work defining a graph neural network model [Scarselli et al., 2008], a large number of models have emerged.
Some of the recent advances in graph neural networks have been loosely inspired by existing natural language processing methods, as for example with NODE2VEC [Grover and Leskovec, 2016], based on WORD2VEC; in turn, [Kipf and Welling, 2017] introduced GCN, which forms the basis for many current graph neural networks, such as GRAPH-SAGE [Hamilton et al., 2017]. These approaches heavily rely on random walks on the graph (or on subgraphs) to gain insight on nodes that are both close and far in the graph, without being intractable in terms of computation [Perozzi et al., 2014].
Wikipedia as graph. Conversely, the many graphs structures in Wikipedia have been thoroughly studied [Suh et al., 2007; Wu et al., 2011], in applications related to natural language processing [Aouicha et al., 2016; Zesch and Gurevych, 2007], or in large-scale graph analysis [Brandes et al., 2009]. In all cases, the most common graph studied is the document-to-document graph [Brandes et al., 2009], as well as the Wikipedia Category Graph [Wu et al., 2011]. Finally, some authors also consider the author-document graph, however since this graph is bipartite there is a lack of metrics to analyse it, and as such it has collected less attention. In this paper, we will only consider the document-to-document graph.
3.1 Data
First of all, let us briefly present the dataset that we use in this paper. We focus on a dataset extracted from the online encyclopedia Wikipedia in the Japanese language.
In the following, we model Wikipedia as a graph G = (V, E), where V is the set of nodes, and is the set of edges without order or duplicates, i.e. for all
,
and (u, v) = (v, u). Elements of V are pages from Wikipedia, and a link (u, v) between page u and v means there is a citation between pages u and v.
This subset encompasses 22, 639 pages that were manually annotated into the 195 classes of the ENE structure. Among those nodes, there are about links, each of those links representing one (or more) hyperlinks between the two linked pages. We use this dataset primarily as a means of comparison to recent works, and in particular [Suzuki et al., 2016].
3.2 Graph modelling for ENE prediction
Let us explain the expected advantage of graph modelling in order to classify Wikipedia pages into the ENE hierarchy.
Figure 1 displays a heat map with each row i (column j) being ENEs, and where each element of the matrix is coloured according to the proportion of ENE j in the neighbourhoods of nodes having ENE i.
From this figure, we can derive different informations; first of all, there are ”star” ENEs, that are highly represented in any node’s neighbourhood. Those typically correspond to Person, Country, Date, i.e. ENEs that are so prevalent that they are likely to be present in any page. However, besides these ”star” ENEs, we notice that (i) the diagonal is highlighted, meaning that if a node u has ENE i, there likely exists (at least) a node v linked to u with the same ENE i, and (ii) we observe the formation of clusters (rectangles in the matrix). These clusters, given the order imposed on the rows and columns of the matrix, validate the fact that any node u with ENE i is more likely to have similar ENEs in its neighborhood.
Moreover, we display in Figure 2 two plots related to the distribution of ENEs in our dataset. Notice that the global distribution is heavily skewed: the most present ENE (Person), represents 18% of the Wikipedia pages in our dataset, whereas at the end of the tail, there are about 10 pages in the dataset. In other words, we are dealing with extremely imbalanced data, due in part to the nature of the dataset as well as the fine grain of the ENE structure itself.
However, Figure 2b shows instead the number of distinct ENEs in the neighbourhood of each page. In that case, the distribution is much less skewed, as only a limited number of ENEs is present in the neighbourhood of each node.
This analysis serves as a basis to show the relevance of the Wikipedia structure for the classification task. In order to take into account the structure induced by Wikipedia articles, we present two developments in the following: integrating the graph information with a combination of hand-crafted graph features and natural language features, and through the introduction of a new aggregator for graph neural networks.
Specifically, we experiment with the GraphSAGE model [Hamilton et al., 2017], a recent model of graph neural network that has outperformed state-of-the-art models. GraphSAGE takes into account the k-hop neighbourhood of the graph (typically with k = 2) to create representations of the nodes that can then be used as features.
3.3 Integrating graph structure as features
Let us now define the graph-theoretical features we use. We use these features for their computability (in polynomial time with respect to the number of nodes), rendering their computation feasible even on large graphs, and because they are classically used as descriptors of real-world graphs. Our goal is to describe the structure around each node, in a way that pages with similar roles in the graph will exhibit similar structure.
The degree of a node u is the number of neighbours of u: d(u) = |N(u)|.
The assortativity is classically defined for an edge (u, v), and quantifies the imbalance between the degrees of nodes u and v. Let us suppose, without loss of generality, that d(u). Then, the mean assortativity for each node is:
Communities are groups of nodes exhibiting similar properties. The definition is highly context-dependent, however a common one is that a community is a group of nodes densely connected inside, and with loose connections outside. Formally, given a graph G = (V, E), a community is group of nodes such that
is high, and
is low, where
is the graph density.
For community detection in our setting, we rely on the Louvain algorithm [Blondel et al., 2008], which produces a partition of the nodes optimizing modularity, and assign the one-hot encoded version community label to each node.
Betweenness centrality, captures the extent to which a node is central, or important [Brandes, 2001]. In this context, the centrality of a node v is defined as the fraction of shortest paths going through v:
In other words, a node v has high betweenness centrality in a graph if it connects (through shortest paths) many different pairs of nodes.
3.4 Taking into account the structure of the ENE hierarchy
Moreover, even though the classes can be seen as independent of each other in first approximation, they stem from a hierarchical definition. Indeed, some ENEs are closer to each other than others, for instance “Country” and “Region” are similar, whereas “Country” and “Food” are not.
This means that even though we are aiming at perfect clas-sification, it is also acceptable to misclassify into a ”neighbouring” ENE.
In order to take into account this hierarchy, we introduce a new aggregator for GraphSAGE, WMEAN:
where is the vector representation of v at iteration k, W is a matrix of weights, N(v) is the neighborhood set of v,
is non-linearity,
is the statistical mean, and P is a vector of size N(v) weighting differently each neighbour depending on its ENE; we define two scenarios for the vector P.
for the sake of simplicity, given a node v and one of its neighbours u, P can take the following values: 0.25 if if
and 1 if ENE(u) = ENE(v). We say that
if the two ENEs have a common parent, i.e. they belong to the same group. The goal of this aggregator is to confer more weight to neighbours of u having the same (or a similar) ENE as u. We refer to this scenario as WMEAN-1.
Figure 1: Heatmap representing the proportion of ENE j in the neighbourhood of nodes having ENE i. ENEs in each row and column are sorted accroding to their order in the ENE hierarchy, which means that closely related ENEs are close to each other. We normalize such that for all row is constant. Notice that this makes the matrix non-symmetric.
(a) Distribution of the number of nodes (pages) per ENE. The most prevalent one, Person, encompasses around 4000 nodes.
(b) Distribution of the number of ENEs in the neighbourhood of each node.
Figure 2: Distributions related to the number of ENEs, demonstrating how taking into account the graph information can fight class imbalance.
Figure 3: Our experimental setup
We also formulate a more flexible function, that, given a hierarchy of classes, ponders the representation inversely proportionally to the distance to the closest common parent between nodes v and u, for all . We refer to this scenario as WMEAN-2.
We focus on the problem of accurately classifying unseen Wikipedia pages in different scenarios, in a semi-supervised setting. In other words, for a given Wikipedia page u, we are interest in accurately predicting its label in the ENE hierarchy.
In particular, we are interested in assessing the potential for contribution of the graph structure for this task.
Figure 3 summarizes our experimental setting. We experiment on different features sets, using only NLP features, using graph-based features, and using node embeddings. Since we are primarily interested in assessing the contribution of graphs, the NLP features are entirely derived from the work described in [Suzuki et al., 2016]. We evaluate the prediction accuracy by reporting the micro F1-score.
Before describing the data and our results, let us finally explain how we perform the train/validation/test split. Indeed, since we are interested in relational data, we believe it is important not to simply select instances at random. This selection method could lead to disconnected graphs in the test setting, lowering the impact that graph models can have. To the extreme, the test graph could be completely comprised of isolated nodes. In order to prevent this, we select training, validation and test in a way that ensures that these graphs are connected (that there is at least a path between all pairs of nodes).
Formally, let us consider a graph G = (V, E), with V a set of nodes and a set of edges. We consider G to be simple, i.e. undirected ((u, v) = (v, u)), unweighted, and without self-loops (
).
Let us partition the set of nodes V into 3 sets, and
, respectively the training, validation and testing sets. These sets are built uniformly at random, such that
The validation is done on the graph , where
. This is the graph induced by the training and validation nodes, where all edges between the
Figure 4: The training, validation and testing graph used in our setting.
validation nodes have been removed.
Finally, the testing is done on the graph . This is the graph where all edges between the test nodes have been removed.
For an illustration of these three graphs and how they are built, see Figure 4.
The algorithm only predicts labels for nodes in the validation phase, and in
in the testing phase.
5.1 Results
Before presenting the results, we tuned the hyperparameters using a random search (1, 000 trials) on the hyperparameter space. The hyperparameter space contains, in addition to the GraphSAGE hyperparameters, the feature set to choose (graph handcrafted features, a subset of the natural language features, node embeddings and their combinations) as well as the aggregator (picked among the four GraphSAGE available ones, or WMEAN-1 and WMEAN-2).
We found that we obtain the best results with two hidden layers, each subsampling respectively 10, then 25 times into each node’s neighbourhood, and when using a combination of node embeddings (of dimension 128).
We present in Table 1 results for the different strategies we elaborated on in Section 3.2 on the dataset presented in Section 3.1. Each reported F1-score is the average of a 10-fold cross-validation.
The first line (in italics) is the result reproduced on this dataset from [Suzuki et al., 2016], and is only provided here for reference. Notice that it yields slightly better results than presented in [Suzuki et al., 2016]; this is only due to the use of a more recent version of MeCab, the tokenizer we use to separate words in Japanese, and no other modifications to the methods presented in [Suzuki et al., 2016] have been made.
From Table 1, we can see that, first of all, graphs alone are the worst performers (line 2). This is expectable, since the task at hand is language-dependent, and the graph structure as we define it has no language related features. However, the hybrid model, combining natural language features and graph embeddings (line 4), performs comparably to the baseline, despite using 20% of the natural language features.
One possible explanation for this stalling may lay into the number of handcrafted graph features, that is very small compared to the number of natural language features, which may lead to their relative lack of impact.
Finally, our two scenarios WMEAN-1 and WMEAN-2 perform significantly better than our baseline, as reported on
Table 1: Micro F1-score for the ENE classification task. Test results that are better than our baseline are highlighted.
lines 5 and 6.
In this paper, we propose a hybrid classifier to classify articles of the online encyclopedia Wikipedia into a given hierarchy of named entities. We focus on state-of-the-art models, and assess the interest of integrating the graph structure in existing classifiers. We show that indeed, the underlying graph structure is of interest for the classification task.
We then design non-uniform aggregator functions for graph neural networks, and demonstrate that we obtain comparable to slightly better F1-scores using significantly less features (roughly 2,000 instead of 10,000); the computation time is also significantly reduced, from a few hours to less than 10 minutes.
We hope that this work provides a proof of concept of the interest of graph-based features for natural language processing tasks.
This work opens many possibilities for future work, that we briefly detail now. First of all, and most importantly, we used an older dataset in order to assess the interest of graph models in a setting where comparison to previous work on the same Extended Named Entity ontology was available. Now that this interest is validated, this opens perspectives to use more recent datasets, that are typically more complete and multilingual. The Shinra-ML project [Sekine et al., 2018] released in October 2019 Wikipedia data in 30 languages, and is as such a dataset of choice for future works on this topic. In this setting, assessing the relevance of graph-based methods in a multilingual setting (where NLP features are harder to design) is a privileged avenue of work.
Now that comparison with the direct state-of-the-art has been done, another track for future work involves redesigning natural language features. Indeed, we suspect that the fact that our performance is hindered by the sheer number of features introduced by [Suzuki et al., 2016], as it minimizes the potential for other-sourced features. In particular, BERT features [Devlin et al., 2018] is an exciting direction that we intend to follow.
From the graph perspective, we only scratched the surface in terms of what is possible and available. This includes, of course, new features, and new models. For example, GraphSAGE, that we base our work upon, heavily relies on neighbourhood sampling to remain computationally efficient even with massive data. The way this sampling is currently done (uniformly at random) is destructive of outliers and events, which could be important in our task. Moreover, previous research shows that graphs have a strong potential to make models more explainable [Viard and Fournier- S’niehotta, 2019]; in our case, for example, this could mean identifying and visualizing subgraphs associated to classifi-cation errors. We hope that this paper serves as a proof of interest, and can be used as a basis for future works.
[Aouicha et al., 2016] Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb, and Malek Ezzeddine. Derivation of “is a” taxonomy from wikipedia category graph. Engineering Applications of Artificial Intelligence, 50:265–286, 2016.
[Blondel et al., 2008] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
[Brandes et al., 2009] Ulrik Brandes, Patrick Kenis, J¨urgen Lerner, and Denise Van Raaij. Network analysis of collaboration structure in wikipedia. In Proceedings of the 18th international conference on World wide web, pages 731–740. ACM, 2009.
[Brandes, 2001] Ulrik Brandes. A faster algorithm for be- tweenness centrality. Journal of mathematical sociology, 25(2):163–177, 2001.
[Dakka and Cucerzan, 2008] Wisam Dakka and Silviu Cucerzan. Augmenting wikipedia with named entity tags. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I, 2008.
[Devlin et al., 2018] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[Fensel, 2001] Dieter Fensel. Ontologies. In Ontologies, pages 11–18. Springer, 2001.
[Grover and Leskovec, 2016] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864. ACM, 2016.
[Hamilton et al., 2017] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large
graphs. In Advances in Neural Information Processing Systems, pages 1024–1034, 2017.
[Hinnosaar et al., 2019] Marit Hinnosaar, Toomas Hinnosaar, Michael E Kummer, and Olga Slivko. Wikipedia matters. Available at SSRN 3046400, 2019.
[Kipf and Welling, 2017] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, 2017.
[Kittur et al., 2007] Aniket Kittur, Bongwon Suh, Bryan A Pendleton, and Ed H Chi. He says, she says: conflict and coordination in wikipedia. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 453–462. ACM, 2007.
[Lehmann et al., 2015] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, S¨oren Auer, et al. Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2):167–195, 2015.
[Lucassen and Schraagen, 2010] Teun Lucassen and Jan Maarten Schraagen. Trust in wikipedia: how users trust information from an unknown source. In Proceedings of the 4th workshop on Information credibility, pages 19–26. ACM, 2010.
[Mahdisoltani et al., 2013] Farzaneh Mahdisoltani, Joanna Biega, and Fabian M Suchanek. Yago3: A knowledge base from multilingual wikipedias. 2013.
[Nothman et al., 2013] Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R Curran. Learning multilingual named entity recognition from wikipedia. Ar-tificial Intelligence, 194:151–175, 2013.
[Perozzi et al., 2014] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
[Scarselli et al., 2008] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2008.
[Schlichtkrull et al., 2017] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. arXiv preprint arXiv:1703.06103, 2017.
[Sekine et al., 2018] Satoshi Sekine, Akio Kobayashi, and Kouta Nakayama. Shinra: Structuring wikipedia by collaborative contribution. 2018.
[Sekine, 2008] Satoshi Sekine. Extended named entity on- tology with attribute information. In LREC, pages 52–57, 2008.
[Shavarani and Sekine, 2019] Hassan S Shavarani and Satoshi Sekine. Multi-class multilingual classification of
wikipedia articles using extended named entity tag set. arXiv preprint arXiv:1909.06502, 2019.
[Suh et al., 2007] Bongwon Suh, Ed H Chi, Bryan A Pendleton, and Aniket Kittur. Us vs. them: Understanding social dynamics in wikipedia with revert graph visualizations. In 2007 IEEE Symposium on Visual Analytics Science and Technology, pages 163–170. IEEE, 2007.
[Suzuki et al., 2016] Masatoshi Suzuki, Koji Matsuda, Satoshi Sekine, Naoaki Okazaki, and Kentaro Inui. Fine-grained named entity classification with wikipedia article vectors. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 483–486. IEEE, 2016.
[Torisawa, 2007] Kentaro Torisawa. Exploiting wikipedia as external knowledge for named entity recognition. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pages 698– 707, 2007.
[Tsourakakis et al., 2009] Charalampos E Tsourakakis, U Kang, Gary L Miller, and Christos Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 837–846. ACM, 2009.
[Viard and Fournier-S’niehotta, 2019] Tiphaine Viard and Rapha¨el Fournier-S’niehotta. Augmenting content-based rating prediction with link stream features. Computer Networks, 150:127–133, 2019.
[Wu and Weld, 2008] Fei Wu and Daniel S Weld. Automati- cally refining the wikipedia infobox ontology. In Proceedings of the 17th international conference on World Wide Web, pages 635–644. ACM, 2008.
[Wu et al., 2011] Guangyu Wu, Martin Harrigan, and P´adraig Cunningham. Characterizing wikipedia pages using edit network motif profiles. In Proceedings of the 3rd international workshop on Search and mining usergenerated contents, pages 45–52. ACM, 2011.
[Yogatama et al., 2015] Dani Yogatama, Daniel Gillick, and Nevena Lazic. Embedding methods for fine grained entity type classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 291–296, 2015.
[Zesch and Gurevych, 2007] Torsten Zesch and Iryna Gurevych. Analysis of the wikipedia category graph for nlp applications. In Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing, pages 1–8, 2007.