A Survey on Knowledge Graphs: Representation, Acquisition and Applications

2020·arXiv

Abstract

I. INTRODUCTION

INCORPORATING human knowledge is one of the re-search directions of artificial intelligence (AI). Knowledge representation and reasoning, inspired by human problem solving, is to represent knowledge for intelligent systems to gain the ability to solve complex tasks [1], [2]. Recently, knowledge graphs as a form of structured human knowledge have drawn great research attention from both the academia and the industry [3]–[6]. A knowledge graph is a structured representation of facts, consisting of entities, relationships, and semantic descriptions. Entities can be real-world objects and abstract concepts, relationships represent the relation

Manuscript received August 09, 2020; revised November xx, 2020; accepted March 30, 2021. This work is supported in part by NSF under grants III-1763325, III-1909323, SaTC-1930941, in part by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (Project #A18A2b0046), and in part by the Academy of Finland (grants 336033, 315896), BusinessFinland (grant 884/31/2018), and EU H2020 (grant 101016775). (Corresponding author: Shirui Pan.)

S. Ji and P. Marttinen are with Aalto University, Finland. E-mail: {shaoxiong.ji; pekka.marttinen}@aalto.fi

S. Pan is with the Department of Data Science and AI, Faculty of IT, Monash University, Australia. E-mail: shirui.pan@monash.edu

E. Cambria is with Nanyang Technological University, Singapore. Email: cambria@ntu.edu.sg

between entities, and semantic descriptions of entities, and their relationships contain types and properties with a well-defined meaning. Property graphs or attributed graphs are widely used, in which nodes and relations have properties or attributes.

The term of knowledge graph is synonymous with knowledge base with a minor difference. A knowledge graph can be viewed as a graph when considering its graph structure [7]. When it involves formal semantics, it can be taken as a knowledge base for interpretation and inference over facts [8]. Examples of knowledge base and knowledge graph are illustrated in Fig. 1. Knowledge can be expressed in a factual triple in the form of (head, relation, tail) or (subject, predicate, object) under the resource description framework (RDF), for example, (Albert Einstein, WinnerOf, Nobel Prize). It can also be represented as a directed graph with nodes as entities and edges as relations. For simplicity and following the trend of the research community, this paper uses the terms knowledge graph and knowledge base interchangeably.

Fig. 1: An example of knowledge base and knowledge graph.

Recent advances in knowledge-graph-based research focus on knowledge representation learning (KRL) or knowledge graph embedding (KGE) by mapping entities and relations into low-dimensional vectors while capturing their semantic meanings [5], [9]. Specific knowledge acquisition tasks include knowledge graph completion (KGC), triple classification, entity recognition, and relation extraction. Knowledge-aware models benefit from the integration of heterogeneous information, rich ontologies and semantics for knowledge representation, and multi-lingual knowledge. Thus, many real-world applications such as recommendation systems and question answering have been brought about prosperity with the ability of commonsense understanding and reasoning. Some real-world products, for example, Microsoft’s Satori and Google’s Knowledge Graph [3], have shown a strong capacity to provide more efficient services.

This paper conducts a comprehensive survey of current literature on knowledge graphs, which enriches graphs with more context, intelligence, and semantics for knowledge acquisition and knowledge-aware applications. Our main contributions are summarized as follows.

• Comprehensive review. We conduct a comprehensive review of the origin of knowledge graph and modern techniques for relational learning on knowledge graphs. Major neural architectures of knowledge graph representation learning and reasoning are introduced and compared. Moreover, we provide a complete overview of many applications in different domains.

• Full-view categorization and new taxonomies. A full-view categorization of research on knowledge graph, together with fine-grained new taxonomies are presented. Specifically, in the high-level, we review the research on knowledge graphs in four aspects: KRL, knowledge acquisition, temporal knowledge graphs, and knowledge-aware applications. For KRL, we further propose fine-grained taxonomies into four views, including representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, KGC is reviewed under embedding-based ranking, relational path reasoning, logical rule reasoning, and meta relational learning; entity acquisition tasks are divided into entity recognition, typing, disambiguation, and alignment; and relation extraction is discussed according to the neural paradigms.

• Wide coverage on emerging advances. We provide wide coverage on emerging topics, including transformer-based knowledge encoding, graph neural network (GNN) based knowledge propagation, reinforcement learning-based path reasoning, and meta relational learning.

• Summary and outlook on future directions. This survey provides a summary of each category and highlights promising future research directions.

The remainder of this survey is organized as follows: first, an overview of knowledge graphs including history, notations, definitions, and categorization is given in Section II; then, we discuss KRL in Section III from four scopes; next, our review goes to tasks of knowledge acquisition and temporal knowledge graphs in Section IV and Section V; downstream applications are introduced in Section VI; finally, we discuss future research directions, together with a conclusion in the end. Other information, including KRL model training and a collection of knowledge graph datasets and open-source implementations, can be found in the appendices.

II. OVERVIEW

A. A Brief History of Knowledge Bases

Knowledge representation has experienced a long-period history of development in the fields of logic and AI. The idea of graphical knowledge representation firstly dated back to 1956 as the concept of semantic net proposed by Richens [10], while the symbolic logic knowledge can go back to the General Problem Solver [1] in 1959. The knowledge base is firstly used with knowledge-based systems for reasoning and problemsolving. MYCIN [2] is one of the most famous rule-based expert systems for medical diagnosis with a knowledge base of about 600 rules. Later, the community of human knowledge representation saw the development of frame-based language, rule-based, and hybrid representations. Approximately at the end of this period, the Cyc project1 began, aiming at assembling human knowledge. Resource description framework (RDF)2 and Web Ontology Language (OWL)3 were released in turn, and became important standards of the Semantic Web4. Then, many open knowledge bases or ontologies were published, such as WordNet, DBpedia, YAGO, and Freebase. Stokman and Vries [7] proposed a modern idea of structure knowledge in a graph in 1988. However, it was in 2012 that the concept of knowledge graph gained great popularity since its first launch by Google’s search engine5, where the knowledge fusion framework called Knowledge Vault [3] was proposed to build large-scale knowledge graphs. A brief road map of knowledge base history is illustrated in Fig. 10 in Appendix A. Many general knowledge graph databases and domain-specific knowledge bases have been released to facilitate research. We introduce more general and domain-specific knowledge bases in Appendices F-A1 and F-A2.

B. Definitions and Notations

Most efforts have been made to give a definition by describing general semantic representation or essential characteristics. However, there is no such wide-accepted formal definition. Paulheim [11] defined four criteria for knowledge graphs. Ehrlinger and W¨oß [12] analyzed several existing definitions and proposed Definition 1, which emphasizes the reasoning engine of knowledge graphs. Wang et al. [5] proposed a definition as a multi-relational graph in Definition 2. Following previous literature, we define a knowledge graph as G = {E, R, F}, where E, R and F are sets of entities, relations and facts, respectively. A fact is denoted as a triple

Definition 1 (Ehrlinger and W¨oß [12]). A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.

Definition 2 (Wang et al. [5]). A knowledge graph is a multi-relational graph composed of entities and relations which are regarded as nodes and different types of edges, respectively.

Specific notations and their descriptions are listed in Table I. Details of several mathematical operations are explained in Appendix B.

C. Categorization of Research on Knowledge Graph

This survey provides a comprehensive literature review on the research of knowledge graphs, namely KRL, knowledge acquisition, and a wide range of downstream knowledge-aware applications, where many recent advanced deep learning techniques are integrated. The overall categorization of the research is illustrated in Fig. 2.

TABLE I: Notations and descriptions.

Knowledge Representation Learning is a critical research issue of knowledge graph which paves the way for many knowledge acquisition tasks and downstream applications. We categorize KRL into four aspects of representation space, scoring function, encoding models and auxiliary information, providing a clear workflow for developing a KRL model. Specific ingredients include:

1) representation space in which the relations and entities are represented;

2) scoring function for measuring the plausibility of factual triples;

3) encoding models for representing and learning relational interactions;

4) auxiliary information to be incorporated into the embedding methods.

Representation learning includes point-wise space, manifold, complex vector space, Gaussian distribution, and discrete space. Scoring metrics are generally divided into distance-based and similarity matching based scoring functions. Current research focuses on encoding models, including linear/bilinear models, factorization, and neural networks. Auxiliary information considers textual, visual, and type information.

Knowledge Acquisition tasks are divided into three categories, i.e., KGC, relation extraction, and entity discovery. The first one is for expanding existing knowledge graphs, while the other two discover new knowledge (aka relations and entities) from the text. KGC falls into the following categories: embedding-based ranking, relation path reasoning, rule-based reasoning, and meta relational learning. Entity discovery includes recognition, disambiguation, typing, and alignment. Relation extraction models utilize attention mechanism, graph convolutional networks (GCNs), adversarial training, reinforcement learning, deep residual learning, and transfer learning.

Temporal Knowledge Graphs incorporate temporal information for representation learning. This survey categorizes four research fields, including temporal embedding, entity dynamics, temporal relational dependency, and temporal logical reasoning.

Knowledge-aware Applications include natural language understanding (NLU), question answering, recommendation systems, and miscellaneous real-world tasks, which inject knowledge to improve representation learning.

Fig. 2: Categorization of research on knowledge graphs.

D. Related Surveys

Previous survey papers on knowledge graphs mainly focus on statistical relational learning [4], knowledge graph refinement [11], Chinese knowledge graph construction [13], knowledge reasoning [14], KGE [5] or KRL [9]. The latter two surveys are more related to our work. Lin et al. [9] presented KRL in a linear manner, with a concentration on quantitative analysis. Wang et al. [5] categorized KRL according to scoring functions and specifically focused on the type of information utilized in KRL. It provides a general view of current research only from the perspective of scoring metrics. Our survey goes deeper to the flow of KRL and provides a full-scaled view from four-folds, including representation space, scoring function, encoding models, and auxiliary information. Besides, our paper provides a comprehensive review of knowledge acquisition and knowledge-aware applications with several emerging topics such as knowledge-graph-based reasoning and few-shot learning discussed.

III. KNOWLEDGE REPRESENTATION LEARNING

KRL is also known as KGE, multi-relation learning, and statistical relational learning in the literature. This section reviews recent advances on distributed representation learning with rich semantic information of entities and relations form four scopes including representation space (representing entities and relations, Sec. III-A), scoring function (measuring the plausibility of facts, Sec. III-B), encoding models (modeling the semantic interaction of facts, Sec. III-C), and auxiliary information (utilizing external information, Sec. III-D). We further provide a summary in Sec. III-E. The training strategies for KRL models are reviewed in Appendix D.

A. Representation Space

The key issue of representation learning is to learn low-dimensional distributed embedding of entities and relations. Current literature mainly uses real-valued point-wise space (Fig. 3a) including vector, matrix and tensor space, while other kinds of space such as complex vector space (Fig. 3b), Gaussian space (Fig. 3c), and manifold (Fig. 3d) are utilized as well. The embedding space should follow three conditions, i.e., differentiability, calculation possibility, and definability of a scoring function [15].

1) Point-Wise Space: Point-wise Euclidean space is widely applied for representing entities and relations, projecting relation embedding in vector or matrix space, or capturing relational interactions. TransE [16] represents entities and relations in d-dimension vector space, i.e., , and makes embeddings follow the translational principle To tackle this problem of insufficiency of a single space for both entities and relations, TransR [17] then further introduces separated spaces for entities and relations. The authors projected entities () into relation () space by a projection matrix . NTN [18] models entities across multiple dimensions by a bilinear tensor neural layer. The relational interaction between head and tail is captured as a tensor denoted as . Instead of using the Cartesian coordinate system, HAKE [19] captures semantic hierarchies by mapping entities into the polar coordinate system, i.e., entity embeddings and in the modulus and phase part, respectively.

Many other translational models such as TransH [20] also use similar representation space, while semantic matching models use plain vector space (e.g., HolE [21]) and relational projection matrix (e.g., ANALOGY [22]). Principles of these translational and semantic matching models are introduced in Section III-B1 and III-B2, respectively.

2) Complex Vector Space: Instead of using a real-valued space, entities and relations are represented in a complex space, where . Take head entity as an example, h has a real part Re(h) and an imaginary part Im(h), i.e., h = Re(h)+i Im(h). ComplEx [23] firstly introduces complex vector space shown in Fig. 3b which can capture both symmetric and antisymmetric relations. Hermitian dot product is used to do composition for relation, head and the conjugate of tail. Inspired by Euler’s identity , RotatE [24] proposes a rotational model taking relation as a rotation from head entity to tail entity in complex space as denotes the element-wise Hadmard product. QuatE [25] extends the complex-valued space into hypercomplex by a quaternion Q = a + bi + cj + dk with three imaginary components, where the quaternion inner product, i.e., the Hamilton product , is used as compositional operator for head entity and relation. With the introduction of the rotational Hadmard product in complex space, RotatE [24] can also capture inversion and composition patterns as well as symmetry and antisymmetry. QuatE [25] uses Hamilton product to capture latent inter-dependencies within the four-dimensional space of entities and relations and gains a more expressive rotational capability than RotatE.

3) Gaussian Distribution: Inspired by Gaussian word embedding, the density-based embedding model KG2E [26] introduces Gaussian distribution to deal with the (un)certainties of entities and relations. The authors embedded entities and relations into multi-dimensional Gaussian distribution . The mean vector u indicates entities and relations’ position, and the covariance matrix models their (un)certainties. Following the translational principle, the probability distribution of entity transformation is denoted as . Similarly, TransG [27] represents entities with Gaussian distributions, while it draws a mixture of Gaussian distribution for relation embedding, where the m-th component translation vector of relation r is denoted as

4) Manifold and Group: This section reviews knowledge representation in manifold space, Lie group, and dihedral group. A manifold is a topological space, which could be defined as a set of points with neighborhoods by the set theory. The group is algebraic structures defined in abstract algebra. Previous point-wise modeling is an ill-posed algebraic system where the number of scoring equations is far more than the number of entities and relations. Moreover, embeddings are restricted in an overstrict geometric form even in some methods with subspace projection. To tackle these issues, ManifoldE [28] extends point-wise embedding into manifold-based embedding. The authors introduced two settings of manifold-based embedding, i.e., Sphere and Hyperplane. An example of a sphere is shown in Fig. 3d. For the sphere setting, Reproducing Kernel Hilbert Space is used to represent the manifold function. Another “hyperplane” setting is introduced to enhance the model with intersected embeddings. ManifoldE [28] relaxes the real-valued point-wise space into manifold space with a more expressive representation from the geometric perspective. When the manifold function and relation-specific manifold parameter are set to zero, the manifold collapses into a point.

Hyperbolic space, a multidimensional Riemannian manifold with a constant negative curvature , is drawing attention for its capacity of capturing hierarchical information. MuRP [29] represents the multi-relational knowledge graph in Poincar´e ball of hyperbolic space . While it fails to capture logical patterns and suffers from constant curvature. Chami et al. [30] leverages expressive hyperbolic isometries and learns a relation-specific absolute curvature in the hyperbolic space.

TorusE [15] solves the regularization problem of TransE via embedding in an n-dimensional torus space which is a compact Lie group. With the projection from vector space into torus space defined as , entities and relations are denoted as . Similar to TransE, it also learns embeddings following the relational translation in torus space, i.e., . Recently, DihEdral [31] proposes a dihedral symmetry group preserving a 2-dimensional polygon. It utilizes a finite non-Abelian group to preserve the relational properties of symmetry/skew-symmetry, inversion, and composition effectively with the rotation and reflection properties in the dihedral group.

Fig. 3: An illustration of knowledge representation in different spaces.

B. Scoring Function

The scoring function is used to measure the plausibility of facts, also referred to as the energy function in the energy-based learning framework. Energy-based learning aims to learn the energy function (parameterized by taking x as input) and to make sure positive samples have higher scores than negative samples. In this paper, the term of the scoring function is adopted for unification. There are two typical types of scoring functions, i.e., distance-based (Fig. 4a) and similarity-based (Fig. 4b) functions, to measure the plausibility of a fact. Distance-based scoring function measures the plausibility of facts by calculating the distance between entities, where addictive translation with relations as is widely used. Semantic similarity based scoring measures the plausibility of facts by semantic matching. It usually adopts a multiplicative formulation, i.e., , to transform head entity near the tail in the representation space.

Fig. 4: Illustrations of distance-based and similarity matching based scoring functions taking TransE [16] and DistMult [32] as examples.

1) Distance-based Scoring Function: An intuitive distance-based approach is to calculate the Euclidean distance between the relational projection of entities. Structural Embedding (SE) [8] uses two projection matrices and distance to learn structural embedding as

A more intensively used principle is the translation-based scoring function that aims to learn embeddings by representing relations as translations from head to tail entities. Bordes et al. [16] proposed TransE by assuming that the added embedding of h+r should be close to the embedding of t with the scoring function is defined under constraints as

Since that, many variants and extensions of TransE have been proposed. For example, TransH [20] projects entities and relations into a hyperplane, TransR [17] introduces separate projection spaces for entities and relations, and TransD [33] constructs dynamic mapping matrices and by the projection vectors replacing Euclidean distance, TransA [34] uses Mahalanobis distance to enable more adaptive metric learning. Previous methods used additive score functions, TransF [35] relaxes the strict translation and uses dot product as To balance the constraints on head and tail, a flexible translation scoring function is further proposed.

Recently, ITransF [36] enables hidden concepts discovery and statistical strength transferring by learning associations between relations and concepts via sparse attention vectors, with scoring function defined as

where is stacked concept projection matrices of entities and relations and are attention vectors calculated by sparse softmax, TransAt [37] integrates relation attention mechanism with translational embedding, and TransMS [38] transmits multi-directional semantics with nonlinear functions and linear bias vectors, with the scoring function as

KG2E [26] in Gaussian space and ManifoldE [28] with manifold also use the translational distance-based scoring function. KG2E uses two scoring methods, i.e, asymmetric KL-divergence and symmetric expected likelihood. While the scoring function of ManifoldE is defined as

where M is the manifold function, and is a relation-specific manifold parameter.

2) Semantic Matching: Another direction is to calculate the semantic similarity. SME [39] proposes to semantically match separate combinations of entity-relation pairs of (h, r) and (r, t). Its scoring function is defined with two versions of matching blocks - linear and bilinear block, i.e.,

The linear matching block is defined as , and the bilinear form is . By restricting relation matrix to be diagonal for multi-relational representation learning, DistMult [32] proposes a simplified bilinear formulation defined as

To capture productive interactions in relational data and compute efficiently, HolE [21] introduces a circular correlation of embedding, which can be interpreted as a compressed tensor product, to learn compositional representations. By defining a perturbed holographic compositional operator as is a fixed vector, the expanded holographic embedding model HolEx [40] interpolates the HolE and full tensor product method. It can be viewed as linear concatenation of perturbed HolE. Focusing on multi-relational inference, ANALOGY [22] models analogical structures of relational data. It’s scoring function is defined as

with relation matrix constrained to be normal matrices in linear mapping, i.e., for analogical inference. HolE with Fourier transformed in the frequency domain can be viewed as a special case of ComplEx [41], which connects holographic and complex embeddings. The analogical embedding framework [22] can recover or equivalently obtain several models such as DistMult, ComplEx and HolE by restricting the embedding dimension and scoring function. Crossover interactions are introduced by CrossE [42] with an interaction matrix to simulate the bi-directional interaction between entity and relation. The relation specific interaction is obtained by looking up interaction matrix as . By combining the interactive representations and matching with tail embedding, the scoring function is defined as

The semantic matching principle can be encoded by neural networks further discussed in Sec. III-C.

The two methods mentioned above in Sec. III-A4 with group representation also follow the semantic matching principle. The scoring function of TorusE [15] is defined as:

By modeling 2L relations as group elements, the scoring function of DihEdral [31] is defined as the summation of components:

where the relation matrix R is defined in block diagonal form for , and entities are embedded in real-valued space for

C. Encoding Models

This section introduces models that encode the interactions of entities and relations through specific model architectures, including linear/bilinear models, factorization models, and neural networks. Linear models formulate relations as a linear/bilinear mapping by projecting head entities into a representation space close to tail entities. Factorization aims to decompose relational data into low-rank matrices for representation learning. Neural networks encode relational data with non-linear neural activation and more complex network structures by matching semantic similarity of entities and relations. Several neural models are illustrated in Fig. 5.

1) Linear/Bilinear Models: Linear/bilinear models encode interactions of entities and relations by applying linear operation as:

or bilinear transformation operations as Eq. 8. Canonical methods with linear/bilinear encoding include SE [8], SME [39], DistMult [32], ComplEx [23], and ANALOGY [22]. For TransE [16] with L2 regularization, the scoring function can be expanded to the form with only linear transformation with one-dimensional vectors, i.e.,

Wang et al. [47] studied various bilinear models and evaluated their expressiveness and connections by introducing the concepts of universality and consistency. The authors further showed that the ensembles of multiple linear models can improve the prediction performance through experiments. Recently, to solve the independence embedding issue of entity vectors in canonical Polyadia decomposition, SimplE [48] introduces the inverse of relations and calculates the average canonical Polyadia score of

where is the embedding of inversion relation. Embedding models in the bilinear family such as RESCAL, DistMult, HolE and ComplEx can be transformed from one into another with certain constraints [47]. More bilinear models are proposed from a factorization perspective discussed in the next section.

2) Factorization Models: Factorization methods formulated KRL models as three-way tensor X decomposition. A general principle of tensor factorization can be denoted as , with the composition function following the semantic matching pattern. Nickel et al. [49] proposed the three-way rank-r factorization RESCAL over each relational slice of knowledge graph tensor. For k-th relation of m relations, the k-th slice of X is factorized as

The authors further extended it to handle attributes of entities efficiently [50]. Jenatton et al. [51] then proposed a bilinear