A Survey on Knowledge Graphs: Representation, Acquisition and Applications

2020·Arxiv

Abstract

Abstract

Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review of knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed. We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions.

Index Terms—Knowledge graph, representation learning, knowledge graph completion, relation extraction, reasoning, deep learning.

I. INTRODUCTION

INCORPORATING human knowledge is one of the re-search directions of artificial intelligence (AI). Knowledge representation and reasoning, inspired by human problem solving, is to represent knowledge for intelligent systems to gain the ability to solve complex tasks [1], [2]. Recently, knowledge graphs as a form of structured human knowledge have drawn great research attention from both the academia and the industry [3]–[6]. A knowledge graph is a structured representation of facts, consisting of entities, relationships, and semantic descriptions. Entities can be real-world objects and abstract concepts, relationships represent the relation

Manuscript received August 09, 2020; revised November xx, 2020; accepted March 30, 2021. This work is supported in part by NSF under grants III-1763325, III-1909323, SaTC-1930941, in part by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (Project #A18A2b0046), and in part by the Academy of Finland (grants 336033, 315896), BusinessFinland (grant 884/31/2018), and EU H2020 (grant 101016775). (Corresponding author: Shirui Pan.)

S. Ji and P. Marttinen are with Aalto University, Finland. E-mail: {shaoxiong.ji; pekka.marttinen}@aalto.fi

S. Pan is with the Department of Data Science and AI, Faculty of IT, Monash University, Australia. E-mail: shirui.pan@monash.edu

E. Cambria is with Nanyang Technological University, Singapore. Email: cambria@ntu.edu.sg

between entities, and semantic descriptions of entities, and their relationships contain types and properties with a well-defined meaning. Property graphs or attributed graphs are widely used, in which nodes and relations have properties or attributes.

The term of knowledge graph is synonymous with knowledge base with a minor difference. A knowledge graph can be viewed as a graph when considering its graph structure [7]. When it involves formal semantics, it can be taken as a knowledge base for interpretation and inference over facts [8]. Examples of knowledge base and knowledge graph are illustrated in Fig. 1. Knowledge can be expressed in a factual triple in the form of (head, relation, tail) or (subject, predicate, object) under the resource description framework (RDF), for example, (Albert Einstein, WinnerOf, Nobel Prize). It can also be represented as a directed graph with nodes as entities and edges as relations. For simplicity and following the trend of the research community, this paper uses the terms knowledge graph and knowledge base interchangeably.

Fig. 1: An example of knowledge base and knowledge graph.

Recent advances in knowledge-graph-based research focus on knowledge representation learning (KRL) or knowledge graph embedding (KGE) by mapping entities and relations into low-dimensional vectors while capturing their semantic meanings [5], [9]. Specific knowledge acquisition tasks include knowledge graph completion (KGC), triple classification, entity recognition, and relation extraction. Knowledge-aware models benefit from the integration of heterogeneous information, rich ontologies and semantics for knowledge representation, and multi-lingual knowledge. Thus, many real-world applications such as recommendation systems and question answering have been brought about prosperity with the ability of commonsense understanding and reasoning. Some real-world products, for example, Microsoft’s Satori and Google’s Knowledge Graph [3], have shown a strong capacity to provide more efficient services.

This paper conducts a comprehensive survey of current literature on knowledge graphs, which enriches graphs with more context, intelligence, and semantics for knowledge acquisition and knowledge-aware applications. Our main contributions are summarized as follows.

• Comprehensive review. We conduct a comprehensive review of the origin of knowledge graph and modern techniques for relational learning on knowledge graphs. Major neural architectures of knowledge graph representation learning and reasoning are introduced and compared. Moreover, we provide a complete overview of many applications in different domains.

• Full-view categorization and new taxonomies. A full-view categorization of research on knowledge graph, together with fine-grained new taxonomies are presented. Specifically, in the high-level, we review the research on knowledge graphs in four aspects: KRL, knowledge acquisition, temporal knowledge graphs, and knowledge-aware applications. For KRL, we further propose fine-grained taxonomies into four views, including representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, KGC is reviewed under embedding-based ranking, relational path reasoning, logical rule reasoning, and meta relational learning; entity acquisition tasks are divided into entity recognition, typing, disambiguation, and alignment; and relation extraction is discussed according to the neural paradigms.

• Wide coverage on emerging advances. We provide wide coverage on emerging topics, including transformer-based knowledge encoding, graph neural network (GNN) based knowledge propagation, reinforcement learning-based path reasoning, and meta relational learning.

• Summary and outlook on future directions. This survey provides a summary of each category and highlights promising future research directions.

The remainder of this survey is organized as follows: first, an overview of knowledge graphs including history, notations, definitions, and categorization is given in Section II; then, we discuss KRL in Section III from four scopes; next, our review goes to tasks of knowledge acquisition and temporal knowledge graphs in Section IV and Section V; downstream applications are introduced in Section VI; finally, we discuss future research directions, together with a conclusion in the end. Other information, including KRL model training and a collection of knowledge graph datasets and open-source implementations, can be found in the appendices.

II. OVERVIEW

A. A Brief History of Knowledge Bases

Knowledge representation has experienced a long-period history of development in the fields of logic and AI. The idea of graphical knowledge representation firstly dated back to 1956 as the concept of semantic net proposed by Richens [10], while the symbolic logic knowledge can go back to the General Problem Solver [1] in 1959. The knowledge base is firstly used with knowledge-based systems for reasoning and problemsolving. MYCIN [2] is one of the most famous rule-based expert systems for medical diagnosis with a knowledge base of about 600 rules. Later, the community of human knowledge representation saw the development of frame-based language, rule-based, and hybrid representations. Approximately at the end of this period, the Cyc project1 began, aiming at assembling human knowledge. Resource description framework (RDF)2 and Web Ontology Language (OWL)3 were released in turn, and became important standards of the Semantic Web4. Then, many open knowledge bases or ontologies were published, such as WordNet, DBpedia, YAGO, and Freebase. Stokman and Vries [7] proposed a modern idea of structure knowledge in a graph in 1988. However, it was in 2012 that the concept of knowledge graph gained great popularity since its first launch by Google’s search engine5, where the knowledge fusion framework called Knowledge Vault [3] was proposed to build large-scale knowledge graphs. A brief road map of knowledge base history is illustrated in Fig. 10 in Appendix A. Many general knowledge graph databases and domain-specific knowledge bases have been released to facilitate research. We introduce more general and domain-specific knowledge bases in Appendices F-A1 and F-A2.

B. Definitions and Notations

Most efforts have been made to give a definition by describing general semantic representation or essential characteristics. However, there is no such wide-accepted formal definition. Paulheim [11] defined four criteria for knowledge graphs. Ehrlinger and W¨oß [12] analyzed several existing definitions and proposed Definition 1, which emphasizes the reasoning engine of knowledge graphs. Wang et al. [5] proposed a definition as a multi-relational graph in Definition 2. Following previous literature, we define a knowledge graph as G = {E, R, F}, where E, R and F are sets of entities, relations and facts, respectively. A fact is denoted as a triple

Definition 1 (Ehrlinger and W¨oß [12]). A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.

Definition 2 (Wang et al. [5]). A knowledge graph is a multi-relational graph composed of entities and relations which are regarded as nodes and different types of edges, respectively.

Specific notations and their descriptions are listed in Table I. Details of several mathematical operations are explained in Appendix B.

C. Categorization of Research on Knowledge Graph

This survey provides a comprehensive literature review on the research of knowledge graphs, namely KRL, knowledge acquisition, and a wide range of downstream knowledge-aware applications, where many recent advanced deep learning techniques are integrated. The overall categorization of the research is illustrated in Fig. 2.

TABLE I: Notations and descriptions.

Knowledge Representation Learning is a critical research issue of knowledge graph which paves the way for many knowledge acquisition tasks and downstream applications. We categorize KRL into four aspects of representation space, scoring function, encoding models and auxiliary information, providing a clear workflow for developing a KRL model. Specific ingredients include:

1) representation space in which the relations and entities are represented;

2) scoring function for measuring the plausibility of factual triples;

3) encoding models for representing and learning relational interactions;

4) auxiliary information to be incorporated into the embedding methods.

Representation learning includes point-wise space, manifold, complex vector space, Gaussian distribution, and discrete space. Scoring metrics are generally divided into distance-based and similarity matching based scoring functions. Current research focuses on encoding models, including linear/bilinear models, factorization, and neural networks. Auxiliary information considers textual, visual, and type information.

Knowledge Acquisition tasks are divided into three categories, i.e., KGC, relation extraction, and entity discovery. The first one is for expanding existing knowledge graphs, while the other two discover new knowledge (aka relations and entities) from the text. KGC falls into the following categories: embedding-based ranking, relation path reasoning, rule-based reasoning, and meta relational learning. Entity discovery includes recognition, disambiguation, typing, and alignment. Relation extraction models utilize attention mechanism, graph convolutional networks (GCNs), adversarial training, reinforcement learning, deep residual learning, and transfer learning.

Temporal Knowledge Graphs incorporate temporal information for representation learning. This survey categorizes four research fields, including temporal embedding, entity dynamics, temporal relational dependency, and temporal logical reasoning.

Knowledge-aware Applications include natural language understanding (NLU), question answering, recommendation systems, and miscellaneous real-world tasks, which inject knowledge to improve representation learning.

Fig. 2: Categorization of research on knowledge graphs.

D. Related Surveys

Previous survey papers on knowledge graphs mainly focus on statistical relational learning [4], knowledge graph refinement [11], Chinese knowledge graph construction [13], knowledge reasoning [14], KGE [5] or KRL [9]. The latter two surveys are more related to our work. Lin et al. [9] presented KRL in a linear manner, with a concentration on quantitative analysis. Wang et al. [5] categorized KRL according to scoring functions and specifically focused on the type of information utilized in KRL. It provides a general view of current research only from the perspective of scoring metrics. Our survey goes deeper to the flow of KRL and provides a full-scaled view from four-folds, including representation space, scoring function, encoding models, and auxiliary information. Besides, our paper provides a comprehensive review of knowledge acquisition and knowledge-aware applications with several emerging topics such as knowledge-graph-based reasoning and few-shot learning discussed.

III. KNOWLEDGE REPRESENTATION LEARNING

KRL is also known as KGE, multi-relation learning, and statistical relational learning in the literature. This section reviews recent advances on distributed representation learning with rich semantic information of entities and relations form four scopes including representation space (representing entities and relations, Sec. III-A), scoring function (measuring the plausibility of facts, Sec. III-B), encoding models (modeling the semantic interaction of facts, Sec. III-C), and auxiliary information (utilizing external information, Sec. III-D). We further provide a summary in Sec. III-E. The training strategies for KRL models are reviewed in Appendix D.

A. Representation Space

The key issue of representation learning is to learn low-dimensional distributed embedding of entities and relations. Current literature mainly uses real-valued point-wise space (Fig. 3a) including vector, matrix and tensor space, while other kinds of space such as complex vector space (Fig. 3b), Gaussian space (Fig. 3c), and manifold (Fig. 3d) are utilized as well. The embedding space should follow three conditions, i.e., differentiability, calculation possibility, and definability of a scoring function [15].

1) Point-Wise Space: Point-wise Euclidean space is widely applied for representing entities and relations, projecting relation embedding in vector or matrix space, or capturing relational interactions. TransE [16] represents entities and relations in d-dimension vector space, i.e., , and makes embeddings follow the translational principle To tackle this problem of insufficiency of a single space for both entities and relations, TransR [17] then further introduces separated spaces for entities and relations. The authors projected entities () into relation () space by a projection matrix . NTN [18] models entities across multiple dimensions by a bilinear tensor neural layer. The relational interaction between head and tail is captured as a tensor denoted as . Instead of using the Cartesian coordinate system, HAKE [19] captures semantic hierarchies by mapping entities into the polar coordinate system, i.e., entity embeddings and in the modulus and phase part, respectively.

Many other translational models such as TransH [20] also use similar representation space, while semantic matching models use plain vector space (e.g., HolE [21]) and relational projection matrix (e.g., ANALOGY [22]). Principles of these translational and semantic matching models are introduced in Section III-B1 and III-B2, respectively.

2) Complex Vector Space: Instead of using a real-valued space, entities and relations are represented in a complex space, where . Take head entity as an example, h has a real part Re(h) and an imaginary part Im(h), i.e., h = Re(h)+i Im(h). ComplEx [23] firstly introduces complex vector space shown in Fig. 3b which can capture both symmetric and antisymmetric relations. Hermitian dot product is used to do composition for relation, head and the conjugate of tail. Inspired by Euler’s identity , RotatE [24] proposes a rotational model taking relation as a rotation from head entity to tail entity in complex space as denotes the element-wise Hadmard product. QuatE [25] extends the complex-valued space into hypercomplex by a quaternion Q = a + bi + cj + dk with three imaginary components, where the quaternion inner product, i.e., the Hamilton product , is used as compositional operator for head entity and relation. With the introduction of the rotational Hadmard product in complex space, RotatE [24] can also capture inversion and composition patterns as well as symmetry and antisymmetry. QuatE [25] uses Hamilton product to capture latent inter-dependencies within the four-dimensional space of entities and relations and gains a more expressive rotational capability than RotatE.

3) Gaussian Distribution: Inspired by Gaussian word embedding, the density-based embedding model KG2E [26] introduces Gaussian distribution to deal with the (un)certainties of entities and relations. The authors embedded entities and relations into multi-dimensional Gaussian distribution . The mean vector u indicates entities and relations’ position, and the covariance matrix models their (un)certainties. Following the translational principle, the probability distribution of entity transformation is denoted as . Similarly, TransG [27] represents entities with Gaussian distributions, while it draws a mixture of Gaussian distribution for relation embedding, where the m-th component translation vector of relation r is denoted as

4) Manifold and Group: This section reviews knowledge representation in manifold space, Lie group, and dihedral group. A manifold is a topological space, which could be defined as a set of points with neighborhoods by the set theory. The group is algebraic structures defined in abstract algebra. Previous point-wise modeling is an ill-posed algebraic system where the number of scoring equations is far more than the number of entities and relations. Moreover, embeddings are restricted in an overstrict geometric form even in some methods with subspace projection. To tackle these issues, ManifoldE [28] extends point-wise embedding into manifold-based embedding. The authors introduced two settings of manifold-based embedding, i.e., Sphere and Hyperplane. An example of a sphere is shown in Fig. 3d. For the sphere setting, Reproducing Kernel Hilbert Space is used to represent the manifold function. Another “hyperplane” setting is introduced to enhance the model with intersected embeddings. ManifoldE [28] relaxes the real-valued point-wise space into manifold space with a more expressive representation from the geometric perspective. When the manifold function and relation-specific manifold parameter are set to zero, the manifold collapses into a point.

Hyperbolic space, a multidimensional Riemannian manifold with a constant negative curvature , is drawing attention for its capacity of capturing hierarchical information. MuRP [29] represents the multi-relational knowledge graph in Poincar´e ball of hyperbolic space . While it fails to capture logical patterns and suffers from constant curvature. Chami et al. [30] leverages expressive hyperbolic isometries and learns a relation-specific absolute curvature in the hyperbolic space.

TorusE [15] solves the regularization problem of TransE via embedding in an n-dimensional torus space which is a compact Lie group. With the projection from vector space into torus space defined as , entities and relations are denoted as . Similar to TransE, it also learns embeddings following the relational translation in torus space, i.e., . Recently, DihEdral [31] proposes a dihedral symmetry group preserving a 2-dimensional polygon. It utilizes a finite non-Abelian group to preserve the relational properties of symmetry/skew-symmetry, inversion, and composition effectively with the rotation and reflection properties in the dihedral group.

Fig. 3: An illustration of knowledge representation in different spaces.

B. Scoring Function

The scoring function is used to measure the plausibility of facts, also referred to as the energy function in the energy-based learning framework. Energy-based learning aims to learn the energy function (parameterized by taking x as input) and to make sure positive samples have higher scores than negative samples. In this paper, the term of the scoring function is adopted for unification. There are two typical types of scoring functions, i.e., distance-based (Fig. 4a) and similarity-based (Fig. 4b) functions, to measure the plausibility of a fact. Distance-based scoring function measures the plausibility of facts by calculating the distance between entities, where addictive translation with relations as is widely used. Semantic similarity based scoring measures the plausibility of facts by semantic matching. It usually adopts a multiplicative formulation, i.e., , to transform head entity near the tail in the representation space.

Fig. 4: Illustrations of distance-based and similarity matching based scoring functions taking TransE [16] and DistMult [32] as examples.

1) Distance-based Scoring Function: An intuitive distance-based approach is to calculate the Euclidean distance between the relational projection of entities. Structural Embedding (SE) [8] uses two projection matrices and distance to learn structural embedding as

A more intensively used principle is the translation-based scoring function that aims to learn embeddings by representing relations as translations from head to tail entities. Bordes et al. [16] proposed TransE by assuming that the added embedding of h+r should be close to the embedding of t with the scoring function is defined under constraints as

Since that, many variants and extensions of TransE have been proposed. For example, TransH [20] projects entities and relations into a hyperplane, TransR [17] introduces separate projection spaces for entities and relations, and TransD [33] constructs dynamic mapping matrices and by the projection vectors replacing Euclidean distance, TransA [34] uses Mahalanobis distance to enable more adaptive metric learning. Previous methods used additive score functions, TransF [35] relaxes the strict translation and uses dot product as To balance the constraints on head and tail, a flexible translation scoring function is further proposed.

Recently, ITransF [36] enables hidden concepts discovery and statistical strength transferring by learning associations between relations and concepts via sparse attention vectors, with scoring function defined as

where is stacked concept projection matrices of entities and relations and are attention vectors calculated by sparse softmax, TransAt [37] integrates relation attention mechanism with translational embedding, and TransMS [38] transmits multi-directional semantics with nonlinear functions and linear bias vectors, with the scoring function as

KG2E [26] in Gaussian space and ManifoldE [28] with manifold also use the translational distance-based scoring function. KG2E uses two scoring methods, i.e, asymmetric KL-divergence and symmetric expected likelihood. While the scoring function of ManifoldE is defined as

where M is the manifold function, and is a relation-specific manifold parameter.

2) Semantic Matching: Another direction is to calculate the semantic similarity. SME [39] proposes to semantically match separate combinations of entity-relation pairs of (h, r) and (r, t). Its scoring function is defined with two versions of matching blocks - linear and bilinear block, i.e.,

The linear matching block is defined as , and the bilinear form is . By restricting relation matrix to be diagonal for multi-relational representation learning, DistMult [32] proposes a simplified bilinear formulation defined as

To capture productive interactions in relational data and compute efficiently, HolE [21] introduces a circular correlation of embedding, which can be interpreted as a compressed tensor product, to learn compositional representations. By defining a perturbed holographic compositional operator as is a fixed vector, the expanded holographic embedding model HolEx [40] interpolates the HolE and full tensor product method. It can be viewed as linear concatenation of perturbed HolE. Focusing on multi-relational inference, ANALOGY [22] models analogical structures of relational data. It’s scoring function is defined as

with relation matrix constrained to be normal matrices in linear mapping, i.e., for analogical inference. HolE with Fourier transformed in the frequency domain can be viewed as a special case of ComplEx [41], which connects holographic and complex embeddings. The analogical embedding framework [22] can recover or equivalently obtain several models such as DistMult, ComplEx and HolE by restricting the embedding dimension and scoring function. Crossover interactions are introduced by CrossE [42] with an interaction matrix to simulate the bi-directional interaction between entity and relation. The relation specific interaction is obtained by looking up interaction matrix as . By combining the interactive representations and matching with tail embedding, the scoring function is defined as

The semantic matching principle can be encoded by neural networks further discussed in Sec. III-C.

The two methods mentioned above in Sec. III-A4 with group representation also follow the semantic matching principle. The scoring function of TorusE [15] is defined as:

By modeling 2L relations as group elements, the scoring function of DihEdral [31] is defined as the summation of components:

where the relation matrix R is defined in block diagonal form for , and entities are embedded in real-valued space for

C. Encoding Models

This section introduces models that encode the interactions of entities and relations through specific model architectures, including linear/bilinear models, factorization models, and neural networks. Linear models formulate relations as a linear/bilinear mapping by projecting head entities into a representation space close to tail entities. Factorization aims to decompose relational data into low-rank matrices for representation learning. Neural networks encode relational data with non-linear neural activation and more complex network structures by matching semantic similarity of entities and relations. Several neural models are illustrated in Fig. 5.

1) Linear/Bilinear Models: Linear/bilinear models encode interactions of entities and relations by applying linear operation as:

or bilinear transformation operations as Eq. 8. Canonical methods with linear/bilinear encoding include SE [8], SME [39], DistMult [32], ComplEx [23], and ANALOGY [22]. For TransE [16] with L2 regularization, the scoring function can be expanded to the form with only linear transformation with one-dimensional vectors, i.e.,

Wang et al. [47] studied various bilinear models and evaluated their expressiveness and connections by introducing the concepts of universality and consistency. The authors further showed that the ensembles of multiple linear models can improve the prediction performance through experiments. Recently, to solve the independence embedding issue of entity vectors in canonical Polyadia decomposition, SimplE [48] introduces the inverse of relations and calculates the average canonical Polyadia score of

where is the embedding of inversion relation. Embedding models in the bilinear family such as RESCAL, DistMult, HolE and ComplEx can be transformed from one into another with certain constraints [47]. More bilinear models are proposed from a factorization perspective discussed in the next section.

2) Factorization Models: Factorization methods formulated KRL models as three-way tensor X decomposition. A general principle of tensor factorization can be denoted as , with the composition function following the semantic matching pattern. Nickel et al. [49] proposed the three-way rank-r factorization RESCAL over each relational slice of knowledge graph tensor. For k-th relation of m relations, the k-th slice of X is factorized as

The authors further extended it to handle attributes of entities efficiently [50]. Jenatton et al. [51] then proposed a bilinear

Fig. 5: Illustrations of neural encoding models. (a) CNN [43] input triples into dense layer and convolution operation to learn semantic representation, (b) GCN [44] acts as encoder of knowledge graphs to produce entity and relation embeddings. (c) RSN [45] encodes entity-relation sequences and skips relations discriminatively. (d) Transformer-based CoKE [46] encodes triples as sequences with an entity replaced by [MASK].

structured latent factor model (LFM), which extends RESCAL by decomposing . By introducing three- way Tucker tensor decomposition, TuckER [52] learns to embed by outputting a core tensor and embedding vectors of entities and relations. LowFER [53] proposes a multi-modal factorized bilinear pooling mechanism to better fuse entities and relations. It generalizes the TuckER model and is computationally efficient with low-rank approximation.

3) Neural Networks: Neural networks for encoding semantic matching have yielded remarkable predictive performance in recent studies. Encoding models with linear/bilinear blocks can also be modeled using neural networks, for example, SME [39]. Representative neural models include multi-layer perceptron (MLP) [3], neural tensor network (NTN) [18], and neural association model (NAM) [54]. They generally feed entities or relations or both into deep neural networks and compute a semantic matching score. MLP [3] encodes entities and relations together into a fully-connected layer, and uses a second layer with sigmoid activation for scoring a triple as

where is the weight matrix and [h, r, t] is a concatenation of three vectors. NTN [18] takes entity embeddings as input associated with a relational tensor and outputs predictive score in as

where is bias for relation and are relation-specific weight matrices. It can be regarded as a combination of MLPs and bilinear models. NAM [54] associates the hidden encoding with the embedding of tail entity, and proposes the relational-modulated neural network (RMNN).

4) Convolutional Neural Networks: CNNs are utilized for learning deep expressive features. ConvE [55] uses 2D convolution over embeddings and multiple layers of nonlinear features to model the interactions between entities and relations by reshaping head entity and relation into 2D matrix, i.e., and for . Its scoring function is defined as

where is the convolutional filters and vec is the vectorization operation reshaping a tensor into a vector. ConvE can express semantic information by non-linear feature learning through multiple layers. ConvKB [43] adopts CNNs for encoding the concatenation of entities and relations without reshaping (Fig. 5a). Its scoring function is defined as

The concatenation of a set for feature maps generated by convolution increases the learning ability of latent features. Compared with ConvE, which captures the local relationships, ConvKB keeps the transitional characteristic and shows better experimental performance. HypER [56] utilizes hypernetwork H for 1D relation-specific convolutional filter generation to achieve multi-task knowledge sharing, and meanwhile simplifies 2D ConvE. It can also be interpreted as a tensor factorization model when taking hypernetwork and weight matrix as tensors.

5) Recurrent Neural Networks: The MLP- and CNN-based models, as mentioned above, learn triplet-level representations. In comparison, the recurrent networks can capture long-term relational dependencies in knowledge graphs. Gardner et al. [57] and Neelakantan et al. [58] propose RNN-based model over the relation path to learn vector representation without and with entity information, respectively. RSN [45] (Fig. 5c) designs a recurrent skip mechanism to enhance semantic representation learning by distinguishing relations and entities. The relational path as with entities and relations in an alternating order is generated by random walk, and it is further used to calculate recurrent hidden state . The skipping operation is conducted as

where are weight matrices.

6) Transformers: Transformer-based models have boosted contextualized text representation learning. To utilize contextual information in knowledge graphs, CoKE [46] employs transformers to encode edges and path sequences. Similarly, KG-BERT [59] borrows the idea form language model pre-training and takes Bidirectional Encoder Representations from Transformer (BERT) model as an encoder for entities and relations.

7) Graph Neural Networks (GNNs): GNNs are introduced for learning connectivity structure under an encoder-decoder framework. R-GCN [60] proposes relation-specific transformation to model the directed nature of knowledge graphs. Its forward propagation is defined as

where is the hidden state of the i-th entity in l-th layer, is a neighbor set of i-th entity within relation are the learnable parameter matrices, and is normalization such as . Here, the GCN [61] acts as a graph encoder. To enable specific tasks, an encoder model still needs to be developed and integrated into the RGCN framework. R-GCN takes the neighborhood of each entity equally. SACN [44] introduces weighted GCN (Fig. 5b), which defines the strength of two adjacent nodes with the same relation type, to capture the structural information in knowledge graphs by utilizing node structure, node attributes, and relation types. The decoder module called Conv-TransE adopts ConvE model as semantic matching metric and preserves the translational property. By aligning the convolutional outputs of entity and relation embeddings with C kernels to be , its scoring function is defined as

Nathani et al. [62] introduced graph attention networks with multi-head attention as encoder to capture multi-hop neighborhood features by inputing the concatenation of entity and relation embeddings. CompGCN [63] proposes entity-relation composition operations over each edge in the neighborhood of a central node and generalizes previous GCN-based models.

D. Embedding with Auxiliary Information

Multi-modal embedding incorporates external information such as text descriptions, type constraints, relational paths, and visual information, with a knowledge graph itself to facilitate more effective knowledge representation.

1) Textual Description: Entities in knowledge graphs have textual descriptions denoted as , providing supplementary semantic information. The challenge of KRL with textual description is to embed both structured knowledge and unstructured textual information in the same space. Wang et al. [64] proposed two alignment models for aligning entity space and word space by introducing entity names and Wikipedia anchors. DKRL [65] extends TransE [16] to learn representation directly from entity descriptions by a convolutional encoder. SSP [66] captures the strong correlations between triples and textual descriptions by projecting them in a semantic subspace. The joint loss function is widely applied when incorporating KGE with textual description. Wang et al. [64] used a three-component loss of knowledge model , text model and alignment model . SSP [66] uses a two-component objective function of embedding-specific loss topic-specific loss within textual description, traded off by a parameter

2) Type Information: Entities are represented with hierarchical classes or types, and consequently, relations with semantic types. SSE [67] incorporates semantic categories of entities to embed entities belonging to the same category smoothly in semantic space. TKRL [68] proposes type encoder model for projection matrix of entities to capture type hierarchy. Noticing that some relations indicate attributes of entities, KREAR [69] categorizes relation types into attributes and relations and modeled the correlations between entity descriptions. Zhang et al. [70] extended existing embedding methods with hierarchical relation structure of relation clusters, relations, and sub-relations.

3) Visual Information: Visual information (e.g., entity images) can be utilized to enrich KRL. Image-embodied IKRL [71], containing cross-modal structure-based and image-based representation, encodes images to entity space and follows the translation principle. The cross-modal representations make sure that structure-based and image-based representations are in the same representation space.

There remain many kinds of auxiliary information for KRL, such as attributes, relation paths, and logical rules. Wang et al. [5] gave a detailed review of using additional information. This paper discusses relation path and logical rules under the umbrella of KGC in Sec. IV-A2 and IV-A4, respectively.

4) Uncertain Information: Knowledge graphs such as ProBase [72], NELL [73], and ConceptNet [74] contain uncertain information with a confidence score assigned to every relational fact. In contrast to classic deterministic knowledge graph embedding, uncertain embedding models aim to capture uncertainty representing the likelihood of relational facts. Chen et al. [75] proposed an uncertain knowledge graph embedding model to simultaneously preserve structural and uncertainty information, where probabilistic soft logic is applied to infer the confidence score. Probability calibration takes a post-processing process to adjust probability scores, making predictions probabilistic sense. Tabacof and Costabello [76] firstly studied probability calibration for knowledge graph embedding under the closed-world assumption, revealing that well-calibrated models can lead to improved accuracy. Safavi et al. [77] further explored probability calibration under the more challenging open-world assumption.

E. Summary

Knowledge representation learning is vital in the research community of knowledge graph. This section reviews four folds of KRL with several modern methods summarized in Table II and more in Appendix C. Overall, developing a novel KRL model is to answer the following four questions: 1) which representation space to choose; 2) how to measure the plausibility of triplets in a specific space; 3) which encoding model to use for modeling relational interactions; 4) whether to utilize auxiliary information. The most popularly used representation space is Euclidean point-based space by embedding entities in vector space and modeling interactions via vector, matrix, or tensor. Other representation spaces, including complex vector space, Gaussian distribution, and manifold space and group, are also studied. Manifold space has an advantage over point-wise Euclidean space by relaxing the point-wise embedding. Gaussian embeddings can express the uncertainties of entities and relations, and multiple relation semantics. Embedding in complex vector space can effectively model different relational connectivity patterns, especially the symmetry/antisymmetry pattern. The representation space plays an essential role in encoding the semantic information of entities and capturing the relational properties. When developing a representation learning model, appropriate representation space should be selected and designed carefully to match the nature of encoding methods and balance the expressiveness and computational complexity. The scoring function with a distance-based metric utilizes the translation principle, while the semantic matching scoring function employs compositional operators. Encoding models, especially neural networks, play a critical role in modeling interactions of entities and relations. The bilinear models also have drawn much attention, and some tensor factorization can also be regarded as this family. Other methods incorporate auxiliary information of textual description, relation/entity types, entity images, and confidence scores.

TABLE II: A summary of recent KRL models. See more in Appendix C.

IV. KNOWLEDGE ACQUISITION

Knowledge acquisition aims to construct knowledge graphs from unstructured text and other structured or semi-structured sources, complete an existing knowledge graph, and discover and recognize entities and relations. Well-constructed and large-scale knowledge graphs can be useful for many downstream applications and empower knowledge-aware models with commonsense reasoning, thereby paving the way for AI. The main tasks of knowledge acquisition include relation extraction, KGC, and other entity-oriented acquisition tasks such as entity recognition and entity alignment. Most methods formulate KGC and relation extraction separately. These two tasks, however, can also be integrated into a unified framework. Han et al. [78] proposed a joint learning framework with mutual attention for data fusion between knowledge graphs and text, which solves KGC and relation extraction from text. There are also other tasks related to knowledge acquisition, such as triple classification [79], relation classification [80], and open knowledge enrichment [81]. In this section, three categories of knowledge acquisition techniques, namely KGC, entity discovery, and relation extraction are reviewed thoroughly.

A. Knowledge Graph Completion

Because of the nature of incompleteness of knowledge graphs, KGC is developed to add new triples to a knowledge graph. Typical subtasks include link prediction, entity prediction, and relation prediction.

Preliminary research on KGC focused on learning low-dimensional embedding for triple prediction. In this survey, we term those methods as embedding-based methods. Most of them, however, failed to capture multi-step relationships. Thus, recent work turns to explore multi-step relation paths and incorporate logical rules, termed as relation path inference and rule-based reasoning, respectively. Triple classification as an associated task of KGC, which evaluates the correctness of a factual triple, is additionally reviewed in this section.

1) Embedding-based Models: Taking entity prediction as an example, embedding-based ranking methods, as shown in Fig. 6a, firstly learn embedding vectors based on existing triples. By replacing the tail entity or head entity with each entity those methods calculate scores of all the candidate entities and rank the top k entities. Aforementioned KRL methods (e.g., TransE [16], TransH [20], TransR [17], HolE [21], and RGCN [60]) and joint learning methods like DKRL [65] with textual information can been used for KGC.

Unlike representing inputs and candidates in the unified embedding space, ProjE [82] proposes a combined embedding by space projection of the known parts of input triples, i.e., (h, r, ?) or (?, r, t), and the candidate entities with the candidate-entity matrix , where s is the number of candidate entities. The embedding projection function including a neural combination layer and a output projection layer is defined as , where is the combination operator of input entity-relation pair. Previous embedding methods do not differentiate entities and relation prediction, and ProjE does not support relation prediction. Based on these observations, SENN [83] distinguishes three KGC subtasks explicitly by introducing a unified neural shared embedding with adaptively weighted general loss function to learn different latent features. Existing methods rely heavily on existing connections in knowledge graphs and fail to capture the evolution of factual knowledge or entities with a few connections. ConMask [84] proposes relationship-dependent content masking over the entity description to select relevant snippets of given relations, and CNN-based target fusion to complete the knowledge graph with unseen entities. It can only make a prediction when query relations and entities are explicitly expressed in the text description. Previous methods are discriminative models that rely on preprepared entity pairs or text corpus. Focusing on the medical domain, REMEDY [85] proposes a generative model, called conditional relationship variational autoencoder, for entity pair discovery from latent space.

Fig. 6: Illustrations of embedding-based ranking and relation path reasoning.

2) Relation Path Reasoning: Embedding learning of entities and relations has gained remarkable performance in some benchmarks, but it fails to model complex relation paths. Relation path reasoning turns to leverage path information over the graph structure. Random walk inference has been widely investigated; for example, the Path-Ranking Algorithm (PRA) [86] chooses a relational path under a combination of path constraints and conducts maximum-likelihood classifica-tion. To improve path search, Gardner et al. [57] introduced vector space similarity heuristics in random walk by incorporating textual content, which also relieves the feature sparsity issue in PRA. Neural multi-hop relational path modeling is also studied. Neelakantan et al. [58] developed an RNN model to compose the implications of relational paths by applying compositionality recursively (in Fig. 6b). Chain-of-Reasoning