b

DiscoverSearch
About
My stuff
Finding Salient Context based on Semantic Matching for Relevance Ranking
2019·arXiv
Abstract
Abstract

In this paper, we propose a salient-context based semantic matching method to improve relevance ranking in information retrieval. We first propose a new notion of salient context and then define how to measure it. Then we show how the most salient context can be located with a sliding window technique. Finally, we use the semantic similarity between a query term and the most salient context terms in a corpus of documents to rank those documents. Experiments on various collections from TREC show the effectiveness of our model compared to the state-of-the-art methods.

Index Terms—keywords matching, contextual salience, semantic matching.

As the core of understanding multimedia, semantic matching plays the role of bridge to connect different forms of content, such as text, image, video and audio, etc. Before semantic matching came into existence, the conventional keywords matching methods have been dominant for a long time, says, in Information Retrieval (IR) [1]. They fail, however, to capture the query term’s fine-grained contextual information. The missing contextual information results in the termmismatching problem due to the word ambiguity issue. To deal with this problem, varieties of neural IR models, which are often called semantic matching, have been proposed to incorporate context information by embedded representation [2]. Some methods consider the whole document as a global context and embed it into one vector. The query term is embedded into a similar vector, and these vectors are used to calculate the relevance between term and document [3]. Other methods consider a certain scope around the keyword as the local context. Only this local context is encoded into embedding vectors and used to compute the relevance [4]. Both parties have made important efforts to do semantic matching, but we believe that the retrieved documents can fit the query terms even better. The global context methods fail to capture the individual interactions between the query and the document terms since the whole document is encoded into one vector. The latter group does not have this problem, but it still leaves the mismatching problem unsolved. To remedy the shortcomings of the previous methods, in this paper, we propose a salient-context-based semantic matching

image

Fig. 1: Term relevance distribution. The vertical axis denotes the query term, and the horizontal axis denotes the term position index. In each box, the upper part shows the terms related to the query term “robotic”, and the lower part shows that to “technology”. The thickness of the line indicates the relevance score of the term, and the thicker the higher. The first two documents are rated relevant by human judges whereas the third one is irrelevant.

model. With this model, we improve the relevance ranking in IR. Fig. 1 explains the concept of salient context with an example. We have the query terms “robot technology” and a corpus of three documents. The three boxes in the figure correspond to those three documents. The vertical lines indicate positions in the documents which are salient with respect to the two query terms, and thus give locations of the salient context.

We can observe that the highly relevant terms are clustered in the first two boxes, while they are scattered in the third. As the two corresponding documents are labeled related to “robot technology” by a human, the clustering indicates that the closer together query-related terms are located, the more relevant the document is to the query. This behavior leads us to define the locations of these clusters as the salient context. Our goal is to find the most salient context and embed it into vectors that represent the document. In this way, we eliminate the risk of single-keyword mismatching, thus addressing the shortcomings of the models mentioned earlier.

To locate the most salient context, we define a measurement of the contextual salience. It is based on the semantic similarity between the query and the salient context and is designed such that it is not influenced by low query-related terms or dominated by a single term. In addition, we use the BM25 relevance score as a representation of the global context in the final relevance function.

This paper has threefold contribution. Firstly, we analyze and demonstrate the aggregation phenomenon of highly query-related terms in relevant documents, and also define our new concept of salient context. Secondly, we propose a way to measure contextual-salience to locate the most salient context dynamically. Thirdly, rather than using the context surrounding a keyword, we propose to use the most salient context as a representation of a document, thereby eliminating the mismatching problem.

A. Term-level Semantic Matching

image

Fig. 2: Analysis of term importance for estimating the relevance of a document to the query “robot technology” by semantic relevance matching.

Generally, it is important that each keyword is exactly matched. It is often particularly important when the keywords are new or rare. However, traditional keyword matching might lose to capture the fine-grained contextual information and semantically related terms. As illustrated by the example in Fig. 2, semantic relevance matching is able to highlight the terms with a high semantic relevance to the query “robotic technology” with dark green being most relevant. We can see that the semantic matching gives emphasis to semantic related terms such as “robot”, “industrial” and “application”.

Distributed representations of text, i.e. word embeddings, encapsulate useful contextual information and effectively represent the semantic information of a word. Models that use pre-trained word embeddings [5][7] have shown better performance than those which use term co-occurrence counting between query and documents. Inspired by this, we utilize the pre-trained word embeddings as the basis for our semantic representation to model the query-document matching interaction. From the embedded vectors, We apply cosine similarity to the capture of the word-level semantic matching as given by:

image

where  wiand  wjrepresent the vectors for the i-th query term and the j-th document term, respectively.

B. Contextual Salience

According to the query-centric assumption proposed in [8], the local context surrounding the location of a found query term in a document is relevant when deciding if the document is a match to the query. In Fig. 2, relevant terms cluster around the first two sentences, and in Fig. 1 we can see that these clusters are present at both the beginning, middle, and end of a document. Thus, the position of the salient context changes from document to document and therefore our salience-measure must be able to handle that shift. We use a sliding window which moves over the document from the start to the end. For a given position of the window, terms which are highly related to the query are found and thus that part of the document will stand out. The window context for the i-th query term is described as:

image

where sijis the cosine distance between the i-th query term and the j-th document term in the window, Q is the set of query terms, T is the set of document terms in the window, and  Sirepresents the cosine relevance between the i-th query term and the document terms which falls inside the window.

This approach is different from the deep learning models. As stated above, the deep learning models combine all terms in a document into one single document representation. Our representation only takes the relevant parts of the document and embeds those into a document representation. Often, only a few terms with a high windows relevance score contribute to the final document relevance. In order to filter away text noise and counteract semantic drift, we choose to only take the window contextual salience of the top n semantic relevance matches into account. Here is the processing for getting the n-maximums of the set  Si.

image

The set of n-maximum members of the set is  Sithen

image

where K=log(L) + 1, decided by window width L.  αis the influence factor to balance semantic interactions’ weighting in the window context.

Queries used in IR are short and without complex grammatical structures. Consequently, we need to take the term importance into account. The compositional relation between the query terms is usually the simple “and” relation when searching. Take the given query “arrested development” for example, a relevant document should refer to “arrested” and “development”, where the term “arrested” is more important than “development”. There have been many previous studies on retrieval models showing the importance of term discrimination [9]. In the proposed model, we introduce an aggregation weight for each query term which controls how much the relevance score on that query contributes to the final relevance score:

image

where  videnotes the weight vector of the i-th query term vector  wi, and ql is the query length. In our model, we set the weight vectors equal to their respective query term vector, i.e.  vi = wi. Putting this into Equ. 7, we get:

image

Here,  wTi wisquares each element of  wibefore summing them together. As  wi ∈ [−1, 1]d, with d being the dimension of the weight vector, the resulting scalar will be positive and equal to the square of the magnitude of  wi. Equ. 8 is the normalized exponential, or softmax, function, with  gi ∈ [0, 1]. It returns a scalar which is proportional to the normalized magnitude of the term vector, but with an emphasis on the vectors with the largest magnitudes. Thus, it regularizes the relevance score.

C. Relevance Aggregation

Different from semantic-matching-based distributional word embedding, exact keywords matching avoids the risk of rare or new words in query. Hence, we linearly combining the exact keywords matching and use it as a compensation for semantic matching. Traditional IR models ,such as BM25 [10], is a classical weighting function employed by the Okapi system. As shown by previous TREC experimentation, BM25 usually provides very effective retrieval performance on the TREC collections. In BM25, the relevance score is based on the within-document term frequency and query term frequency. We can utilize BM25 to model relevance matching in document-level with query terms. In our paper, we apply BM25 to extend model on document-level matching and define the way to aggregate exact keywords matching interactions by integrating into BM25 linearly via a parameter  β. We also take into consideration of the co-occurrence of query terms within document in weighting function for the contextual salience in the document. The two formulas are defined as below:

image

image

+ β ·BM25, C ∈ R,(10) where  βis the influence factor to balance BM25, decides the effects of BM25 in relevance scoring. When  βis 0, only contextual salience contributes the relevance scoring, β ∈(0,1) the contextual salience and BM25 contribute the relevance scoring together. co is the co-occurrence of query terms within document, and the constant C is a constant to balance parameter co.

We evaluate the proposed approach on five standard TREC collections , which are different in their sizes, contents, and topics. The TREC tasks and topic numbers associated with each collection are summarized in Table I. For all the test

TABLE I: Overview of the TREC collections used

image

collections used in our experiments, we apply pre-trained GloVe word vectors1 which are trained from a 6 billion token collection (Wikipedia 2014 plus Gigawords 5), reliable term representations can be better acquired from large scale unlabeled text collections rather than from the limited ground truth data for IR task. We use the TREC retrieval evaluation script focusing on MAP, RP (recall precision) and P@5, P@20, NDCG@5, and NDCG@20 in our experiments. We provide the source code3 for the model as well as trained word vectors.

image

Table II shows the performance comparisons between the baseline model BM25 and new model CSSM on five collections over MAP, RP and P@5, P@20, NDCG@5 and NDCG@20. The percentage of how much our model outperforms BM25 is also listed. With regards to MAP and RP it indicates that, in general our model performs better than the baseline model BM25 on all five collections, especially on WT2G, Robust04 and Blog06 collections. It demonstrates the importance of semantic relevance matching and emphasizes contextual salience is helpful to locate the most relevant local context through highly semantic relevance matching. Compare the results of CSSMlf(linear function) and CSSMcw(co weighting function), three datasets show improvements, the co-occurrence information of query terms in document can offer positive connection with contextual salience in the model. The experiment results prove that our model can encode

TABLE II: Comparisons of CSSM and BM25, with MAP, RP and P@5, P@20, NDCG@5, and NDCG@20 over five TREC collections

image

the critical contextual semantic information in our relevance ranking function for the IR.

TABLE III: Comparisons of Deep Learning methods on Ro- bust04 collection

image

Table III shows the performance on Robust04 collection with comparison of deep learning based methods recently proposed in [5][7]. Our performance is better than DRMM, PACRR, DRMM-PACRR, slightly better than ABEL-DRMM and ABEL-DRMM+MV with less extra model training data. Compare with POSIT-DRMM and POSIT-DRMM+MV which encode multiple views (MV) of terms (context-sensitive term encodings, pre-trained term embeddings, and one-hot term encodings), our model utilizes pre-trained term embeddings alone. We mainly take into account of two reasons. First, according to our scoring function, directly applying multiple views of terms is hard to balance the input dimensions differences, one-hot vector is high dimensional and sparse term embedding. Second, it needs sacrifice efficient to take training data to explicitly tune context-sensitive term encodings in model. In addition, without model parameters tuning, our model retrieval time costing is less than all supervised deep learning based models in the table, works as efficiently as BM25.

image

In this paper, we propose a semantic-matching based method to locate the most salient context for understanding a piece of multimedia content. We propose to prioritize the action of locating the semantic salient context in the relevance calculation. On the basis of the prioritization, we define a measurement of contextual salience to quantify the relevance of a document towards a query. Furthermore, we apply the proposed method in IR, and it shows promising improvements over the strong BM25 baseline and several neural relevance matching models. Finally, extensive comparisons between several neural relevance matching models and our approach suggest that explicitly modelling the salient query-related context in document is helpful to improve the effectiveness of relevance ranking for IR. Our idea of understanding content by locating the most salient context provides a new perspective in multimedia content analysis, and the proposed semantic-matching based method can be applied to other forms of multimedia content. The proposed method provides an effi-cient and explainable relevance ranking solution which can be generalized to other forms of multimedia content as well.

image

This work was supported by Beijing Natural Science Foundation (4174098), National Natural Science Foundation of China (61702047), National Natural Science Foundation of China (61703234) and the Fundamental Research Funds for the Central Universities (2017RC02).

[1] M. Christopher, R. Prabhakar and S. Hinrich, “Introduction to infor- mation retrieval,” Natural Language Engineering, vol. 16, no. 1, pp. 100–103, 2010.

[2] O. K. Dilek and Zhang et al., “Neural Information Retrieval: At the End of the Early Years,” Information Retrieval Journal, Norwell, vol. 21, pp. 111–182, 2018.

[3] Y. Shen, X. He, J. Gao, D. Li and M. Gr´egoire, “Learning semantic representations using convolutional neural networks for web search,” In International Conference on World Wide Web, Seoul, pp. 373–374,2014.

[4] H. Kai, Y. Andrew, B. Klaus and de Melo, Gerard, “Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval,” In Proceedings of Web Search and Data Mining, Los Angeles, 2018.

[5] J. Guo, Y. Fan, Q. Ai and C. W Bruce, “A deep relevance matching model for ad-hoc retrieval,” In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indiana, pp. 55–64, 2016.

[6] H. Kai, Y. Andrew, B. Klaus and de Melo, Gerard, “Position-Aware Representations for Relevance Matching in Neural Information Retrieval,” In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, pp. 799–800, 2017.

[7] M. Ryan, B. Georgios-Ioannis and A. Ion, “Deep relevance ranking using enhanced document-query interactions,” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, pp. 1849–1860, 2018.

[8] H. Wu, L., Robert W P, K.F. Wong and K. K L, “A retrospective study of a hybrid document-context based retrieval model,” Information processing & management, vol. 43, no. 5, pp. 1308–1331, 2007.

[9] H. Fang, T. Tao and C. Zhai, “Diagnostic evaluation of information retrieval models,” ACM Transactions on Information Systems, vol. 29, no. 2, pp. 7:1-7:42, 2011.

[10] S. Robertson and H. Zaragoza, “The probabilistic relevance framework: BM25 and beyond,” In Foundations and Trends® in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009.


Designed for Accessibility and to further Open Science