Abstract

Abstract

The categorical compositional distributional model of meaning gives the composition of words into phrases and sentences pride of place. However, it has so far lacked a model of logical negation. This paper gives some steps towards providing this operator, modelling it as a version of projection onto the subspace orthogonal to a word. We give a small demonstration of the operators performance in a sentence entailment task.

1 Introduction

Compositional models of meaning aim to represent the meaning of phrases and sentences by combining representations of the words in the sentence according to some rule. Compositional distributional models, such as described in Baroni and Zamparelli [2010], Coecke et al. [2010], Paperno et al. [2014] combine the compositional approach with vector-based models of word meaning. In these models, nouns are represented as vectors, and function words, such as verbs and adjectives, are modelled as linear maps. In this paper, we use the categorical compositional distributional (DisCoCat) model introduced in Coecke et al. [2010]. This model formalises the compositional approach to language using category theory, setting up a functorial mapping between the grammar of the language on the one hand, and the structures used to represent lexical meaning on the other. In modelling the meaning of words and sentences, a distinction can be made between words with lexical content, and words that can arguably be modelled as an operation on the structure of the sentence. For example, in Sadrzadeh et al. [2013], relative pronouns are modelled as routing information around a sentence using the structure of a Frobenius algebra. In Kartsaklis [2016], conjunctions are modelled using Frobenius algebras.

In the current paper we model negation as an operation on words. In Coecke et al. [2010], negation is modelled as a linear map on a two-dimensional sentence space which sends each basis vector to the subspace orthogonal to it. This idea of modelling negation as projection to the orthogonal subspace was used in Widdows and Peters [2003], but at the vector level is somewhat unsatisfactory since a word and its negation are then of two different kinds. Furthermore within DisCoCat, words should be modelled as linear maps, which projection onto an orthogonal subspace doesn’t satisfy.

Within the categorical compositional framework, we can be flexible about how to represent word meanings. In Coecke et al. [2010] the category FVect of vector spaces and linear maps was used, meaning that nouns and sentences are modelled as vectors, and functional words such as verbs and adjectives are modelled as linear maps. In Bolt et al. [2019] the category ConvexRel was used, enabling the representation of nouns and sentences as convex sets and function words as convex relations. In this paper we will use the category CPM(FVect) which models nouns and sentences as positive operators, and function words as completely positive maps. This approach to meaning was developed in Balkır et al. [2016], Bankova et al. [2019] and implemented in Lewis [2019a,b]. We model negation as an operation related to projection onto the orthogonal subspace. We discuss how negation interacts with composition, and we provide a small corpus-based implementation to illustrate the ideas.

1.1 Related work

As mentioned, the idea of negation as projection onto the orthogonal subspace has been implemented in Widdows and Peters [2003] and discussed in Coecke et al. [2010]. However, it has also been argued that negation should not be viewed in this way: rather, that the negation of a word should be fairly similar to the original. for example, Hermann et al. [2013] argue that ‘not red’ is still a colour, and provide a model where the vector is divided into domains and only part of the vector is inverted. Similarly, Rimell et al. [2017] view negation as antonymy, and provide a model of negation in which an encoder is trained to produce the antonym of a given adjective. Continuing the discussion of the distinction between conversational negation and logical negation, Kruszewski et al. [2016] provide an in depth analysis of the ways in which people use negation in conversation.

The kind of negation that we discuss in this paper is more akin to logical negation, and this will be exemplified by its interaction with entailment between sentences.

2 Background

2.1 Categorical compositional approaches to meaning

The categorical compositional model of meaning uses the framework of category theory to set up a mapping between the grammar of a language and the structures used to represent the manings of individual words. A formalization of grammar is chosen, and represented as a category, called the grammar category. A choice is made about the type of meaning representation, which again is formalized as a category, called the meaning category. The meaning category and the grammar category are chosen to have the same abstract structure. Type reductions in the grammar category are then functorially mapped to operations in the semantics category. In this paper, the grammar category and the meaning category are both compact closed. For details of what this means within the context of linguistics, see Coecke et al. [2010] or Preller and Sadrzadeh [2011]. A gentle presentation is also given in Bolt et al. [2019].

Pregroup grammar In this paper, we will use pregroup grammar, although the formalism is flexible about what can be used, and other choices are given in, for example, Coecke et al. [2013], Maillard et al. [2014], Muskens and Sadrzadeh [2016]. A pregroup is a partially ordered monoid () where each has a left and a right adjoint (

A pregroup grammar is the pregroup freely generated over a set of chosen types. We consider the set containing n for noun and s for sentence. Complex types are built up by concatenation of types, and we often leave out the dot so that we say that x reduces to y.

A string of types grammatical if it reduces, via the morphisms above, to the sentence type s. For example, typing clowns as as n, the sentence Clowns tell the truth has type and is shown to be grammatical as follows:

The above reduction can be represented graphically as follows:

Meaning categories As a first example we describe how pregroup grammar is mapped to FVect, the category of vector spaces and linear transformations. The noun type n is mapped to a vector space N and the sentence type s to S. The concatenation operation in the grammar is mapped to , i.e., the tensor product of vector spaces. Then the morphisms map to tensor contraction, and and map to identity matrices. Function words like verbs and adjectives are modelled as (multi)linear maps. Intransitive verbs are represented as maps from N to S, or matrices in transitive verbs are represented as maps from two copies of N to S, or tensors in . So, in the example above, Clowns is mapped to a vector in N, as is the truth, and tell is mapped to a tensor in . The vectors and tensors are concatenated using the tensor product, and tensor contraction is applied to map the sentence down into one sentence vector. Compact closed categories have a nice diagrammatic calculus, described in Selinger

[2010], or for a linguistically couched explanation see Coecke et al. [2010]. In this calculus, the composition of the words Clowns, tell, and the truth into the sentence Clowns tell the truth is expressed as follows:

We will use this notation later to describe how to build particular representations of verbs and other function words.

2.2 Modelling words as positive operators

In Piedeleu et al. [2015], Bankova et al. [2019], and Balkır et al. [2016] the DisCoCat model is instantiated with the meaning category CPM(FVect). This has the same objects as FVect, but the morphisms are now completely positive maps. The CPM construction is introduced in Selinger [2007]. Words are now represented as positive operators rather than as vectors, and maps between them are completely positive maps. A positive operator is defined as follows, using bra-ket notation from physics. For a unit vector , the projection operator onto the subspace spanned by is called a pure state. A positive operator is given by sum of pure states. It is an operator A such that:

1. ∀

2. A is self-adjoint

If, in addition, A has trace 1, then A encodes a probabilistic mixture of pure states, and is called a density matrix. Relaxing this condition gives us different choices for normalization.

Completely positive maps are linear maps that preserve positivity of operators and do so for any trivial extension.

We give an informal description of how pregroup grammar maps into the category CPM(FVect). For more details see Piedeleu et al. [2015], Bankova et al. [2019], or Balkır et al. [2016]. Within CPM(FVect), the objects are vector spaces, and morphisms are completely positive maps. The underlying spaces that we represent nouns, sentences, and other words in are now doubled up, meaning that a noun is a positive operator , or a positive semidefinite matrix in . Morphisms are completely positive maps. These are defined in Selinger [2007] as a morphism such that there exists an object C in the underlying category, in our case FVect, and a morphism such that:

Importantly, CPM(FVect) is also compact closed, so that the same sort of functorial mapping can be made from the grammar category to the semantics category. Furthermore, the diagrammatic calculus can also be used in this context.

Positive operators were proposed in Balkır et al. [2016], Bankova et al. [2019] as a means of representing word meanings since they have a natural ordering called the Löwner ordering. This ordering states that for two positive operators A and B,

This ordering can be used to represent hyponymy and lexical entailment. In Balkır et al. [2016], Lewis [2019b] concrete proposals for building positive operators representing words are given.

The space of positive operators and the properties of the Löwner ordering on this space has been examined in D’Hondt and Panangaden [2006], van de Wetering [2016]. When the set of positive operators is restricted to those with maximum eigenvalue less than or equal to 1, the ordering has nice properties. We restrict to this set, and use the notation ). When the set of positive operators is restricted to those with eigenvalues exactly 1, we have the projectors, and the Löwner ordering corresponds to subspace inclusion on projection operators.

The Löwner ordering is crisp: either the relation obtains or it doesn’t. However, when considering natural language, we are also interested in graded notions of hyponymy and entailment. For example, although we may consider dog to be highly indicative of pet, not every dog is a pet, and so we want some kind of graded ordering. On the other hand, we would expect dog to be a full hyponym of mammal

Balkır et al. [2016] introduce a graded notion of hyponymy based on the relative entropy of two operators. Bankova et al. [2019] use a graded notion of hyponymy that is based on expanding the hypernym (the broader term) to include the hyponym. Lewis [2019b] extends this idea to include a wider range of gradings.

Specifically, suppose we are comparing two positive operators crisply, then B = A + D for some positive operator D. However, if this is not the case, then we can consider an error term E so that now

Then we have that , i.e. that there is a wholly positive and a wholly negative component to the difference . In Bankova et al. [2019] the authors render the error term E as being of the form (1 1). Then the value k is the strength of the hyponymy relation between A and B. The drawback of this approach is that the span of A must be included within the span of B. Lewis [2019b] proposes two alternative gradings based on the error term that do not suffer from this drawback:

In equation (4), in the worst case the positive difference term D is 0, and then 1. In the best case E = 0 and then = 1. In equation (5), in the worst case = 0. In the best case E = 0 and then

2.3 Building positive operators for words

In Bankova et al. [2019], a broader term such as mammal is viewed as a weighted sum over projectors describing instances of mammals. For example:

Lewis [2019b] propose a means of building positive operators for words using distributional word vectors and information about hyponymy relations from resources such as WordNet Miller [1995], as follows. In general, the meaning of a word w is considered to be given by a collection of unit vectors , where each represents an instance of the concept expressed by the word. Then the operator:

represents the word are weightings derived from the text, and there are various choices about what these should be.

We build representations of words as positive operators in the following manner. Suppose we have a dictionary of word vectors from a corpus using standard distributional or embedding techniques, for example GloVe, Pennington et al. [2014], FastText Bojanowski et al. [2017], or weighted co-occurrence vectors. To build a representation of a word, we obtain a set of hyponyms that are instances of that word. In this paper, we use WordNet Miller [1995], a human-curated database of word relationships including hyponym-hypernym pairs. The WordNet hyponymy relationship is naturally arranged as a directed graph with a root (it is not quite a tree). For the noun subset of the database, the root is the most general noun entity, and the leaves are specific nouns. For example, under the word rocket there are (inter alia): test_instrument_vehicle, Stinger, takeoff_booster, arugula. Notice that here we have different meanings of the word rocket, one as a projectile and one as a vegetable. There are also less supervised ways of obtaining these relationships using patterns derived from text, see Hearst [1992], Roller et al. [2018] for examples.

To build a positive operator for a word w, we go through the WordNet hierarchy and collect all hyponyms at all levels. We then form as in equation (6), with = 1 for all i.

When we build these operators, between 1/3 and 1/2 of the hyponyms listed in WordNet are available in GloVe, and we therefore miss a large proportion of the information included in WordNet.

2.4 Normalization

An important parameter choice is the type of normalization to use. In Bankova et al. [2019] two choices are discussed: normalizing operators to trace 1, or normalizing operators to have maximum eigenvalue less than or equal to 1. The properties of these two normalization strategies are thoroughly analyzed in van de Wetering [2017]. If operators are normalized to trace 1, then the crisp Löwner ordering becomes trivial: no two operators stand in the relation . If operators are normalized to have maximum eigenvalue 1, then the Löwner ordering has particularly nice properties. In the current paper, we will need to normalize operators so that their maximum eigenvalue is less than or equal to 1, as this will allow us to apply our proposed negation operator.

2.5 Composing positive operators

Building positive operators as proposed gives us representations for individual words. However, the representations are all states in one object of CPM(FVect), whereas for verbs, adjectives, and so on, we need morphisms in CPM(FVect). In order to obtain these, we use an approach outlined in Kartsaklis et al. [2012]. Firstly, we consider the spaces for noun and sentence to be the same, so now our pregroup types n and s both map to the same space W. To represent adjectives and verbs, representations of type are needed. In order to encode our representations in , we need to use the word representations we have built to define suitable morphisms in CPM(FVect). Kartsaklis et al. [2012] use the notion of a Frobenius algebra. Working in FVect, a Frobenius algebra over a finite-dimensional vector space with bases is given by

In the graphical calculus, these are given by:

A vector can be lifted to a higher-order representation in applying the map ∆. In FVect, this higher-order representation takes the vector and embeds it along the diagonal of a matrix in So, for example, given a vector representation of an intransitive verb , we can lift that representation to a matrix in by embedding it into the diagonal of a matrix. The Frobenius algebra interacts with the type reduction morphism in such a way that the result of lifting a verb and then composing with a noun is to apply the multiplication to the tensor product of the noun and the verb vectors, i.e.

Diagrammatically,

In FVect the multiplication implements pointwise multiplication of the two vectors. In CPM(FVect) we have access to the same algebra, and the multiplication operates similarly - namely, given two positive operators implements pointwise multiplication of the two operators. We call this operator Mult or . Whilst simple and theoretically motivated, this operation is not desirable for linguistic purposes as it is commutative, so that ‘dog bites man’ gets the same representation as ‘man bites dog’.

In Coecke [2019], Lewis [2019a], two other multiplications are proposed for combining positive operators. One, which we call BMult or , was originally proposed in Leifer and Poulin [2008], Leifer and Spekkens [2013] as a quantum Bayesian operation. This takes two operators A and B and returns the non-commutative and non-associative product . In Coecke and Meichanetzidis [2020], the authors show that this operation is also related to a Frobenius algebra, with the caveat that the algebra corresponds to a basis for W that diagonalises B.

The second, which we call KMult, or , is to form a completely positive map from a positive matrix B by decomposing B into a weighted sum of orthogonal projectors , and then forming the map

If we again consider a basis that diagonalises B, this operation then corresponds to the Frobenius multiplication ) in that basis. To see this, consider

Then

We therefore have three ways of combining positive operators. Moreover, each of these combination methods preserves the property that the eigenvalues must be less than or equal to 1. For the operations Mult and KMult, the spectral radius is submultiplicative with respect to the Hadamard (pointwise) product of two positive semidefinite matrices Horn and Johnson [1985], implying that the maximum eigenvalue of is bounded by 1. For the case of BMult, note that the product is similar to AB and hence has the same eigenvalues. Then the maximum eigenvalue of the product AB is bounded by the product of the maximum eigenvalues of A and of B Bhatia [2013], again implying that the maximum eigenvalue of AB is bounded by 1.

To apply these multiplications linguistically, choices must be made about the order in which they are applied, since neither BMult nor KMult are associative. In particular for transitive verbs there are a number of different choices, and some of these are discussed in Lewis [2019b]. For now, we limit to simple intransitive sentences, of the form noun verb.

The operators we outlined above are summarised below.

where in KMult

3 Modelling negation in CP1(V )

So far, we have shown how to build positive operators from a corpus of text, together with information about hyponymy relations. We have also shown how to lift the simple operators thus described to the maps required for functional words such as verbs and adjectives. We now describe how to model negation.

As discussed, one approach to modelling negation is to map a vector to the subspace orthogonal to it. We can incorporate this in our model very easily, since in the case of projectors, this is equivalent to subtracting the associated matrix from the identity matrix. Consider a vector that we have learnt in a distributional manner from a corpus. We can lift this representation to a positive operator by forming the projector , which forms a one-dimension subspace of the vector space W. We can then form an operator

which encompasses the 1-dimensional subspace orthogonal to the projector . In the general case, we define

When we restrict to the subset ) over a vector space W, this operation preserves positivity of the operator and also maps operators into the set

Importantly, this operation is not a morphism of CPM(FVect), and therefore a suitable home needs to be found for it. We do not provide an answer to that in this paper, leaving it for ongoing work. Rather, we look at how this operation interacts with composition, the Löwner ordering, and how it works in implementation.

3.1 How not interacts with the (graded) Löwner ordering

Consider operators ). Under the crisp Löwner ordering, we have

Considering an error term E, we use the notation . With such an error term,

Depending on the grading we use, the strength of the hyponymy relation will be affected. Using the grading (equation (4)) we have that not B is a hyponym of not A with strength

Using (equation (5)), we have:

3.2 How not interacts with composition

We focus here just on the case of intransitive sentences composed of a subject and a verb. When we negate the noun we obtain the following expressions:

Particularly in the case of , these feel like fairly natural interpretations of a sentence with a negated noun. We take the meaning of the verb as a whole, and then subtract out the part of the verb that is applied to the noun.

When we negate the verb we obtain the following expressions:

and, assuming that we use a basis in which is diagonal:

The operation does not have a particularly illuminating representation when the verb is negated, but in the case of , these are again fairly natural interpretations of a sentence with a negated verb.

4 Demonstrations

We give a demonstration on a small dataset that this rendering of negation works well together with the composition operators proposed. In particular, we will see that our combination operators can beat baselines that examine just the noun or the verb in the sentence. This is an important baseline since the construction of the dataset is such that entailment does follow from comparing either the nouns or the verbs. Our combination operators do not in general beat an average of two operators, however, they do in some cases.

4.1 Datasets

We build a set of datasets based on the intransitive sentence dataset introduced in Sadrzadeh et al. [2018]. The dataset consists of paired sentences consisting of a subject and a verb. In half the cases the first sentence entails the second, and in the other half of cases, the order of the sentences is reversed. For example, we have:

The first sentence is marked as entailing, whereas the second is marked as not entailing. The dataset is created by selecting nouns and verbs from WordNet. In the case of the sentence marked T, the first noun is selected as a hyponym of the second noun, and the first verb is selected as a hyponym of the second verb.

For these sentences to be thought of as entailing, we must view them as being implicitly existentially quantified. For example, if we took the pair of sentences

we can clearly see that the first sentence does not entail the second if we assume a universal quantification - there could easily be, and there are, non-gazelle mammals that don’t run. However, if we take an existential quantification, then the fact that there is some gazelle that sprints means that there must be some mammal (the gazelle) who runs (as sprinting is a kind of running).

Bearing in mind that the sentences are existentially quantified, we create three further datasets that include negation. We apply negation only at the word level and not at the sentence level, as this retains the existentially quantified nature of the sentences. Consider an entailing sentence pair such as:

We include negation in two places: either the noun can be negated, giving us non-dogs and non-mammals, or else the verbs can be negated, giving us do not run and do not move.

From dogs run |= mammals move we then get three more pairs of entailing sentences:

To model these, we render the negation of the verb as directly acting on the verb. Another choice would be for the negation to act on the whole sentence, rendering dogs don’t move as not(dogs move), but this would mean that we now consider the sentence universally quantified. Working out how to include a full account of quantification is an area of further work.

To model these sentences, we therefore calculate, respectively:

where is one of the graded hyponymy measures and is one of the compositional operators.

4.2 Construction and composition of positive operators

We follow the construction methods outlined in Lewis [2019b] and summarised in this paper in section 2.3. In order to construct the basic positive operators, we use hyponyms from WordNet Miller [1995], and 50 or 300 dimensional GloVe vectors. The operators produced are normalised to have maximum eigenvalue equal to 1.

To compose positive operators, we use the three composition functions Mult, Mult, Mult discussed in section 2.5. We compare these with three baselines: the average of two operators, a noun-only baseline, and a verb-only baseline. Due to the construction of the datasets, we see that in fact the verb-only and noun-only baselines are fairly strong, since as long as the construction of the individual words models the hyponymy relations well then a verb-only or noun-only model will be able to perform well on these datasets. Note that taking the average of the two operators

Table 1: Area under ROC curve on the negation datasets, using hyponyms, and 300 dimensional GloVe vectors. Figures reported are the average of the 100 values of the test statistic. indicates significantly better than the Average baselline. indicates significantly better than the noun-only baseline.

preserves the criterion of the maximum eigenvalue being less than or equal to 1 by Weyl’s inequalities Weyl [1912]

Metrics and significance measures Since the entailment measures we use give back a grading, whereas we require a binary response, we calculate area under ROC curve (AUC). The AUC calculates the true positive rate vs. the false positive rate for different cutoff levels of the graded measure. The maximum that can be attained is 1.

To measure the significance of our results, we use bootstrapping Efron [1992] to calculate 100 values of the test statistic (AUC) drawn from the distribution implied by the data. We compare between models using a paired t-test and apply the Bonferroni correction to compensate for multiple model comparisons.

5 Results

We can see that across the board (tables 1, 2, 3, 4), the measure performs more strongly than the measure. The difference in performance is likely to be because the measure is very symmetric, and the dataset is also, meaning that not only are there equal numbers of entailing and non-entailing sentences in the dataset, but the non-entailing datasets are the opposite of the entailing datasets. Enhancing the datasets with some random pairings would likely degrade the performance of the measure. Investigating the differences in performance in a less balanced dataset is an area of further work.

Table 2: Area under ROC curve on the negation datasets, using hyponyms, and 300 dimensional GloVe vectors. Figures reported are the average of the 100 values of the test statistic. indicates significantly better than the Average baselline. indicates significantly better than the noun-only baseline.

Table 3: Area under ROC curve on the negation datasets, using hyponyms, and 50 dimensional GloVe vectors. Figures reported are the average of the 100 values of the test statistic. indicates significantly better than the Average baselline. indicates significantly better than the noun-only baseline.

In the case of the measure, increasing the dimensionality of the underlying vector space improved performance across all sentence types. This was not the case for the measure, where for sentences of the type noun - not verb and not noun - verb performance using the measure improved with lower dimensionality (tables 2 and 4)

The best results were obtained using the measure and 300-dimensional GloVe vectors. In this set of results (table 1) the Average baseline proves hard to beat, however Mult also performs strongly for sentences with either no word negated or both words negated. For these two classes of sentences, it is also notable that all

Table 4: Area under ROC curve on the negation datasets, using hyponyms, and 50 dimensional GloVe vectors. Figures reported are the average of the 100 values of the test statistic. indicates significantly better than the Average baseline. indicates significantly better than the noun-only baseline.

composition functions enable better performance than the strong non-compositional noun-only baseline. A similar pattern is seen when using 50-dimensional vectors with the measure (table 3), where the benefit of using a compositional operator is also seen for the sentence type noun - not verb.

The benefit of using compositional operators is also seen for the (tables 2 and 4), where using a compositional operator helps in almost all cases over the (admittedly much worse) non-compositional noun-only baseline.

Across both measures and dimensionalities performance is poor on the sentence type not noun - verb. More research is needed to investigate why this is.

6 Discussion and Conclusions

We have introduced a negation operator for use in the CPM(FVect) flavour of DisCoCat. The operators is based on the notion of projection onto the orthogonal subspace, used previously by Widdows and Peters [2003]. The operator works well together with the composition operators Mult, BMult, and KMult discussed in Lewis [2019a,b], Coecke and Meichanetzidis [2020], and in many cases perform well on a toy dataset of sentence entailments.

More investigation into the properties of the BMult and KMult operators is needed. Coecke and Meichanetzidis [2020] have shown that the two operators can be combined together in a double density matrix setting, meaning that the operators can be given a natural home.

Work is also ongoing to build operators from corpora in a less supervised way.

Recent work on learning Gaussian embeddings Vilnis and McCallum [2014] may be leveraged to build the representations needed.

Further, testing on larger scale datasets is also needed. Ideally, the kinds of entailment relations we are looking at should be useful for textual entailment and reasoning systems. Expanding the models we currently have to test on realistic datasets is desirable.

Another major unanswered question is where the negation operator should sit theoretically. It cannot be viewed as a morphism in CPM(FVect). Some work in progress is into looking at the set ) as an object of the category ConvexRel, introduced in Bolt et al. [2019]. Then, the negation operator can be viewed as a morphism. This is an area of further work.

References

Esma Balkır, Mehrnoosh Sadrzadeh, and Bob Coecke. Distributional sentence en- tailment using density matrices. In Topics in Theoretical Computer Science, pages 1–22. Springer, 2016.

Dea Bankova, Bob Coecke, Martha Lewis, and Dan Marsden. Graded hyponymy for compositional distributional semantics. Journal of Language Modelling, 6(2): 225–260, 2019.

Marco Baroni and Roberto Zamparelli. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1183–1193. Association for Computational Linguistics, 2010.

Rajendra Bhatia. Matrix analysis, volume 169. Springer Science & Business Media, 2013.

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enrich- ing word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017.

Joe Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Dan Marsden, and Robin Piedeleu. Interacting conceptual spaces i: Grammatical composition of concepts. In Conceptual Spaces: Elaborations and Applications, pages 151–181. Springer, 2019.

Bob Coecke. The mathematics of text structure. arXiv preprint arXiv:1904.03478, 2019.

Bob Coecke and Konstantinos Meichanetzidis. Meaning updating of density matri- ces. arXiv preprint arXiv:2001.00862, 2020.

Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark. Mathematical foundations for a compositional distributional model of meaning. arXiv:1003.4394, 2010.

Bob Coecke, Edward Grefenstette, and Mehrnoosh Sadrzadeh. Lambek vs. Lam- bek: Functorial vector space semantics and string diagrams for Lambek calculus. Annals of Pure and Applied Logic, 164(11):1079–1100, 2013.

Ellie D’Hondt and Prakash Panangaden. Quantum weakest preconditions. Mathematical Structures in Computer Science, 16(03):429–451, 2006.

Bradley Efron. Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics, pages 569–593. Springer, 1992.

Marti A Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics-Volume 2, pages 539–545. Association for Computational Linguistics, 1992.

Karl Moritz Hermann, Edward Grefenstette, and Phil Blunsom. “not not bad” is not “bad”: A distributional account of negation. arXiv preprint arXiv:1306.2158, 2013.

Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge University Press, 1985.

Dimitri Kartsaklis. Coordination in categorical compositional distributional seman- tics. arXiv preprint arXiv:1606.01515, 2016.

Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, and Stephen Pulman. A unified sentence space for categorical distributional-compositional semantics: Theory and experiments. In In Proceedings of COLING: Posters, pages 549–558, 2012.

Germán Kruszewski, Denis Paperno, Raffaella Bernardi, and Marco Baroni. There is no logical negation here, but there are alternatives: Modeling conversational negation with distributional semantics. Computational Linguistics, 42(4):637–660, December 2016. doi: 10.1162/COLI\_a\_00262. URL https://www.aclweb.org/anthology/J16-4003.

Matthew S Leifer and David Poulin. Quantum graphical models and belief propa- gation. Annals of Physics, 323(8):1899–1946, 2008.

Matthew S Leifer and Robert W Spekkens. Towards a formulation of quantum theory as a causally neutral theory of bayesian inference. Physical Review A, 88 (5):052130, 2013.

M. Lewis. Hyponymy in discocat, 2019a. URL http://www.cs.ox.ac.uk/ACT2019/preproceedings/Martha%20Lewis.pdf. Under review.

Martha Lewis. Compositional hyponymy with positive operators. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 638–647, Varna, Bulgaria, September 2019b. INCOMA Ltd. doi: 10.26615/978-954-452-056-4\_075. URL https://www.aclweb.org/anthology/R19-1075.

Jean Maillard, Stephen Clark, and Edward Grefenstette. A type-driven tensor-based semantics for ccg. In Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 46–54, 2014.

George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38 (11):39–41, November 1995. ISSN 0001-0782. doi: 10.1145/219717.219748. URL http://doi.acm.org/10.1145/219717.219748.

Reinhard Muskens and Mehrnoosh Sadrzadeh. Context update for lambdas and vec- tors. In International Conference on Logical Aspects of Computational Linguistics, pages 247–254. Springer, 2016.

Denis Paperno, Marco Baroni, et al. A practical and linguistically-motivated ap- proach to compositional distributional semantics. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 90–99, 2014.

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014. URL http://www.aclweb.org/anthology/D14-1162.

Robin Piedeleu, Dimitri Kartsaklis, Bob Coecke, and Mehrnoosh Sadrzadeh. Open system categorical quantum semantics in natural language processing. arXiv:1502.00831, 2015.

Anne Preller and Mehrnoosh Sadrzadeh. Bell states and negative sentences in the distributed model of meaning. Electronic Notes in Theoretical Computer Science, 270(2):141–153, 2011.

Laura Rimell, Amandla Mabona, Luana Bulat, and Douwe Kiela. Learning to negate adjectives with bilinear models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 71–78, 2017.

Stephen Roller, Douwe Kiela, and Maximilian Nickel. Hearst patterns revisited: Automatic hypernym detection from large text corpora. arXiv preprint arXiv:1806.03191, 2018.

Mehrnoosh Sadrzadeh, Stephen Clark, and Bob Coecke. The Frobenius anatomy of word meanings I: subject and object relative pronouns. Journal of Logic and Computation, page ext044, 2013.

Mehrnoosh Sadrzadeh, Dimitri Kartsaklis, and Esma Balkir. Sentence entailment in compositional distributional semantics. Ann. Math. Artif. Intell., 82(4):189–218, 2018. doi: 10.1007/s10472-017-9570-x. URL https://doi.org/10.1007/s10472-017-9570-x.

Peter Selinger. Dagger compact closed categories and completely positive maps. Electronic Notes in Theoretical Computer Science, 170:139–163, 2007.

Peter Selinger. A survey of graphical languages for monoidal categories. In New structures for physics, pages 289–355. Springer, 2010.

John van de Wetering. Entailment relations on distributions. arXiv preprint arXiv:1608.01405, 2016.

John van de Wetering. Ordering information on distributions. arXiv preprint arXiv:1701.06924, 2017.

Luke Vilnis and Andrew McCallum. Word representations via gaussian embedding. arXiv preprint arXiv:1412.6623, 2014.

Hermann Weyl. Das asymptotische verteilungsgesetz der eigenwerte linearer par- tieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen, 71(4):441–479, 1912.

Dominic Widdows and Stanley Peters. Word vectors and quantum logic: Experi- ments with negation and disjunction. Mathematics of language, 8(141-154), 2003.

designed for accessibility and to further open science

Towards logical negation for compositional distributional semantics

2020·Arxiv