Saturday, April 1, 2017

Quantifier Scoping

I never really got it. Quantifier scoping. In the CLE book [1] it is introduced with sentences like
John visited every house on a street.
The sentence is ambiguous. I can mean "John visited every house on street S" and "John visited every house that is located in a street".

The Blackburn and Bos book [2] features sentences like
Every boxer loves a woman.
These sentences have in common that they are not typical sentences you would use when interacting with a database. I never paid much attention to them. My attitude was one of I'll cross that bridge when I get to it.

Quantification in Predicate Calculus

What I should have taken from it was that these are just (clumsy) examples of very fundamental logical property called quantification. To refresh your memory I'll just quote from that Wikipedia article:
The two most common quantifiers are the universal quantifier and the existential quantifier. The traditional symbol for the universal quantifier is "∀", a rotated letter "A", which stands for "for all" or "all". The corresponding symbol for the existential quantifier is "∃", a rotated letter "E", which stands for "there exists" or "exists".
Quantifiers are about quantity. Therefore the natural numbers (1, 5, 13) are just as important quantifiers.

Example sentence

In my NLI library I have set of integration tests that involve sentences in the field of human relationships. A month ago the following question came to mind:
Does every parent have 2 children?
and I knew my library could not handle this sentence. Why not? Suppose there are three parents in the database and each has 2 children. When I collect all these parents and children, I find that there are 6 of them. And 6 is not 2. So the library will say no while the answer should be yes.

The relational representation of the sentence at that time was

have(P) 
subject(P, S) parent(S) determiner(S, D1) isa(D, all)
object (P, O) child(O) determiner(O, D2) isa(D2, 2)

It must be rewritten to get rid of the "vague" (this is the official term) verb "have", because "have" does not have, and cannot have, a representation in the database.

have_child(S, O) 
parent(S) determiner(S, D1) isa(D, all)
child(O) determiner(O, D2) isa(D2, 2)

NLI Literature

I read about the subject in all the books and articles I currently have at my disposal and I learned that the LUNAR system (Woods) was the first milestone in handling quantification. CLE was the system that handled quantification most extensively.

Quantification is expressed in a sentence through determiners, quantifiers, more specifically, and there's such a thing as a quantifier phrase QP. Examples of quantifier phrases are

every
all
some
much
21
more than 2
between 2 and 5
one or two
at least three-fourths of

Quantifications are nested. The problem with this nesting is that it changes a relational representation from a simple unordered set to a tree structure of unordered sets. That's another reason I postponed it as long as I could. The other problem is that the order of nesting is not syntactically determined. In some cases quantifier A has wider scope than quantifier B and in other cases it is the other way around. It depends on the type of quantifiers. There are hard rules that govern this ordering, and preferences.

I finally stranded at the Core Language Engine as the most useful resource. But I found the chapter on quantifier scoping hard to digest. Thankfully, it was based on a separate article by Douglas B. Moran, a member of the CLE team, which was much better readable. I get happy whenever I think about it:

Quantifier Scoping in the SRI Core Language Engine - Douglas B. Moran

The Scoping Algorithm

I will now describe how an unscoped representation is transformed into a scoped representation. It is loosely based on the algorithm of CLE, but some details were not given in the papers and I had to fill in myself.

But before I do that, you must know about the concepts of this domain.
  1. The Range is the set of entities that form the domain of the quantifier. In the example "every parent", the range is formed by all parents.
  2. The Quantifier is the set of relations that were originally in the quantifier phrase. It is the "every" and the "2" in the example sentence.
  3. The Scope of a quantifier is the set of relations where the quantifier applies.
  4. A Quantification is a relation that binds the lot together.
Let's start with the earlier named example

have_child(S, O) 
parent(S) determiner(S, D1) isa(D1, all)
child(O) determiner(O, D2) isa(D2, 2)

I changed the determiner predicate into quantification (quantification(range-variable, range, quantifier variable, quantifier)), because it better reflects the meaning

have_child(S, O) 
parent(S) quantification(S, [], D1, []) isa(D1, all)
child(O) quantification(O, [], D2, []) isa(D2, 2)

In a separate step I place the range and the quantifier into the quantification. In this exceptional step that only applies to quantifications, the relations below the dp() phrase are placed in the fourth argument, and the relations below the nbar() phrase are placed in the second argument.
rule: np(E1) -> dp(D1) nbar(E1), sense: quantification(E1, [], D1, []);
the result

have_child(S, O) 
quantification(S, [ parent(S) ], D1, [ isa(D1, all) ]) 
quantification(O, [ child(O) ], D2, [ isa(D2, 2) ]) 

This is called the unscoped representation. In CLE it is called the quasi logical form and  quantification is named qterm in CLE.

The next step is essential: it orders the quantifications based on hard rules and preferences. I currently have a very elementary implementation that sorts the quantifications using a sort function that takes two elements and determines their order
  1. if both quantifiers are "all" leave the order as is
  2. if the first quantifier is "all", it must go first
  3. if the second quantifier is "all", swap it to go first

LUNAR and CLE have much more rules at this point. I have not needed them yet.

Next, the quantifications are nested in sorting order. At this point the quantifications are turned into quants, the term used in CLE. The arguments are like that of quantification, except that one is added: scope. The lower ordered quant is placed inside the scope of the higher ordered quant:

quant(S, [ parent(S) ], D1, [ isa(D1, all) ], [
     quant(O, [ child(O) ], D2, [ isa(D2, 2) ], [])
])

Finally, I place the unscoped relations inside the scope of the quants. I do this by going through these relations one by one. Its default scope is outside of the quants. Then I step through all nested scopes, from outside to inside. Whenever the range variable of the quant matches one of variables of the arguments of the unscoped relation, the scope changes to that new inner scope. The unscoped relation is added to the last chosen scope.

quant(S, [ parent(S) ], D1, [ isa(D1, all) ], [
     quant(O, [ child(O) ], D2, [ isa(D2, 2) ], [
        have_child(S, O)
    ])
])

Execution

After the quant relations are produced, they are executed differently from normal relations while processing a sentence.

To evaluate a quant, these steps are taken
  1. Bind the range. The relations from the range (i.e. parent(S)) are bound to values in the database. This results in a set of bindings for the variable (S).
  2. Bind the scope. The relations from the scope (i.e. the inner quant, and have_child()) are bound to values in the database. This results in a second set of bindings featuring among others the range variable (S).
  3. Validation. The range bindings and the scope bindings are now evaluated through the use of the quantifier. The quant with quantifier "all" is only true if the number of range bindings equals that of the scope bindings. The quant with quantifier "2" is only true is the number of scope bindings equals 2.

Conclusion

Quantifier scoping handles the fact that the semantics of determiners cannot be created with rewrite rules alone. It makes the process of interpreting a sentence a lot more complex. But it is a necessary step in that it resolves some ambiguities. And above all, it is necessary for many aggregation functions that apply to quantities of temporary results. Scoping these quantifications is part of understanding the sentence.


[1] The Core Language Engine
[2] Representation and Inference for Natural Language

No comments:

Post a Comment

On SQLAlchemy

I've been using SQLAlchemy and reading about it for a few months now, and I don't get it. I don't mean I don't get SQLAlchem...