Saturday, January 15, 2022

Processing SHRDLU demo sentence "Why?" with NLI-GO

For several years I have been working on a natural language understanding program called NLI-GO.  It allows the user to interact with a database through natural language. Since I needed example sentences to extend its functionality, and SHRDLU is a great inspiration, I thought it would be a good idea to recreate SHRDLU's famous sample dialog in NLI-GO. And it is a good idea. Each sentence creates new challenges and puzzles that need to be solved, and this allows NLI-GO to grow.

I am still working on it. The dialog has 44 interactions, and the last few months I have been working on interaction 25. It goes like this:

Person: Why?
Computer: TO GET RID OF IT

As usual, this interaction proved more complicated than I had envisioned at first. And now that I finished it I would like to explain some of the techniques I used. 

Some dialog context

The question "why?" follows a series of interactions. Of these interactions the ones that matter are:

Person: Will you please stack up both of the red blocks 
  and either a green cube or a pyramid?
Computer: OK.

This is followed by the sequence

Person: Had you touched any pyramid before you put the 
  green one on the little cube?
Computer: YES, THE GREEN ONE

Person: When did you pick it up?
Computer: WHILE I WAS STACKING UP THE RED CUBE, A LARGE RED BLOCK AND A LARGE GREEN CUBE

Person: Why?
Computer: TO GET RID OF IT

So the person asks the computer when it picked up the green pyramid, and then why it picked up that pyramid. SHRDLU remembers that it was stacking up some blocks and needed to get rid of this pyramid in order to move the block that was below it.

SHRDLU's actions

This is a lot to take in. These are not your average QA questions one would ask a database in an NLI system. So this is a good time to go into the action structure of SHRDLU.

When the user tells SHRDLU to perform some command ("build a stack", "pick up a block"), the actions needed to perform this command are executed. While they are executing, SHRDLU also stores the fact that such and such an action was performed in its database.

This is what the start of an action "pick up" looks like in NLI-GO:

pick_up(EventId, Subject, Object)
start_time(EventId, Start);
end_time(EventId, End); parent_event(EventId, ParentEventId) 

Note that there's no need for an event-based database, if such database would exist. A normal relational database can keep track of these events, but they need to be inserted explicitly by the application.

I am currently using an internal database (a data structure) to store these actions, but they could be stored in a relational database as well. `pick_up`, `start_time`,`end_time`, and `parent_event` would then be the names of tables.

Most commands are not simple. They consist of a hierarchy of actions. If an action "pick up" needs to move aside some object, it starts a "get rid of" action. This action will receive a parent pointer to its origination action. This is stored as the `parent_event`.

Because these actions are stored, SHRDLU has a memory of them. Because each action has a reference to the action that started it, SHRDLU can tell why it performed an action, simply by following the `parent_event` link.

SHRDLU has a simple discrete sense of time. Each time a basic action is performed, the internal time is updated by 1.

Ellipsis

To understand the question "Why?", an understanding system needs to find out what is meant here, exactly, as if asking: "Why what?" Clearly part of the question is left out. this is called ellipsis.

NLI-GO treats this problem by declaring the ellipsis explicitly in the grammar. Here is the grammar rule that deals with it:

{ rule: interrogative_clause(P1) -> 'why',
  ellipsis: [prev_sentence]//mem_vp(P1),
  sense: go:intent(why, P1) $mem_vp }

The rule " interrogative_clause(P1) -> 'why' " rewrites the clause to the single word "why". The value of `ellipsis`,  [prev_sentence]//mem_vp(P1) is a path that leads to the missing part of the sentence.

In trying to match this rule, NLI-GO follows the path, starting by the current `interrogative_clause` node of the active syntax tree:

  • [prev_sentence] tells it to visit the previous sentence in the dialog); this is "When did you pick it up". NLI-GO is not at the root of this sentence
  • //mem_vp tells it to visit all mem_vp nodes anywhere below the current node 
The syntax is somewhat similar to that of XPath (used to navigate XML).

It such an `mem_vp` is found, this node will be copied into the active sentence. The syntax tree of "Why?" is now extended and the complete sentence now looks like this:

Why did you pick it up?

Resolving "it"

Something that may seem self-evident, but which isn't really, is that "it" in the new sentence must refer to the same green pyramid as where "it" in the previous sentence referred to.

NLI-GO should not try to resolve this "it" anew; it should simply inherit the value of "it" that had been resolved in the "When" question. If it wouldn't, "it" might resolve to another object, and this is contra-intuitive. 

(And yes, this is of course exactly what happened at an earlier stage of the software; "it" first referred to SHRDLU (which is not even an "it", but NLI-GO had no idea).

The response: "it" again

The response to the question should be

To get rid of it

However, the response that NLI-GO initially gave was this:

To get rid of the green pyramid

It proved not so simple to get NLI-GO to just use a pronoun.

Centering theory

Interaction 25 is the first interaction that uses a pronoun in its response. Why would SHRDLU suddenly use a pronoun in its response? To the reader this is obvious. The object in the sentence had been referred to as "it" before, so it is natural to keep referring to it as that. The fact that an object may be in the spotlight of a dialog has been worked out in centering theory. If an object was the preferred center of the previous sentence, it must the preferred center in the current sentence again. Also: a sentence' subject is more likely to become a new center than an object, and this in turn is more likely than any other entity.

Concluding remarks

This seemingly innocuous sentence has kept me busy for four months. I had not seen it coming. It has both been a source of joy for learning new concepts but also a continuous source of frustration. I needed to rewrite existing structures to enable these new features and this broke several existing interactions. Still, the new framework is not at all robust. It needs a lot of work to make it simple to use. At the same time, I still feel this is a great field to work in, and it will eventually provide a level of determinism that machine learning will never reach. So it's worth it.



No comments:

Post a Comment

On SQLAlchemy

I've been using SQLAlchemy and reading about it for a few months now, and I don't get it. I don't mean I don't get SQLAlchem...