Unstructured Text: 2017

Saturday, September 9, 2017

English question syntax

I took the time to find out what types of question syntax there are. Even though one probably needs only a couple of them, it's always good to have some sort of overview.

I had no idea there were so many forms. I learned many of them from this very helpful page! My English Teacher

But I started my search in the ever inspiring ...

The phrase structure rules presented below are my own. Some structures recur in several places. They may probably be grouped. For a more authoritative reference, see Stanford Parser

I use the following abbreviations:

EQ = identity (is, was, are, were)
BE = auxiliary (is, was, are, were)
DO = auxiliary (do, does, did)
HAVE = auxiliary (has, have)
MOD = modality (can, could, will, would, shall, should)

and these syntactic categories

NP = noun phrase (a "thing")
VP = verb phrase (an action, or more general, a predication)
ADJP = adjective phrase (i.e. red, little, happy)
ADVP = adverbial phrase (i.e. quickly, happily)

I grouped the questions by their response type, which I showed in each of the headings.

Factual yes/no-questions or choice questions (boolean / a selected object)

DO NP VP (did Lord Byron marry Queen Elisabeth, did Lord Byron not marry Queen Elisabeth)
DO NP VP (did Lord Byron marry Queen Elisabeth or Anne Isabella Milbanke)
--
EQ NP NP (was Lord Byron king of England, was Lord Byron not king of England)
EQ NP NP (was Lord Byron a king or a lord)
--
BE NP ADJP (is the block red)
BE NP ADJP (is the block red or blue)
--
BE NP VP (was Lord Byron born in London, was Lord Byron not born in London)
BE NP VP (was Lord Byron born in London or Cambridge)
--
HAVE NP VP (has Napoleon invaded Germany, has Napoleon not invaded Germany)
HAVE NP VP (has Napoleon invaded Germany or The Netherlands)
--
MOD NP VP (would you like a cup of coffee, should I leave my things here, can dogs fly, can i ask you a question, can you stack a cube on a pyramid)
MOD NP VP (would you like coffee or tea)

Uninverted yes/no questions (boolean)

NP VP (Lord Byron married Queen Elisabeth?) (question mark is required)

Wh-questions (which, what, who; name one or more individuals)

WHO VP (who married Lord Byron, who was Lord Byron's wife)
WHO BE NP VP (who are you seeing)
WHO DO NP VP (who does Pierre want to beat)
WHO HAVE NP VP (who have you been seeing)
WHO MOD VP (who can drive me home)
--
WHOM DO NP VP (whom do you believe)
WHOM HAVE NP VP (whom have you believed)
WHOM MOD NP VP (whom should I talk to)

WHOM -> WITH/TO WHOM (with whom is Peter speaking)
--
WHICH NP VP (which countries border the mediterranean, which countries do not border the mediterranean)
WHICH BE NP (which is the best option)
WHICH DO NP VP (which do you do more often)
WHICH NP MOD NP VP (which way should I go)
--
WHAT NP VP (what rock sample contains most iron, what food items did you eat)
WHAT BE NP (what is the biggest block, what is your name)
WHAT DO NP VP (what do laptops cost)
WHAT HAVE NP VP (what has Churchill done to stop the war)
WHAT MOD NP VP (what should I do)
--
WHOSE NP VP (whose autographs have you collected, whose parents will drive)
WHOSE NP BE NP (whose book is that)

Amount (a number, requires aggregation)

HOW MANY NP VP (how many children had Lord Byron, how many children did Lord Byron have)

Degree (a number, the unit result depends on subject)

HOW MUCH NP VP (how much sugar goes in a single drink)
HOW ADJP BE NP (how high is the Mount Everest, how tall is the tallest man, how small is a mouse, how old are you)
HOW ADVP DO NP VP (how often do you go to the movies, how nicely do I need to dress tonight)

Manner (a means)

HOW BE NP VP (how was Napoleon crowned king)
HOW DO NP VP (how do you go to work)
HOW HAVE NP VP (how has Napoleon invaded Britain)
HOW MOD NP VP (how can I become more productive)

State (a state)

HOW BE NP (how are you)

Reason (a cause)

WHY BE NP VP (why was Napoleon crowned king)
WHY DO NP VP (why did Napoleon invade Germany)
WHY HAVE NP VP (why has John hit Jake)
WHY MOD NP VP (why should I go)

Time (a time)

WHEN BE NP (when was the marriage)
WHEN BE NP VP (when was Napoleon crowned king)
WHEN DO NP VP (when did you start wearing make up)
WHEN HAVE NP VP (when have you bought the tv)
WHEN MOD NP VP (when can I go home)

WHEN -> WHEN PP (when in the next hour do you want to go)

Place (a place)

WHERE BE NP (where is it?)
WHERE BE NP VP (where is the concert taking place)
WHERE DO NP VP (where did you go)
WHERE HAVE NP VP (where has Sally gone)
WHERE MOD NP VP (where can I find a pub)

WHERE -> WHERE PP (where on the map is it)

Thursday, August 31, 2017

Query Optimization

When building a natural language interface system you must ask yourself if you are going to send many simple queries to the database, or a single complex query and have the database execute it a single run.

One Big Query vs Many Small Queries

For Nli-Go I chose to send many little requests to the database. Each request deals with a single relation and has no joins and aggregations. For a triple store such as SPARQL, each relation just has two arguments. In this case there are only four queries per relation, depending on the variables that were bound in advance:

SELECT A, B FROM RELATION_P
SELECT A    FROM RELATION_P WHERE           B = Y
SELECT    B FROM RELATION_P WHERE A = X
SELECT 1    FROM RELATION_P WHERE A = X AND B = Y

There are major benefits of the small queries approach:

Very little knowledge is required of the idiosyncrasies of the database and of the database management system.
Given that the tables are properly indexed, all queries are quick.
It is possible to add extra logic to the user query not available from the database.
It is possible to combine the results of multiple databases for a single user query. These databases may have completely types.

There are also some major drawbacks, of course:

The system does not make use the processing power of the database.
The system need to do (very) many queries
More knowledge of the nli system is required.

Optimization

To me the benefits outweigh the drawbacks, but in order to have a performing system, I need to deal with the fact that the database does not help in creating the fastest query execution.
This comes down to the fact that the Query Optimizer of the database is not used for the full user query. Mainly, this means that the queried relations are not presented to the database in the most efficient order.
A simple example will make this clearer. Suppose the following two queries were created from a user input sentence "who married Lord Byron?"

SELECT A, B FROM MARRIED_TO
SELECT 1    FROM NAME WHERE           A = ? AND B = "Lord Byron"

The first query asks for all A, B involved in a marriage relation. The second query asks for the ID of a person whose name is given. When executed in the shown order, the first query yields, say, 40.000 rows. Next, each of the resulting values is entered into the second query. The second query is executed 40.000 times, each time with different values. As you can see, even a simple set of relations gets out of control easily.
Since the database does not do query optimization, the nli system needs to perform this task.
David H.D. Warren wrote an interesting article about query optimization for CHAT-80, a Prolog based natural language interface. It's called

Efficient Processing of Interactive Relational Database Queries Expressed in Logic

His algorithm to determine the order of the simple queries consists of two steps:

Calculate the cost for each query
Order the queries by cost, lowest first.

The cost is calculated by taking the size of the relation, and dividing it by the product of the number of distinct values in each of the bound arguments. I will explain this with examples:
For the query

SELECT ID, NAME FROM PERSON

the cost is calculated as

size(PERSON)

where "size(PERSON)" is the number of rows in PERSON. Note that the query has no constraints.
For the query

SELECT NAME FROM PERSON WHERE ID = 8721

the cost is calculated as

size(PERSON) / size_unique(ID)

Where "size(PERSON)" is the number of records in the table PERSON, and "size_unique(ID)" is the number of unique values in the column ID. Note that the query has one constraint: ID.
For the query

SELECT NAME FROM PERSON WHERE COUNTRY = "England" AND AGE = 87

the cost is calculated as

size(PERSON) / ( size_unique(COUNTRY) * size_unique(AGE) )

Where "size(PERSON)" is the number of records in the table PERSON, and "size_unique(COUNTRY)" is the number of unique values in the column COUNTRY. Note that the query has two constraints: COUNTRY and AGE.

The result

The cost of a bound NAME query is much lower than that for an unbound MARRIED_TO query, hence the NAME query will be scheduled for execution first. This query returns only a single row, and the second query will need to be executed only once.

In this algorithm, queries with many unbound variables are very costly. With more bound variables, the cost drops rapidly. It also takes into account that tables with less rows are cheaper than tables with many rows and that columns with many distinct values (higher cardinality) are cheaper than those with little distinct values.

It is a very simple, yet efficient way of optimizing conjunctive queries.

Friday, July 21, 2017

Intents, Slots and SHRDLU

Several things came together, lately. I read up on the chatbot blogs that flourish. For chatbots the concept of the intent of the user is important. The intent of a sentence is what the user means by it. But what struck me, even if it hadn't been said with so many words, is that the number of intents is restricted. Remember my examination of Siri's commands? These are typical intents, and their number is limited.

At first I was bored by this, because I want my AI to have unlimited possibilities. But then I saw its strength.

As it happened, I was looking for the best way to design an NLI (natural language interface), and for a definition of when an NLI is "complete". When is the design finished? This is an important question when you are building an NLI.

And I read Winograd and Flores' 1987 book "Understanding Computers and Cognition". A book that still needs to sink in with me, and which at first was actually pretty demotivating because it points at the limits of AI. But I already took one lesson home from it: computers are tools. They don't really understand anything. Everything needs to be told. You can't expect a computer to "get it" and continue on its own.

These things, together, brought me the idea of a finite set of intents, semantic structures that act like a set of services in an API. And that these intents are the place where the design of an NLI starts.

Designing from Intents

An NLI has these three layers:

Syntax
Semantics
Database language

When you start to design an NLI you could start out by defining which aspects of syntax you would like to support. But this leads quickly to over-engineering. Because the syntax, while complicated, is the easy part. Every part of the syntax is an input to the other parts of the system. What's more, syntax is never "complete". More syntax is always better.

But when you take semantics as a starting point, or more precise, a fixed set of intents, this changes. Taking this entrance, you ask yourself: what are the things the user actually wants from the database? And this turns out to be surprisingly limited.

Once the intents have been defined, syntax follows. Only syntactic structures that will actually be used need to be implemented. Also, a single intent can have multiple syntactic representations.

The Intents of Alexa

There are several commercial agents you can talk to these days, and that allow you to extend their standard functionality with custom code.

Amazon's Alexa (featured on devices like the Amazon Echo) allows you to create skills. A skill is a conversation subject, like the weather, movies, planning a trip, or whatever.

A skill consists of intents and utterances. An intent is a "lambda function" with a name and arguments. The arguments are called slots. Example intents and slots for the weather skill: WEATHER(date, location), RAIN(date, location), BARBEQUE(date). The functions can be written in several different languages, like JavaScript and Python.

While skills are defined on the semantic level, the syntactic level is defined by utterances. Each intent can have multiple utterances. Here are some utterances for RAIN(date, location): "is it going to rain {date}" "will it rain {date} in {location}" "do I need an umbrella?". Notice the slots. Slots have slot types. The slots named here have the date and location types.

If you want to read more about it, Amazon has detailed descriptions here. It is a lot of work! For a fun and friendly introduction, read Liz Rice her blogs.

The Intents of SHRDLU

I went on to test the idea of starting with intents. And what better place to start than the most complex NLI every built: SHRDLU. SHRDLU, built by Terry Winograd in 1969, expresses many NLI characteristics that have rarely been seen in later systems:

Questions about the domain model of the system: "Can a pyramid support a pyramid"
Question about the system's history and its decisions: "Why did you clear off that cube"
Introduction of new concepts: "A 'steeple' is a stack which contains two green cubes and a pyramid"

But back to intents. I took the example dialog from the book "Understanding Natural Language" by Winograd and tried to create an impression of what the intents of the sentences would be.

1. pick up a big red block
   KB:PICKUP!(Object)

2. grasp the pyramid
   KB:PICKUP!(Object)

3. find a block which is taller than the one you are holding and put it into the box
   KB:FIND?(Object) && KB:PUTIN!(Object, Object)

4. what does the box contain?
   KB:WHAT?(ObjectB) && KB:CONTAIN(ObjectA, ObjectB)

5. what is the pyramid supported by?
   KB:WHAT?(ObjectA) && KB:SUPPORT(ObjectA, ObjectB)

6. how many blocks are not in the box?
   KB:HOW_MANY?(ObjectB) && !KB:CONTAIN(ObjectA, ObjectB)

7. is at least one of them narrower than the one which I told you to pick up?
   KB:PRESENT?(ObjectA) && KB:NARROWER(ObjectA, ObjectB)

8. is it supported?
   KB:SUPPORT?(Object)

9. can the table pick up blocks?
   DM:CAN?(P) && P=KB:PICKUP(Object)

10. can a pyramid be supported by a block?
    DM:CAN?(P) && P=KB:SUPPORT(Object, Object)

11. can a pyramid support a pyramid?
    DM:CAN?(P) && P=KB:SUPPORT(Object, Object)

12. stack up two pyramids
    KB:STACKUP!(Object, Object)

13. the blue pyramid is mine
    UM:OWN(Person, Object)

14. I own blocks which are not red, but I don't own anything which supports a pyramid
    UM:OWN(Person, Object) && UM:OWN(Person, Object)

15. do I own the box?
    UM:OWN?(Person, Object)

16. do I own anything in the box?
    UM:OWN?(Person, Object)

17. will you please stack up both of the red blocks and either a green cube or a pyramid?
    KB:STACKUP!(Object, Object)

18. which cube is sitting on the table?
    KB:WHICH?(ObjectB) && KB:SUPPORT(ObjectA, ObjectB)

19. is there a large block behind a pyramid?
    KB:PRESENT?(Object) && KB:BEHIND(Object, Location)

20. put a small one onto the green cube which supports a pyramid
    KB:PUTON!(Object, Object)

21. put the littlest pyramid on top of it
    KB:PUTON!(Object, Object)

22. how many things are on top of green cubes?
    KB:HOW_MANY?(ObjectB) && KB:SUPPORT(ObjectA, ObjectB)

23. had you touched any pyramid before you put the green one on the little cube?
    TM:TOUCH?(Person, Object) && TM:AT(Time)

24. when did you pick it up?
    TM:WHEN?(E) && E=KB:PICKUP(Object)

25. why?
    TM:WHY?(E)

26. why did you do that?
    TM:WHY?(E) && E=TM:DO()

27. why did you clear off that cube?
    TM:WHY?(E) && E=KB:CLEAR_OFF(Object)

28. why did you do that?
    TM:WHY?(E) && E=TM:DO()

29. why did you do that?
    TM:WHY?(E) && E=TM:DO()

30. how did you do it?
    TM:HOW?(E) && E=TM:DO()

31. how many objects did you touch while you where doing it?
    KB:HOW_MANY?(Object) && TM:TOUCH(Object) && TM:AT(Time)

32. what did the red cube support before you started to clean it off?
    KB:WHAT?(Object) && KB:SUPPORT(Object, Object) && TM:AT(Time)

33. there were five blocks to the left of the box then.
    KB:PRESENT(Object, Location) && TM:AT(Time)

34. put the blue pyramid on the block in the box
    KB:PUTON!(Object, Object)

35. is there anything which is bigger than every pyramid but is not as wide as the thing that supports it?
    KB:PRESENT?(ObjectA) && KB:BIGGER(ObjectA, ObjectB) && !KB:WIDE(ObjectA, ObjectC) && KB:SUPPORT(ObjectA, ObjectC)

36. does a steeple
    ---

37. a "steeple" is a stack which contains two green cubes and a pyramid
    DM:DEFINE!(Word, Object)

38. are there any steeples now?
    KB:PRESENT?(Object, Time)

39. build one
    KB:BUILD!(Object)

40. call the biggest block "superblock"
    DM:DEFINE!(Word, Object)

41. have you picked up superblock since we began?
    TM:PICKEDUP?(Object) && TM:AT(Time)

42. why did you drop it?
    TM:WHY?(E) && E=KB:DROP(Object)

43. is there anything to the right of the red pyramid?
    KB:PRESENT?(Object, Location)

44. thank you
    SM:THANK_YOU()

Here's a movie of such a SHRDLU interaction.

Many of the intents here are similar. I will now collect them, and add comments. But before I do, I must describe the prefixes I used for the modules involved:

KB: knowledge base, the primary database
DM: domain model, contains meta knowledge about the domain
UM: a model specific to the current user
TM: task manager
SM: smalltalk

The number of intents of a system should be as small as possible, to avoid code duplication. It should also be not smaller than that, to avoid conditional structures in the intents. I think an intent should not only be about the lambda function, but also about the type of output the user expects.

These then are the aspects of an NLI intent:

a precondition: where a simple invocation name is sufficient to wake up a skill, this will not do for a more complex NLI. A semantic condition is necessary.
a procedure: where a hardcoded function (the lambda function) is sufficient for a chatbot, an NLI should use parameterized procedures in a language like Datalog. Each step of the procedure must be manageable by a task manager.
an output pattern: different types of questions require different types of answers. It is possible that two intents only differ in the responses they give.

Verbs can be the syntactic representation of intents, but "why" can be an intent as well. I use the suffixes ! for commands, and ? for questions.

The KB (knowledge base) intents:

PICKUP!(Object): bool
STACKUP(Object, Object): bool
PUTON!(Object, Location) / PUTIN!(Object, Location): bool
for these intents different procedures are activated that include a physical act that takes time. the system responds with an acknowledgement, or statement of impossibility
BUILD!(Object): bool
build a compound object
FIND?(Object): Object
search the space for an object with specifications. the object is used for another intent
PRESENT?(Object): bool
like FIND?, the system responds with yes or no
WHAT?(Object): object
like FIND?, but the system responds with a description that distinguishes the object from others
WHICH?(Object): object
like WHAT?, but comes with a specific class of objects
HOW_MANY?(Object): number
this intent involves counting objects, whereas the others just deal with a single instance. returns a number
SUPPORT?(Object): bool
just checks if a certain declaration exists. many of these could exist, and perhaps they can be combined to a single intent PRESENT?(Predication)

The DM (domain model) intents:

CAN?(Predication)
checks key points of the predication with a set of allowed interactions in the domain model
DEFINE!(Word, Object)
maps a word onto a semantic structure that defines it

The UM (user model) intents:

OWN(Person, Object): bool
mark the ownership relation between the user and the object
OWN?(Person, Object): bool
checks if the ownership relation exists

The TM (task manager) intents:

TOUCH?(Person, Object)
touch is an abstract verb that includes pick up, stack on, etc. It has a time frame.
PICKEDUP?(Object)
checks if PICKUP was used
WHEN?(Event)
returns a time index. It is described to the user as the event that happened then
WHY?(Event)
returns the active goal of an event
WHY?(Goal)
returns the parent goal of an goal. the top level goal is "because you asked me to"

The SM (smalltalk) intents:

THANK_YOU()
there is no meaning involved, just a canned response

Closing

I know I sound very deliberate in this post, that I know exactly what's going on and how it should be. That's just my way of trying to make sense of the world. In fact I just stumbled on this approach to NLI and its very new to me. Also my analysis of SHRDLU's "intents", which it never had, may be completely off track. One would only know if one tried to build a new SHRDLU around this concept.

I think chatbots form a great way to get involved in natural language processing.

Saturday, July 8, 2017

Zen's Prime Directive

I am watching the seventies TV series Blake's 7 from DVD set once more. It's great drama, although some of the visual effects are now very outdated. One of the things that I like best are the human-computer dialogs. The author of the series, Terry Nation, really got inside the skin of the computer.

There's one particular dialog that I want to write out here, since it illustrates an aspect of needs based agents that I was talking about. It is from the first series, episode 10, "Breakdown". It's a dialog between Blake and Zen, with a surprised remark from the technical genius, Avon, in between.

--- --- --- --- --- --- ---
Blake: Zen. Set navigation computers for direct route to space laboratory XK72. Speed Standard by Six.
Zen: Rejected.
Avon: You CANNOT reject a direct command!
Blake: Justify that rejection, please.
Zen: Your command reduces to an order to self-destruct. This runs counter to Prime Directive.

For a visual, here's an image of Blake talking to Zen.

Blake gives Zen a command. Zen has command over all computers of the ship, it is the task manager. Blake tells Zen to delegate the command to the navigation computers of the ship.

Zen rejects. Note that rejection is a higher order function of a system. Most computers just do as they are told. Not Zen. This surprises Avon, who calls out that it is impossible for a computer to reject a command. To Avon Zen is "just a computer". But Zen is a needs based agent.

Blake asks Zen to justify the rejection. This is an introspective request. It asks Zen to inspect its own prior reasoning as an intelligent agent.

Zen explains its reasoning efficiently. He says that the command, which includes going through an part of space of which it is only known that it is extremely dangerous (it later appears to be some sort of vortex) may very likely destruct the ship. Zen checks its projected actions with its directives (in which I see the concept of needs). It finds that it runs counter to its Prime Directive (which is probably to stay alive). From this it rejects the command.

As Blake and his crew manually override Zen and continue to go through this part of space, Zen shuts down all of its functions. It refuses to cooperate. If Zen was more like HAL (in 2001, a space odyssey), it would have actively tried to counteract the actions of the humans.

Note that in Star Trek there's also a Prime Directive. There it is a guiding principle for space explorers not to interfere with alien civilizations. I consider both Zen's Prime Directive and that of Star Trek to be needs.

Sunday, June 25, 2017

An introduction to CALO

From 2003 to 2008 the American Defense Advanced Research Projects Agency (DARPA) funded the largest research effort for cognitive systems in history. It involved 250 people from 25 universities and companies. Over 500 articles were written about it. It was named PAL, for Perceptive Assistant that learns. The aim of this program was described as follows

Through the PAL program, researchers will develop software that will function as an "enduring personalized cognitive assistant" to help decision-makers manage their complex worlds of multiple simultaneous tasks and unexpected events.

The DARPA grant was divided over Carnegie Mellon University (CMU) and Stanford Research Institute (SRI).

The CMU project was dubbed RADAR, for Reflective Agents with Distributed Adaptive Reasoning. The system's eventual goal was "The system will help busy managers to cope with time-consuming tasks such as organizing their E-mail, planning meetings, allocating scarce resources such as office space, maintaining a web site, and writing quarterly reports. Like any good assistant, RADAR must learn by interacting with its human master and by accepting explicit advice and instruction." [PAL1]

The SRI project was named CALO (Cognitive Agent that Learns and Organizes). It involved contributions from 20 additional research institutions. Its goal was broader than RADAR's. "The name was inspired by the Latin word “calonis,” which means “soldier’s assistant.” The CALO software, which will learn by working with and being advised by its users, will handle a broad range of interrelated decision-making tasks that have in the past been resistant to automation. It will have the capability to engage in and carry out routine tasks, and to assist when the unexpected happens." [PAL1]

This article is just a short introduction to CALO. Most of its information is derived from SRI's PAL website [PAL2]. The website emphasizes the learning aspect of the framework. Here, I focus on the relation between the components.

[PAL1] DARPA Awards Contracts For Pioneering R&D in Cognitive Systems (2003)

[PAL2] SRI's PAL website

CALO

To the user CALO consists of several applications. These applications however, are integrated into a common knowledge base, and governed by an autonomous agent that aims to assist the user in common tasks.

The following image from [CALO1] is an overview of the cognitive nature of CALO.

Machine learning ("learning in the wild") was an essential part of each CALO component. It had to be able to learn all kinds of new information without adding lines of code.

These are some of the types of learning strategies involved:

learning by instruction (Tailor)
learning by discussion (PLOW)
learning by demonstration (LAPDOG)
learning by observation (Markov Logic Nets)

Different CALO components used different ontologies, but all were designed around a core Component Library (CLib) "The CLib is encoded in the Knowledge Machine knowledge representation language, with a subset being automatically transformable into the popular Web Ontology Language (OWL) of the Semantic Web framework." [CALO-MA3]

[CALO1] Building an Intelligent Personal Assistant (2006)

Application Suite

CALO grew into of a suite of applications that groups around three centers:

PExA, the Project Execution Assistant
CALO-MA, the Meeting Assistant
IRIS, the Information Assistant

From a cognitive agent perspective, PExA is the central component.

PExA - Project Execution Assistant

"PExA focuses on two key areas: time management and task management. Time management refers to the process of helping a user manage actual and potential temporal commitments. Time management critically involves meeting or appointment scheduling but further includes reminder generation and workload balancing. Task management involves the planning, execution, and oversight of tasks. Such tasks may be personal in that they originate with the user, or they may derive from responsibilities associated with a project." [PEXA2]

PExA contains several tools that allow it to learn new procedures, that can be used by SPARK, the task manager.

PExA Architecture

PExA includes the following applications

SPARK, the Task Manager
PTIME, the Time Manager
EMMA, the Event Management Assistant (with PTIME)
EMP, the Execution Monitor and Predictor (ProPL)
Towel, the Task Management / ToDo UI
BEAM, the ToDo Interpreter
SEAR, for State Estimation
LEPT, the Duration Learner
Query Manager
ICEE, the Task Explainer
Machinetta, the Team Coordinator
Tailor, a Procedure Learner
EzBuilder, a Procedure Learner
LAPDOG, a Procedure Learner
PrimTL, a Procedure Learner
PLOW, a Procedure Learner

Each user had its own CALO. Machinetta took care of inter-CALO communication. [CALO1]

[PEXA1] A Context-Aware Personal Desktop Assistant (Demo Paper) (2008)

[PEXA2] An Intelligent Personal Assistant for Task and Time Management (2007)

[PEXA3] Like an Intuitive and Courteous Butler: A Proactive Personal Agent for Task Management (2009)

CALO-MA - Meeting Assistant

"The meeting assistant is designed to enhance a user’s participation in a meeting through mechanisms that track the topics that are discussed, the participants’ positions, and resultant decisions." [PEXA2]

"Our efforts are focused on assisting artifact producing meetings, i.e. meetings for which the intended outcome is a tangible product such as a project management plan or a budget." [CALO-MA1]

The Meeting Assisant used the Multimodal Discourse Ontology (MMD).

The meeting assistant architecture

CALO-MA includes the following applications

Dialog Manager
Meeting Browser
2-D Drawing Recognizer
3-D Gesture Recognizer

[CALO-MA1] Multi-Human Dialogue Understanding for Assisting Artifact-Producing Meetings (2004)
[CALO-MA2] The CALO Meeting Speech Recognition and Understanding System (2008)
[CALO-MA3] Ontology-Based Discourse Understanding for a Persistent Meeting Assistant (2005)

IRIS - Information Assistant

"IRIS is an application framework for enabling users to create a “personal map” across their office-related information objects. IRIS is an acronym for "Integrate. Relate. Infer. Share."" [IRIS1]

IRIS collects raw data from several other applications on the user's desktop, and integrates this data into it's own knowledge base, using CLib based ontologies.

The three layer IRIS integration framework

It contains a plugin framework and plugins for the following applications had been created:

Email (Mozilla)
Browser (Mozilla)
Calendar (OpenOffice Glow)
Chat (Jabber)
File Explorer (custom made)
Data editor/viewer (Personal Radar)

Adam Cheyer was the main contributor to IRIS. He left the CALO project in 2007 to cofound the CALO spin-off Siri. [SIRI1]

In 2007 IRIS, a Java application, was rewritten as a Windows application and renamed to CALO Express. [IRIS4]

CALO Express further contains the applications:

Presentation Assistant (for quick creation of presentations)
PrepPak (find, gather, and store relevant resources for common office tasks such as attending an upcoming meeting)

[IRIS1] IRIS: Integrate. Relate. Infer. Share. (2005)
[IRIS2] A Case Study in Engineering a Knowledge Base for an Intelligent Personal Assistant
(2006)
[IRIS3] Extracting Knowledge about Users’ Activities from Raw Workstation Contents (2006)
[IRIS4] Adam Cheyer personal website

[SIRI1] SIRI RISING: The Inside Story Of Siri’s Origins (2013)

SPARK

SPARK is the central component of PExA.

"At the heart of CALO’s ability to act is a Task Manager that initiates, tracks, and executes activities and commitments on behalf of its user, while remaining responsive to external events. The Task Manager component of CALO is based on a reactive execution system called SPARK" [SPARK2]

"There is a need for agent systems that can scale to real-world applications, yet retain the clean semantic underpinning of more formal agent frameworks. We describe the SRI Procedural Agent Realization Kit (SPARK), a new BDI agent framework that combines these two qualities." [SPARK1]

SPARK Architecture

SPARK is a BDI (Belief-Desire-Intention) agent. Its central function is described as "Each agent maintains a knowledge base (KB) of beliefs about the world and itself that is updated both by sensory input from the external world and by internal events. The agent has a library of procedures that provide declarative representations of activities for responding to events and for decomposing complex tasks into simpler tasks. At any given time the agent has a set of intentions, which are procedure instances that it is currently executing. The hierarchical decomposition of tasks bottoms out in primitive actions that instruct effectors to bring about some change in the outside world or the internal state of the agent. At SPARK’s core is the executor whose role is to manage the execution of intentions. It does this by repeatedly selecting one of the current intentions to process and performing a single step of that intention. Steps generally involve activities such as performing tests on and changing the KB, adding tasks, decomposing tasks by applying procedures, or executing primitive actions." [SPARK1]

[SPARK1] The SPARK Agent Framework (2004)
[SPARK2] Task Management under Change and Uncertainty Constraint Solving Experience with the CALO Project (2005)
[SPARK3] SPARK website
[SPARK4] Balancing Formal and Practical Concerns in Agent Design (2004)
[SPARK5] Introduction to SPARK (2004)

Ending Notes

I wrote this piece because I believe CALO is worth to be studied. It holds gems in various fields of AI. At the same time, the huge number of papers written on it may be daunting to the newcomer. I just hope this blog post has opened a door.

Sunday, June 11, 2017

BDI and PRS, theory and practice

The NLI system CALO I am examining uses a BDI system (named SPARK) that is a successor to SRI's PRS. In order to understand, and because PRS is historically significant, it's worth the study.

BDI

A BDI system has its philosophical roots in the Belief Desire Intention theory of Michael E. Bratman. In this theory on human rationality, formulated in the eighties, he proposes a third mental attitude, next to Beliefs and Desires, namely Intentions. It's laid down in his book Intention, Plans, and Practical Reason (= IPPR)

BDI systems are attractive because of their ability to adapt their plans to new circumstances. Since they are rational agents, their behavior is rational and one can inspect their course of action to find out why each decision was made.

Michael E. Bratman

It's very important to get the concepts straight, so I will try some definitions myself. The most important concepts are Belief, Desire, Intention, Goal, Plan, and Action. Since this theory is about humans, the concepts should be expressed in human terms, not in computational terms.

Belief

A belief is a bit of information about something, hold true by the person. It may be information about the world or about the person itself. It is different from a fact in that it isn't necessarily really true, it is just hold as true by the person.

Inference-type beliefs can be used to form plans.

Examples:

it rains
I am cold
bus line 4 will get me from home to Tanner Library
the use of my computer will heat up the room

Desire

A desire is a future state that the person wants to achieve. The desire itself can exist without the intent of actually executing it, there may not even be a plan or possibility to bring it about. Where I said "want" Bratman prefers the term "pro-attitude", a container that includes wanting, judging desirable, caring about, and others. (The future state is occasionally referred to as the goal.)

Examples:

to go to Tanner Library
a milkshake for lunch
find the best price
go to the party
become rich

Conflicting Desires: Choice

Desires can be conflicting. The desire to drink milkshake conflicts with the desire to loose weight, for example. Or two desirable events take place at the same time. The person needs to choose between these desires.

Intention

An intention is the possible practical consequence of a desire. It is also a future state that the person wants to achieve, but the difference with desire is that the intention involves a commitment and a plan. The commitment implies that, barring interventions, the intention will be realized. The plan is simply a prerequisite to achieve it. The intention controls the execution of the plan, every step of the way. [IPPR, p16] An intention also has inertia, which means that that one intention won't just be dropped for the next one. It strives for the plan to be executed to completion.

Examples:

to take bus line 4 from home to Tanner Library

Plan

A plan is the means by which an intention is executed. A plan is not just a general recipe, a scheme, or a procedure, but a concrete sequence of steps adapted to the specific circumstances at hand. A plan inherits the characteristics commitment and inertia from its intention. The plan may be partial, or incomplete. Plans are hierarchical. A plan consists of steps and subplans. [IPPR, p29]

Examples:

take bus (line 4, from home, to Tanner) => go to the bus stop, enter the bus, pay the conductor, have a seat, get out at the nearest bus stop to the target, walk to the target

Planning

Planning is the process of taking some inference rules and plan schemes, and turning this into a plan. But there's a catch. Plans must be consistent, there must be no contradictions between one plan and another, nor between a plan and some belief.

Examples:

I need a means to go to Tanner. This can be by bus or by car. But I am already planning to leave my car at home for Susan to use. [IPPR, p33]

Expected Side Effects

Side effects are the results of the actions following the intention, that are not part of the intention per sé. Some of these were purely accidental, some were expected. These side effects may well be undesirable for the person. When planning, a person needs to take into account that the side effects do not conflict with his desires (or that of other agents).

Goal?

The term Goal does not have a special position in IPPR. It is not named in the first chapters. And when it occurs, in chapter 9, when dealing with other frameworks, it just refers to the object of desire, [IPPR, p131] or as an alias for desire [IPPR, p33].

PRS

PRS is the first implementation of BDI. It was written by Michael Georgeff, Amy L. Lansky, François Félix Ingrand, and others at SRI. It was used for fault diagnosis on the Space Shuttle. I was interested in how it worked and I found a very clear document, Procedural Reasoning System, User's Guide, A Manual for Version 2.0. There's also a paper that appears to be the original document on PRS: Reactive Reasoning and Planning, which is exceptionally clear. I will use the naming from that early paper.

Amy L. Lansky and Michael Georgeff

PRS explicitly represents its goals, plans and actions. It is a reactive planner. It creates partial plans and is able to change them if new input comes in. But let's see how it implements the ideas of Bratman: BDI.

Belief

Beliefs are implemented as a database of state descriptions. Some beliefs are built-in. Others are derived by PRS in the course of action: observations and conclusions from these observations.

Procedural knowledge (which is a belief is BDI) is implemented as KA (for Knowledge Area, a plan schema).

Desire

Desires are called goals. Goals do not represent static world states, but rather desired behaviors. Goals are collected on the goal stack.

Intention

Intentions are implemented as a process stack of active KAs.

Plan

Plans are called Active KAs, and this probably needs some explanation. Whereas Bratman focused on plans as active mental states, the focus of PRS is in the declarative definition of the procedures. This leads to the following mapping:

BDI -> PRS

procedural belief -> KA / procedure
plan -> Active KA

The body of a KA consists of sequences of subgoals to be achieved. A KA also has an invocation condition, a logical expression that describes under which conditions it is useful.

Planning

PRS contains a large amount of procedural knowledge. This is knowledge in the form: to do A, follow steps B, C, and D. In such a procedure each of the steps can itself be a subgoal that will in turn need other procedures. This procedural knowledge is stored in KAs (Knowledge Areas). KAs are hierarchical, partial plan schemas.

Each goal is executed by applying one KA to the variables of the goal. Thus, an applied KA is formed.

PRS' System Interpreter only expands a subgoal of an active KA when it is the active step. It does not plan ahead. It does not start expanding subgoals in order to make the plan more complete. And because it does not plan ahead, it does not need to adjust its plans given changing circumstances.

Sometimes planning ahead is necessary, of course. When planning a route, for example. PRS does not use its central planning system to plan ahead. For the route planner, a KA has been that essentially consists of these steps

create a route-plan from here to T (and place it in the database)
convert the route-plan in a sequence of subgoals
execute the next subgoal (one subgoal at a time) until it is done

Expected Side Effects

The side effects of the KAs must be considered by their designer.

The System Interpreter

The interpreter "runs the entire system". Each cycle it performs these tasks:

selection of a new KA, based on goals and beliefs, and placing it on the process stack
executing the next KA from the process stack

Executing a KA causes

the creation of new beliefs (added to the database)
the creation of new subgoals (added to the process stack)
the creation of new goals (added to the goal stack)

Final Words

BDI is cool and PRS is fascinating.

Note: planning ahead and plan execution are two different things. Some tasks require some planning ahead, like making a reservation. An agent must plan ahead while it is executing this plan and other plans. The two tasks are done by different subsystems of the agent.

Bratman does not deal with the priorities of desires in IPPR. Priorities may be assigned to intentions in PRS, however.

Saturday, June 3, 2017

Some Thoughts on the Cognitive Agent

I will use this post to try and structure my incoherent thoughts about the structure of cognitive agents.

The Theater

At the heart of the cognitive agent should be a theater, as proposed by Bernard J. Baars in his Global Workspace Theory. The theater represents human working memory, conscious activity (the spotlight on the stage) working serially, and the many unconscious processes working in parallel.

(image from)

In this theater, decisions are made. Tasks are selected. It is a Task Manager.

Needs

The agent has needs. These are hard-coded like the need to help the user fulfill its goals or the need for self preservation. These needs define the soul of the agent.

The idea of needs is from Maslow, designed for humans, but can be applied to other agents too, as this image illustrates:

(image from)

The need of helping the user looks like this:

The number of unresolved user requests should be 0.
Importance: 0.8

The needs of an agent drive it. They determine what goals it will create and their relative importance.

"Needs" may alternatively be called "drives" or "concerns".

Goals

Needs lead to goals, based on input. If the user asks a question, for example, this will be interpreted as a user request:

User Request: "Is there a restaurant here?"
Urgency: 0.2

If the user makes a command, it will be treated the same way:

User Request: "Send me an e-mail if anyone mentions my website"

Urgency: 0.1

The question named here is a request that may be answered immediately. The command, however, becomes a permanent request.

The agent does not execute the user's request just because it's there. Executing the request just becomes a goal because one of the agent needs is to fulfill the requests of its user.

The agent has sensors that notice that its needs change. These sensors create active goals (instantiated from a template, its variables bound). These goals are entered into the Task Manager.

Plans

In order to reach a goal, the agent needs to follow a plan. In the restaurant example, the agent may have a plan like this. It consists of these steps.

Find current location (L)
Find user food preference (P)
Find restaurants of type P within distance D of L
Present the restaurants to the user

The plan is instantiated and handed over with the goal to the Theater. Note: no task is treated as something that may be executed immediately, synchronously. Every task needs to go through the theater.

Note that all cognitive activity takes the form of plans and tasks. Yes, that includes all deduction tasks. It presumes a Datalog type of knowledge representation, a logical form that is procedural in nature.

goal :- location(L), preference(P), restaurants(P, L, R), show(R).

Theater: Task Selection

The Theater, as a Task Manager, keeps track of all active plans. Each active plan has an active task.
At any point multiple active tasks may "scream for attention".

Now, the system as a whole may do many things simultaneously. But the thing is, each of its modules can only do one thing at a time. So when a given module is busy, it cannot handle another task at that time.

The Task Manager knows which modules are active and which ones are idle.

All active tasks scream with a loudness of:

Priority = Importance * Urgency

The task with the highest priority is selected by the Task Manager. This task is placed in the spotlight. TM asks all idle system modules to handle the task. The first one gets it. Or the one that bids most (as in an auction).

The Task Manager hands the task over to the module, and places it on the waiting list. TM will receive a "done" event from the module later, at which point it binds the result values to the active plan and advances the active plan to the next task. When all tasks are done, the active plan is removed.

Dialog Manager

The task "find user preference for restaurant" may have two implementations:

find preference in user database
ask user for preference

If the first task fails, because the preference has not been asked before, the second task will be performed. It is performed by the Dialog Manager. The DM is "just another module" of the system.

As soon as the "ask user for preference" task enters the Dialog Manager, it becomes active, and will remain active until it has received a reply from the user. This way, the system will not ask several questions at once.

Likewise, if the user has just asked something, DM will start active mode, and will first reply to the user before starting a question of its own.

The Dialog Manager may also initiate conversation on its own, based on a dialog plan. It cannot just start by itself, however. It needs to send a goal and a plan to the Task Manager, and request attention.

Episodic Memory

All active goals, plans and tasks are logged in episodic memory by the Task Manager. Or EM simply listens to TM. Storage occurs in a structured way. For every task, it is possible to retrieve its plan, its goal, and, ultimately, its need.

The reason for this is that Episodic Memory thus becomes a knowledge source for auditing. It allow the user to ask the question: "Why did you do that?"

This is also an important reason why all tasks need to go through the Task Manager. This way all activated tasks are accounted for in a central place.

Emotion

Remember the sensors that register that needs change? They can start emotions too. To do that, they use two types of parameters:

digression from the norm
time left to deadline

The norm in the example of assisting the user is 0 requests. If the number of requests becomes larger, like 5 or so, the need to assist the user becomes strained. In order to correct it, the importance of the need is temporarily increased. Remember the "importance" of a need? This is not a fixed value; it has a base value and increases as the need is strained.

If this does not sound very emotional, that's because it it just the arousal aspect of emotion. It causes one to become more excited in a certain area. Arousal is involuntary, it does not need to go through the theater to become active.

It is also possible to give the agent emotional expressions. It may start yelling to hurry up or to shut the pod bay doors. These expressions become (very) conscious and need to pass through the Theater in order to become active.

I suggest reading The Cognitive Structure of Emotions if you want to know more.

In Closing

I made some progress. Things are starting to make sense.

Tuesday, May 23, 2017

What I learned about Siri by pondering over some sample sentences

Siri is a virtual assistant, available on the Apple platform, with a natural language interface that allows its user to interact with his or her smartphone.

Siri's technical details are kept secret by Apple, but of course I wanted to know how it works. I plan to examine CALO, the large SRI / DARPA initiative from which Siri is a spin-off, but for the moment I thought I'd entertain myself and you by examining some sample queries and commands and reverse engineer them a little to see what would be necessary to create an NLI engine that would be able to process the textual input.

I found the sentences in the books

Siri for dummies (2012)
Talking to Siri (2014)

Before we start, I must make clear that the interpretation given here is not how Siri really works. It describes what I think would be a simple way to create this functionality. Further, Siri's real power is interacting with dozens of very complex services. I believe its natural language component is relatively simple in comparison with the full framework.

Let's start with the first sentence. It's a command. Which is fitting, because Siri has been called "a do engine, rather than a search engine".

Call a taxi

This command, that instructs Siri to assist in making a taxi reservation, is an idiom. It cannot be understood if taking literally; it is a agreement between language users: this sequence of words means a certain thing.

Understanding of an idiom by itself is the easiest form of recognition. It is a matter of matching the words call, a, taxi in sequence. Put in Extended Backus-Naur form:

<call-a-taxi> ::= "call a taxi"

To reduce fragmentation, I will show the code with the sentence. Next sentence(s):

Victor Agreda is my boss.
Erica Sadun is my writing partner.

<my-brother> ::= "my brother"
<my-boss> ::= "my boss"
<my-wife> ::= "my wife"
<role> ::= <my-brother> | <my-boss> | <my-wife>
<role-assignment> ::= <name> "is" <role>

Siri keeps a database of some of your personal relations. You can teach Siri the link between the name and the role in your relationship and then add information to this person, like their e-mail address and their home address. This is an important part of what makes it personal.

As the "writing partner" example shows, it is also possible to define new roles on the spot.

The following sentence represents a large class of commands that can be given to Siri, in order to perform many types of tasks, using some variables.

Meet with my wife at noon tomorrow.

<adjunct> ::= <time-adjunct> | <date-adjunct> | <place-adjunct>
<meeting-planning> ::= "meet with" <role> {<adjunct>}

The structure is command subject adjuncts. Adjuncts are modifiers for time, place, person, like "at noon", "at work". There are many variations on this type of command, but in an abstract sense the structure shown here is the same. You ask Siri to do something, and it forwards the request to an application (here: the default calendar application) using a required parameter and some optional ones. If you leave out some parameters, Siri will continue to ask for them.

Add Mary Smith to my meeting at 3:30 PM
Add Mary Smith to my 3:30 PM meeting

<person> ::= <name> | <role>
<time> ::= <digit> { <digit> } { : <digit> <digit> }  { AM | PM }
<meeting-add-person> ::= "add" <person> "to my meeting at" <time>
<meeting-add-person> ::= "add" <person> "to my" <time> "meeting"

Here we see that a known relationship is referred to by name, and a known appointment is referred to by time (the date is today, implicitly).

Other examples of the command subject adjuncts pattern are:

Remind me to call Mike Jones when I get home
Take me to one infinite Loop Cupertino California from times square New York
Email Mary Smith subject meeting confirmation
Play song by David Guetta
Who directed The Big Lebowski?

This is where the real power of Siri lies: seamless interfacing with a multitude of services.

Next: "canned responses".

What is the best smartphone ever?

Seriously?
The one you're holding

Siri has a large array of them, just to entertain its user. Siri keeps track which responses it has given before, rather than completely randomize them. Perhaps the responses are based on keywords in the sentence, rather than complete sentences. That would increase the chance that someone matches it.

Are there any good Japanese restaurants nearby?

It is likely that "any good" is not part of an exact match, and that Siri just matches keywords like "restaurants" in this sentence.

There's a subtle point I like to make about the word "nearby". The point is that Siri does not search for nearby restaurants itself. The query is forwarded to a third party service. Why does that matter? It means that the intelligent part of this answer is not performed by Siri, but by Apple Maps.

Play my UK tracks playlist
Skip
What's playing?

These sentences are interesting. The first sentence starts a "play playlist" context, that opens the door for the following context-dependent commands "skip", "pause", "play", "what's playing" and some others. These commands are only available after the context-initiating command ("Play playlist").

Most commands are in "global scope". The playlist commands are an exception. Another one is timer commands "Pause the timer" , "Resume the timer" and "Stop the timer".

Siri is also capable of handling some pronouns (him, her, it).

Read it again.
Call her.

In order to do this, it needs to keep track of most recently used sentence subject.

The final sentence I would like to mention is a neat trick by Siri: for knowledge intensive questions, just pass the question to Wolfram Alpha.

How much is $5.73 plus Denver sales tax

Conclusion

The creators of Siri have done a great job of creating a powerful virtual assistant. I have shown that the symbolic part of its natural language engine is really not that complicated. But that's just because the designers have left out a lot of unnecessary complexity. Which is to their credit. Siri's parent, CALO had much more power. And that's what I'll be examining next.

Sunday, April 23, 2017

The Needs-Based Agent

This post is about a subject that has occupied my mind for many years. It is not a complete theory by a long shot, but this time is as good as any to express the main idea.

The subject is the needs-based agent, a term that I coined for a type of intelligent agent that acts on its own needs. It extends the ideas of "Artificial Intelligence - A Modern Approach" with those of Abraham Maslow.

Types of Agents

In "Artificial Intelligence - A Modern Approach", by Stuart Russell and Peter Norvig, a book I read for a university class on AI, several types of agents were introduced, in increasing order of complexity.

Simple reflex agents: these agents have a set of condition-action rules that determine their next action
Agents that keep track of the world: agents with an internal state, with memory if you will
Goal based agents: agents that use search and planning to reach their goals
Utility-based agents: agents that select the best action to reach a goal, if multiple actions are available

The utility-based agent (from "Artificial Intelligence - A Modern Approach")

Maslow on Human Needs

In "Motivation and Personality" Abraham Maslow unfolds his positive theory of the human psyche, in which the concept of needs has a central role. Maslow describes a hierarchy of basic human needs.

Physiological needs: the need for food, oxygen, sleep, sex, ...
Safety needs: the need to be in a safe place, both physically and phychologically
The belongingness and love needs: the need to be loved, to give and receive affection
The esteem needs: the need to be valued by others
The self-actualization need: the need to express their individual nature

The basic needs (from Wikipedia)

As Maslow is a cognitive psychologist, he describes the human needs in a functional way. Almost ready to be modelled in a computer model.

A needs-based agent is then simply an agent that acts according to his needs.

The Difference Between Goals and Needs

You are entitled to ask yourself if a need is not simply another term for a goal. Indeed, a need has the aspect of something that is to be reached, just like a goal. Also, search and planning are needed to fulfill a need. But there's an important difference: needs are not goals in themselves; needs cause goals.

Take the need for food for example: when you're hungry, eating becomes a goal. But when you're not hungry, there's no goal. In fact, when you've eaten too much, avoiding more food may become a goal in itself. ;)

To frame hunger in functional terms:

Need: the need for sugar
Resource: the amount of sugar present in the body
Desired state: at least n grams of sugar
Present state: too low
Goal: a certain level of sugar in the body

A need then, is a process that constantly monitors a certain resource. When the desired amount of the resource differs too much from the available amount of the resource, the process creates a goal to reduce this difference and restore the balance.

Note also, that many needs are dormant. This is where the hierarchy comes in. Higher needs are only activated when lower needs are fulfilled. One only thinks of writing a blog post after he has eaten, is in a safe place, and is loved by others.

A New Agent Diagram

Since I picture this agent as an elaboration of the existing agent types, it lies within reason that I create an agent diagram for it, that extends the utility based agent. Here it is:

The needs-based agent (my own production)

The Sims

As I finish this article, I notice an article that I had not seen before. It is written by Robert Zubek on the information structure of the computer game The Sims.

Needs-Based AI - Robert Zubek

The Sims, that my wife and children play a lot, is based on exactly this agent architecture. Quoting from the article:

Needs correspond to individual motivations: for example, the need to eat, drink, or rest. The choice of needs depends very much on the game. The Sims, being a simulator of everyday people, borrowed heavily from Maslow’s hierarchy (a theory of human behavior based on increasingly important psychological needs), and ended up with a mix of basic biological and emotional drivers.

Finally

I think this type of modelling will become more important as applications are used to model human behavior. It is a good idea to introduce a new type of agent for it.

Saturday, April 1, 2017

Quantifier Scoping

I never really got it. Quantifier scoping. In the CLE book [1] it is introduced with sentences like

John visited every house on a street.

The sentence is ambiguous. I can mean "John visited every house on street S" and "John visited every house that is located in a street".

The Blackburn and Bos book [2] features sentences like

Every boxer loves a woman.

These sentences have in common that they are not typical sentences you would use when interacting with a database. I never paid much attention to them. My attitude was one of I'll cross that bridge when I get to it.

Quantification in Predicate Calculus

What I should have taken from it was that these are just (clumsy) examples of very fundamental logical property called quantification. To refresh your memory I'll just quote from that Wikipedia article:

The two most common quantifiers are the universal quantifier and the existential quantifier. The traditional symbol for the universal quantifier is "∀", a rotated letter "A", which stands for "for all" or "all". The corresponding symbol for the existential quantifier is "∃", a rotated letter "E", which stands for "there exists" or "exists".

Quantifiers are about quantity. Therefore the natural numbers (1, 5, 13) are just as important quantifiers.

Example sentence

In my NLI library I have set of integration tests that involve sentences in the field of human relationships. A month ago the following question came to mind:

Does every parent have 2 children?

and I knew my library could not handle this sentence. Why not? Suppose there are three parents in the database and each has 2 children. When I collect all these parents and children, I find that there are 6 of them. And 6 is not 2. So the library will say no while the answer should be yes.

The relational representation of the sentence at that time was

have(P) 
subject(P, S) parent(S) determiner(S, D1) isa(D, all)
object (P, O) child(O) determiner(O, D2) isa(D2, 2)

It must be rewritten to get rid of the "vague" (this is the official term) verb "have", because "have" does not have, and cannot have, a representation in the database.

have_child(S, O) 
parent(S) determiner(S, D1) isa(D, all)
child(O) determiner(O, D2) isa(D2, 2)

NLI Literature

I read about the subject in all the books and articles I currently have at my disposal and I learned that the LUNAR system (Woods) was the first milestone in handling quantification. CLE was the system that handled quantification most extensively.

Quantification is expressed in a sentence through determiners, quantifiers, more specifically, and there's such a thing as a quantifier phrase QP. Examples of quantifier phrases are

every
all
some
much
21
more than 2
between 2 and 5
one or two
at least three-fourths of

Quantifications are nested. The problem with this nesting is that it changes a relational representation from a simple unordered set to a tree structure of unordered sets. That's another reason I postponed it as long as I could. The other problem is that the order of nesting is not syntactically determined. In some cases quantifier A has wider scope than quantifier B and in other cases it is the other way around. It depends on the type of quantifiers. There are hard rules that govern this ordering, and preferences.

I finally stranded at the Core Language Engine as the most useful resource. But I found the chapter on quantifier scoping hard to digest. Thankfully, it was based on a separate article by Douglas B. Moran, a member of the CLE team, which was much better readable. I get happy whenever I think about it:

Quantifier Scoping in the SRI Core Language Engine - Douglas B. Moran

The Scoping Algorithm

I will now describe how an unscoped representation is transformed into a scoped representation. It is loosely based on the algorithm of CLE, but some details were not given in the papers and I had to fill in myself.

But before I do that, you must know about the concepts of this domain.

The Range is the set of entities that form the domain of the quantifier. In the example "every parent", the range is formed by all parents.
The Quantifier is the set of relations that were originally in the quantifier phrase. It is the "every" and the "2" in the example sentence.
The Scope of a quantifier is the set of relations where the quantifier applies.
A Quantification is a relation that binds the lot together.

Let's start with the earlier named example

have_child(S, O) 
parent(S) determiner(S, D1) isa(D1, all)
child(O) determiner(O, D2) isa(D2, 2)

I changed the determiner predicate into quantification (quantification(range-variable, range, quantifier variable, quantifier)), because it better reflects the meaning

have_child(S, O) 
parent(S) quantification(S, [], D1, []) isa(D1, all)
child(O) quantification(O, [], D2, []) isa(D2, 2)

In a separate step I place the range and the quantifier into the quantification. In this exceptional step that only applies to quantifications, the relations below the dp() phrase are placed in the fourth argument, and the relations below the nbar() phrase are placed in the second argument.

rule: np(E1) -> dp(D1) nbar(E1), sense: quantification(E1, [], D1, []);

the result

have_child(S, O) 
quantification(S, [ parent(S) ], D1, [ isa(D1, all) ]) 
quantification(O, [ child(O) ], D2, [ isa(D2, 2) ])

This is called the unscoped representation. In CLE it is called the quasi logical form and quantification is named qterm in CLE.

The next step is essential: it orders the quantifications based on hard rules and preferences. I currently have a very elementary implementation that sorts the quantifications using a sort function that takes two elements and determines their order

if both quantifiers are "all" leave the order as is
if the first quantifier is "all", it must go first
if the second quantifier is "all", swap it to go first

LUNAR and CLE have much more rules at this point. I have not needed them yet.

Next, the quantifications are nested in sorting order. At this point the quantifications are turned into quants, the term used in CLE. The arguments are like that of quantification, except that one is added: scope. The lower ordered quant is placed inside the scope of the higher ordered quant:

quant(S, [ parent(S) ], D1, [ isa(D1, all) ], [
     quant(O, [ child(O) ], D2, [ isa(D2, 2) ], [])
])

Finally, I place the unscoped relations inside the scope of the quants. I do this by going through these relations one by one. Its default scope is outside of the quants. Then I step through all nested scopes, from outside to inside. Whenever the range variable of the quant matches one of variables of the arguments of the unscoped relation, the scope changes to that new inner scope. The unscoped relation is added to the last chosen scope.

quant(S, [ parent(S) ], D1, [ isa(D1, all) ], [
     quant(O, [ child(O) ], D2, [ isa(D2, 2) ], [
        have_child(S, O)
    ])
])

Execution

After the quant relations are produced, they are executed differently from normal relations while processing a sentence.

To evaluate a quant, these steps are taken

Bind the range. The relations from the range (i.e. parent(S)) are bound to values in the database. This results in a set of bindings for the variable (S).
Bind the scope. The relations from the scope (i.e. the inner quant, and have_child()) are bound to values in the database. This results in a second set of bindings featuring among others the range variable (S).
Validation. The range bindings and the scope bindings are now evaluated through the use of the quantifier. The quant with quantifier "all" is only true if the number of range bindings equals that of the scope bindings. The quant with quantifier "2" is only true is the number of scope bindings equals 2.

Conclusion

Quantifier scoping handles the fact that the semantics of determiners cannot be created with rewrite rules alone. It makes the process of interpreting a sentence a lot more complex. But it is a necessary step in that it resolves some ambiguities. And above all, it is necessary for many aggregation functions that apply to quantities of temporary results. Scoping these quantifications is part of understanding the sentence.

[1] The Core Language Engine
[2] Representation and Inference for Natural Language