At first I was bored by this, because I want my AI to have unlimited possibilities. But then I saw its strength.
As it happened, I was looking for the best way to design an NLI (natural language interface), and for a definition of when an NLI is "complete". When is the design finished? This is an important question when you are building an NLI.
And I read Winograd and Flores' 1987 book "Understanding Computers and Cognition". A book that still needs to sink in with me, and which at first was actually pretty demotivating because it points at the limits of AI. But I already took one lesson home from it: computers are tools. They don't really understand anything. Everything needs to be told. You can't expect a computer to "get it" and continue on its own.
These things, together, brought me the idea of a finite set of intents, semantic structures that act like a set of services in an API. And that these intents are the place where the design of an NLI starts.
Designing from Intents
An NLI has these three layers:- Syntax
- Semantics
- Database language
When you start to design an NLI you could start out by defining which aspects of syntax you would like to support. But this leads quickly to over-engineering. Because the syntax, while complicated, is the easy part. Every part of the syntax is an input to the other parts of the system. What's more, syntax is never "complete". More syntax is always better.
But when you take semantics as a starting point, or more precise, a fixed set of intents, this changes. Taking this entrance, you ask yourself: what are the things the user actually wants from the database? And this turns out to be surprisingly limited.
Once the intents have been defined, syntax follows. Only syntactic structures that will actually be used need to be implemented. Also, a single intent can have multiple syntactic representations.
The Intents of Alexa
There are several commercial agents you can talk to these days, and that allow you to extend their standard functionality with custom code.
Amazon's Alexa (featured on devices like the Amazon Echo) allows you to create skills. A skill is a conversation subject, like the weather, movies, planning a trip, or whatever.
A skill consists of intents and utterances. An intent is a "lambda function" with a name and arguments. The arguments are called slots. Example intents and slots for the weather skill: WEATHER(date, location), RAIN(date, location), BARBEQUE(date). The functions can be written in several different languages, like JavaScript and Python.
While skills are defined on the semantic level, the syntactic level is defined by utterances. Each intent can have multiple utterances. Here are some utterances for RAIN(date, location): "is it going to rain {date}" "will it rain {date} in {location}" "do I need an umbrella?". Notice the slots. Slots have slot types. The slots named here have the date and location types.
If you want to read more about it, Amazon has detailed descriptions here. It is a lot of work! For a fun and friendly introduction, read Liz Rice her blogs.
The Intents of SHRDLU
I went on to test the idea of starting with intents. And what better place to start than the most complex NLI every built: SHRDLU. SHRDLU, built by Terry Winograd in 1969, expresses many NLI characteristics that have rarely been seen in later systems:- Questions about the domain model of the system: "Can a pyramid support a pyramid"
- Question about the system's history and its decisions: "Why did you clear off that cube"
- Introduction of new concepts: "A 'steeple' is a stack which contains two green cubes and a pyramid"
1. pick up a big red block KB:PICKUP!(Object)
2. grasp the pyramid KB:PICKUP!(Object)
3. find a block which is taller than the one you are holding and put it into the box KB:FIND?(Object) && KB:PUTIN!(Object, Object)
4. what does the box contain? KB:WHAT?(ObjectB) && KB:CONTAIN(ObjectA, ObjectB)
5. what is the pyramid supported by? KB:WHAT?(ObjectA) && KB:SUPPORT(ObjectA, ObjectB)
6. how many blocks are not in the box? KB:HOW_MANY?(ObjectB) && !KB:CONTAIN(ObjectA, ObjectB)
7. is at least one of them narrower than the one which I told you to pick up? KB:PRESENT?(ObjectA) && KB:NARROWER(ObjectA, ObjectB)
8. is it supported? KB:SUPPORT?(Object)
9. can the table pick up blocks? DM:CAN?(P) && P=KB:PICKUP(Object)
10. can a pyramid be supported by a block? DM:CAN?(P) && P=KB:SUPPORT(Object, Object)
11. can a pyramid support a pyramid? DM:CAN?(P) && P=KB:SUPPORT(Object, Object)
12. stack up two pyramids KB:STACKUP!(Object, Object)
13. the blue pyramid is mine UM:OWN(Person, Object)
14. I own blocks which are not red, but I don't own anything which supports a pyramid UM:OWN(Person, Object) && UM:OWN(Person, Object)
15. do I own the box? UM:OWN?(Person, Object)
16. do I own anything in the box? UM:OWN?(Person, Object)
17. will you please stack up both of the red blocks and either a green cube or a pyramid? KB:STACKUP!(Object, Object)
18. which cube is sitting on the table? KB:WHICH?(ObjectB) && KB:SUPPORT(ObjectA, ObjectB)
19. is there a large block behind a pyramid? KB:PRESENT?(Object) && KB:BEHIND(Object, Location)
20. put a small one onto the green cube which supports a pyramid KB:PUTON!(Object, Object)
21. put the littlest pyramid on top of it KB:PUTON!(Object, Object)
22. how many things are on top of green cubes? KB:HOW_MANY?(ObjectB) && KB:SUPPORT(ObjectA, ObjectB)
23. had you touched any pyramid before you put the green one on the little cube? TM:TOUCH?(Person, Object) && TM:AT(Time)
24. when did you pick it up? TM:WHEN?(E) && E=KB:PICKUP(Object)
25. why? TM:WHY?(E)
26. why did you do that? TM:WHY?(E) && E=TM:DO()
27. why did you clear off that cube? TM:WHY?(E) && E=KB:CLEAR_OFF(Object)
28. why did you do that? TM:WHY?(E) && E=TM:DO()
29. why did you do that? TM:WHY?(E) && E=TM:DO()
30. how did you do it? TM:HOW?(E) && E=TM:DO()
31. how many objects did you touch while you where doing it? KB:HOW_MANY?(Object) && TM:TOUCH(Object) && TM:AT(Time)
32. what did the red cube support before you started to clean it off? KB:WHAT?(Object) && KB:SUPPORT(Object, Object) && TM:AT(Time)
33. there were five blocks to the left of the box then. KB:PRESENT(Object, Location) && TM:AT(Time)
34. put the blue pyramid on the block in the box KB:PUTON!(Object, Object)
35. is there anything which is bigger than every pyramid but is not as wide as the thing that supports it? KB:PRESENT?(ObjectA) && KB:BIGGER(ObjectA, ObjectB) && !KB:WIDE(ObjectA, ObjectC) && KB:SUPPORT(ObjectA, ObjectC)
36. does a steeple ---
37. a "steeple" is a stack which contains two green cubes and a pyramid DM:DEFINE!(Word, Object)
38. are there any steeples now? KB:PRESENT?(Object, Time)
39. build one KB:BUILD!(Object)
40. call the biggest block "superblock" DM:DEFINE!(Word, Object)
41. have you picked up superblock since we began? TM:PICKEDUP?(Object) && TM:AT(Time)
42. why did you drop it? TM:WHY?(E) && E=KB:DROP(Object)
43. is there anything to the right of the red pyramid? KB:PRESENT?(Object, Location)
44. thank you SM:THANK_YOU()
Here's a movie of such a SHRDLU interaction.
Many of the intents here are similar. I will now collect them, and add comments. But before I do, I must describe the prefixes I used for the modules involved:
- KB: knowledge base, the primary database
- DM: domain model, contains meta knowledge about the domain
- UM: a model specific to the current user
- TM: task manager
- SM: smalltalk
These then are the aspects of an NLI intent:
- a precondition: where a simple invocation name is sufficient to wake up a skill, this will not do for a more complex NLI. A semantic condition is necessary.
- a procedure: where a hardcoded function (the lambda function) is sufficient for a chatbot, an NLI should use parameterized procedures in a language like Datalog. Each step of the procedure must be manageable by a task manager.
- an output pattern: different types of questions require different types of answers. It is possible that two intents only differ in the responses they give.
Verbs can be the syntactic representation of intents, but "why" can be an intent as well. I use the suffixes ! for commands, and ? for questions.
The KB (knowledge base) intents:
- PICKUP!(Object): bool
- STACKUP(Object, Object): bool
- PUTON!(Object, Location) / PUTIN!(Object, Location): bool
for these intents different procedures are activated that include a physical act that takes time. the system responds with an acknowledgement, or statement of impossibility - BUILD!(Object): bool
build a compound object - FIND?(Object): Object
search the space for an object with specifications. the object is used for another intent - PRESENT?(Object): bool
like FIND?, the system responds with yes or no - WHAT?(Object): object
like FIND?, but the system responds with a description that distinguishes the object from others - WHICH?(Object): object
like WHAT?, but comes with a specific class of objects - HOW_MANY?(Object): number
this intent involves counting objects, whereas the others just deal with a single instance. returns a number - SUPPORT?(Object): bool
just checks if a certain declaration exists. many of these could exist, and perhaps they can be combined to a single intent PRESENT?(Predication)
The DM (domain model) intents:
- CAN?(Predication)
checks key points of the predication with a set of allowed interactions in the domain model - DEFINE!(Word, Object)
maps a word onto a semantic structure that defines it
The UM (user model) intents:
- OWN(Person, Object): bool
mark the ownership relation between the user and the object - OWN?(Person, Object): bool
checks if the ownership relation exists
- TOUCH?(Person, Object)
touch is an abstract verb that includes pick up, stack on, etc. It has a time frame. - PICKEDUP?(Object)
checks if PICKUP was used - WHEN?(Event)
returns a time index. It is described to the user as the event that happened then - WHY?(Event)
returns the active goal of an event - WHY?(Goal)
returns the parent goal of an goal. the top level goal is "because you asked me to"
The SM (smalltalk) intents:
- THANK_YOU()
there is no meaning involved, just a canned response
Closing
I know I sound very deliberate in this post, that I know exactly what's going on and how it should be. That's just my way of trying to make sense of the world. In fact I just stumbled on this approach to NLI and its very new to me. Also my analysis of SHRDLU's "intents", which it never had, may be completely off track. One would only know if one tried to build a new SHRDLU around this concept.
I think chatbots form a great way to get involved in natural language processing.