woensdag 15 mei 2019

Natural language interaction in flowcharts


In this post I will sketch some of my ideas about a generic natural language interaction program in the form of a flowchart.

Natural language interaction is any form of computer program that allows you to interact with it in a complex form of natural language (i.e. English, Chinese). It is a subject I have been working on for several years.

A flowchart visualizes the flow of control of a program. It consists mainly of interconnected tasks, but I added intermediate data structures (in blue) because of their special relevance.

What you're about to see is an attempt to unify the aspects of many historical nli-systems in a single system. It's still very sketchy, but I needed to have these thoughts visualized, because it allows me to have a clearer goal.

Check this wikipedia article if you're not sure about some of the flowchart symbols.

The main loop

The main loop of the program exists (as can be expected) of a sequence of asking for input, processing it, and displaying the output.

I split the processing part into analyzing and understanding. By analyzing I mean processing the textual string into a logical, context insensitive data structure (think predicate calculus). By understanding I mean processing this logical structure into a domain-specific representation, a speech act.

Both the analysis and understanding part may fail. Analysis is language dependent. If the program supports multiple languages and analysis in one language fails, the other languages may be tried. The language that succeeds will become the default language for the next input sentence. Understanding is domain specific. If the system supports multiple domains, and understanding fails for one domain, other domains may be tried.

The parts that look like this are called predefined processes. They are placeholders for other flowcharts, show below.

Analyse input

The analysis of input forms the standard pipeline of natural language processing.

Tokenization splits a string into words. Parsing creates a parse tree. Semantic analysis maps the forms to meaning. Quantifier scoping creates nested structures for quantifiers like "all". Pragmatic analysis fills in the variables like "this", "her".

The analysis phase needs information in the form of a lexicon (word forms), a grammar (rewrite rules) and semantic attachments (from word form and phrase to meaning structure).

Ask user

Any nontrivial system sometimes needs to ask the user to disambiguate between several possible interpretations. To do this, it is sensible to present the user with a multiple choice question (i.e. not an open question). This way, any answer the user gives is OK and will not cause a new problem in itself.

The added idea here is that a user should not be asked the same question more than once in a session. So any answer he/she gives is stored for possible later use. The store, "dialog context" may be used for pragmatic variables as well. The active subject of a conversation for example, that may be referred to as "he", "she" or "it", might well be stored here.

Speech act processing

This is the central component of an NLI system. The input is a logical structure that has been formatted in a way that makes sense in a certain domain. The term "speech act" fits this description well. The system then looks up a "solution", a way of handling, for this speech act. This solution consists of a goal (which is usually just the contents of the input, in a procedural form), and a way of presenting the results (it depends on the type of speech act and even per type of question). The latter one thus forms a goal in itself.

The red square is used for internal data structures. The one shown here lists a number of available solutions.

Process procedure

The assumption in this blog is that the internal processing of a speech act is done in a procedural way (think Prolog) where a goal is achieved by fulfilling is dependent subgoals.

The symbol

stands for a choice. One of these ways may be used to process a procedure.

The goal may just be to look up a fact in a domain database. Or it may involve reasoning or planning. Finally, it may require introspection into its own former goals. Advanced topic, only handled by SHRDLU. I apologize for the extremely sketchyness of this part. I have never programmed "planning", nor the introspective parts.

The take away of this diagram is that any goal may be achieved in many ways that all require intelligence. A goal may have sub goals and these sub goals can also have sub goals. The leaf nodes of such a structure consist of a simple database lookup, or a simple action. An action may update a database, or an actual action in the world (again SHRDLU being a perfect example).

Database query and update

Transforming a logical structure into a database query is an art in itself, but in a simple flowchart it just looks like this:


My approach to accessing a database is to not create a grand single query to fetch all information at once. In stead, I think it works better to have many elementary queries, selecting 1 field. No aggregations.

zondag 2 december 2018

Matching brackets

For my work as a developer, I sometimes get the chance to interview new developers for the company. I check their resumes and I review some of their code they send me. We thought it would also be a good idea to give them a little programming task to do in a limited amount of time. I had no idea what kind of task to choose, so I checked Google. The following problem showed up several times and I liked it because it requires some thinking to do it right and at the same time, it doesn't require a lot of typing.

Write a function that checks if the parentheses of an expression match.

The task involves several types of brackets: [], () and {}.  For example, the parenthesis in this expression match:
While in this expression they do not:
I wrote a test function (in PHP):
function test(string $exp, bool $expected)
    if (matchBrackets($exp) !== $expected) {
        echo "Error matching " . $exp . "\n";
and some tests
test("()", true);
test("()()", true);
test("noot()noot", true);
test("(aap)", true);
test("aap", true);
test("(", false);
test(")", false);
test("{[]}", true);
test("{[][]}", true);
test("{[]()}", true);
test("", true);
test("{[aap](({banaan}))}", true);
test("{[{[](({}){a[](({}))}){b[](({})){[](({c}))}}{[](({}))}}]({[]d(({}){[](({123}))})}({{[](({}))}}))}", true);
test("{[aap](({banaa))}", false);
test("{[aap]((anaan}))}", false);

Solving the problem in an elegant way requires you to use recursive nature of the domain.

While the developer was writing the program, I tried to figure it out for myself. And it occurred to me that next to some algorithmic options, there would also be a possibility to use a regular expression. But this could only be done by applying the advanced recursive patterns feature. I had heard of this feature, but never used it.

I would certainly not demand of the developer to give such an answer, but it intrigued me too much to let it go.

It is indeed possible, and so, without further ado, I present the answer here:
function matchBrackets(string $exp): bool
    return preg_match('/^
        # start of group 1
                ) |
                    \( (?1) \)
                ) |
                    \[ (?1) \]
                ) |
                    \{ (?1) \}
    $/x', $exp);

I will try to explain by building the expression from the ground up.
The "x" modifier strips all whitespace from the string. This allows you to make it more readable.
Match the beginning and end of $exp.
Create the group that will be used for recursion.
Each item in this group can be repeated any number of times. This also allows for the completely empty $exp.
Any nested group consists of a sequence of non-brackets, or a set of round brackets () or a set of squared brackets [] or a set of curly braces {}. Finally a "bracket" groups look like this:
\{ (?1) \}
It starts with a bracket (here "{"), followed by a recursive group, and ends with the matching bracket "}"..

A recursive group can repeat the entire expression. You would have to use (?R) in stead of (?1). Here it just matches a single group (group 1) from the expression. Group 1 is the group that starts with the first "(" in the expression. I do not use (?R) here because that would place the ^ and $ in the middle of the expression. That would never give a match.

End Note

It took me hours to get this expression just right, and as simple as possible. I expected the recursive expression to be complicated, but actually it turned out that it could be made to be pretty simple. And if you're interested: yes, it is fast. It was about 8 times as fast as the algorithmic variant I came up with. For a non-compiled language like PHP regular expressions open the door to the speed of a compiled language.

zondag 18 februari 2018

DAYDREAMER - A Landmark Cognitive Architecture

I posted this review before at Amazon
Source code of DAYDREAMER can be found on Github

This book is a revised version of Erik T. Mueller's 1987 dissertation. It was written in 1990 and republished in 2012, by the author himself.

It is a very important book in the discipline of Cognitive Science, as I will try to explain.

Daydreaming, the subject of the book, is the process of "the spontaneous activity - carried out in a stream of thought - of recalling past experiences, imagining alternative courses that a past experience might have taken, and imagining possible future experiences."

Mueller has gone way beyond that single subject, and worked out an impressive cognitive architecture that includes many mental faculties. He supports it with a massive base of research literature. Based on this architecture is the DAYDREAMER Lisp program.


At the core of the architecture are personal goals and needs. These include achievement goals: to maintain self esteem, to have friends, and so on; and cyclic goals: food, money, entertainment.

In order to achieve these personal goals, DAYDREAMER has knowledge about "persons, physical objects, locations, possession of physical objects, ... mental states, interpersonal relationships, actions, and activities" (the "interpersonal domain"). This knowledge is implemented as planning rules and inference rules.

The program also applies these rules to other entities than itself in order to represent and make inferences on the mental states of others. This is called "other planning".

"Most AI programs take a request from a human user, handle the request, and terminate...In contrast, DAYDREAMER has a collection of goals which are instantiated and processed as the program sees fit."

A task or activity on behalf of an active top-level goal is called a "concern". Several concerns may be active at any time. "Each subsequence of contiguous representations produced on behalf of a given concern may be called a 'daydream'".

The program uses emotions to determine which concern to choose at any time ("emotion-driven planning"). Several emotions are created in different phases of a concern. For example, when a concern terminates a new positive or negative emotion is created. These emotions may trigger new concerns.

The architecture allow for two types of creative mechanisms: Serendipity and Action Mutation. Serendipity-based planning uses new input, as a fortunate coincidence, to help progress concerns that are not currently active. Action Mutation modifies events and goals in arbitrary ways (mutation) in order to find novel solutions.

Events are stored in episodic memory.

Emotions, Daydreaming goals, learning, and creativity

  • Chapter 3 shows how to represent emotions in a clear way. It continues with the description of the personal goals that are specific to daydreaming: rationalization, roving, revenge, reversal, recovery, rehearsal, and repercussions. These goals are activated when a person is idle. "Daydreaming goals are heuristics for how to exploit surplus processing time in a useful way." They are activated by emotions and directed, among others, towards the reduction of negative emotional states, and the exploration of consequences of possible actions.
  • Chapter 4 show how a person learns from daydreaming, the effect on his or her future behavior. 
  • Chapter 5 dives into the creative processes serendipity and action mutation.


  • Chapter 6 describes the implementation of mental states and attitudes using Conceptual Dependency. 
  • Chapter 7 describes the implementation of DAYDREAMER rules and procedures in the GATE language, an AI program for the T language, a version of Scheme (Lisp). 
  • Chapter 8 describes possible episodic memory schemes, and Intersection Indexing, the technique used in DAYDREAMER.

Literature review, philosophical underpinnings, and conclusions

  • Chapter 9 is a review literature on daydreaming. 
  • Chapter 10 deals with philosophical questions. 
  • Chapter 11 describes possible applications of the architecture.


Mueller seems to have read every single book and paper every written on the subject, and all adjoining fields as well. Whenever he missed a book reference, and I checked, it turned out that that book hadn't been written at the time. Note that Mueller's dissertation was finished as early as 1987!

And even though it is thirty years later now, as far as I know (and I am happy to be proven wrong), this book can still be seen as the most complete description of a cognitive architecture to date.

zaterdag 13 januari 2018

My little NLI-GO DBPedia demo

With some pride (but not very much ;) I present to you my natural language interaction demo:


 This is the first presentable result of many years work of trying to create a library that allows a user (you) to interact with a database in natural language.

The language in this example is DBPedia, a database that is based on Wikipedia. It is a triple store that contains information about the people in Wikipedia, among other things.

My library tokenizes and parses the sentence and it converts the parse tree into a relational representation. A solution is looked up and executed. This involves a series of simple database requests (in this case Sparql queries). The results are integrated to more relations. From these relations a response in natural language is generated.

The point of my library is not to create a little interface to DBPedia. I am looking to make the interface to databases in general as simple and powerful as possible. This means that more databases, types of databases, domains, natural languages, etc, should be addable with few configuration files. No coding should be necessary eventually. It's a long way to go.

Nevertheless, I invite you to try these sentences with the demo, and check out the productions that will be shown. These are some of the steps the library takes to find the answer.

  • How many children had Lord Byron?
  • How many children had Michael Jackson?
  • Who married Elvis Presley?
  • Who married Donald Trump?
I hope you find this interesting.

zaterdag 9 september 2017

English question syntax

I took the time to find out what types of question syntax there are. Even though one probably needs only a couple of them, it's always good to have some sort of overview.

I had no idea there were so many forms. I learned many of them from this very helpful page! My English Teacher

But I started my search in the ever inspiring ...

The phrase structure rules presented below are my own. Some structures recur in several places. They may probably be grouped. For a more authoritative reference, see Stanford Parser

I use the following abbreviations:

  • EQ = identity (is, was, are, were)
  • BE = auxiliary (is, was, are, were)
  • DO = auxiliary (do, does, did)
  • HAVE = auxiliary (has, have)
  • MOD = modality (can, could, will, would, shall, should)
and these syntactic categories
  • NP = noun phrase (a "thing")
  • VP = verb phrase (an action, or more general, a predication)
  • ADJP = adjective phrase (i.e. red, little, happy)
  • ADVP = adverbial phrase (i.e. quickly, happily)
I grouped the questions by their response type, which I showed in each of the headings.

Factual yes/no-questions or choice questions (boolean / a selected object)

  • DO NP VP (did Lord Byron marry Queen Elisabeth, did Lord Byron not marry Queen Elisabeth)
  • DO NP VP (did Lord Byron marry Queen Elisabeth or Anne Isabella Milbanke)
  • EQ NP NP (was Lord Byron king of England, was Lord Byron not king of England)
  • EQ NP NP (was Lord Byron a king or a lord)
  • BE NP ADJP (is the block red)
  • BE NP ADJP (is the block red or blue)
  • BE NP VP (was Lord Byron born in London, was Lord Byron not born in London)
  • BE NP VP (was Lord Byron born in London or Cambridge)
  • HAVE NP VP (has Napoleon invaded Germany, has Napoleon not invaded Germany)
  • HAVE NP VP (has Napoleon invaded Germany or The Netherlands)
  • MOD NP VP (would you like a cup of coffee, should I leave my things here, can dogs fly, can i ask you a question, can you stack a cube on a pyramid)
  • MOD NP VP (would you like coffee or tea)

Uninverted yes/no questions (boolean)

  • NP VP (Lord Byron married Queen Elisabeth?) (question mark is required)

Wh-questions (which, what, who; name one or more individuals)

  • WHO VP (who married Lord Byron, who was Lord Byron's wife)
  • WHO BE NP VP (who are you seeing)
  • WHO DO NP VP (who does Pierre want to beat)
  • WHO HAVE NP VP (who have you been seeing)
  • WHO MOD VP (who can drive me home)
  • WHOM DO NP VP (whom do you believe)
  • WHOM HAVE NP VP (whom have you believed)
  • WHOM MOD NP VP (whom should I talk to)

    (with whom is Peter speaking)
  • --
  • WHICH NP VP (which countries border the mediterranean, which countries do not border the mediterranean)
  • WHICH BE NP (which is the best option)
  • WHICH DO NP VP (which do you do more often)
  • WHICH NP MOD NP VP (which way should I go)
  • WHAT NP VP (what rock sample contains most iron, what food items did you eat)
  • WHAT BE NP (what is the biggest block, what is your name)
  • WHAT DO NP VP (what do laptops cost)
  • WHAT HAVE NP VP (what has Churchill done to stop the war)
  • WHAT MOD NP VP (what should I do)
  • WHOSE NP VP (whose autographs have you collected, whose parents will drive)
  • WHOSE NP BE NP (whose book is that)

Amount (a number, requires aggregation)

  • HOW MANY NP VP (how many children had Lord Byron, how many children did Lord Byron have)

Degree (a number, the unit result depends on subject)

  • HOW MUCH NP VP (how much sugar goes in a single drink)
  • HOW ADJP BE NP (how high is the Mount Everest, how tall is the tallest man, how small is a mouse, how old are you)
  • HOW ADVP DO NP VP (how often do you go to the movies, how nicely do I need to dress tonight)

Manner (a means)

  • HOW BE NP VP (how was Napoleon crowned king)
  • HOW DO NP VP (how do you go to work)
  • HOW HAVE NP VP (how has Napoleon invaded Britain)
  • HOW MOD NP VP (how can I become more productive)

State (a state)

  • HOW BE NP (how are you)

Reason (a cause)

  • WHY BE NP VP (why was Napoleon crowned king)
  • WHY DO NP VP (why did Napoleon invade Germany)
  • WHY HAVE NP VP (why has John hit Jake)
  • WHY MOD NP VP (why should I go)

Time (a time)

  • WHEN BE NP (when was the marriage)
  • WHEN BE NP VP (when was Napoleon crowned king)
  • WHEN DO NP VP (when did you start wearing make up)
  • WHEN HAVE NP VP (when have you bought the tv)
  • WHEN MOD NP VP (when can I go home)

    WHEN -> WHEN PP (when in the next hour do you want to go)

Place (a place)

  • WHERE BE NP (where is it?)
  • WHERE BE NP VP (where is the concert taking place)
  • WHERE DO NP VP (where did you go)
  • WHERE HAVE NP VP (where has Sally gone)
  • WHERE MOD NP VP (where can I find a pub)

    WHERE -> WHERE PP (where on the map is it)

donderdag 31 augustus 2017

Query Optimization

When building a natural language interface system you must ask yourself if you are going to send many simple queries to the database, or a single complex query and have the database execute it a single run.

One Big Query vs Many Small Queries

For Nli-Go I chose to send many little requests to the database. Each request deals with a single relation and has no joins and aggregations. For a triple store such as SPARQL, each relation just has two arguments. In this case there are only four queries per relation, depending on the variables that were bound in advance:


There are major benefits of the small queries approach:
  • Very little knowledge is required of the idiosyncrasies of the database and of the database management system.
  • Given that the tables are properly indexed, all queries are quick.
  • It is possible to add extra logic to the user query not available from the database.
  • It is possible to combine the results of multiple databases for a single user query. These databases may have completely types.
There are also some major drawbacks, of course:
  • The system does not make use the processing power of the database.
  • The system need to do (very) many queries
  • More knowledge of the nli system is required.


To me the benefits outweigh the drawbacks, but in order to have a performing system, I need to deal with the fact that the database does not help in creating the fastest query execution.
This comes down to the fact that the Query Optimizer of the database is not used for the full user query. Mainly, this means that the queried relations are not presented to the database in the most efficient order. 
A simple example will make this clearer. Suppose the following two queries were created from a user input sentence "who married Lord Byron?"
SELECT 1    FROM NAME WHERE           A = ? AND B = "Lord Byron"
The first query asks for all A, B involved in a marriage relation. The second query asks for the ID of a person whose name is given. When executed in the shown order, the first query yields, say, 40.000 rows. Next, each of the resulting values is entered into the second query. The second query is executed 40.000 times, each time with different values. As you can see, even a simple set of relations gets out of control easily.
Since the database does not do query optimization, the nli system needs to perform this task.
David H.D. Warren wrote an interesting article about query optimization for CHAT-80, a Prolog based natural language interface. It's called
Efficient Processing of Interactive Relational Database Queries Expressed in Logic
His algorithm to determine the order of the simple queries consists of two steps:
  • Calculate the cost for each query
  • Order the queries by cost, lowest first.
The cost is calculated by taking the size of the relation, and dividing it by the product of the number of distinct values in each of the bound arguments. I will explain this with examples:
For the query
the cost is calculated as
where "size(PERSON)" is the number of rows in PERSON. Note that the query has no constraints.
For the query
the cost is calculated as
size(PERSON) / size_unique(ID)
Where "size(PERSON)" is the number of records in the table PERSON, and "size_unique(ID)" is the number of unique values in the column ID. Note that the query has one constraint: ID.
For the query
the cost is calculated as
size(PERSON) / ( size_unique(COUNTRY) * size_unique(AGE) )
Where "size(PERSON)" is the number of records in the table PERSON, and "size_unique(COUNTRY)" is the number of unique values in the column COUNTRY. Note that the query has two  constraints: COUNTRY and AGE.

The result

The cost of a bound NAME query is much lower than that for an unbound MARRIED_TO query, hence the NAME query will be scheduled for execution first. This query returns only a single row, and the second query will need to be executed only once.

In this algorithm, queries with many unbound variables are very costly. With more bound variables, the cost drops rapidly. It also takes into account that tables with less rows are cheaper than tables with many rows and that columns with many distinct values (higher cardinality) are cheaper than those with little distinct values.

It is a very simple, yet efficient way of optimizing conjunctive queries.

vrijdag 21 juli 2017

Intents, Slots and SHRDLU

Several things came together, lately. I read up on the chatbot blogs that flourish. For chatbots the concept of the intent of the user is important. The intent of a sentence is what the user means by it. But what struck me, even if it hadn't been said with so many words, is that the number of intents is restricted. Remember my examination of Siri's commands? These are typical intents, and their number is limited.

At first I was bored by this, because I want my AI to have unlimited possibilities. But then I saw its strength.

As it happened, I was looking for the best way to design an NLI (natural language interface), and for a definition of when an NLI is "complete". When is the design finished? This is an important question when you are building an NLI.

And I read Winograd and Flores' 1987 book "Understanding Computers and Cognition". A book that still needs to sink in with me, and which at first was actually pretty demotivating because it points at the limits of AI. But I already took one lesson home from it: computers are tools. They don't really understand anything. Everything needs to be told. You can't expect a computer to "get it" and continue on its own.

These things, together, brought me the idea of a finite set of intents, semantic structures that act like a set of services in an API. And that these intents are the place where the design of an NLI starts.

Designing from Intents

An NLI has these three layers:
  • Syntax
  • Semantics
  • Database language
When you start to design an NLI you could start out by defining which aspects of syntax you would like to support. But this leads quickly to over-engineering. Because the syntax, while complicated, is the easy part. Every part of the syntax is an input to the other parts of the system. What's more, syntax is never "complete". More syntax is always better.

But when you take semantics as a starting point, or more precise, a fixed set of intents, this changes. Taking this entrance, you ask yourself: what are the things the user actually wants from the database? And this turns out to be surprisingly limited.

Once the intents have been defined, syntax follows. Only syntactic structures that will actually be used need to be implemented. Also, a single intent can have multiple syntactic representations.

The Intents of Alexa

There are several commercial agents you can talk to these days, and that allow you to extend their standard functionality with custom code.

Amazon's Alexa (featured on devices like the Amazon Echo) allows you to create skills. A skill is a conversation subject, like the weather, movies, planning a trip, or whatever.

A skill consists of intents and utterances. An intent is a "lambda function" with a name and arguments. The arguments are called slots. Example intents and slots for the weather skill: WEATHER(date, location), RAIN(date, location), BARBEQUE(date). The functions can be written in several different languages, like JavaScript and Python.

While skills are defined on the semantic level, the syntactic level is defined by utterances. Each intent can have multiple utterances. Here are some utterances for RAIN(date, location): "is it going to rain {date}" "will it rain {date} in {location}" "do I need an umbrella?". Notice the slots. Slots have slot types. The slots named here have the date and location types.

If you want to read more about it, Amazon has detailed descriptions here. It is a lot of work! For a fun and friendly introduction, read Liz Rice her blogs.

The Intents of SHRDLU

I went on to test the idea of starting with intents. And what better place to start than the most complex NLI every built: SHRDLU. SHRDLU, built by Terry Winograd in 1969, expresses many NLI characteristics that have rarely been seen in later systems:
  • Questions about the domain model of the system: "Can a pyramid support a pyramid"
  • Question about the system's history and its decisions: "Why did you clear off that cube"
  • Introduction of new concepts: "A 'steeple' is a stack which contains two green cubes and a pyramid"
But back to intents. I took the example dialog from the book "Understanding Natural Language" by Winograd and tried to create an impression of what the intents of the sentences would be.

1. pick up a big red block
2. grasp the pyramid
3. find a block which is taller than the one you are holding and put it into the box
   KB:FIND?(Object) && KB:PUTIN!(Object, Object)
4. what does the box contain?
   KB:WHAT?(ObjectB) && KB:CONTAIN(ObjectA, ObjectB)
5. what is the pyramid supported by?
   KB:WHAT?(ObjectA) && KB:SUPPORT(ObjectA, ObjectB)
6. how many blocks are not in the box?
   KB:HOW_MANY?(ObjectB) && !KB:CONTAIN(ObjectA, ObjectB)
7. is at least one of them narrower than the one which I told you to pick up?
   KB:PRESENT?(ObjectA) && KB:NARROWER(ObjectA, ObjectB)
8. is it supported?
9. can the table pick up blocks?
   DM:CAN?(P) && P=KB:PICKUP(Object)
10. can a pyramid be supported by a block?
    DM:CAN?(P) && P=KB:SUPPORT(Object, Object)
11. can a pyramid support a pyramid?
    DM:CAN?(P) && P=KB:SUPPORT(Object, Object)
12. stack up two pyramids
    KB:STACKUP!(Object, Object)
13. the blue pyramid is mine
    UM:OWN(Person, Object)
14. I own blocks which are not red, but I don't own anything which supports a pyramid
    UM:OWN(Person, Object) && UM:OWN(Person, Object)
15. do I own the box?
    UM:OWN?(Person, Object)
16. do I own anything in the box?
    UM:OWN?(Person, Object)
17. will you please stack up both of the red blocks and either a green cube or a pyramid?
    KB:STACKUP!(Object, Object)
18. which cube is sitting on the table?
    KB:WHICH?(ObjectB) && KB:SUPPORT(ObjectA, ObjectB)
19. is there a large block behind a pyramid?
    KB:PRESENT?(Object) && KB:BEHIND(Object, Location)
20. put a small one onto the green cube which supports a pyramid
    KB:PUTON!(Object, Object)
21. put the littlest pyramid on top of it
    KB:PUTON!(Object, Object)
22. how many things are on top of green cubes?
    KB:HOW_MANY?(ObjectB) && KB:SUPPORT(ObjectA, ObjectB)
23. had you touched any pyramid before you put the green one on the little cube?
    TM:TOUCH?(Person, Object) && TM:AT(Time)
24. when did you pick it up?
    TM:WHEN?(E) && E=KB:PICKUP(Object)
25. why?
26. why did you do that?
    TM:WHY?(E) && E=TM:DO()
27. why did you clear off that cube?
    TM:WHY?(E) && E=KB:CLEAR_OFF(Object)
28. why did you do that?
    TM:WHY?(E) && E=TM:DO()
29. why did you do that?
    TM:WHY?(E) && E=TM:DO()
30. how did you do it?
    TM:HOW?(E) && E=TM:DO()
31. how many objects did you touch while you where doing it?
    KB:HOW_MANY?(Object) && TM:TOUCH(Object) && TM:AT(Time)
32. what did the red cube support before you started to clean it off?
    KB:WHAT?(Object) && KB:SUPPORT(Object, Object) && TM:AT(Time)
33. there were five blocks to the left of the box then.
    KB:PRESENT(Object, Location) && TM:AT(Time)
34. put the blue pyramid on the block in the box
    KB:PUTON!(Object, Object)
35. is there anything which is bigger than every pyramid but is not as wide as the thing that supports it?
    KB:PRESENT?(ObjectA) && KB:BIGGER(ObjectA, ObjectB) && !KB:WIDE(ObjectA, ObjectC) && KB:SUPPORT(ObjectA, ObjectC)
36. does a steeple
37. a "steeple" is a stack which contains two green cubes and a pyramid
    DM:DEFINE!(Word, Object)
38. are there any steeples now?
    KB:PRESENT?(Object, Time)
39. build one
40. call the biggest block "superblock"
    DM:DEFINE!(Word, Object)
41. have you picked up superblock since we began?
    TM:PICKEDUP?(Object) && TM:AT(Time)
42. why did you drop it?
    TM:WHY?(E) && E=KB:DROP(Object)
43. is there anything to the right of the red pyramid?
    KB:PRESENT?(Object, Location)
44. thank you

Here's a movie of such a SHRDLU interaction.

Many of the intents here are similar. I will now collect them, and add comments. But before I do, I must describe the prefixes I used for the modules involved:
  • KB: knowledge base, the primary database
  • DM: domain model, contains meta knowledge about the domain
  • UM: a model specific to the current user
  • TM: task manager
  • SM: smalltalk
The number of intents of a system should be as small as possible, to avoid code duplication. It should also be not smaller than that, to avoid conditional structures in the intents. I think an intent should not only be about the lambda function, but also about the type of output the user expects.

These then are the aspects of an NLI intent:

  • a precondition: where a simple invocation name is sufficient to wake up a skill, this will not do for a more complex NLI. A semantic condition is necessary.
  • a procedure: where a hardcoded function (the lambda function) is sufficient for a chatbot, an NLI should use parameterized procedures in a language like Datalog. Each step of the procedure must be manageable by a task manager.
  • an output pattern: different types of questions require different types of answers. It is possible that two intents only differ in the responses they give.

Verbs can be the syntactic representation of intents, but "why" can be an intent as well. I use the suffixes ! for commands, and ? for questions.

The KB (knowledge base) intents:
  • PICKUP!(Object): bool
  • STACKUP(Object, Object): bool
  • PUTON!(Object, Location) / PUTIN!(Object, Location): bool
    for these intents different procedures are activated that include a physical act that takes time. the system responds with an acknowledgement, or statement of impossibility
  • BUILD!(Object): bool
    build a compound object
  • FIND?(Object): Object
    search the space for an object with specifications. the object is used for another intent
  • PRESENT?(Object): bool
    like FIND?, the system responds with yes or no
  • WHAT?(Object): object
    like FIND?, but the system responds with a description that distinguishes the object from others
  • WHICH?(Object): object
    like WHAT?, but comes with a specific class of objects
  • HOW_MANY?(Object): number
    this intent involves counting objects, whereas the others just deal with a single instance. returns a number
  • SUPPORT?(Object): bool
    just checks if a certain declaration exists. many of these could exist, and perhaps they can be combined to a single intent PRESENT?(Predication)
The DM (domain model) intents:
  • CAN?(Predication)
    checks key points of the predication with a set of allowed interactions in the domain model
  • DEFINE!(Word, Object)
    maps a word onto a semantic structure that defines it
The UM (user model) intents:
  • OWN(Person, Object): bool
    mark the ownership relation between the user and the object
  • OWN?(Person, Object): bool
    checks if the ownership relation exists
The TM (task manager) intents:
  • TOUCH?(Person, Object)
    touch is an abstract verb that includes pick up, stack on, etc. It has a time frame.
  • PICKEDUP?(Object)
    checks if PICKUP was used
  • WHEN?(Event)
    returns a time index. It is described to the user as the event that happened then
  • WHY?(Event)
    returns the active goal of an event
  • WHY?(Goal)
    returns the parent goal of an goal. the top level goal is "because you asked me to"
The SM (smalltalk) intents:
    there is no meaning involved, just a canned response


I know I sound very deliberate in this post, that I know exactly what's going on and how it should be. That's just my way of trying to make sense of the world. In fact I just stumbled on this approach to NLI and its very new to me. Also my analysis of SHRDLU's "intents", which it never had, may be completely off track. One would only know if one tried to build a new SHRDLU around this concept.

I think chatbots form a great way to get involved in natural language processing.

zaterdag 8 juli 2017

Zen's Prime Directive

I am watching the seventies TV series Blake's 7 from DVD set once more. It's great drama, although some of the visual effects are now very outdated. One of the things that I like best are the human-computer dialogs. The author of the series,  Terry Nation, really got inside the skin of the computer.

There's one particular dialog that I want to write out here, since it illustrates an aspect of needs based agents that I was talking about. It is from the first series, episode 10, "Breakdown". It's a dialog between Blake and Zen, with a surprised remark from the technical genius, Avon, in between.
--- --- --- --- --- --- --- 
Blake: Zen. Set navigation computers for direct route to space laboratory XK72. Speed Standard by Six.
Zen: Rejected.
Avon: You CANNOT reject a direct command!
Blake: Justify that rejection, please.
Zen: Your command reduces to an order to self-destruct. This runs counter to Prime Directive.
For a visual, here's an image of Blake talking to Zen.

Blake gives Zen a command. Zen has command over all computers of the ship, it is the task manager. Blake tells Zen to delegate the command to the navigation computers of the ship.

Zen rejects. Note that rejection is a higher order function of a system. Most computers just do as they are told. Not Zen. This surprises Avon, who calls out that it is impossible for a computer to reject a command. To Avon Zen is "just a computer". But Zen is a needs based agent.

Blake asks Zen to justify the rejection. This is an introspective request. It asks Zen to inspect its own prior reasoning as an intelligent agent.

Zen explains its reasoning efficiently. He says that the command, which includes going through an part of space of which it is only known that it is extremely dangerous (it later appears to be some sort of vortex) may very likely destruct the ship. Zen checks its projected actions with its directives (in which I see the concept of needs). It finds that it runs counter to its Prime Directive (which is probably to stay alive). From this it rejects the command.

As Blake and his crew manually override Zen and continue to go through this part of space, Zen shuts down all of its functions. It refuses to cooperate. If Zen was more like HAL (in 2001, a space odyssey), it would have actively tried to counteract the actions of the humans.

Note that in Star Trek there's also a Prime Directive. There it is a guiding principle for space explorers not to interfere with alien civilizations. I consider both Zen's Prime Directive and that of Star Trek to be needs.

zondag 25 juni 2017

An introduction to CALO

From 2003 to 2008 the American Defense Advanced Research Projects Agency (DARPA) funded the largest research effort for cognitive systems in history. It involved 250 people from 25 universities and companies. Over 500 articles were written about it. It was named PAL, for Perceptive Assistant that learns. The aim of this program was described as follows
Through the PAL program, researchers will develop software that will function as an "enduring personalized cognitive assistant" to help decision-makers manage their complex worlds of multiple simultaneous tasks and unexpected events. 
The DARPA grant was divided over Carnegie Mellon University (CMU) and Stanford Research Institute (SRI).

The CMU project was dubbed RADAR, for Reflective Agents with Distributed Adaptive Reasoning. The system's eventual goal was "The system will help busy managers to cope with time-consuming tasks such as organizing their E-mail, planning meetings, allocating scarce resources such as office space, maintaining a web site, and writing quarterly reports. Like any good assistant, RADAR must learn by interacting with its human master and by accepting explicit advice and instruction." [PAL1]

The SRI project was named CALO (Cognitive Agent that Learns and Organizes). It involved contributions from 20 additional research institutions. Its goal was broader than RADAR's. "The name was inspired by the Latin word “calonis,” which means “soldier’s assistant.” The CALO software, which will learn by working with and being advised by its users, will handle a broad range of interrelated decision-making tasks that have in the past been resistant to automation. It will have the capability to engage in and carry out routine tasks, and to assist when the unexpected happens." [PAL1]

This article is just a short introduction to CALO. Most of its information is derived from SRI's PAL website [PAL2]. The website emphasizes the learning aspect of the framework. Here, I focus on the relation between the components.


To the user CALO consists of several applications. These applications however, are integrated into a common knowledge base, and governed by an autonomous agent that aims to assist the user in common tasks.

The following image from [CALO1] is an overview of the cognitive nature of CALO.

Machine learning ("learning in the wild") was an essential part of each CALO component. It had to be able to learn all kinds of new information without adding lines of code.

These are some of the types of learning strategies involved:

  • learning by instruction (Tailor)
  • learning by discussion (PLOW)
  • learning by demonstration (LAPDOG)
  • learning by observation (Markov Logic Nets)

Different CALO components used different ontologies, but all were designed around a core Component Library (CLib) "The CLib is encoded in the Knowledge Machine knowledge representation language, with a subset being automatically transformable into the popular Web Ontology Language (OWL) of the Semantic Web framework." [CALO-MA3]

[CALO1] Building an Intelligent Personal Assistant (2006)

Application Suite

CALO grew into of a suite of applications that groups around three centers:
  • PExA, the Project Execution Assistant
  • CALO-MA, the Meeting Assistant
  • IRIS, the Information Assistant
From a cognitive agent perspective, PExA is the central component.

PExA - Project Execution Assistant

"PExA focuses on two key areas: time management and task management. Time management refers to the process of helping a user manage actual and potential temporal commitments. Time management critically involves meeting or appointment scheduling but further includes reminder generation and workload balancing. Task management involves the planning, execution, and oversight of tasks. Such tasks may be personal in that they originate with the user, or they may derive from responsibilities associated with a project." [PEXA2]

PExA contains several tools that allow it to learn new procedures, that can be used by SPARK, the task manager.

PExA Architecture

PExA includes the following applications
  • SPARK, the Task Manager
  • PTIME, the Time Manager
  • EMMA, the Event Management Assistant (with PTIME)
  • EMP, the Execution Monitor and Predictor (ProPL)
  • Towel, the Task Management / ToDo UI
  • BEAM, the ToDo Interpreter
  • SEAR, for State Estimation
  • LEPT, the Duration Learner
  • Query Manager
  • ICEE, the Task Explainer
  • Machinetta, the Team Coordinator
  • Tailor, a Procedure Learner
  • EzBuilder, a Procedure Learner
  • LAPDOG, a Procedure Learner
  • PrimTL, a Procedure Learner
  • PLOW, a Procedure Learner
Each user had its own CALO. Machinetta took care of inter-CALO communication. [CALO1]

CALO-MA - Meeting Assistant

"The meeting assistant is designed to enhance a user’s participation in a meeting through mechanisms that track the topics that are discussed, the participants’ positions, and resultant decisions." [PEXA2]

"Our efforts are focused on assisting artifact producing meetings, i.e. meetings for which the intended outcome is a tangible product such as a project management plan or a budget." [CALO-MA1]

The Meeting Assisant used the Multimodal Discourse Ontology (MMD).

The meeting assistant architecture

CALO-MA includes the following applications
  • Dialog Manager
  • Meeting Browser
  • 2-D Drawing Recognizer
  • 3-D Gesture Recognizer

IRIS - Information Assistant

"IRIS is an application framework for enabling users to create a “personal map” across their office-related information objects. IRIS is an acronym for "Integrate. Relate. Infer. Share."" [IRIS1]

IRIS collects raw data from several other applications on the user's desktop, and integrates this data into it's own knowledge base, using CLib based ontologies.

The three layer IRIS integration framework

It contains a plugin framework and plugins for the following applications had been created:
  • Email (Mozilla)
  • Browser (Mozilla)
  • Calendar (OpenOffice Glow)
  • Chat (Jabber)
  • File Explorer (custom made)
  • Data editor/viewer (Personal Radar)
Adam Cheyer was the main contributor to IRIS. He left the CALO project in 2007 to cofound the CALO spin-off Siri. [SIRI1]

In 2007 IRIS, a Java application, was rewritten as a Windows application and renamed to CALO Express. [IRIS4]

CALO Express further contains the applications:
  • Presentation Assistant (for quick creation of presentations)
  • PrepPak (find, gather, and store relevant resources for common office tasks such as attending an upcoming meeting)
[IRIS1] IRIS: Integrate. Relate. Infer. Share. (2005)
[IRIS2] A Case Study in Engineering a Knowledge Base for an Intelligent Personal Assistant
[IRIS3] Extracting Knowledge about Users’ Activities from Raw Workstation Contents (2006)
[IRIS4] Adam Cheyer personal website

[SIRI1] SIRI RISING: The Inside Story Of Siri’s Origins (2013)


SPARK is the central component of PExA.

"At the heart of CALO’s ability to act is a Task Manager that initiates, tracks, and executes activities and commitments on behalf of its user, while remaining responsive to external events. The Task Manager component of CALO is based on a reactive execution system called SPARK" [SPARK2]

"There is a need for agent systems that can scale to real-world applications, yet retain the clean semantic underpinning of more formal agent frameworks. We describe the SRI Procedural Agent Realization Kit (SPARK), a new BDI agent framework that combines these two qualities." [SPARK1]

SPARK Architecture

SPARK is a BDI (Belief-Desire-Intention) agent. Its central function is described as "Each agent maintains a knowledge base (KB) of beliefs about the world and itself that is updated both by sensory input from the external world and by internal events. The agent has a library of procedures that provide declarative representations of activities for responding to events and for decomposing complex tasks into simpler tasks. At any given time the agent has a set of intentions, which are procedure instances that it is currently executing. The hierarchical decomposition of tasks bottoms out in primitive actions that instruct effectors to bring about some change in the outside world or the internal state of the agent. At SPARK’s core is the executor whose role is to manage the execution of intentions. It does this by repeatedly selecting one of the current intentions to process and performing a single step of that intention. Steps generally involve activities such as performing tests on and changing the KB, adding tasks, decomposing tasks by applying procedures, or executing primitive actions." [SPARK1]

[SPARK1] The SPARK Agent Framework (2004)
[SPARK2] Task Management under Change and Uncertainty Constraint Solving Experience with the CALO Project (2005)
[SPARK3] SPARK website
[SPARK4] Balancing Formal and Practical Concerns in Agent Design (2004)
[SPARK5] Introduction to SPARK (2004)

Ending Notes

I wrote this piece because I believe CALO is worth to be studied. It holds gems in various fields of AI. At the same time, the huge number of papers written on it may be daunting to the newcomer. I just hope this blog post has opened a door.

zondag 11 juni 2017

BDI and PRS, theory and practice

The NLI system CALO I am examining uses a BDI system (named SPARK) that is a successor to SRI's PRS. In order to understand, and because PRS is historically significant, it's worth the study.


A BDI system has its philosophical roots in the Belief Desire Intention theory of Michael E. Bratman. In this theory on human rationality, formulated in the eighties, he proposes a third mental attitude, next to Beliefs and Desires, namely Intentions. It's laid down in his book Intention, Plans, and Practical Reason (= IPPR)

BDI systems are attractive because of their ability to adapt their plans to new circumstances. Since they are rational agents, their behavior is rational and one can inspect their course of action to find out why each decision was made.

Michael E. Bratman

It's very important to get the concepts straight, so I will try some definitions myself. The most important concepts are Belief, Desire, Intention, Goal, Plan, and Action. Since this theory is about humans, the concepts should be expressed in human terms, not in computational terms.


A belief is a bit of information about something, hold true by the person. It may be information about the world or about the person itself. It is different from a fact in that it isn't necessarily really true, it is just hold as true by the person.

Inference-type beliefs can be used to form plans.

  • it rains
  • I am cold
  • bus line 4 will get me from home to Tanner Library
  • the use of my computer will heat up the room


A desire is a future state that the person wants to achieve. The desire itself can exist without the intent of actually executing it, there may not even be a plan or possibility to bring it about. Where I said "want" Bratman prefers the term "pro-attitude", a container that includes wanting, judging desirable, caring about, and others. (The future state is occasionally referred to as the goal.)

  • to go to Tanner Library
  • a milkshake for lunch
  • find the best price
  • go to the party
  • become rich

Conflicting Desires: Choice

Desires can be conflicting. The desire to drink milkshake conflicts with the desire to loose weight, for example. Or two desirable events take place at the same time. The person needs to choose between these desires.


An intention is the possible practical consequence of a desire. It is also a future state that the person wants to achieve, but the difference with desire is that the intention involves a commitment and a plan. The commitment implies that, barring interventions, the intention will be realized. The plan is simply a prerequisite to achieve it. The intention controls the execution of the plan, every step of the way. [IPPR, p16] An intention also has inertia, which means that that one intention won't just be dropped for the next one. It strives for the plan to be executed to completion.

  • to take bus line 4 from home to Tanner Library


A plan is the means by which an intention is executed. A plan is not just a general recipe, a scheme, or a procedure, but a concrete sequence of steps adapted to the specific circumstances at hand. A plan inherits the characteristics commitment and inertia from its intention. The plan may be partial, or incomplete. Plans are hierarchical. A plan consists of steps and subplans. [IPPR, p29]

  • take bus (line 4, from home, to Tanner) => go to the bus stop, enter the bus, pay the conductor, have a seat, get out at the nearest bus stop to the target, walk to the target


Planning is the process of taking some inference rules and plan schemes, and turning this into a plan. But there's a catch. Plans must be consistent, there must be no contradictions between one plan and another, nor between a plan and some belief.

  • I need a means to go to Tanner. This can be by bus or by car. But I am already planning to leave my car at home for Susan to use. [IPPR, p33]

Expected Side Effects

Side effects are the results of the actions following the intention, that are not part of the intention per sé. Some of these were purely accidental, some were expected. These side effects may well be undesirable for the person. When planning, a person needs to take into account that the side effects do not conflict with his desires (or that of other agents).


The term Goal does not have a special position in IPPR. It is not named in the first chapters. And when it occurs, in chapter 9, when dealing with other frameworks, it just refers to the object of desire, [IPPR, p131] or as an alias for desire [IPPR, p33].


PRS is the first implementation of BDI. It was written by Michael Georgeff, Amy L. Lansky,  François Félix Ingrand, and others at SRI. It was used for fault diagnosis on the Space Shuttle. I was interested in how it worked and I found a very clear document, Procedural Reasoning System, User's Guide, A Manual for Version 2.0. There's also a paper that appears to be the original document on PRS: Reactive Reasoning and Planning, which is exceptionally clear. I will use the naming from that early paper.

Amy L. Lansky and Michael Georgeff

PRS explicitly represents its goals, plans and actions. It is a reactive planner. It creates partial plans and is able to change them if new input comes in. But let's see how it implements the ideas of Bratman: BDI.


Beliefs are implemented as a database of state descriptions. Some beliefs are built-in. Others are derived by PRS in the course of action: observations and conclusions from these observations.

Procedural knowledge (which is a belief is BDI)  is implemented as KA (for Knowledge Area, a plan schema).


Desires are called goals. Goals do not represent static world states, but rather desired behaviors. Goals are collected on the goal stack.


Intentions are implemented as a process stack of active KAs.


Plans are called Active KAs, and this probably needs some explanation. Whereas Bratman focused on plans as active mental states, the focus of PRS is in the declarative definition of the procedures. This leads to the following mapping:


  • procedural belief -> KA / procedure
  • plan -> Active KA
The body of a KA consists of sequences of subgoals to be achieved. A KA also has an invocation condition, a logical expression that describes under which conditions it is useful.


PRS contains a large amount of procedural knowledge. This is knowledge in the form: to do A, follow steps B, C, and D. In such a procedure each of the steps can itself be a subgoal that will in turn need other procedures. This procedural knowledge is stored in KAs (Knowledge Areas). KAs are hierarchical, partial plan schemas.

Each goal is executed by applying one KA to the variables of the goal. Thus, an applied KA is formed.

PRS' System Interpreter only expands a subgoal of an active KA when it is the active step. It does not plan ahead. It does not start expanding subgoals in order to make the plan more complete. And because it does not plan ahead, it does not need to adjust its plans given changing circumstances.

Sometimes planning ahead is necessary, of course. When planning a route, for example. PRS does not use its central planning system to plan ahead. For the route planner, a KA has been that essentially consists of these steps
  • create a route-plan from here to T (and place it in the database)
  • convert the route-plan in a sequence of subgoals
  • execute the next subgoal (one subgoal at a time) until it is done

Expected Side Effects

The side effects of the KAs must be considered by their designer.

The System Interpreter

The interpreter "runs the entire system". Each cycle it performs these tasks:
  • selection of a new KA, based on goals and beliefs, and placing it on the process stack
  • executing the next KA from the process stack
Executing a KA causes
  • the creation of new beliefs (added to the database)
  • the creation of new subgoals (added to the process stack)
  • the creation of new goals (added to the goal stack) 

Final Words

BDI is cool and PRS is fascinating. 

Note: planning ahead and plan execution are two different things. Some tasks require some planning ahead, like making a reservation. An agent must plan ahead while it is executing this plan and other plans. The two tasks are done by different subsystems of the agent.

Bratman does not deal with the priorities of desires in IPPR. Priorities may be assigned to intentions in PRS, however.

zaterdag 3 juni 2017

Some Thoughts on the Cognitive Agent

I will use this post to try and structure my incoherent thoughts about the structure of cognitive agents.

The Theater

At the heart of the cognitive agent should be a theater, as proposed by Bernard J. Baars in his Global Workspace Theory. The theater represents human working memory, conscious activity (the spotlight on the stage) working serially, and the many unconscious processes working in parallel.

In this theater, decisions are made. Tasks are selected. It is a Task Manager.


The agent has needs. These are hard-coded like the need to help the user fulfill its goals or the need for self preservation. These needs define the soul of the agent. 

The idea of needs is from Maslow, designed for humans, but can be applied to other agents too, as this image illustrates:

The need of helping the user looks like this:
The number of unresolved user requests should be 0.
Importance: 0.8
The needs of an agent drive it. They determine what goals it will create and their relative importance.

"Needs" may alternatively be called "drives" or "concerns".


Needs lead to goals, based on input. If the user asks a question, for example, this will be interpreted as a user request:
User Request: "Is there a restaurant here?"
Urgency: 0.2
If the user makes a command, it will be treated the same way:
User Request: "Send me an e-mail if anyone mentions my website"
Urgency: 0.1
The question named here is a request that may be answered immediately. The command, however, becomes a permanent request.

The agent does not execute the user's request just because it's there. Executing the request just becomes a goal because one of the agent needs is to fulfill the requests of its user.

The agent has sensors that notice that its needs change. These sensors create active goals (instantiated from a template, its variables bound). These goals are entered into the Task Manager.


In order to reach a goal, the agent needs to follow a plan. In the restaurant example, the agent may have a plan like this. It consists of these steps.

  • Find current location (L)
  • Find user food preference (P)
  • Find restaurants of type P within distance D of L
  • Present the restaurants to the user

The plan is instantiated and handed over with the goal to the Theater. Note: no task is treated as something that may be executed immediately, synchronously. Every task needs to go through the theater.

Note that all cognitive activity takes the form of plans and tasks. Yes, that includes all deduction tasks. It presumes a Datalog type of knowledge representation, a logical form that is procedural in nature.
goal :- location(L), preference(P), restaurants(P, L, R), show(R).

Theater: Task Selection

The Theater, as a Task Manager, keeps track of all active plans. Each active plan has an active task.
At any point multiple active tasks may "scream for attention".

Now, the system as a whole may do many things simultaneously. But the thing is, each of its modules can only do one thing at a time. So when a given module is busy, it cannot handle another task at that time.

The Task Manager knows which modules are active and which ones are idle.

All active tasks scream with a loudness of:
Priority = Importance * Urgency
The task with the highest priority is selected by the Task Manager. This task is placed in the spotlight. TM asks all idle system modules to handle the task. The first one gets it. Or the one that bids most (as in an auction).

The Task Manager hands the task over to the module, and places it on the waiting list. TM will receive a "done" event from the module later, at which point it binds the result values to the active plan and advances the active plan to the next task. When all tasks are done, the active plan is removed.

Dialog Manager

The task "find user preference for restaurant" may have two implementations:
  • find preference in user database
  • ask user for preference
If the first task fails, because the preference has not been asked before, the second task will be performed. It is performed by the Dialog Manager. The DM is "just another module" of the system.

As soon as the "ask user for preference" task enters the Dialog Manager, it becomes active, and will remain active until it has received a reply from the user. This way, the system will not ask several questions at once.

Likewise, if the user has just asked something, DM will start active mode, and will first reply to the user before starting a question of its own.

The Dialog Manager may also initiate conversation on its own, based on a dialog plan. It cannot just start by itself, however. It needs to send a goal and a plan to the Task Manager, and request attention.

Episodic Memory

All active goals, plans and tasks are logged in episodic memory by the Task Manager. Or EM simply listens to TM. Storage occurs in a structured way. For every task, it is possible to retrieve its plan, its goal, and, ultimately, its need.

The reason for this is that Episodic Memory thus becomes a knowledge source for auditing. It allow the user to ask the question: "Why did you do that?"

This is also an important reason why all tasks need to go through the Task Manager. This way all activated tasks are accounted for in a central place.


Remember the sensors that register that needs change? They can start emotions too. To do that, they use two types of parameters:
  • digression from the norm
  • time left to deadline
The norm in the example of assisting the user is 0 requests. If the number of requests becomes larger, like 5 or so, the need to assist the user becomes strained. In order to correct it, the importance of the need is temporarily increased. Remember the "importance" of a need? This is not a fixed value; it has a base value and increases as the need is strained.

If this does not sound very emotional, that's because it it just the arousal aspect of emotion. It causes one to become more excited in a certain area. Arousal is involuntary, it does not need to go through the theater to become active.

It is also possible to give the agent emotional expressions. It may start yelling to hurry up or to shut the pod bay doors. These expressions become (very) conscious and need to pass through the Theater in order to become active.

I suggest reading The Cognitive Structure of Emotions if you want to know more.

In Closing

I made some progress. Things are starting to make sense.

Natural language interaction in flowcharts

Flowchart In this post I will sketch some of my ideas about a generic natural language interaction program in the form of a flowchart. N...