zaterdag 25 januari 2020

Combining the data of multiple databases in a single natural language query

In our time we have at our disposal a growing number of open data sources. We can query these individually, and that's nice. But wouldn't it be awesome if we could make queries that combined information from diverse data sources at once?

Suppose that you have two or open data databases at your disposal. One contains information about the relations between famous persons, the other about books and authors. What if you were to ask:
Did Charles Darwin's grandfather write "The Loves of the Plants"?

Database 1 might hold

parent(`a11`, `a34`)
parent(`a34`, `a71`)
name(`a11`, "Erasmus Darwin")
name(`a71`, "Charles Darwin")

and database 2 might hold

author(`x21-r11`, `w909`)
person(`x21-r11`, "Darwin", "Erasmus")
work(`w909`, "The Loves of the Plants")

You can see that one part of the information necessary to answer the question is available in database 1, while the order part is present in database 2, but it is not clear how these parts could be combined. Is it possible to combine these two to answer the question?

Shared identities

I think the answer to this problem can be found in what I will call shared identities: database independent ids that can be used to reason about entities that are present in more than one database.

The shared identity consists of an id and a number of database ids. The shared identities that would be used in the example above could be

[type: person, shared-id: darwin-105, db-ids: {db1: a71}]
[type: person, shared-id: darwin-99, db-ids: {db1: a34}]
[type: person, shared-id: darwin-21, db-ids: {db1: a11, db2: x21-r11}]
The shared identity has an id ("darwin-21") that is present in neither database, yet it can be used to populate the query. At the same time the ids of the databases where this entity is represented are available ("db1", "db2").

How does that work?

When the question is processed, a system will first look up the names "Charles Darwin" and "The Loves of the Plants" in the databases. This results in "a71" from db1 and "w909" from db2.

Next, the system will look up the shared identities from these two entities. Now most entity types do not have shared identities. "Work" for example only occurs in database 2, so there's no need for an extra super-database construct. The id from db2, "w909", can be used as shared identity for the work "The loves of the Plants". Persons, on the other hand, are represented in both databases, and so we look up the shared identity for "a71", which is "darwin-105".

Our query Did Charles Darwin's grandfather write "The Loves of the Plants"? will then look simply like this:

parent(E1, E2)
parent(E2, `darwin-105`)
author(E1, `w909`)

Or: did the parent of the parent of `darwin-105` author `w909`?

When this query is processed, the system will present the relations to the underlying databases.

parent(E2, `darwin-105`)
 contains a shared identity. It's id's will be looked up and be replaced before the relation is passed to the database.

parent(E2, `a71`)
The database will return a34 and the system will look up its shared id: `darwin-99`. E2 will be bound to `darwin-99`.

The system will work with shared identities everywhere, except when it deals with an individual database directly.

parent(E1, E2)
will then be

parent(E1, `darwin-99`)

and E1 will be found to be `darwin-21`. In order to resolve author(`darwin-21`, `w909`), `darwin-21` will need to transformed into the database id `x21-r11`.

Who creates the shared identities?

Shared identities are stored as mapping tables: database-id <==> shared id, one table for each entity type in a database.

Only entities that live in multiple databases need shared identities. For entities that live in a single database we can still use the term "shared identity" but only use the database id. No mapping tables are used.

The mapping tables can be constructed manually or be generated by rules or heuristics. It must be done off-line, not during the query, because it takes too much time.

The tables used in the example above can be constructed as follows: for each person, take the name from db1, check if the same name occurs in db2, by appending first name and last name. If so, add an entry to the person mapping tables of db1 and db2. Take the unique database id. Make up a shared identity, it doesn't matter what.

db1: darwin-21 = a11
db2: darwin-21 = x21-r11

If a person occurs in only one database, just add the mapping for that single database.

woensdag 16 oktober 2019

Translation in PHP using class constants

Translation in PHP frameworks is done in several different ways, but in all of them (?) the source text is a string. The translations are stored in CSV files, .mo files or in PHP arrays.

I propose to store them in class constants. This makes loading translations faster and creates a better integration with the IDE.

Translation classes

In this proposal a translation source file looks like this:

class CatalogTranslator extends Translator
    const price = "Price";
    const name = "Name";
    const nTimes = "%s times";

Each module has its own source file. This one belongs to the module Catalog. Note that we now have a source (i.e. nTimes) and a default translation ("%s times"in English, in this application).

The translation file (for the main German locale) is like this:
class CatalogTranslator_de_DE extends CatalogTranslator
    const price = "Preis";
    const nTimes = "%s Mal";
Note that the translation file extends the source file. This translation here has two translations, but lacks a translation for "name". That's ok; the value is inherited from its base class.

Translation calls

A function that needs translation functionality starts by creating an instance of a translation class for this module:
$t = CatalogTranslator::resolve();
And here is an example of an actual translation call:
There it is: $t::name. It is a constant from a class that was resolved by the CatalogTranslator::resolve() call. When the request's current locale is de_DE, the class will be CatalogTranslator_de_DE; when the locale is fr_FR it will be CatalogTranslator_fr_FR. Which of the classes is selected is determined by the resolution function resolve.

Translator class resolution

Did you notice that CatalogTranslator extends the class Translator? This base class contains the static resolve function that determines the runtime translation class:
class Translator
     * @return $this
    public static function resolve() {
        return $xyz->getTranslator(static::class);
The function returns $this, which tells the IDE that an object of class CatalogTranslator is returned. A subclass of it, in fact.

$xyz stands for some object in your framework that defines the getTranslator function. Important is only that the runtime class (static::class) is passed to determine the name of the active translation class.

The function getTranslator looks like this:
class $Xyz
    protected $translatorClasses = [];

    public function getTranslator(string $class): Translator
        if (!array_key_exists($class, $this->translatorClasses)) {
            $locale = $xyz->getLocale();
            $translatorClass = $class . '_' . $locale;
            $this->translatorClasses[$class] = new $translatorClass();
        return $this->translatorClasses[$class];
As you can see the resolver fetches the active locale from the request (again your framework will have a different approach) and simply append it to the runtime classname of the translation:

CatalogTranslator + de_DE = CatalogTranslator_de_DE

Advantages and disadvantages

An advantage of this approach is that the translations do not need to be loaded in memory each new request. They already reside in the opcache. Another is that translations are clearly scoped to the module they are defined in. No naming conflicts with other modules. Further, this approach is IDE friendly. Just control-click on the constant and you are in the source file. In this file you can find out where this source text is used (Find Usages) and which translations there are (Is overridden) using standard IDE functionality.

Disadvantages: the use of PHP constants to store translations will not be familiar to developers and translation agencies. It is easier to just type a source text directly in source code than to have to define a constant. Also, module-scoped translations mean that you may have to translate the same text several times in different modules. But if this is your only concern, you can adapt the approach into a single set of system-wide translation classes.

woensdag 15 mei 2019

Natural language interaction in flowcharts


In this post I will sketch some of my ideas about a generic natural language interaction program in the form of a flowchart.

Natural language interaction is any form of computer program that allows you to interact with it in a complex form of natural language (i.e. English, Chinese). It is a subject I have been working on for several years.

A flowchart visualizes the flow of control of a program. It consists mainly of interconnected tasks, but I added intermediate data structures (in blue) because of their special relevance.

What you're about to see is an attempt to unify the aspects of many historical nli-systems in a single system. It's still very sketchy, but I needed to have these thoughts visualized, because it allows me to have a clearer goal.

Check this wikipedia article if you're not sure about some of the flowchart symbols.

The main loop

The main loop of the program exists (as can be expected) of a sequence of asking for input, processing it, and displaying the output.

I split the processing part into analyzing and understanding. By analyzing I mean processing the textual string into a logical, context insensitive data structure (think predicate calculus). By understanding I mean processing this logical structure into a domain-specific representation, a speech act.

Both the analysis and understanding part may fail. Analysis is language dependent. If the program supports multiple languages and analysis in one language fails, the other languages may be tried. The language that succeeds will become the default language for the next input sentence. Understanding is domain specific. If the system supports multiple domains, and understanding fails for one domain, other domains may be tried.

The parts that look like this are called predefined processes. They are placeholders for other flowcharts, show below.

Analyse input

The analysis of input forms the standard pipeline of natural language processing.

Tokenization splits a string into words. Parsing creates a parse tree. Semantic analysis maps the forms to meaning. Quantifier scoping creates nested structures for quantifiers like "all". Pragmatic analysis fills in the variables like "this", "her".

The analysis phase needs information in the form of a lexicon (word forms), a grammar (rewrite rules) and semantic attachments (from word form and phrase to meaning structure).

Ask user

Any nontrivial system sometimes needs to ask the user to disambiguate between several possible interpretations. To do this, it is sensible to present the user with a multiple choice question (i.e. not an open question). This way, any answer the user gives is OK and will not cause a new problem in itself.

The added idea here is that a user should not be asked the same question more than once in a session. So any answer he/she gives is stored for possible later use. The store, "dialog context" may be used for pragmatic variables as well. The active subject of a conversation for example, that may be referred to as "he", "she" or "it", might well be stored here.

Speech act processing

This is the central component of an NLI system. The input is a logical structure that has been formatted in a way that makes sense in a certain domain. The term "speech act" fits this description well. The system then looks up a "solution", a way of handling, for this speech act. This solution consists of a goal (which is usually just the contents of the input, in a procedural form), and a way of presenting the results (it depends on the type of speech act and even per type of question). The latter one thus forms a goal in itself.

The red square is used for internal data structures. The one shown here lists a number of available solutions.

Process procedure

The assumption in this blog is that the internal processing of a speech act is done in a procedural way (think Prolog) where a goal is achieved by fulfilling is dependent subgoals.

The symbol

stands for a choice. One of these ways may be used to process a procedure.

The goal may just be to look up a fact in a domain database. Or it may involve reasoning or planning. Finally, it may require introspection into its own former goals. Advanced topic, only handled by SHRDLU. I apologize for the extremely sketchyness of this part. I have never programmed "planning", nor the introspective parts.

The take away of this diagram is that any goal may be achieved in many ways that all require intelligence. A goal may have sub goals and these sub goals can also have sub goals. The leaf nodes of such a structure consist of a simple database lookup, or a simple action. An action may update a database, or an actual action in the world (again SHRDLU being a perfect example).

Database query and update

Transforming a logical structure into a database query is an art in itself, but in a simple flowchart it just looks like this:


My approach to accessing a database is to not create a grand single query to fetch all information at once. In stead, I think it works better to have many elementary queries, selecting 1 field. No aggregations.

zondag 2 december 2018

Matching brackets

For my work as a developer, I sometimes get the chance to interview new developers for the company. I check their resumes and I review some of their code they send me. We thought it would also be a good idea to give them a little programming task to do in a limited amount of time. I had no idea what kind of task to choose, so I checked Google. The following problem showed up several times and I liked it because it requires some thinking to do it right and at the same time, it doesn't require a lot of typing.

Write a function that checks if the parentheses of an expression match.

The task involves several types of brackets: [], () and {}.  For example, the parenthesis in this expression match:
While in this expression they do not:
I wrote a test function (in PHP):
function test(string $exp, bool $expected)
    if (matchBrackets($exp) !== $expected) {
        echo "Error matching " . $exp . "\n";
and some tests
test("()", true);
test("()()", true);
test("noot()noot", true);
test("(aap)", true);
test("aap", true);
test("(", false);
test(")", false);
test("{[]}", true);
test("{[][]}", true);
test("{[]()}", true);
test("", true);
test("{[aap](({banaan}))}", true);
test("{[{[](({}){a[](({}))}){b[](({})){[](({c}))}}{[](({}))}}]({[]d(({}){[](({123}))})}({{[](({}))}}))}", true);
test("{[aap](({banaa))}", false);
test("{[aap]((anaan}))}", false);

Solving the problem in an elegant way requires you to use recursive nature of the domain.

While the developer was writing the program, I tried to figure it out for myself. And it occurred to me that next to some algorithmic options, there would also be a possibility to use a regular expression. But this could only be done by applying the advanced recursive patterns feature. I had heard of this feature, but never used it.

I would certainly not demand of the developer to give such an answer, but it intrigued me too much to let it go.

It is indeed possible, and so, without further ado, I present the answer here:
function matchBrackets(string $exp): bool
    return preg_match('/^
        # start of group 1
                ) |
                    \( (?1) \)
                ) |
                    \[ (?1) \]
                ) |
                    \{ (?1) \}
    $/x', $exp);

I will try to explain by building the expression from the ground up.
The "x" modifier strips all whitespace from the string. This allows you to make it more readable.
Match the beginning and end of $exp.
Create the group that will be used for recursion.
Each item in this group can be repeated any number of times. This also allows for the completely empty $exp.
Any nested group consists of a sequence of non-brackets, or a set of round brackets () or a set of squared brackets [] or a set of curly braces {}. Finally a "bracket" groups look like this:
\{ (?1) \}
It starts with a bracket (here "{"), followed by a recursive group, and ends with the matching bracket "}"..

A recursive group can repeat the entire expression. You would have to use (?R) in stead of (?1). Here it just matches a single group (group 1) from the expression. Group 1 is the group that starts with the first "(" in the expression. I do not use (?R) here because that would place the ^ and $ in the middle of the expression. That would never give a match.

End Note

It took me hours to get this expression just right, and as simple as possible. I expected the recursive expression to be complicated, but actually it turned out that it could be made to be pretty simple. And if you're interested: yes, it is fast. It was about 8 times as fast as the algorithmic variant I came up with. For a non-compiled language like PHP regular expressions open the door to the speed of a compiled language.

zondag 18 februari 2018

DAYDREAMER - A Landmark Cognitive Architecture

I posted this review before at Amazon
Source code of DAYDREAMER can be found on Github

This book is a revised version of Erik T. Mueller's 1987 dissertation. It was written in 1990 and republished in 2012, by the author himself.

It is a very important book in the discipline of Cognitive Science, as I will try to explain.

Daydreaming, the subject of the book, is the process of "the spontaneous activity - carried out in a stream of thought - of recalling past experiences, imagining alternative courses that a past experience might have taken, and imagining possible future experiences."

Mueller has gone way beyond that single subject, and worked out an impressive cognitive architecture that includes many mental faculties. He supports it with a massive base of research literature. Based on this architecture is the DAYDREAMER Lisp program.


At the core of the architecture are personal goals and needs. These include achievement goals: to maintain self esteem, to have friends, and so on; and cyclic goals: food, money, entertainment.

In order to achieve these personal goals, DAYDREAMER has knowledge about "persons, physical objects, locations, possession of physical objects, ... mental states, interpersonal relationships, actions, and activities" (the "interpersonal domain"). This knowledge is implemented as planning rules and inference rules.

The program also applies these rules to other entities than itself in order to represent and make inferences on the mental states of others. This is called "other planning".

"Most AI programs take a request from a human user, handle the request, and terminate...In contrast, DAYDREAMER has a collection of goals which are instantiated and processed as the program sees fit."

A task or activity on behalf of an active top-level goal is called a "concern". Several concerns may be active at any time. "Each subsequence of contiguous representations produced on behalf of a given concern may be called a 'daydream'".

The program uses emotions to determine which concern to choose at any time ("emotion-driven planning"). Several emotions are created in different phases of a concern. For example, when a concern terminates a new positive or negative emotion is created. These emotions may trigger new concerns.

The architecture allow for two types of creative mechanisms: Serendipity and Action Mutation. Serendipity-based planning uses new input, as a fortunate coincidence, to help progress concerns that are not currently active. Action Mutation modifies events and goals in arbitrary ways (mutation) in order to find novel solutions.

Events are stored in episodic memory.

Emotions, Daydreaming goals, learning, and creativity

  • Chapter 3 shows how to represent emotions in a clear way. It continues with the description of the personal goals that are specific to daydreaming: rationalization, roving, revenge, reversal, recovery, rehearsal, and repercussions. These goals are activated when a person is idle. "Daydreaming goals are heuristics for how to exploit surplus processing time in a useful way." They are activated by emotions and directed, among others, towards the reduction of negative emotional states, and the exploration of consequences of possible actions.
  • Chapter 4 show how a person learns from daydreaming, the effect on his or her future behavior. 
  • Chapter 5 dives into the creative processes serendipity and action mutation.


  • Chapter 6 describes the implementation of mental states and attitudes using Conceptual Dependency. 
  • Chapter 7 describes the implementation of DAYDREAMER rules and procedures in the GATE language, an AI program for the T language, a version of Scheme (Lisp). 
  • Chapter 8 describes possible episodic memory schemes, and Intersection Indexing, the technique used in DAYDREAMER.

Literature review, philosophical underpinnings, and conclusions

  • Chapter 9 is a review literature on daydreaming. 
  • Chapter 10 deals with philosophical questions. 
  • Chapter 11 describes possible applications of the architecture.


Mueller seems to have read every single book and paper every written on the subject, and all adjoining fields as well. Whenever he missed a book reference, and I checked, it turned out that that book hadn't been written at the time. Note that Mueller's dissertation was finished as early as 1987!

And even though it is thirty years later now, as far as I know (and I am happy to be proven wrong), this book can still be seen as the most complete description of a cognitive architecture to date.

zaterdag 13 januari 2018

My little NLI-GO DBPedia demo

With some pride (but not very much ;) I present to you my natural language interaction demo:

 This is the first presentable result of many years work of trying to create a library that allows a user (you) to interact with a database in natural language.

The language in this example is DBPedia, a database that is based on Wikipedia. It is a triple store that contains information about the people in Wikipedia, among other things.

My library tokenizes and parses the sentence and it converts the parse tree into a relational representation. A solution is looked up and executed. This involves a series of simple database requests (in this case Sparql queries). The results are integrated to more relations. From these relations a response in natural language is generated.

The point of my library is not to create a little interface to DBPedia. I am looking to make the interface to databases in general as simple and powerful as possible. This means that more databases, types of databases, domains, natural languages, etc, should be addable with few configuration files. No coding should be necessary eventually. It's a long way to go.

Nevertheless, I invite you to try these sentences with the demo, and check out the productions that will be shown. These are some of the steps the library takes to find the answer.

  • How many children had Lord Byron?
  • How many children had Michael Jackson?
  • Who married Elvis Presley?
  • Who married Donald Trump?
I hope you find this interesting.

zaterdag 9 september 2017

English question syntax

I took the time to find out what types of question syntax there are. Even though one probably needs only a couple of them, it's always good to have some sort of overview.

I had no idea there were so many forms. I learned many of them from this very helpful page! My English Teacher

But I started my search in the ever inspiring ...

The phrase structure rules presented below are my own. Some structures recur in several places. They may probably be grouped. For a more authoritative reference, see Stanford Parser

I use the following abbreviations:

  • EQ = identity (is, was, are, were)
  • BE = auxiliary (is, was, are, were)
  • DO = auxiliary (do, does, did)
  • HAVE = auxiliary (has, have)
  • MOD = modality (can, could, will, would, shall, should)
and these syntactic categories
  • NP = noun phrase (a "thing")
  • VP = verb phrase (an action, or more general, a predication)
  • ADJP = adjective phrase (i.e. red, little, happy)
  • ADVP = adverbial phrase (i.e. quickly, happily)
I grouped the questions by their response type, which I showed in each of the headings.

Factual yes/no-questions or choice questions (boolean / a selected object)

  • DO NP VP (did Lord Byron marry Queen Elisabeth, did Lord Byron not marry Queen Elisabeth)
  • DO NP VP (did Lord Byron marry Queen Elisabeth or Anne Isabella Milbanke)
  • EQ NP NP (was Lord Byron king of England, was Lord Byron not king of England)
  • EQ NP NP (was Lord Byron a king or a lord)
  • BE NP ADJP (is the block red)
  • BE NP ADJP (is the block red or blue)
  • BE NP VP (was Lord Byron born in London, was Lord Byron not born in London)
  • BE NP VP (was Lord Byron born in London or Cambridge)
  • HAVE NP VP (has Napoleon invaded Germany, has Napoleon not invaded Germany)
  • HAVE NP VP (has Napoleon invaded Germany or The Netherlands)
  • MOD NP VP (would you like a cup of coffee, should I leave my things here, can dogs fly, can i ask you a question, can you stack a cube on a pyramid)
  • MOD NP VP (would you like coffee or tea)

Uninverted yes/no questions (boolean)

  • NP VP (Lord Byron married Queen Elisabeth?) (question mark is required)

Wh-questions (which, what, who; name one or more individuals)

  • WHO VP (who married Lord Byron, who was Lord Byron's wife)
  • WHO BE NP VP (who are you seeing)
  • WHO DO NP VP (who does Pierre want to beat)
  • WHO HAVE NP VP (who have you been seeing)
  • WHO MOD VP (who can drive me home)
  • WHOM DO NP VP (whom do you believe)
  • WHOM HAVE NP VP (whom have you believed)
  • WHOM MOD NP VP (whom should I talk to)

    (with whom is Peter speaking)
  • --
  • WHICH NP VP (which countries border the mediterranean, which countries do not border the mediterranean)
  • WHICH BE NP (which is the best option)
  • WHICH DO NP VP (which do you do more often)
  • WHICH NP MOD NP VP (which way should I go)
  • WHAT NP VP (what rock sample contains most iron, what food items did you eat)
  • WHAT BE NP (what is the biggest block, what is your name)
  • WHAT DO NP VP (what do laptops cost)
  • WHAT HAVE NP VP (what has Churchill done to stop the war)
  • WHAT MOD NP VP (what should I do)
  • WHOSE NP VP (whose autographs have you collected, whose parents will drive)
  • WHOSE NP BE NP (whose book is that)

Amount (a number, requires aggregation)

  • HOW MANY NP VP (how many children had Lord Byron, how many children did Lord Byron have)

Degree (a number, the unit result depends on subject)

  • HOW MUCH NP VP (how much sugar goes in a single drink)
  • HOW ADJP BE NP (how high is the Mount Everest, how tall is the tallest man, how small is a mouse, how old are you)
  • HOW ADVP DO NP VP (how often do you go to the movies, how nicely do I need to dress tonight)

Manner (a means)

  • HOW BE NP VP (how was Napoleon crowned king)
  • HOW DO NP VP (how do you go to work)
  • HOW HAVE NP VP (how has Napoleon invaded Britain)
  • HOW MOD NP VP (how can I become more productive)

State (a state)

  • HOW BE NP (how are you)

Reason (a cause)

  • WHY BE NP VP (why was Napoleon crowned king)
  • WHY DO NP VP (why did Napoleon invade Germany)
  • WHY HAVE NP VP (why has John hit Jake)
  • WHY MOD NP VP (why should I go)

Time (a time)

  • WHEN BE NP (when was the marriage)
  • WHEN BE NP VP (when was Napoleon crowned king)
  • WHEN DO NP VP (when did you start wearing make up)
  • WHEN HAVE NP VP (when have you bought the tv)
  • WHEN MOD NP VP (when can I go home)

    WHEN -> WHEN PP (when in the next hour do you want to go)

Place (a place)

  • WHERE BE NP (where is it?)
  • WHERE BE NP VP (where is the concert taking place)
  • WHERE DO NP VP (where did you go)
  • WHERE HAVE NP VP (where has Sally gone)
  • WHERE MOD NP VP (where can I find a pub)

    WHERE -> WHERE PP (where on the map is it)

donderdag 31 augustus 2017

Query Optimization

When building a natural language interface system you must ask yourself if you are going to send many simple queries to the database, or a single complex query and have the database execute it a single run.

One Big Query vs Many Small Queries

For Nli-Go I chose to send many little requests to the database. Each request deals with a single relation and has no joins and aggregations. For a triple store such as SPARQL, each relation just has two arguments. In this case there are only four queries per relation, depending on the variables that were bound in advance:


There are major benefits of the small queries approach:
  • Very little knowledge is required of the idiosyncrasies of the database and of the database management system.
  • Given that the tables are properly indexed, all queries are quick.
  • It is possible to add extra logic to the user query not available from the database.
  • It is possible to combine the results of multiple databases for a single user query. These databases may have completely types.
There are also some major drawbacks, of course:
  • The system does not make use the processing power of the database.
  • The system need to do (very) many queries
  • More knowledge of the nli system is required.


To me the benefits outweigh the drawbacks, but in order to have a performing system, I need to deal with the fact that the database does not help in creating the fastest query execution.
This comes down to the fact that the Query Optimizer of the database is not used for the full user query. Mainly, this means that the queried relations are not presented to the database in the most efficient order. 
A simple example will make this clearer. Suppose the following two queries were created from a user input sentence "who married Lord Byron?"
SELECT 1    FROM NAME WHERE           A = ? AND B = "Lord Byron"
The first query asks for all A, B involved in a marriage relation. The second query asks for the ID of a person whose name is given. When executed in the shown order, the first query yields, say, 40.000 rows. Next, each of the resulting values is entered into the second query. The second query is executed 40.000 times, each time with different values. As you can see, even a simple set of relations gets out of control easily.
Since the database does not do query optimization, the nli system needs to perform this task.
David H.D. Warren wrote an interesting article about query optimization for CHAT-80, a Prolog based natural language interface. It's called
Efficient Processing of Interactive Relational Database Queries Expressed in Logic
His algorithm to determine the order of the simple queries consists of two steps:
  • Calculate the cost for each query
  • Order the queries by cost, lowest first.
The cost is calculated by taking the size of the relation, and dividing it by the product of the number of distinct values in each of the bound arguments. I will explain this with examples:
For the query
the cost is calculated as
where "size(PERSON)" is the number of rows in PERSON. Note that the query has no constraints.
For the query
the cost is calculated as
size(PERSON) / size_unique(ID)
Where "size(PERSON)" is the number of records in the table PERSON, and "size_unique(ID)" is the number of unique values in the column ID. Note that the query has one constraint: ID.
For the query
the cost is calculated as
size(PERSON) / ( size_unique(COUNTRY) * size_unique(AGE) )
Where "size(PERSON)" is the number of records in the table PERSON, and "size_unique(COUNTRY)" is the number of unique values in the column COUNTRY. Note that the query has two  constraints: COUNTRY and AGE.

The result

The cost of a bound NAME query is much lower than that for an unbound MARRIED_TO query, hence the NAME query will be scheduled for execution first. This query returns only a single row, and the second query will need to be executed only once.

In this algorithm, queries with many unbound variables are very costly. With more bound variables, the cost drops rapidly. It also takes into account that tables with less rows are cheaper than tables with many rows and that columns with many distinct values (higher cardinality) are cheaper than those with little distinct values.

It is a very simple, yet efficient way of optimizing conjunctive queries.

vrijdag 21 juli 2017

Intents, Slots and SHRDLU

Several things came together, lately. I read up on the chatbot blogs that flourish. For chatbots the concept of the intent of the user is important. The intent of a sentence is what the user means by it. But what struck me, even if it hadn't been said with so many words, is that the number of intents is restricted. Remember my examination of Siri's commands? These are typical intents, and their number is limited.

At first I was bored by this, because I want my AI to have unlimited possibilities. But then I saw its strength.

As it happened, I was looking for the best way to design an NLI (natural language interface), and for a definition of when an NLI is "complete". When is the design finished? This is an important question when you are building an NLI.

And I read Winograd and Flores' 1987 book "Understanding Computers and Cognition". A book that still needs to sink in with me, and which at first was actually pretty demotivating because it points at the limits of AI. But I already took one lesson home from it: computers are tools. They don't really understand anything. Everything needs to be told. You can't expect a computer to "get it" and continue on its own.

These things, together, brought me the idea of a finite set of intents, semantic structures that act like a set of services in an API. And that these intents are the place where the design of an NLI starts.

Designing from Intents

An NLI has these three layers:
  • Syntax
  • Semantics
  • Database language
When you start to design an NLI you could start out by defining which aspects of syntax you would like to support. But this leads quickly to over-engineering. Because the syntax, while complicated, is the easy part. Every part of the syntax is an input to the other parts of the system. What's more, syntax is never "complete". More syntax is always better.

But when you take semantics as a starting point, or more precise, a fixed set of intents, this changes. Taking this entrance, you ask yourself: what are the things the user actually wants from the database? And this turns out to be surprisingly limited.

Once the intents have been defined, syntax follows. Only syntactic structures that will actually be used need to be implemented. Also, a single intent can have multiple syntactic representations.

The Intents of Alexa

There are several commercial agents you can talk to these days, and that allow you to extend their standard functionality with custom code.

Amazon's Alexa (featured on devices like the Amazon Echo) allows you to create skills. A skill is a conversation subject, like the weather, movies, planning a trip, or whatever.

A skill consists of intents and utterances. An intent is a "lambda function" with a name and arguments. The arguments are called slots. Example intents and slots for the weather skill: WEATHER(date, location), RAIN(date, location), BARBEQUE(date). The functions can be written in several different languages, like JavaScript and Python.

While skills are defined on the semantic level, the syntactic level is defined by utterances. Each intent can have multiple utterances. Here are some utterances for RAIN(date, location): "is it going to rain {date}" "will it rain {date} in {location}" "do I need an umbrella?". Notice the slots. Slots have slot types. The slots named here have the date and location types.

If you want to read more about it, Amazon has detailed descriptions here. It is a lot of work! For a fun and friendly introduction, read Liz Rice her blogs.

The Intents of SHRDLU

I went on to test the idea of starting with intents. And what better place to start than the most complex NLI every built: SHRDLU. SHRDLU, built by Terry Winograd in 1969, expresses many NLI characteristics that have rarely been seen in later systems:
  • Questions about the domain model of the system: "Can a pyramid support a pyramid"
  • Question about the system's history and its decisions: "Why did you clear off that cube"
  • Introduction of new concepts: "A 'steeple' is a stack which contains two green cubes and a pyramid"
But back to intents. I took the example dialog from the book "Understanding Natural Language" by Winograd and tried to create an impression of what the intents of the sentences would be.

1. pick up a big red block
2. grasp the pyramid
3. find a block which is taller than the one you are holding and put it into the box
   KB:FIND?(Object) && KB:PUTIN!(Object, Object)
4. what does the box contain?
   KB:WHAT?(ObjectB) && KB:CONTAIN(ObjectA, ObjectB)
5. what is the pyramid supported by?
   KB:WHAT?(ObjectA) && KB:SUPPORT(ObjectA, ObjectB)
6. how many blocks are not in the box?
   KB:HOW_MANY?(ObjectB) && !KB:CONTAIN(ObjectA, ObjectB)
7. is at least one of them narrower than the one which I told you to pick up?
   KB:PRESENT?(ObjectA) && KB:NARROWER(ObjectA, ObjectB)
8. is it supported?
9. can the table pick up blocks?
   DM:CAN?(P) && P=KB:PICKUP(Object)
10. can a pyramid be supported by a block?
    DM:CAN?(P) && P=KB:SUPPORT(Object, Object)
11. can a pyramid support a pyramid?
    DM:CAN?(P) && P=KB:SUPPORT(Object, Object)
12. stack up two pyramids
    KB:STACKUP!(Object, Object)
13. the blue pyramid is mine
    UM:OWN(Person, Object)
14. I own blocks which are not red, but I don't own anything which supports a pyramid
    UM:OWN(Person, Object) && UM:OWN(Person, Object)
15. do I own the box?
    UM:OWN?(Person, Object)
16. do I own anything in the box?
    UM:OWN?(Person, Object)
17. will you please stack up both of the red blocks and either a green cube or a pyramid?
    KB:STACKUP!(Object, Object)
18. which cube is sitting on the table?
    KB:WHICH?(ObjectB) && KB:SUPPORT(ObjectA, ObjectB)
19. is there a large block behind a pyramid?
    KB:PRESENT?(Object) && KB:BEHIND(Object, Location)
20. put a small one onto the green cube which supports a pyramid
    KB:PUTON!(Object, Object)
21. put the littlest pyramid on top of it
    KB:PUTON!(Object, Object)
22. how many things are on top of green cubes?
    KB:HOW_MANY?(ObjectB) && KB:SUPPORT(ObjectA, ObjectB)
23. had you touched any pyramid before you put the green one on the little cube?
    TM:TOUCH?(Person, Object) && TM:AT(Time)
24. when did you pick it up?
    TM:WHEN?(E) && E=KB:PICKUP(Object)
25. why?
26. why did you do that?
    TM:WHY?(E) && E=TM:DO()
27. why did you clear off that cube?
    TM:WHY?(E) && E=KB:CLEAR_OFF(Object)
28. why did you do that?
    TM:WHY?(E) && E=TM:DO()
29. why did you do that?
    TM:WHY?(E) && E=TM:DO()
30. how did you do it?
    TM:HOW?(E) && E=TM:DO()
31. how many objects did you touch while you where doing it?
    KB:HOW_MANY?(Object) && TM:TOUCH(Object) && TM:AT(Time)
32. what did the red cube support before you started to clean it off?
    KB:WHAT?(Object) && KB:SUPPORT(Object, Object) && TM:AT(Time)
33. there were five blocks to the left of the box then.
    KB:PRESENT(Object, Location) && TM:AT(Time)
34. put the blue pyramid on the block in the box
    KB:PUTON!(Object, Object)
35. is there anything which is bigger than every pyramid but is not as wide as the thing that supports it?
    KB:PRESENT?(ObjectA) && KB:BIGGER(ObjectA, ObjectB) && !KB:WIDE(ObjectA, ObjectC) && KB:SUPPORT(ObjectA, ObjectC)
36. does a steeple
37. a "steeple" is a stack which contains two green cubes and a pyramid
    DM:DEFINE!(Word, Object)
38. are there any steeples now?
    KB:PRESENT?(Object, Time)
39. build one
40. call the biggest block "superblock"
    DM:DEFINE!(Word, Object)
41. have you picked up superblock since we began?
    TM:PICKEDUP?(Object) && TM:AT(Time)
42. why did you drop it?
    TM:WHY?(E) && E=KB:DROP(Object)
43. is there anything to the right of the red pyramid?
    KB:PRESENT?(Object, Location)
44. thank you

Here's a movie of such a SHRDLU interaction.

Many of the intents here are similar. I will now collect them, and add comments. But before I do, I must describe the prefixes I used for the modules involved:
  • KB: knowledge base, the primary database
  • DM: domain model, contains meta knowledge about the domain
  • UM: a model specific to the current user
  • TM: task manager
  • SM: smalltalk
The number of intents of a system should be as small as possible, to avoid code duplication. It should also be not smaller than that, to avoid conditional structures in the intents. I think an intent should not only be about the lambda function, but also about the type of output the user expects.

These then are the aspects of an NLI intent:

  • a precondition: where a simple invocation name is sufficient to wake up a skill, this will not do for a more complex NLI. A semantic condition is necessary.
  • a procedure: where a hardcoded function (the lambda function) is sufficient for a chatbot, an NLI should use parameterized procedures in a language like Datalog. Each step of the procedure must be manageable by a task manager.
  • an output pattern: different types of questions require different types of answers. It is possible that two intents only differ in the responses they give.

Verbs can be the syntactic representation of intents, but "why" can be an intent as well. I use the suffixes ! for commands, and ? for questions.

The KB (knowledge base) intents:
  • PICKUP!(Object): bool
  • STACKUP(Object, Object): bool
  • PUTON!(Object, Location) / PUTIN!(Object, Location): bool
    for these intents different procedures are activated that include a physical act that takes time. the system responds with an acknowledgement, or statement of impossibility
  • BUILD!(Object): bool
    build a compound object
  • FIND?(Object): Object
    search the space for an object with specifications. the object is used for another intent
  • PRESENT?(Object): bool
    like FIND?, the system responds with yes or no
  • WHAT?(Object): object
    like FIND?, but the system responds with a description that distinguishes the object from others
  • WHICH?(Object): object
    like WHAT?, but comes with a specific class of objects
  • HOW_MANY?(Object): number
    this intent involves counting objects, whereas the others just deal with a single instance. returns a number
  • SUPPORT?(Object): bool
    just checks if a certain declaration exists. many of these could exist, and perhaps they can be combined to a single intent PRESENT?(Predication)
The DM (domain model) intents:
  • CAN?(Predication)
    checks key points of the predication with a set of allowed interactions in the domain model
  • DEFINE!(Word, Object)
    maps a word onto a semantic structure that defines it
The UM (user model) intents:
  • OWN(Person, Object): bool
    mark the ownership relation between the user and the object
  • OWN?(Person, Object): bool
    checks if the ownership relation exists
The TM (task manager) intents:
  • TOUCH?(Person, Object)
    touch is an abstract verb that includes pick up, stack on, etc. It has a time frame.
  • PICKEDUP?(Object)
    checks if PICKUP was used
  • WHEN?(Event)
    returns a time index. It is described to the user as the event that happened then
  • WHY?(Event)
    returns the active goal of an event
  • WHY?(Goal)
    returns the parent goal of an goal. the top level goal is "because you asked me to"
The SM (smalltalk) intents:
    there is no meaning involved, just a canned response


I know I sound very deliberate in this post, that I know exactly what's going on and how it should be. That's just my way of trying to make sense of the world. In fact I just stumbled on this approach to NLI and its very new to me. Also my analysis of SHRDLU's "intents", which it never had, may be completely off track. One would only know if one tried to build a new SHRDLU around this concept.

I think chatbots form a great way to get involved in natural language processing.

zaterdag 8 juli 2017

Zen's Prime Directive

I am watching the seventies TV series Blake's 7 from DVD set once more. It's great drama, although some of the visual effects are now very outdated. One of the things that I like best are the human-computer dialogs. The author of the series,  Terry Nation, really got inside the skin of the computer.

There's one particular dialog that I want to write out here, since it illustrates an aspect of needs based agents that I was talking about. It is from the first series, episode 10, "Breakdown". It's a dialog between Blake and Zen, with a surprised remark from the technical genius, Avon, in between.
--- --- --- --- --- --- --- 
Blake: Zen. Set navigation computers for direct route to space laboratory XK72. Speed Standard by Six.
Zen: Rejected.
Avon: You CANNOT reject a direct command!
Blake: Justify that rejection, please.
Zen: Your command reduces to an order to self-destruct. This runs counter to Prime Directive.
For a visual, here's an image of Blake talking to Zen.

Blake gives Zen a command. Zen has command over all computers of the ship, it is the task manager. Blake tells Zen to delegate the command to the navigation computers of the ship.

Zen rejects. Note that rejection is a higher order function of a system. Most computers just do as they are told. Not Zen. This surprises Avon, who calls out that it is impossible for a computer to reject a command. To Avon Zen is "just a computer". But Zen is a needs based agent.

Blake asks Zen to justify the rejection. This is an introspective request. It asks Zen to inspect its own prior reasoning as an intelligent agent.

Zen explains its reasoning efficiently. He says that the command, which includes going through an part of space of which it is only known that it is extremely dangerous (it later appears to be some sort of vortex) may very likely destruct the ship. Zen checks its projected actions with its directives (in which I see the concept of needs). It finds that it runs counter to its Prime Directive (which is probably to stay alive). From this it rejects the command.

As Blake and his crew manually override Zen and continue to go through this part of space, Zen shuts down all of its functions. It refuses to cooperate. If Zen was more like HAL (in 2001, a space odyssey), it would have actively tried to counteract the actions of the humans.

Note that in Star Trek there's also a Prime Directive. There it is a guiding principle for space explorers not to interfere with alien civilizations. I consider both Zen's Prime Directive and that of Star Trek to be needs.

zondag 25 juni 2017

An introduction to CALO

From 2003 to 2008 the American Defense Advanced Research Projects Agency (DARPA) funded the largest research effort for cognitive systems in history. It involved 250 people from 25 universities and companies. Over 500 articles were written about it. It was named PAL, for Perceptive Assistant that learns. The aim of this program was described as follows
Through the PAL program, researchers will develop software that will function as an "enduring personalized cognitive assistant" to help decision-makers manage their complex worlds of multiple simultaneous tasks and unexpected events. 
The DARPA grant was divided over Carnegie Mellon University (CMU) and Stanford Research Institute (SRI).

The CMU project was dubbed RADAR, for Reflective Agents with Distributed Adaptive Reasoning. The system's eventual goal was "The system will help busy managers to cope with time-consuming tasks such as organizing their E-mail, planning meetings, allocating scarce resources such as office space, maintaining a web site, and writing quarterly reports. Like any good assistant, RADAR must learn by interacting with its human master and by accepting explicit advice and instruction." [PAL1]

The SRI project was named CALO (Cognitive Agent that Learns and Organizes). It involved contributions from 20 additional research institutions. Its goal was broader than RADAR's. "The name was inspired by the Latin word “calonis,” which means “soldier’s assistant.” The CALO software, which will learn by working with and being advised by its users, will handle a broad range of interrelated decision-making tasks that have in the past been resistant to automation. It will have the capability to engage in and carry out routine tasks, and to assist when the unexpected happens." [PAL1]

This article is just a short introduction to CALO. Most of its information is derived from SRI's PAL website [PAL2]. The website emphasizes the learning aspect of the framework. Here, I focus on the relation between the components.


To the user CALO consists of several applications. These applications however, are integrated into a common knowledge base, and governed by an autonomous agent that aims to assist the user in common tasks.

The following image from [CALO1] is an overview of the cognitive nature of CALO.

Machine learning ("learning in the wild") was an essential part of each CALO component. It had to be able to learn all kinds of new information without adding lines of code.

These are some of the types of learning strategies involved:

  • learning by instruction (Tailor)
  • learning by discussion (PLOW)
  • learning by demonstration (LAPDOG)
  • learning by observation (Markov Logic Nets)

Different CALO components used different ontologies, but all were designed around a core Component Library (CLib) "The CLib is encoded in the Knowledge Machine knowledge representation language, with a subset being automatically transformable into the popular Web Ontology Language (OWL) of the Semantic Web framework." [CALO-MA3]

[CALO1] Building an Intelligent Personal Assistant (2006)

Application Suite

CALO grew into of a suite of applications that groups around three centers:
  • PExA, the Project Execution Assistant
  • CALO-MA, the Meeting Assistant
  • IRIS, the Information Assistant
From a cognitive agent perspective, PExA is the central component.

PExA - Project Execution Assistant

"PExA focuses on two key areas: time management and task management. Time management refers to the process of helping a user manage actual and potential temporal commitments. Time management critically involves meeting or appointment scheduling but further includes reminder generation and workload balancing. Task management involves the planning, execution, and oversight of tasks. Such tasks may be personal in that they originate with the user, or they may derive from responsibilities associated with a project." [PEXA2]

PExA contains several tools that allow it to learn new procedures, that can be used by SPARK, the task manager.

PExA Architecture

PExA includes the following applications
  • SPARK, the Task Manager
  • PTIME, the Time Manager
  • EMMA, the Event Management Assistant (with PTIME)
  • EMP, the Execution Monitor and Predictor (ProPL)
  • Towel, the Task Management / ToDo UI
  • BEAM, the ToDo Interpreter
  • SEAR, for State Estimation
  • LEPT, the Duration Learner
  • Query Manager
  • ICEE, the Task Explainer
  • Machinetta, the Team Coordinator
  • Tailor, a Procedure Learner
  • EzBuilder, a Procedure Learner
  • LAPDOG, a Procedure Learner
  • PrimTL, a Procedure Learner
  • PLOW, a Procedure Learner
Each user had its own CALO. Machinetta took care of inter-CALO communication. [CALO1]

CALO-MA - Meeting Assistant

"The meeting assistant is designed to enhance a user’s participation in a meeting through mechanisms that track the topics that are discussed, the participants’ positions, and resultant decisions." [PEXA2]

"Our efforts are focused on assisting artifact producing meetings, i.e. meetings for which the intended outcome is a tangible product such as a project management plan or a budget." [CALO-MA1]

The Meeting Assisant used the Multimodal Discourse Ontology (MMD).

The meeting assistant architecture

CALO-MA includes the following applications
  • Dialog Manager
  • Meeting Browser
  • 2-D Drawing Recognizer
  • 3-D Gesture Recognizer

IRIS - Information Assistant

"IRIS is an application framework for enabling users to create a “personal map” across their office-related information objects. IRIS is an acronym for "Integrate. Relate. Infer. Share."" [IRIS1]

IRIS collects raw data from several other applications on the user's desktop, and integrates this data into it's own knowledge base, using CLib based ontologies.

The three layer IRIS integration framework

It contains a plugin framework and plugins for the following applications had been created:
  • Email (Mozilla)
  • Browser (Mozilla)
  • Calendar (OpenOffice Glow)
  • Chat (Jabber)
  • File Explorer (custom made)
  • Data editor/viewer (Personal Radar)
Adam Cheyer was the main contributor to IRIS. He left the CALO project in 2007 to cofound the CALO spin-off Siri. [SIRI1]

In 2007 IRIS, a Java application, was rewritten as a Windows application and renamed to CALO Express. [IRIS4]

CALO Express further contains the applications:
  • Presentation Assistant (for quick creation of presentations)
  • PrepPak (find, gather, and store relevant resources for common office tasks such as attending an upcoming meeting)
[IRIS1] IRIS: Integrate. Relate. Infer. Share. (2005)
[IRIS2] A Case Study in Engineering a Knowledge Base for an Intelligent Personal Assistant
[IRIS3] Extracting Knowledge about Users’ Activities from Raw Workstation Contents (2006)
[IRIS4] Adam Cheyer personal website

[SIRI1] SIRI RISING: The Inside Story Of Siri’s Origins (2013)


SPARK is the central component of PExA.

"At the heart of CALO’s ability to act is a Task Manager that initiates, tracks, and executes activities and commitments on behalf of its user, while remaining responsive to external events. The Task Manager component of CALO is based on a reactive execution system called SPARK" [SPARK2]

"There is a need for agent systems that can scale to real-world applications, yet retain the clean semantic underpinning of more formal agent frameworks. We describe the SRI Procedural Agent Realization Kit (SPARK), a new BDI agent framework that combines these two qualities." [SPARK1]

SPARK Architecture

SPARK is a BDI (Belief-Desire-Intention) agent. Its central function is described as "Each agent maintains a knowledge base (KB) of beliefs about the world and itself that is updated both by sensory input from the external world and by internal events. The agent has a library of procedures that provide declarative representations of activities for responding to events and for decomposing complex tasks into simpler tasks. At any given time the agent has a set of intentions, which are procedure instances that it is currently executing. The hierarchical decomposition of tasks bottoms out in primitive actions that instruct effectors to bring about some change in the outside world or the internal state of the agent. At SPARK’s core is the executor whose role is to manage the execution of intentions. It does this by repeatedly selecting one of the current intentions to process and performing a single step of that intention. Steps generally involve activities such as performing tests on and changing the KB, adding tasks, decomposing tasks by applying procedures, or executing primitive actions." [SPARK1]

[SPARK1] The SPARK Agent Framework (2004)
[SPARK2] Task Management under Change and Uncertainty Constraint Solving Experience with the CALO Project (2005)
[SPARK3] SPARK website
[SPARK4] Balancing Formal and Practical Concerns in Agent Design (2004)
[SPARK5] Introduction to SPARK (2004)

Ending Notes

I wrote this piece because I believe CALO is worth to be studied. It holds gems in various fields of AI. At the same time, the huge number of papers written on it may be daunting to the newcomer. I just hope this blog post has opened a door.

Combining the data of multiple databases in a single natural language query

In our time we have at our disposal a growing number of open data sources. We can query these individually, and that's nice. But wouldn&...