zondag 17 juli 2022

The necessity of consciousness

Why do living organisms experience consciousness? I hypothesize that living beings acquired consciousness because it was the only way they could learn about what is good and what is bad for their concerns.

Organisms have concerns like staying alive, stay well-nourished, stay healthy, reproduce, etc.

Consciousness as I use it here is the mental experience of information: seeing, smelling, touching, feeling. Not the physical aspect of neurons firing, but the mental aspect of experiencing.

Learning always requires feedback. This feedback can come from an external source, a teacher, or from yourself, as you set a goal to mimic someone or something else.


But how do you learn that sprouts taste awful? Or that being yelled at hurts? Or that a painting by Picasso is strange. Of course someone may tell you about it, and you may copy that opinion. But what if copying someone else's opinion is not good enough. And that it's better to find out for yourself.

In these cases you learn by trying. You learn from experience

Can this learning be done by mental processes alone? Is the firing of neurons, the strengthening of the axons, or a bit higher up: the creation of a neural representation, is that enough to learn about the taste of pepper, the smell of poop, or the bewilderment caused by an Escher painting?

It isn't. What's missing is the judgement. Of good and bad, of awful and sweet. Of interest and boredom. This judgement is not in the representation. And it is not induced from its representation. The judgement is in the experience of the representation.

Smell a rotten egg. Certain chemicals enter the nose. The nose fires certain neurons. Certain areas in the cortex are activated. The amygdala is involved. Neural patterns are formed. So what? Why would instinctively cause you to retract? It doesn't. Not unless experience is involved. And no matter the kind of representation, it is no substitute for experience.

This experience is important. It connects your sensory input to your inner concerns. It teaches you what is good for you and what is bad for you.


But if experience is different from a neuronal representation, what is it? I have to admit that my idea of this is not far from the view that Descartes described, 400 years ago. Dualism, formed from a neuronal mind / body on the one hand, and an experiencing psyche on the other hand. Experience (consciousness) is a different substance than matter.

For this hypothesis to work, the neural mind must influence the conscious substance. The idea that comes to mind is the way electricity influences magnetism, and vice versa: two different substances that turn out to be part of the same continuum. In this case matter and consciousness would both be part of a matter-consciousness universe.

This diagram attempts to show how mind and consciousness interact. Mind creates a representation. Certain representations cause a change in consciousness. In order for this experience to be functional, it needs to be noticed by the mind again. There needs to be perception of the conscious experience.


But why is this necessary? Can't experience be emulated? It can be faked, for sure. But it can't be emulated, because experience is what it's all about. Experience drives existence. You would not buy a painting unless it was beautiful. You would not eat all that potato chips unless it tasted so damn good. Experience is the messenger of the deep concerns that makes each of us a living being.



zaterdag 15 januari 2022

Processing SHRDLU demo sentence "Why?" with NLI-GO

For several years I have been working on a natural language understanding program called NLI-GO.  It allows the user to interact with a database through natural language. Since I needed example sentences to extend its functionality, and SHRDLU is a great inspiration, I thought it would be a good idea to recreate SHRDLU's famous sample dialog in NLI-GO. And it is a good idea. Each sentence creates new challenges and puzzles that need to be solved, and this allows NLI-GO to grow.

I am still working on it. The dialog has 44 interactions, and the last few months I have been working on interaction 25. It goes like this:

Person: Why?
Computer: TO GET RID OF IT

As usual, this interaction proved more complicated than I had envisioned at first. And now that I finished it I would like to explain some of the techniques I used. 

Some dialog context

The question "why?" follows a series of interactions. Of these interactions the ones that matter are:

Person: Will you please stack up both of the red blocks 
  and either a green cube or a pyramid?
Computer: OK.

This is followed by the sequence

Person: Had you touched any pyramid before you put the 
  green one on the little cube?

Person: When did you pick it up?

Person: Why?
Computer: TO GET RID OF IT

So the person asks the computer when it picked up the green pyramid, and then why it picked up that pyramid. SHRDLU remembers that it was stacking up some blocks and needed to get rid of this pyramid in order to move the block that was below it.

SHRDLU's actions

This is a lot to take in. These are not your average QA questions one would ask a database in an NLI system. So this is a good time to go into the action structure of SHRDLU.

When the user tells SHRDLU to perform some command ("build a stack", "pick up a block"), the actions needed to perform this command are executed. While they are executing, SHRDLU also stores the fact that such and such an action was performed in its database.

This is what the start of an action "pick up" looks like in NLI-GO:

pick_up(EventId, Subject, Object)
start_time(EventId, Start);
end_time(EventId, End); parent_event(EventId, ParentEventId) 

Note that there's no need for an event-based database, if such database would exist. A normal relational database can keep track of these events, but they need to be inserted explicitly by the application.

I am currently using an internal database (a data structure) to store these actions, but they could be stored in a relational database as well. `pick_up`, `start_time`,`end_time`, and `parent_event` would then be the names of tables.

Most commands are not simple. They consist of a hierarchy of actions. If an action "pick up" needs to move aside some object, it starts a "get rid of" action. This action will receive a parent pointer to its origination action. This is stored as the `parent_event`.

Because these actions are stored, SHRDLU has a memory of them. Because each action has a reference to the action that started it, SHRDLU can tell why it performed an action, simply by following the `parent_event` link.

SHRDLU has a simple discrete sense of time. Each time a basic action is performed, the internal time is updated by 1.


To understand the question "Why?", an understanding system needs to find out what is meant here, exactly, as if asking: "Why what?" Clearly part of the question is left out. this is called ellipsis.

NLI-GO treats this problem by declaring the ellipsis explicitly in the grammar. Here is the grammar rule that deals with it:

{ rule: interrogative_clause(P1) -> 'why',
  ellipsis: [prev_sentence]//mem_vp(P1),
  sense: go:intent(why, P1) $mem_vp }

The rule " interrogative_clause(P1) -> 'why' " rewrites the clause to the single word "why". The value of `ellipsis`,  [prev_sentence]//mem_vp(P1) is a path that leads to the missing part of the sentence.

In trying to match this rule, NLI-GO follows the path, starting by the current `interrogative_clause` node of the active syntax tree:

  • [prev_sentence] tells it to visit the previous sentence in the dialog); this is "When did you pick it up". NLI-GO is not at the root of this sentence
  • //mem_vp tells it to visit all mem_vp nodes anywhere below the current node 
The syntax is somewhat similar to that of XPath (used to navigate XML).

It such an `mem_vp` is found, this node will be copied into the active sentence. The syntax tree of "Why?" is now extended and the complete sentence now looks like this:

Why did you pick it up?

Resolving "it"

Something that may seem self-evident, but which isn't really, is that "it" in the new sentence must refer to the same green pyramid as where "it" in the previous sentence referred to.

NLI-GO should not try to resolve this "it" anew; it should simply inherit the value of "it" that had been resolved in the "When" question. If it wouldn't, "it" might resolve to another object, and this is contra-intuitive. 

(And yes, this is of course exactly what happened at an earlier stage of the software; "it" first referred to SHRDLU (which is not even an "it", but NLI-GO had no idea).

The response: "it" again

The response to the question should be

To get rid of it

However, the response that NLI-GO initially gave was this:

To get rid of the green pyramid

It proved not so simple to get NLI-GO to just use a pronoun.

Centering theory

Interaction 25 is the first interaction that uses a pronoun in its response. Why would SHRDLU suddenly use a pronoun in its response? To the reader this is obvious. The object in the sentence had been referred to as "it" before, so it is natural to keep referring to it as that. The fact that an object may be in the spotlight of a dialog has been worked out in centering theory. If an object was the preferred center of the previous sentence, it must the preferred center in the current sentence again. Also: a sentence' subject is more likely to become a new center than an object, and this in turn is more likely than any other entity.

Concluding remarks

This seemingly innocuous sentence has kept me busy for four months. I had not seen it coming. It has both been a source of joy for learning new concepts but also a continuous source of frustration. I needed to rewrite existing structures to enable these new features and this broke several existing interactions. Still, the new framework is not at all robust. It needs a lot of work to make it simple to use. At the same time, I still feel this is a great field to work in, and it will eventually provide a level of determinism that machine learning will never reach. So it's worth it.

donderdag 13 mei 2021

Designing Software with the Needs of Human Beings in Mind

Some developers seem to think that writing tests for a software application is the best and only thing needed to write good software. It isn't. There's a lot more to it than that.

When you design or architect a piece of software, you need to be aware of the needs of all people that are somehow involved in it.

By needs, I mean the human needs or concerns that form the purpose of the software's existence. To write good software you need to be aware of these.


A need is something that is essential to the well-being of a person. I use the term need in the sense of Maslow's hierarchy of needs. People have a variety of needs and they expect some of these needs to be fulfilled by the software application.

These are some of the human needs that are related to software:

  • Basic needs
  • Safety
  • Esteem / Respect
  • Belonging
  • Creativity

There several types of people involved in your software: 

  • end-users: the people that actually use your software
  • developers: you, your fellow developers, others
  • business: managers, CEO's, business oriented people
These people have many different needs. Most of which you don't know about. But in some way they are reflected in different software aspects. 


Software has a number of needs-related characteristics. Let's call them aspects. These characteristics can be reduced to one or more basic human needs.  


This aspect simply says that the software should do what it's supposed to do. Have lots of features and no bugs. 

The functions of a software application helps a person to fulfill all kinds of needs: from buying food to spiritual growth. This is part of why it's so interesting to be a developer. Your work can be used in almost all fields of life.

Lack of functionality brings disappointment to users. They will try competitive products, if possible. Good software helps people to reach their goals and fulfill their needs.

Needs: all types of needs
Elements: features, usability, documentation, short time-to-market, bug-free
People: end-users, business


Processing speed is a pervasive aspect of software. Faster is almost always better. Why? Because whatever you want from a piece of software, your need will be fulfilled earlier if the software is faster. 

Slow software leads to frustration, an emotion that expresses that needs are not fulfilled fast enough. Only fast software can bring people in a state of flow.

Speed is tricky, because at the start of a project, the software you create is fast enough. It just becomes slower and slower as you add more features, or add more data. So you need to think ahead. Create a design with the working software in mind. Picture your customers using it, and estimate the amount of data they will be processing. Plan for scalability. 

Needs: all types of needs
Elements: hardware, architecture, scalability, optimization
People: end-users, developers, business


Maintainability is about the time it takes for someone to change the software. The time it takes to learn it; the ease of making changes.

If your software is maintainable, you and other developers are able to be creative while developing it further. Maintaining software builds a community of developers and this leads to a sense of belonging. Working on well-maintainable software lifts your self-esteem and the esteem of the other developers.

Needs: esteem, belonging, creativity
Elements: elegance, simplicity, architecture, design patterns, documentation, tests, code conventions, logging

People: developers, business


Security is both a software aspect and a basic human need. People need to be safe, and feel that their software is safe to use.

Needs: safety
Elements: privacy, risk analysis, input checking
People: end-users, business

Low Cost

And then there is cost: software development costs money, of course. I call this "low cost", because the lower the cost, the better.

You can also think of cost in a broader sense: the negative effects of the software to society, and to the environment.

The cost of software is not simply reducible to a human need. 

Needs: any, indirectly
Elements: timely delivery, cost of development, cost of maintenance, environmental cost
People: end-users, business

About these aspects

These aspects are not cast in stone. I just made them up ;) I just picked some to waken your awareness to these things. You may find another subdivision more useful.

The aspects are orthogonal: they don't overlap, and the one cannot be reduced to the other.

But even though the aspects are orthogonal, they are not independent. Every new feature may reduce the speed. All aspects need maintenance. Everything that must be made costs money.

Note the "people" parts of the aspects: if "developers" are not listed among them, you will need to place yourself in the position of other people to understand their needs. Security, for example, is not important to the developer. The developer will need to consider what the end-user and the business will consider important when it comes to security.

In architecture design these aspects are usually called "stakes" and the people that are involved, stakeholders. The difference this blog tries to make is to emphasize that these stakes are based on human needs, and that makes it a bit more personal.


You will find your self thinking: yes, I understand. These are all worthy causes, but it takes way too much time to build software that fulfills all these aspects!

You are right. And this is exactly the point! First you need to be aware of all these aspects. Then you need to weigh all of them. Then you need to come to the right balance: which aspects are more important and which are less important, in the current project? You can just keep this in mind; or you might put it in writing if you intend to set a standard.

But you need to consider these aspects early on in the software trajectory. They are architectural decisions. Changing your stance on these issues may be very hard and costly.

Balancing aspects is the most important role of a software designer or architect.


You may have noticed that writing tests is just a small aspect of software design. It helps to write maintainable code, but it is not the whole picture. Some projects benefit more from focusing on other aspects. And in some projects automated tests are just a waste of time. It all depends.

maandag 1 februari 2021

How do I write my own parser? (for JSON)

 This blog post first appeared on http://techblog.procurios.nl/k/n618/news/view/14605/14863/how-do-i-write-my-own-parser-(for-json).html

05 December 2008

If no parser is available for the file you need, writing one yourself may be easier than you think. What file-structures are managable? What would be the design of such a parser? How do you make sure it is complete? Here we describe the process for building a JSON parser in C#, and issue the source code.

By Patrick van Bergen

[Download the JSON parser / generator for C#]

The software is subject to the MIT license: you are free to use it in any way you like, but it must keep its license.

For our synchronisation-module (which we use to synchronize data between diverse business applications) we chose JSON for data exchange. JSON is just a little better suited for a PHP web-environment than XML, because:

  • The PHP functions json_encode() and json_decode() allow you to convert data structures from and to JSON strings
  • JSON can be sent directly to the browser in an Ajax request
  • It takes up less space than XML, which is important in server > browser traffic.
  • A JSON string can be composed of only ASCII characters, while still being able to express all UNICODE characters, thus avoiding all possible conversion issues a transport may carry.

So JSON is very convenient for PHP. But of course we wanted to be able to synchronize with Windows applications as well, and because C# is better suited to this environment, this part of the module was written in this language. The .Net framework just didn't have its own JSON parser / encoder and the open-source software written for this task often contained a whole package of classes and constraints and sometimes the JSON implementation wasn't even complete.

We just wanted a single class that could be imported and that used the most basic building blocks of our application: the ArrayList and the Hashtable. Also, all aspects of JSON should would have to be implemented, there should a JSON generator, and of course it should be fast.

More reasons to write our own parser weren't necessary. Writing a parser happens to be a very thing satisfying to do. It is the best way to learn a new programming language thoroughly. Especially if you're using unit-testing to guarantee the parser / generator matches the language specification exactly. JSON's specification is easy to find. The website http://www.json.org/ is as clear as one could wish for.

You start by writing the unit-tests. You should really write all test before starting the implementation, but such patience is seldomly found in a programmer. You can at least start by writing some obvious tests that help you to create a consistent API. This is an example of a simple object-test:

string json;
Hashtable o;
bool success = true;

json = "{\"name\":123,\"name2\":-456e8}";
o = (Hashtable)JSON.JsonDecode(json);
success = success && ((double)o["name"] == 123);
success = success && ((double)o["name2"] == -456e8);

Eventually you should write all tests needed to check all aspects of the language, because your users (other programmers) will assume that the parser just works.

OK. Parsers. Parsers are associated with specialized software: so called compiler compilers (of which Yacc is the most well known). Using this software will make sure that the parser will be fast, but it does not do all the work for you. What's more, it can be even easier to write the entire parser yourself than to do all the preparatoy work for the cc.

The compiler compiler is needed for languages with a high level of ambiguity. A language expression is parsed from-left-to-right. If a language contains many structures that cannot be identified at the start of te parse, it is advisable to use a tool that is able to manage the emerging complexity.

Unambiguous languages are better suitable for building the parser manually, using recursive functions to process the recursive nature of the language. The parser looks ahead one or more tokens to identify the next construct. For JSON it is even sufficient to look ahead a single token. This classifies it as an LL(1) language (see also http://en.wikipedia.org/wiki/LL_parser).

A parser takes as input a string of tokens. Tokens are the most elementary building blocks of a language, like "+", "{", "[", but also complete numbers like "-1.345e5" and strings like "'The scottish highlander looked around.'". The parse-phase is usually preceded by a tokenization phase. In our JSON parser this step is integrated in the parser, because to determine the next token, in almost all cases, it is enough to just read the next character in the string. This saves the allocation of a token table in memory.

The parser takes a string as input and returns a C# datastructure, consisting of ArrayLists, Hashtables, a number of scalar value types and null. The string is processed from left-to-right. An index (pointer) keeps track of the current position in the string at any moment. At each level of the parse process the parser performs these steps:

  • Look ahead 1 token to determine the type of the next construct
  • Choose the function to parse the construct
  • Call this function and integrate the returned value in the construct that is currently built.

A nice example is the recursive function "ParseObject" that parses an object:

protected Hashtable ParseObject(char[] json, ref int index)
Hashtable table = new Hashtable();
int token;

// {
NextToken(json, ref index);

bool done = false;
while (!done) {
token = LookAhead(json, index);
if (token == JSON.TOKEN_NONE) {
return null;
} else if (token == JSON.TOKEN_COMMA) {
NextToken(json, ref index);
} else if (token == JSON.TOKEN_CURLY_CLOSE) {
NextToken(json, ref index);
return table;
} else {

// name
string name = ParseString(json, ref index);
if (name == null) {
return null;

// :
token = NextToken(json, ref index);
if (token != JSON.TOKEN_COLON) {
return null;

// value
bool success = true;
object value = ParseValue(json, ref index, ref success);
if (!success) {
return null;

table[name] = value;

return table;

The function is only called if a look ahead has determined that a construct starts with an opening curly brace. So this token may be skipped. Next, the string is parsed just as long as the closing brace is not found, or the end of the string is found (a syntax error, but one that needs to be caught). Between the braces there are a number of "'name': value" pairs, separated by comma's. This algorithm is can be found literally in the function, which makes it very insightful and thus easy to debug. The function builds an ArrayList and returns it to the calling function. The parser mainly consists of these types of functions.

If you create your own parser, you will always need to take into account that the incoming string may be grammatically incorrect. Users expect the parser to be able to tell on which line the error occurred. Our parser only remembers the index, but it also contains an extra function that returns the immediate context of the position of the error, comparable to the error messages that MySQL generates.

If you want to know more about parsers, it is good to know there consists a een standard work on this subject, that recently (2006) saw its second version:

Compilers: principles, techniques, and tools, Aho, A.V., Sethi, R. and Ullman ,J.D. (1986)

Semantic web marvels in a relational database - part II: Comparing alternatives

This blog article first appeared on http://techblog.procurios.nl/k/n618/news/view/34441/14863/Semantic-web-marvels-in-a-relational-database---part-II-Comparing-alternatives.html

15 June 2009

In this article I will compare the basic technical details of current relational database alternatives.

By Patrick van Bergen

In the first article I explained the relational database mapping of our semantic web implementation. In this article I will place this work into perspective by exploring related techniques.

The last few years developers are looking for ways to overcome certain shortcomings of relational database systems. RDBMSes are general purpose data stores that are flexible enough to store any type of data. However, these are several cases in which the relational model proves inefficient:

  • An object has many attributes (100+), many of which are optional. It would be wasting space to store all these attributes in separate columns.
  • Many attributes with multiple values. Since each of these attributes needs a separate table, the object data will be distributed over many tables. This is inefficient in terms of development time, maintenance, as well as query time.
  • Class inheritance. Since most software is Object Oriented these days the objects in code will need to be mapped to the database structure. In the case of class inheritance, where attributes are inherited from superclasses, it is a big problem to store objects in, and query them from, an RDBMS efficiently.
  • Types and attributes are not objects. In an RDBMS the data of a model is separate from the metadata (attribute names, datatypes, foreign key constraints, etc.). Types and attributes are not like normal objects. This is inefficient in areas where types and attributes need to be added, changed and removed regularly, just like any other data. It is inefficient to write separate code to manipulate and query types and attributes. In short, first order predicate logic no longer suffices for many new applications. The second order is needed.
  • Scalability. Is an aspect often named as the reason to leave RDBMS. However, since relational databases have been optimized for decades, they do scale. Nevertheless, in this age of global, real-time webapplications, techniques provided by RDBMS manufacturers may prove to be inadequate, or simply too expensive.

In the following I will provide a simple understanding of the basic principles of alternative database techniques, along with some pointers to more in-depth information. I hope you will forgive me my non-expert view on these subjects. For detailed information on any given subject, look elsewhere. This article is meant to be just a quick overview, aimed to waken some concepts provided by the examples.

RDBMS, or Row-Oriented database

In a relational database management system, pieces of data are grouped together in a record. In this article I will consider the case where the data stored is meant to represent the attributes of an object. Seen this way, a record is a group of attributes of an object. Here's an example of such a table of objects:

object idcolorwidthheightname
3red100100my box
4green50500old beam

Metadata is shown in gray. Keys / foreign keys are shown in bold typeface.

Need more attributes? Add more columns. Need an attribute with multiple values? Add a table and link it to the first. The RDBMS chooses speed over flexibility. Speed was a big deal 40 years ago, when this database type was designed. And it still is a big deal today. For large amounts of simple data, there is absolutely no need to leave this model.

Semantic net

Storing semantic information as triples is an old idea in the field of Knowledge Representation. As early as 1956, semantic nets were used for this purpose. In this technique the relations between objects are represented by plain labels. Each "record" stores only a single attribute, or one element of an array-attribute. Most notable are the absense of metadata and the fact that object data is distributed over many records.

object idpredicatevalue

my box

4nameold beam

Need more attributes? No need to change the table structure. Need an attribute with multiple values? Same thing. 


The Entity-Attribute-Value model of knowledge representation uses some form of triples, just like the semantic web. Its primary use is described by Wikipedia as "Entity-Attribute-Value model (EAV), also known as object-attribute-value model and open schema is a data model that is used in circumstances where the number of attributes (properties, parameters) that can be used to describe a thing (an "entity" or "object") is potentially very vast, but the number that will actually apply to a given entity is relatively modest. In mathematics, this model is known as a sparse matrix."

Attribute metadata is stored in separate attribute tables, which are not triples. EAV is a sort of middle between semantic nets and semantic web: attributes have explicit properties, but these are fixed in amount.

EAV can be used to model classes and relationships as in EAV/CR.

EAV is used in Cloud computing databases like Amazon's SimpleDB and Google's App Engine.

object idattribute idvalue

my box

44old beam


attribute idnamedatatypeunique

Need more attributes? Add them in the attribute table. Attributes with multiple values? No extra work. The schema of the attributes is stored in the database explicitly, but attributes are treated different from the objects.

Column-Oriented databases

From wikipedia: "A column-oriented DBMS is a database management system (DBMS) which stores its content by column rather than by row."

object idcolor


object idwidth


object idheight


object idname
3my box
4old beam


Google's BigTable is based, in part, on column-orientation. Their tables use reversed URI's as object and column identifiers, and have a "third dimension" in that older revisions of the data are stored in the same table.


Correlation databases

correlation database is "value based": every constant value is stored only once. All these values are stored together, except that values are grouped by datatype. All values are indexed. "In addition to typical data values, the data value store contains a special type of data for storing relationships between tables...but with a CDBMS, the relationship is known by the dictionary and stored as a data value."

I have not found a clear example of what this datastructure looks like, but we can infer that the internal structure must look something like the following. Note: I may be completely wrong here!

The values-table (actually there is one table per major datatype; i.e. integers, strings, dates, etc.)

value idvalue
6my box
7old beam
8<object 1>
9<object 2>
10<relationship color>
11<relationship width>
12<relationship height>
13<relationship name>


and then there is at least a table containing the relationships (or: "associations") between the values. The relationships are stored as values themselves:

value id 1associationvalue id 2


Hierarchical model, Network model, Navigational database

For the sake of completeness I have to name these models. The hierarchical model stores tree-like structures only, requiring each piece of data to have a single "parent". The network model allows a piece of data to have multiple parents. Both models were superseded by the relational model, but they are still used for special-purpose applications. A navigational database allows to traverse such trees / DAGs by following paths.



Object-Oriented databases

In an object-oriented database all attributes of a class are stored together. From what I've read on the internet I conclude that the actual storage structure of an OODBMS is sort of an implementation detail. This means that performance characteristics of the database will depend heavily on the type of implementation chosen. Development of this model was first in the hands of the ODMG, but control was transferred to the Java Community Proces that build the Java Data Objects specification. This specification names the conditions for such a database, but does not guide the implementation.

Some special properties:

  • Class inheritance is supported in the data model.
  • Object nesting: an object can contain (not just link to) other objects

Mapped to an RDBMS, a so called ORM (Object Relational Mapping), objects are commonly stored in a standard relational way: one column per (single valued) attribute. To implement inheritance, the columns of all base classes of an object are joined. This can be done at design-time (create a big table containing the columns of all parent classes) or at query-time (join parent class tables).

class idobject idcolorwidthheightname
1013red100100my box
1014green50500old beam


class idclass nameparent class


Document based databases

document based database is a different beast altogether. It lacks a database schema completely, and a complete object is stored in a single cell. In the case of CouchDB, this is done by encoding the object (or: document) in JSON. Real-time querying of the source table is thus impossible, one needs to create views on the data.

object iddocument
3{"color":"red","width":100,"height":100,"name":"my box"}
4{"color":"green","width":50,"height":500,"name":"old beam"}



Some triplestores are publicly available. Commonly they have an RDF interface. Their performance can be measured using the Lehigh University Benchmark (LUBM). The most advanced open source triplestores are Sesame, and ARC.

object idattribute idvalue
3104my box
4104old beam


Very little has been made public about the way triplestores are implemented in a relational database. A laudable exception to this is the Jena2 database schema. Unfortunately, the schema appears to be very inefficient, since the URIs are not indexed but are used literally.

A charmingly simple implementation that seems resource intensive was made for expasy4j: triples are stored in a single table, but for query speed, a single column is reserved for each separate datatype.

Another, somewhat better implementation was made for OpenLink Virtuoso: it uses indexed uris, but all constants are placed in a single field datatyped "ANY".


I hope this article has shown you a little bit why developers are looking for alternatives for the familiar RDBMS and which forms these currently have taken. Currently the field is quite diverse and developments are being made by many different parties. It will be interesting to see how this evolves and which alternative(s) will eventually become the successor of the relational database.

Semantic web marvels in a relational database - part I: Case Study

This blog article first appeared on http://techblog.procurios.nl/k/n618/news/view/34300/14863/Semantic-web-marvels-in-a-relational-database---part-I-Case-Study.html 

01 June 2009

You have heard about the semantic web. You know it is described as the future of the Web. But you are still wondering how this vision is going to make your applications better. Can it speed up application development? Can it help you to build complex datastructures? Can you use Object Oriented principles? This article shows how it can be done. And more.

By Patrick van Bergen

The semantic web is framework developed by the W3C under supervision of Tim Berners Lee. Its basic assumption is that data should be self-descriptive in a global way. That means that data does not just express numbers, dates and text, it also explicitly expresses the types of relationship these fields have for their objects. Using this uniform datastructure, it will be easier to interchange data between different servers, and most of all, data can be made accessible to global search engines.

That is a big thing. But is that all? Can't you just provide an RDF import / export tool for your data and be done? Are there any intrinsic reasons why you would base you entire datastructure on the semantic web?

In a series of two articles I will try to explain how we at Procurios implemented semantic web concepts, what the theoretical background of our implementation is, and what benefits a semantic web has over a traditional relational database. In this first article I will explain how we implemented a semantic web in a relational database (we used MySQL), added an object oriented layer on top, and even created a data revision control system from it.


In a classic relational database, data is stored in records. Each record contains multiple fields. These fields contain data that may belong to some object. The relation between the field and the object it belongs to is not represented as data in the database. It is only available as metadata in the form of the column (name, datatype, collation, foreign keys). An object is not explictly modelled, but rather via a series of linked tables.

A semantic web is a network of interrelated triples ("subject-predicate-object" triplets) whose predicates are part of the data themselves. Moreover, each object has an identifier that is not just an integer number that means only something inside the database only. It is a URI that may have a distinct meaning worldwide.

A triple is a record containing three values: either (uri, uri, uri) or (uri, uri, value). In the first form the triple relates one object to another, as in the fact "Vox inc. is a supplier" (Both "Vox inc.", "is a", and "supplier" are semantic subjects identified by a uri). In the second form the triple links a constant value to a subject, as in  "Vox inc.'s phone number is 0842 020 9090". A naive implementation would look like this:

CREATE TABLE `triple` (
    `subject`                varchar(255) NOT NULL,
    `predicate`             varchar(255) NOT NULL,
    `object`               longtext,

This table provides a complete implementation for the semantic web. However, it is too slow to be used in any serious application. Now, there are various ways in which this basic form can be optimized, but to my knowledge there is no best practise available. Several problems have to met:

  • How to identify a triple uniquely (If this is necessary for your application. The combination of subject, predicate, object itself is not unique)
  • How to search fast, given a subject and predicate? ("Give me the names of these set of people"?)
  • How to search fast, given a predicate and an object? ("Give me the persons whose name begins with `Moham`"?)

To solve these problems we came up with the following changes:

  • Create a single triple index table that only stores triple ids.
  • Create separate triple tables for each of the major datatypes needed (varchar(255), longtext, integer, double, datetime)
  • The triple tables reference the index table by foreign key.
  • Add two extra indexes for the two ways the tables are used: a subject-predicate combined key and a predicate-object combined key.

Here's the triple index table (we are using MySQL):

CREATE TABLE `triple` (
    `triple_id`                int(11) NOT NULL auto_increment,
    PRIMARY KEY (`triple_id`)

and here's the triple table for the datatype "datetime" (the other datatypes are handled similarly)

CREATE TABLE `triple_datetime` (
    `triple_id`                int(11) NOT NULL,
    `subject_id`              int(11) NOT NULL,
    `predicate_id`          int(11) NOT NULL,
    `object`                   datetime NOT NULL,
    `active`                   tinyint(1) NOT NULL DEFAULT '1',
    PRIMARY KEY (`triple_id`),
    KEY (`subject_id`, `predicate_id`),
    KEY (`predicate_id`, `object`),
    CONSTRAINT `triple_datetime_ibfk_1` FOREIGN KEY (`triple_id`) REFERENCES `triple` (`triple_id`) ON DELETE CASCADE

The table definition should speak for itself, except for the field "active". This field is not necessary at this point, but I will need it in the next section.

The predicate_id refers to a separate "uri" table where the full uris of these predicates are stored. However, this is not necessary and the uris may be stored in the triple_longtext table as well.

The two combined keys have an interesting side-effect: the application developer never needs to be concerned again about using the right keys. Effective keys have been added by default.

To query this triplestore building SQL queries by hand may be a daunting task. It requires a special query language to be effective. More about that below.

All data of a given object can be queried by selecting all triples with a given subject id (one query per datatype triple table). That seems to be inefficient and it is: compared to the situation where an object can be stored in a single record, the triplestore is always slower. However, in a more complex situation a relational database requires you to join many tables to fetch all data. We use 5 separate queries (one per datatype table) to fetch all object data from the triplestore. This turned out faster than a single union query on the five queries. We use the same 5 queries to fetch all data of any desired number of objects. Here a the queries needed to fetch object data from three objects identified by the ids 12076, 12077, and 12078:

SELECT `object` FROM `triple_varchar` WHERE `subject_id` IN (12076, 12077, 12078);
SELECT `object` FROM `triple_longtext` WHERE `subject_id` IN (12076, 12077, 12078);
SELECT `object` FROM `triple_integer` WHERE `subject_id` IN (12076, 12077, 12078);
SELECT `object` FROM `triple_double` WHERE `subject_id` IN (12076, 12077, 12078);
SELECT `object` FROM `triple_datetime` WHERE `subject_id` IN (12076, 12077, 12078);

You can see that the object-data is fetched from the database without having to provide explicit type or attribute information. The type of the object is stored in one of the triples. This is useful in case of inheritance where the exact type of an object can only be determined at runtime.

Arrays and multilinguality

Many object attributes have an array datatype (an unordered set). To model these in a relational database you would need a separate table for each of these attributes. Querying all attributes of a series of objects including these array attributes is far from easy. In the triple store you can model an unordered set as a series of triples having the same subject and predicate and a different object. When you query all object data, you will get the array values the same way as you get the scalar values.

Multilinguality is also a hassle in relational databases. For each of the attributes that need to be available in more than one language the table structure needs to be adjusted and it is hard to avoid data duplication. In a triplestore you can treat a multilingual attribute almost like an array element. The only thing is that the predicates are similar but not the same. We use the following uri's for the representation of different language variants of an attribute: http://our-business.com/supplier/description#nl, http://our-business.com/supplier/description#en, http://our-business.com/supplier/description#de (in the tables these predicates are replaced by their integer ids for faster querying).

Data revision control

Version control is pretty common for developers when it comes to storing previous versions of their code. It allows you to track the changes of the code, revert to a previous version, and to work on the same file together. Still, when it comes to data, version control is very uncommon. And I think that is mainly because the overhead to create such a system is huge in a traditional relational database.

One of the requirements for our framework was that there should be some form of data-change history available. And when you think of it, it is actually really simple to keep track of all the changes that are made to the data if you use triples. And that's because from a version-control point of view, all that changes each revision is that some triples are added, and others are removed.

So all that is needed is two more tables, one to keep track of the revision-data, like, who made the change, when, and a short description for future reference, and another to track all the added and removed triples in this revision:

CREATE TABLE `revision` (
    `revision_id`           int(11) not null auto_increment,
    `user_id`                int(11),
    `revision_timestamp`    int(11) not null,
    `revision_description`    varchar(255),
    PRIMARY KEY  (`revision_id`)

CREATE TABLE IF NOT EXISTS `mod_lime_revision_action` (
    `action_id`                   int(11) NOT NULL AUTO_INCREMENT,
    `revision_id`                 int(11) NOT NULL,
    `triple_id`                   int(11) NOT NULL,
    `action`                   enum ('ACTIVATE', 'DEACTIVATE') NOT NULL,
    `section_id`                  int(11),
    PRIMARY KEY  (`action_id`),
    CONSTRAINT `revision_triple_ibfk_1` FOREIGN KEY (`revision_id`) REFERENCES `revision` (`revision_id`) ON DELETE CASCADE,
    CONSTRAINT `revision_triple_ibfk_2` FOREIGN KEY (`triple_id`) REFERENCES `triple` (`triple_id`) ON DELETE CASCADE

Each time a user changes the data, a new revision is stored in the database, along with a list of all triples that are added or deactivated, and a compact description of the change. A triple that was already available in an inactive state is made active. If no such triple was present, an active one is created. Triples are never really removed, they are only set to be inactive.

If you query the triplestore (the set of all triples), you need to ensure that only the active triples are queried.

With this information, you can:

  • List all revisions made to the data, showing who made the change and when, along with a small description of the change.
  • Revert changes back to a previous revision, by performing the revisions backwards: activate the deactivated triples, deactivate the activated triples. It is also possible to undo a single revision, that is not even the last one. But beware that revisions following it may have dependencies on it.
  • Work together on an object by merging the changes made by the two users using the difference in data between the start and end revisions.


© Adam Betts

Object database

Businesses are used to work with objects. A web of data needs to be structured first before it can be used for common business purposes. To this end we decided to build an object oriented layer on top of the triplestore. But even though the Web Ontology Language (OWL) was designed for this purpose, we did not use it, since we needed only a very small subset anyway and we wanted complete freedom for our modelling activities, because processing speed was very high on our priority list. I will not cover all the details here, since it is a very extensive project, but I want to mention the following features:

  • The database was set up as a repository: no direct database access is possible by the application developer. Object creation, modification, destruction, and querying is done via the repository API. This provided the OOP principles of information hiding, modularity.
  • Object types could be associated with PHP classes. This is no requirement, but it proved really easy to generate object types from PHP classes. This provided us with the principle of polymorphism.
  • Not only simple objects are modelled as objects (a set of triples, having the same subject), but object types as well. Furthermore, the attributes of the types are modelled as objects as well. Objects and their types can be used in the same query.
  • Object types can be subtyped. The triplestore allows us to query objects of a given type and all its subtypes in a straightforward way.
  • The attributes of objects can be subtyped as well. This allows you to add datatype restrictions to the attributes of subtypes that were not applicable on a higher level up the type hierarchy.

These features are very powerful. It is possible to build a real Object database using triples as a datastructure only. Types and attributes are treated the same as normal objects. This means that the same code can be used to manipulate normal data as wel as metadata. Also, to implement inheritance is relatively easy, since object data is not chunked in single rows any more.

Query language

After some time we felt that the simple queries we were performing on the object database were too constraining. We wanted the same power that SQL provides. And on top of that, since we continue to use normal relation tables as well, the object queries needed to be able combine the object database with the relational tables. For these reasons, the semantic web query language of choice, SPARQL, was insufficient for our purposes. We now build SQL-like queries using method chaining on a query object. The object then creates the SQL query.

I mention this because you really need to build or use a new query language when starting to work with either a triplestores or an object database. The underlying triple store is too Spartan for an application developer. The lower level SQL queries consist of many self-joins connecting the subject of one to the object of another. Very hard to understand.


I wrote this article because I think this sort of detailed information about emerging datastructures is lacking on the internet. It is also great to work for a company (Procurios!) that agrees with me that knowledge should be given back to the community if possible. Gerbert Kaandorp from Backbase once asked me very seriously what I had done to promote the semantic web, like it was some kind of holy mission. I hope this article has made a small contribution and that it inspires some of you to build your own semantic web based object datastore. Let me know what you think!

zaterdag 28 november 2020

The right way of looking up names in a database while understanding a human sentence - this is it!

This is about natural language processing. Parsing a sentence. That holds names. Of persons, or cars, or whatever. When these names are linked to ID's in the database. 

 Let's just get to the point. These are the steps you need to take to parse a sentence, turn it into a semantic representation and resolve the names in the input to ID's in the database.

  • Parse
  • Relationize
  • Find Sorts
  • Find Names

The first thing to note is that these steps can be taken sequentially. It is not necessary to combine them.


Parsing the sentence is done with a parser using a grammar that contains rewrite rules. Each rewrite rule has pieces of semantics attached. I suggest you use Earley's algorithm. Some example syntax rule with semantics (= sense):
{ rule: s(S1) -> interrogative(S1),                                     
  sense: go:intent(question) }
{ rule: interrogative(P1) -> 'who' copula(_) np(E1),                    
  sense: go:quant_check($np, go:intent(who, E1) dom:person(E1)) }
{ rule: proper_noun_group(N1) -> proper_noun(N1) proper_noun(N1) proper_noun(N1) }
{ rule: proper_noun_group(N1) -> proper_noun(N1) proper_noun(N1) }
{ rule: proper_noun_group(N1) -> proper_noun(N1) }

When the parse is complete you have one or more syntax trees and each node in the tree may have semantic attachments.

In the rules above, take a special look at the `proper_noun_group` and `proper_noun` categories. The `proper_noun_group` matches a multi-word name (there are three rules, one for 1 word, one for 2 words, and one for 3 words). The `proper_noun` category matches a single word in the name.

Insight 1: the `proper_noun` category matches any word

That's right: this grammar has a category that matches any word. Why? Because names can be anything, from "Michael Jackson" to "IBM", "iPhone", "the miracles", "X11", and of course the legendary band the the (i mean The The). If these names are in a database, there is no need to guess which words are names, just look them up. This is how to do it.

You may be worried, nay, you should be worried, that this approach leads to many, many parse trees. And yes, this is true, a little, but not much. It leads so some more parse trees, but not a lot. And this is because of

Insight 2: syntax constrains the number of possible parse trees enormously

While any word can be a proper noun, and thus a noun, these proper nouns have to fit in with the rest of the parse tree. Once the parse is complete, only valid parse trees remain. Thus reducing the number of parse trees enormously.

Relationize and extract names

Combine the semantic attachments to create a single semantic representation of the sentence. No need for the syntax anymore.

When you walk through the syntax tree to combine the semantic representations into a single representation, you occasionally come across these `proper_noun_group` nodes that matched names. When you do, pick up the name-words from the child-nodes and append them into the full multi-word name. Create a mapping from variable to name, like this

  • E1: "Michael Jackson"
  • E2: "Lisa Marie Presley"

The variables refer to the entities that form the arguments of the relations in the semantic representation. Here's a simple semantic representation that we can use as an example:

marry(E1, E2)

We must now continue to find the database ID with these variables so that we may replace the variables with database ID's, like this

marry(19932, 20021)

Once we have that, we can check this relation to the database entry. But how do we find the name in the database?

Find sorts

Usually there is some table that maps names to ids, of course. And in the simple case where you need only a single entity type, this is straightforward. But I want to go a bit beyond that and consider the case where you have tens or hundreds of entity types, each with its own tables. Must you look up the name in each of these tables? No. And what if the name is present in multiple tables? Ouch.

The proper way to address this is to find out the sorts of the entities involved. The sort is the type of the entity. 

Each relation has a semantic selection that constrains its sorts. For example:

marry(person, person)

Once you find this out, you can map the variables to sorts, like this:

  • E1: person
  • E2: person

Now you know that its two persons that married. And you can look up the names in the person table, not the country table.

Find names

In the last step you take the variables, the names and the sorts, and you check which database supports which sorts and which table belongs to that particular sort. You may need database specific information about how to extract the ID from the table. But at least you know where to look for the name.

Database 1:
    sort = person, locate = relation(Id, _, Name, _, _)

At this point you know the exact table you need to look for the name, and you need to do only a single query to find the ID with the name. When this is done, you have the variable - ID link, like this:

  • E1: 19932
  • E2: 20021

And you can replace the variables with constants:

marry(19932, 20021)

As you can see, this representation does not have naming information in it any more. It has direct links to the primary key of the tables. And you know what this means: speed! And also

Insight 3: resolve names before executing the semantic representation


Why did I use this slightly hysterical way of writing an article? Because this is a hard subject to get attention. And because it took me five years to get to this point, and it was a real revelation when I found out. It finally clicked. It all came together. It's so simple, once you see it.

I hope you enjoyed this and it makes you want to use it. Good luck!

zaterdag 25 januari 2020

Combining the data of multiple databases in a single natural language query

In our time we have at our disposal a growing number of open data sources. We can query these individually, and that's nice. But wouldn't it be awesome if we could make queries that combined information from diverse data sources at once?

Suppose that you have two or open data databases at your disposal. One contains information about the relations between famous persons, the other about books and authors. What if you were to ask:
Did Charles Darwin's grandfather write "The Loves of the Plants"?

Database 1 might hold

parent(`a11`, `a34`)
parent(`a34`, `a71`)
name(`a11`, "Erasmus Darwin")
name(`a71`, "Charles Darwin")

and database 2 might hold

author(`x21-r11`, `w909`)
person(`x21-r11`, "Darwin", "Erasmus")
work(`w909`, "The Loves of the Plants")

You can see that one part of the information necessary to answer the question is available in database 1, while the order part is present in database 2, but it is not clear how these parts could be combined. Is it possible to combine these two to answer the question?

Shared identities

I think the answer to this problem can be found in what I will call shared identities: database independent ids that can be used to reason about entities that are present in more than one database.

The shared identity consists of an id and a number of database ids. The shared identities that would be used in the example above could be

[type: person, shared-id: darwin-105, db-ids: {db1: a71}]
[type: person, shared-id: darwin-99, db-ids: {db1: a34}]
[type: person, shared-id: darwin-21, db-ids: {db1: a11, db2: x21-r11}]
The shared identity has an id ("darwin-21") that is present in neither database, yet it can be used to populate the query. At the same time the ids of the databases where this entity is represented are available ("db1", "db2").

How does that work?

When the question is processed, a system will first look up the names "Charles Darwin" and "The Loves of the Plants" in the databases. This results in "a71" from db1 and "w909" from db2.

Next, the system will look up the shared identities from these two entities. Now most entity types do not have shared identities. "Work" for example only occurs in database 2, so there's no need for an extra super-database construct. The id from db2, "w909", can be used as shared identity for the work "The loves of the Plants". Persons, on the other hand, are represented in both databases, and so we look up the shared identity for "a71", which is "darwin-105".

Our query Did Charles Darwin's grandfather write "The Loves of the Plants"? will then look simply like this:

parent(E1, E2)
parent(E2, `darwin-105`)
author(E1, `w909`)

Or: did the parent of the parent of `darwin-105` author `w909`?

When this query is processed, the system will present the relations to the underlying databases.

parent(E2, `darwin-105`)
 contains a shared identity. It's id's will be looked up and be replaced before the relation is passed to the database.

parent(E2, `a71`)
The database will return a34 and the system will look up its shared id: `darwin-99`. E2 will be bound to `darwin-99`.

The system will work with shared identities everywhere, except when it deals with an individual database directly.

parent(E1, E2)
will then be

parent(E1, `darwin-99`)

and E1 will be found to be `darwin-21`. In order to resolve author(`darwin-21`, `w909`), `darwin-21` will need to transformed into the database id `x21-r11`.

Who creates the shared identities?

Shared identities are stored as mapping tables: database-id <==> shared id, one table for each entity type in a database.

Only entities that live in multiple databases need shared identities. For entities that live in a single database we can still use the term "shared identity" but only use the database id. No mapping tables are used.

The mapping tables can be constructed manually or be generated by rules or heuristics. It must be done off-line, not during the query, because it takes too much time.

The tables used in the example above can be constructed as follows: for each person, take the name from db1, check if the same name occurs in db2, by appending first name and last name. If so, add an entry to the person mapping tables of db1 and db2. Take the unique database id. Make up a shared identity, it doesn't matter what.

db1: darwin-21 = a11
db2: darwin-21 = x21-r11

If a person occurs in only one database, just add the mapping for that single database.

woensdag 16 oktober 2019

Translation in PHP using class constants

Translation in PHP frameworks is done in several different ways, but in all of them (?) the source text is a string. The translations are stored in CSV files, .mo files or in PHP arrays.

I propose to store them in class constants. This makes loading translations faster and creates a better integration with the IDE.

Translation classes

In this proposal a translation source file looks like this:

class CatalogTranslator extends Translator
    const price = "Price";
    const name = "Name";
    const nTimes = "%s times";

Each module has its own source file. This one belongs to the module Catalog. Note that we now have a source (i.e. nTimes) and a default translation ("%s times"in English, in this application).

The translation file (for the main German locale) is like this:
class CatalogTranslator_de_DE extends CatalogTranslator
    const price = "Preis";
    const nTimes = "%s Mal";
Note that the translation file extends the source file. This translation here has two translations, but lacks a translation for "name". That's ok; the value is inherited from its base class.

Translation calls

A function that needs translation functionality starts by creating an instance of a translation class for this module:
$t = CatalogTranslator::resolve();
And here is an example of an actual translation call:
There it is: $t::name. It is a constant from a class that was resolved by the CatalogTranslator::resolve() call. When the request's current locale is de_DE, the class will be CatalogTranslator_de_DE; when the locale is fr_FR it will be CatalogTranslator_fr_FR. Which of the classes is selected is determined by the resolution function resolve.

Translator class resolution

Did you notice that CatalogTranslator extends the class Translator? This base class contains the static resolve function that determines the runtime translation class:
class Translator
     * @return $this
    public static function resolve() {
        return $xyz->getTranslator(static::class);
The function returns $this, which tells the IDE that an object of class CatalogTranslator is returned. A subclass of it, in fact.

$xyz stands for some object in your framework that defines the getTranslator function. Important is only that the runtime class (static::class) is passed to determine the name of the active translation class.

The function getTranslator looks like this:
class $Xyz
    protected $translatorClasses = [];

    public function getTranslator(string $class): Translator
        if (!array_key_exists($class, $this->translatorClasses)) {
            $locale = $xyz->getLocale();
            $translatorClass = $class . '_' . $locale;
            $this->translatorClasses[$class] = new $translatorClass();
        return $this->translatorClasses[$class];
As you can see the resolver fetches the active locale from the request (again your framework will have a different approach) and simply append it to the runtime classname of the translation:

CatalogTranslator + de_DE = CatalogTranslator_de_DE

Advantages and disadvantages

An advantage of this approach is that the translations do not need to be loaded in memory each new request. They already reside in the opcache. Another is that translations are clearly scoped to the module they are defined in. No naming conflicts with other modules. Further, this approach is IDE friendly. Just control-click on the constant and you are in the source file. In this file you can find out where this source text is used (Find Usages) and which translations there are (Is overridden) using standard IDE functionality.

Disadvantages: the use of PHP constants to store translations will not be familiar to developers and translation agencies. It is easier to just type a source text directly in source code than to have to define a constant. Also, module-scoped translations mean that you may have to translate the same text several times in different modules. But if this is your only concern, you can adapt the approach into a single set of system-wide translation classes.

woensdag 15 mei 2019

Natural language interaction in flowcharts


In this post I will sketch some of my ideas about a generic natural language interaction program in the form of a flowchart.

Natural language interaction is any form of computer program that allows you to interact with it in a complex form of natural language (i.e. English, Chinese). It is a subject I have been working on for several years.

A flowchart visualizes the flow of control of a program. It consists mainly of interconnected tasks, but I added intermediate data structures (in blue) because of their special relevance.

What you're about to see is an attempt to unify the aspects of many historical nli-systems in a single system. It's still very sketchy, but I needed to have these thoughts visualized, because it allows me to have a clearer goal.

Check this wikipedia article if you're not sure about some of the flowchart symbols.

The main loop

The main loop of the program exists (as can be expected) of a sequence of asking for input, processing it, and displaying the output.

I split the processing part into analyzing and understanding. By analyzing I mean processing the textual string into a logical, context insensitive data structure (think predicate calculus). By understanding I mean processing this logical structure into a domain-specific representation, a speech act.

Both the analysis and understanding part may fail. Analysis is language dependent. If the program supports multiple languages and analysis in one language fails, the other languages may be tried. The language that succeeds will become the default language for the next input sentence. Understanding is domain specific. If the system supports multiple domains, and understanding fails for one domain, other domains may be tried.

The parts that look like this are called predefined processes. They are placeholders for other flowcharts, show below.

Analyse input

The analysis of input forms the standard pipeline of natural language processing.

Tokenization splits a string into words. Parsing creates a parse tree. Semantic analysis maps the forms to meaning. Quantifier scoping creates nested structures for quantifiers like "all". Pragmatic analysis fills in the variables like "this", "her".

The analysis phase needs information in the form of a lexicon (word forms), a grammar (rewrite rules) and semantic attachments (from word form and phrase to meaning structure).

Ask user

Any nontrivial system sometimes needs to ask the user to disambiguate between several possible interpretations. To do this, it is sensible to present the user with a multiple choice question (i.e. not an open question). This way, any answer the user gives is OK and will not cause a new problem in itself.

The added idea here is that a user should not be asked the same question more than once in a session. So any answer he/she gives is stored for possible later use. The store, "dialog context" may be used for pragmatic variables as well. The active subject of a conversation for example, that may be referred to as "he", "she" or "it", might well be stored here.

Speech act processing

This is the central component of an NLI system. The input is a logical structure that has been formatted in a way that makes sense in a certain domain. The term "speech act" fits this description well. The system then looks up a "solution", a way of handling, for this speech act. This solution consists of a goal (which is usually just the contents of the input, in a procedural form), and a way of presenting the results (it depends on the type of speech act and even per type of question). The latter one thus forms a goal in itself.

The red square is used for internal data structures. The one shown here lists a number of available solutions.

Process procedure

The assumption in this blog is that the internal processing of a speech act is done in a procedural way (think Prolog) where a goal is achieved by fulfilling is dependent subgoals.

The symbol

stands for a choice. One of these ways may be used to process a procedure.

The goal may just be to look up a fact in a domain database. Or it may involve reasoning or planning. Finally, it may require introspection into its own former goals. Advanced topic, only handled by SHRDLU. I apologize for the extremely sketchyness of this part. I have never programmed "planning", nor the introspective parts.

The take away of this diagram is that any goal may be achieved in many ways that all require intelligence. A goal may have sub goals and these sub goals can also have sub goals. The leaf nodes of such a structure consist of a simple database lookup, or a simple action. An action may update a database, or an actual action in the world (again SHRDLU being a perfect example).

Database query and update

Transforming a logical structure into a database query is an art in itself, but in a simple flowchart it just looks like this:


My approach to accessing a database is to not create a grand single query to fetch all information at once. In stead, I think it works better to have many elementary queries, selecting 1 field. No aggregations.

The necessity of consciousness

Why do living organisms experience consciousness? I hypothesize that living beings acquired consciousness because it was the only way they c...