donderdag 13 mei 2021

Designing Software with the Needs of Human Beings in Mind

Some developers seem to think that writing tests for a software application is the best and only thing needed to write good software. It isn't. There's a lot more to it than that.

When you design or architect a piece of software, you need to be aware of the needs of all people that are somehow involved in it.

By needs, I mean the human needs or concerns that form the purpose of the software's existence. To write good software you need to be aware of these.

Needs

A need is something that is essential to the well-being of a person. I use the term need in the sense of Maslow's hierarchy of needs. People have a variety of needs and they expect some of these needs to be fulfilled by the software application.

These are some of the human needs that are related to software:

  • Basic needs
  • Safety
  • Esteem / Respect
  • Belonging
  • Creativity

There several types of people involved in your software: 

  • end-users: the people that actually use your software
  • developers: you, your fellow developers, others
  • business: managers, CEO's, business oriented people
These people have many different needs. Most of which you don't know about. But in some way they are reflected in different software aspects. 

Aspects

Software has a number of needs-related characteristics. Let's call them aspects. These characteristics can be reduced to one or more basic human needs.  

Functionality

This aspect simply says that the software should do what it's supposed to do. Have lots of features and no bugs. 

The functions of a software application helps a person to fulfill all kinds of needs: from buying food to spiritual growth. This is part of why it's so interesting to be a developer. Your work can be used in almost all fields of life.

Lack of functionality brings disappointment to users. They will try competitive products, if possible. Good software helps people to reach their goals and fulfill their needs.

Needs: all types of needs
Elements: features, usability, documentation, short time-to-market, bug-free
People: end-users, business

Speed

Processing speed is a pervasive aspect of software. Faster is almost always better. Why? Because whatever you want from a piece of software, your need will be fulfilled earlier if the software is faster. 

Slow software leads to frustration, an emotion that expresses that needs are not fulfilled fast enough. Only fast software can bring people in a state of flow.

Speed is tricky, because at the start of a project, the software you create is fast enough. It just becomes slower and slower as you add more features, or add more data. So you need to think ahead. Create a design with the working software in mind. Picture your customers using it, and estimate the amount of data they will be processing. Plan for scalability. 

Needs: all types of needs
Elements: hardware, architecture, scalability, optimization
People: end-users, developers, business

Maintainability

Maintainability is about the time it takes for someone to change the software. The time it takes to learn it; the ease of making changes.

If your software is maintainable, you and other developers are able to be creative while developing it further. Maintaining software builds a community of developers and this leads to a sense of belonging. Working on well-maintainable software lifts your self-esteem and the esteem of the other developers.

Needs: esteem, belonging, creativity
Elements: elegance, simplicity, architecture, design patterns, documentation, tests, code conventions, logging

People: developers, business

Security

Security is both a software aspect and a basic human need. People need to be safe, and feel that their software is safe to use.

Needs: safety
Elements: privacy, risk analysis, input checking
People: end-users, business

Low Cost

And then there is cost: software development costs money, of course. I call this "low cost", because the lower the cost, the better.

You can also think of cost in a broader sense: the negative effects of the software to society, and to the environment.

The cost of software is not simply reducible to a human need. 

Needs: any, indirectly
Elements: timely delivery, cost of development, cost of maintenance, environmental cost
People: end-users, business

About these aspects

These aspects are not cast in stone. I just made them up ;) I just picked some to waken your awareness to these things. You may find another subdivision more useful.

The aspects are orthogonal: they don't overlap, and the one cannot be reduced to the other.

But even though the aspects are orthogonal, they are not independent. Every new feature may reduce the speed. All aspects need maintenance. Everything that must be made costs money.

Note the "people" parts of the aspects: if "developers" are not listed among them, you will need to place yourself in the position of other people to understand their needs. Security, for example, is not important to the developer. The developer will need to consider what the end-user and the business will consider important when it comes to security.

In architecture design these aspects are usually called "stakes" and the people that are involved, stakeholders. The difference this blog tries to make is to emphasize that these stakes are based on human needs, and that makes it a bit more personal.

Balance

You will find your self thinking: yes, I understand. These are all worthy causes, but it takes way too much time to build software that fulfills all these aspects!

You are right. And this is exactly the point! First you need to be aware of all these aspects. Then you need to weigh all of them. Then you need to come to the right balance: which aspects are more important and which are less important, in the current project? You can just keep this in mind; or you might put it in writing if you intend to set a standard.

But you need to consider these aspects early on in the software trajectory. They are architectural decisions. Changing your stance on these issues may be very hard and costly.

Balancing aspects is the most important role of a software designer or architect.

Finally

You may have noticed that writing tests is just a small aspect of software design. It helps to write maintainable code, but it is not the whole picture. Some projects benefit more from focusing on other aspects. And in some projects automated tests are just a waste of time. It all depends.



maandag 1 februari 2021

How do I write my own parser? (for JSON)

 This blog post first appeared on http://techblog.procurios.nl/k/n618/news/view/14605/14863/how-do-i-write-my-own-parser-(for-json).html

05 December 2008

If no parser is available for the file you need, writing one yourself may be easier than you think. What file-structures are managable? What would be the design of such a parser? How do you make sure it is complete? Here we describe the process for building a JSON parser in C#, and issue the source code.

By Patrick van Bergen

[Download the JSON parser / generator for C#]

The software is subject to the MIT license: you are free to use it in any way you like, but it must keep its license.

For our synchronisation-module (which we use to synchronize data between diverse business applications) we chose JSON for data exchange. JSON is just a little better suited for a PHP web-environment than XML, because:

  • The PHP functions json_encode() and json_decode() allow you to convert data structures from and to JSON strings
  • JSON can be sent directly to the browser in an Ajax request
  • It takes up less space than XML, which is important in server > browser traffic.
  • A JSON string can be composed of only ASCII characters, while still being able to express all UNICODE characters, thus avoiding all possible conversion issues a transport may carry.

So JSON is very convenient for PHP. But of course we wanted to be able to synchronize with Windows applications as well, and because C# is better suited to this environment, this part of the module was written in this language. The .Net framework just didn't have its own JSON parser / encoder and the open-source software written for this task often contained a whole package of classes and constraints and sometimes the JSON implementation wasn't even complete.

We just wanted a single class that could be imported and that used the most basic building blocks of our application: the ArrayList and the Hashtable. Also, all aspects of JSON should would have to be implemented, there should a JSON generator, and of course it should be fast.

More reasons to write our own parser weren't necessary. Writing a parser happens to be a very thing satisfying to do. It is the best way to learn a new programming language thoroughly. Especially if you're using unit-testing to guarantee the parser / generator matches the language specification exactly. JSON's specification is easy to find. The website http://www.json.org/ is as clear as one could wish for.

You start by writing the unit-tests. You should really write all test before starting the implementation, but such patience is seldomly found in a programmer. You can at least start by writing some obvious tests that help you to create a consistent API. This is an example of a simple object-test:

string json;
Hashtable o;
bool success = true;

json = "{\"name\":123,\"name2\":-456e8}";
o = (Hashtable)JSON.JsonDecode(json);
success = success && ((double)o["name"] == 123);
success = success && ((double)o["name2"] == -456e8);

Eventually you should write all tests needed to check all aspects of the language, because your users (other programmers) will assume that the parser just works.

OK. Parsers. Parsers are associated with specialized software: so called compiler compilers (of which Yacc is the most well known). Using this software will make sure that the parser will be fast, but it does not do all the work for you. What's more, it can be even easier to write the entire parser yourself than to do all the preparatoy work for the cc.

The compiler compiler is needed for languages with a high level of ambiguity. A language expression is parsed from-left-to-right. If a language contains many structures that cannot be identified at the start of te parse, it is advisable to use a tool that is able to manage the emerging complexity.

Unambiguous languages are better suitable for building the parser manually, using recursive functions to process the recursive nature of the language. The parser looks ahead one or more tokens to identify the next construct. For JSON it is even sufficient to look ahead a single token. This classifies it as an LL(1) language (see also http://en.wikipedia.org/wiki/LL_parser).

A parser takes as input a string of tokens. Tokens are the most elementary building blocks of a language, like "+", "{", "[", but also complete numbers like "-1.345e5" and strings like "'The scottish highlander looked around.'". The parse-phase is usually preceded by a tokenization phase. In our JSON parser this step is integrated in the parser, because to determine the next token, in almost all cases, it is enough to just read the next character in the string. This saves the allocation of a token table in memory.

The parser takes a string as input and returns a C# datastructure, consisting of ArrayLists, Hashtables, a number of scalar value types and null. The string is processed from left-to-right. An index (pointer) keeps track of the current position in the string at any moment. At each level of the parse process the parser performs these steps:

  • Look ahead 1 token to determine the type of the next construct
  • Choose the function to parse the construct
  • Call this function and integrate the returned value in the construct that is currently built.

A nice example is the recursive function "ParseObject" that parses an object:

protected Hashtable ParseObject(char[] json, ref int index)
{
Hashtable table = new Hashtable();
int token;

// {
NextToken(json, ref index);

bool done = false;
while (!done) {
token = LookAhead(json, index);
if (token == JSON.TOKEN_NONE) {
return null;
} else if (token == JSON.TOKEN_COMMA) {
NextToken(json, ref index);
} else if (token == JSON.TOKEN_CURLY_CLOSE) {
NextToken(json, ref index);
return table;
} else {

// name
string name = ParseString(json, ref index);
if (name == null) {
return null;
}

// :
token = NextToken(json, ref index);
if (token != JSON.TOKEN_COLON) {
return null;
}

// value
bool success = true;
object value = ParseValue(json, ref index, ref success);
if (!success) {
return null;
}

table[name] = value;
}
}

return table;
}

The function is only called if a look ahead has determined that a construct starts with an opening curly brace. So this token may be skipped. Next, the string is parsed just as long as the closing brace is not found, or the end of the string is found (a syntax error, but one that needs to be caught). Between the braces there are a number of "'name': value" pairs, separated by comma's. This algorithm is can be found literally in the function, which makes it very insightful and thus easy to debug. The function builds an ArrayList and returns it to the calling function. The parser mainly consists of these types of functions.

If you create your own parser, you will always need to take into account that the incoming string may be grammatically incorrect. Users expect the parser to be able to tell on which line the error occurred. Our parser only remembers the index, but it also contains an extra function that returns the immediate context of the position of the error, comparable to the error messages that MySQL generates.

If you want to know more about parsers, it is good to know there consists a een standard work on this subject, that recently (2006) saw its second version:

Compilers: principles, techniques, and tools, Aho, A.V., Sethi, R. and Ullman ,J.D. (1986)

Semantic web marvels in a relational database - part II: Comparing alternatives

This blog article first appeared on http://techblog.procurios.nl/k/n618/news/view/34441/14863/Semantic-web-marvels-in-a-relational-database---part-II-Comparing-alternatives.html

15 June 2009

In this article I will compare the basic technical details of current relational database alternatives.

By Patrick van Bergen

In the first article I explained the relational database mapping of our semantic web implementation. In this article I will place this work into perspective by exploring related techniques.

The last few years developers are looking for ways to overcome certain shortcomings of relational database systems. RDBMSes are general purpose data stores that are flexible enough to store any type of data. However, these are several cases in which the relational model proves inefficient:

  • An object has many attributes (100+), many of which are optional. It would be wasting space to store all these attributes in separate columns.
  • Many attributes with multiple values. Since each of these attributes needs a separate table, the object data will be distributed over many tables. This is inefficient in terms of development time, maintenance, as well as query time.
  • Class inheritance. Since most software is Object Oriented these days the objects in code will need to be mapped to the database structure. In the case of class inheritance, where attributes are inherited from superclasses, it is a big problem to store objects in, and query them from, an RDBMS efficiently.
  • Types and attributes are not objects. In an RDBMS the data of a model is separate from the metadata (attribute names, datatypes, foreign key constraints, etc.). Types and attributes are not like normal objects. This is inefficient in areas where types and attributes need to be added, changed and removed regularly, just like any other data. It is inefficient to write separate code to manipulate and query types and attributes. In short, first order predicate logic no longer suffices for many new applications. The second order is needed.
  • Scalability. Is an aspect often named as the reason to leave RDBMS. However, since relational databases have been optimized for decades, they do scale. Nevertheless, in this age of global, real-time webapplications, techniques provided by RDBMS manufacturers may prove to be inadequate, or simply too expensive.

In the following I will provide a simple understanding of the basic principles of alternative database techniques, along with some pointers to more in-depth information. I hope you will forgive me my non-expert view on these subjects. For detailed information on any given subject, look elsewhere. This article is meant to be just a quick overview, aimed to waken some concepts provided by the examples.

RDBMS, or Row-Oriented database

In a relational database management system, pieces of data are grouped together in a record. In this article I will consider the case where the data stored is meant to represent the attributes of an object. Seen this way, a record is a group of attributes of an object. Here's an example of such a table of objects:

object idcolorwidthheightname
3red100100my box
4green50500old beam


Metadata is shown in gray. Keys / foreign keys are shown in bold typeface.

Need more attributes? Add more columns. Need an attribute with multiple values? Add a table and link it to the first. The RDBMS chooses speed over flexibility. Speed was a big deal 40 years ago, when this database type was designed. And it still is a big deal today. For large amounts of simple data, there is absolutely no need to leave this model.

Semantic net

Storing semantic information as triples is an old idea in the field of Knowledge Representation. As early as 1956, semantic nets were used for this purpose. In this technique the relations between objects are represented by plain labels. Each "record" stores only a single attribute, or one element of an array-attribute. Most notable are the absense of metadata and the fact that object data is distributed over many records.

object idpredicatevalue
3colorred
3width100
3height100
3name

my box

4colorgreen
4width50
4height500
4nameold beam


Need more attributes? No need to change the table structure. Need an attribute with multiple values? Same thing. 

Entity-Attribute-Value

The Entity-Attribute-Value model of knowledge representation uses some form of triples, just like the semantic web. Its primary use is described by Wikipedia as "Entity-Attribute-Value model (EAV), also known as object-attribute-value model and open schema is a data model that is used in circumstances where the number of attributes (properties, parameters) that can be used to describe a thing (an "entity" or "object") is potentially very vast, but the number that will actually apply to a given entity is relatively modest. In mathematics, this model is known as a sparse matrix."

Attribute metadata is stored in separate attribute tables, which are not triples. EAV is a sort of middle between semantic nets and semantic web: attributes have explicit properties, but these are fixed in amount.

EAV can be used to model classes and relationships as in EAV/CR.

EAV is used in Cloud computing databases like Amazon's SimpleDB and Google's App Engine.

object idattribute idvalue
31red
32100
33100
34

my box

41green
4250
43500
44old beam

 

attribute idnamedatatypeunique
1colorchar(6)true
2widthdoubletrue
3heightdoubletrue
4namestringtrue


Need more attributes? Add them in the attribute table. Attributes with multiple values? No extra work. The schema of the attributes is stored in the database explicitly, but attributes are treated different from the objects.

Column-Oriented databases

From wikipedia: "A column-oriented DBMS is a database management system (DBMS) which stores its content by column rather than by row."

object idcolor
3red
4green

 

object idwidth
3100
450

 

object idheight
3100
4500

 

object idname
3my box
4old beam

 

Google's BigTable is based, in part, on column-orientation. Their tables use reversed URI's as object and column identifiers, and have a "third dimension" in that older revisions of the data are stored in the same table.

References:

Correlation databases

correlation database is "value based": every constant value is stored only once. All these values are stored together, except that values are grouped by datatype. All values are indexed. "In addition to typical data values, the data value store contains a special type of data for storing relationships between tables...but with a CDBMS, the relationship is known by the dictionary and stored as a data value."

I have not found a clear example of what this datastructure looks like, but we can infer that the internal structure must look something like the following. Note: I may be completely wrong here!

The values-table (actually there is one table per major datatype; i.e. integers, strings, dates, etc.)

value idvalue
1red
2green
3100
450
5500
6my box
7old beam
8<object 1>
9<object 2>
10<relationship color>
11<relationship width>
12<relationship height>
13<relationship name>

 

and then there is at least a table containing the relationships (or: "associations") between the values. The relationships are stored as values themselves:

value id 1associationvalue id 2
8101
8113
8123
8136
9102
9114
9125
9137


References:

Hierarchical model, Network model, Navigational database

For the sake of completeness I have to name these models. The hierarchical model stores tree-like structures only, requiring each piece of data to have a single "parent". The network model allows a piece of data to have multiple parents. Both models were superseded by the relational model, but they are still used for special-purpose applications. A navigational database allows to traverse such trees / DAGs by following paths.

 

 

Object-Oriented databases

In an object-oriented database all attributes of a class are stored together. From what I've read on the internet I conclude that the actual storage structure of an OODBMS is sort of an implementation detail. This means that performance characteristics of the database will depend heavily on the type of implementation chosen. Development of this model was first in the hands of the ODMG, but control was transferred to the Java Community Proces that build the Java Data Objects specification. This specification names the conditions for such a database, but does not guide the implementation.

Some special properties:

  • Class inheritance is supported in the data model.
  • Object nesting: an object can contain (not just link to) other objects

Mapped to an RDBMS, a so called ORM (Object Relational Mapping), objects are commonly stored in a standard relational way: one column per (single valued) attribute. To implement inheritance, the columns of all base classes of an object are joined. This can be done at design-time (create a big table containing the columns of all parent classes) or at query-time (join parent class tables).

class idobject idcolorwidthheightname
1013red100100my box
1014green50500old beam

 

class idclass nameparent class
101Object 
102Bar101


References:

Document based databases

document based database is a different beast altogether. It lacks a database schema completely, and a complete object is stored in a single cell. In the case of CouchDB, this is done by encoding the object (or: document) in JSON. Real-time querying of the source table is thus impossible, one needs to create views on the data.

object iddocument
3{"color":"red","width":100,"height":100,"name":"my box"}
4{"color":"green","width":50,"height":500,"name":"old beam"}


References:

Triplestores

Some triplestores are publicly available. Commonly they have an RDF interface. Their performance can be measured using the Lehigh University Benchmark (LUBM). The most advanced open source triplestores are Sesame, and ARC.

object idattribute idvalue
3101red
3102100
3103100
3104my box
4101green
410250
4103500
4104old beam
101104color
102104width
103104height
104104name

 

Very little has been made public about the way triplestores are implemented in a relational database. A laudable exception to this is the Jena2 database schema. Unfortunately, the schema appears to be very inefficient, since the URIs are not indexed but are used literally.

A charmingly simple implementation that seems resource intensive was made for expasy4j: triples are stored in a single table, but for query speed, a single column is reserved for each separate datatype.

Another, somewhat better implementation was made for OpenLink Virtuoso: it uses indexed uris, but all constants are placed in a single field datatyped "ANY".

Conclusion

I hope this article has shown you a little bit why developers are looking for alternatives for the familiar RDBMS and which forms these currently have taken. Currently the field is quite diverse and developments are being made by many different parties. It will be interesting to see how this evolves and which alternative(s) will eventually become the successor of the relational database.


Semantic web marvels in a relational database - part I: Case Study

This blog article first appeared on http://techblog.procurios.nl/k/n618/news/view/34300/14863/Semantic-web-marvels-in-a-relational-database---part-I-Case-Study.html 

01 June 2009

You have heard about the semantic web. You know it is described as the future of the Web. But you are still wondering how this vision is going to make your applications better. Can it speed up application development? Can it help you to build complex datastructures? Can you use Object Oriented principles? This article shows how it can be done. And more.

By Patrick van Bergen

The semantic web is framework developed by the W3C under supervision of Tim Berners Lee. Its basic assumption is that data should be self-descriptive in a global way. That means that data does not just express numbers, dates and text, it also explicitly expresses the types of relationship these fields have for their objects. Using this uniform datastructure, it will be easier to interchange data between different servers, and most of all, data can be made accessible to global search engines.

That is a big thing. But is that all? Can't you just provide an RDF import / export tool for your data and be done? Are there any intrinsic reasons why you would base you entire datastructure on the semantic web?

In a series of two articles I will try to explain how we at Procurios implemented semantic web concepts, what the theoretical background of our implementation is, and what benefits a semantic web has over a traditional relational database. In this first article I will explain how we implemented a semantic web in a relational database (we used MySQL), added an object oriented layer on top, and even created a data revision control system from it.

Triples

In a classic relational database, data is stored in records. Each record contains multiple fields. These fields contain data that may belong to some object. The relation between the field and the object it belongs to is not represented as data in the database. It is only available as metadata in the form of the column (name, datatype, collation, foreign keys). An object is not explictly modelled, but rather via a series of linked tables.

A semantic web is a network of interrelated triples ("subject-predicate-object" triplets) whose predicates are part of the data themselves. Moreover, each object has an identifier that is not just an integer number that means only something inside the database only. It is a URI that may have a distinct meaning worldwide.

A triple is a record containing three values: either (uri, uri, uri) or (uri, uri, value). In the first form the triple relates one object to another, as in the fact "Vox inc. is a supplier" (Both "Vox inc.", "is a", and "supplier" are semantic subjects identified by a uri). In the second form the triple links a constant value to a subject, as in  "Vox inc.'s phone number is 0842 020 9090". A naive implementation would look like this:

CREATE TABLE `triple` (
    `subject`                varchar(255) NOT NULL,
    `predicate`             varchar(255) NOT NULL,
    `object`               longtext,
);

This table provides a complete implementation for the semantic web. However, it is too slow to be used in any serious application. Now, there are various ways in which this basic form can be optimized, but to my knowledge there is no best practise available. Several problems have to met:

  • How to identify a triple uniquely (If this is necessary for your application. The combination of subject, predicate, object itself is not unique)
  • How to search fast, given a subject and predicate? ("Give me the names of these set of people"?)
  • How to search fast, given a predicate and an object? ("Give me the persons whose name begins with `Moham`"?)

To solve these problems we came up with the following changes:

  • Create a single triple index table that only stores triple ids.
  • Create separate triple tables for each of the major datatypes needed (varchar(255), longtext, integer, double, datetime)
  • The triple tables reference the index table by foreign key.
  • Add two extra indexes for the two ways the tables are used: a subject-predicate combined key and a predicate-object combined key.

Here's the triple index table (we are using MySQL):

CREATE TABLE `triple` (
    `triple_id`                int(11) NOT NULL auto_increment,
    PRIMARY KEY (`triple_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

and here's the triple table for the datatype "datetime" (the other datatypes are handled similarly)

CREATE TABLE `triple_datetime` (
    `triple_id`                int(11) NOT NULL,
    `subject_id`              int(11) NOT NULL,
    `predicate_id`          int(11) NOT NULL,
    `object`                   datetime NOT NULL,
    `active`                   tinyint(1) NOT NULL DEFAULT '1',
    PRIMARY KEY (`triple_id`),
    KEY (`subject_id`, `predicate_id`),
    KEY (`predicate_id`, `object`),
    CONSTRAINT `triple_datetime_ibfk_1` FOREIGN KEY (`triple_id`) REFERENCES `triple` (`triple_id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The table definition should speak for itself, except for the field "active". This field is not necessary at this point, but I will need it in the next section.

The predicate_id refers to a separate "uri" table where the full uris of these predicates are stored. However, this is not necessary and the uris may be stored in the triple_longtext table as well.

The two combined keys have an interesting side-effect: the application developer never needs to be concerned again about using the right keys. Effective keys have been added by default.

To query this triplestore building SQL queries by hand may be a daunting task. It requires a special query language to be effective. More about that below.

All data of a given object can be queried by selecting all triples with a given subject id (one query per datatype triple table). That seems to be inefficient and it is: compared to the situation where an object can be stored in a single record, the triplestore is always slower. However, in a more complex situation a relational database requires you to join many tables to fetch all data. We use 5 separate queries (one per datatype table) to fetch all object data from the triplestore. This turned out faster than a single union query on the five queries. We use the same 5 queries to fetch all data of any desired number of objects. Here a the queries needed to fetch object data from three objects identified by the ids 12076, 12077, and 12078:

SELECT `object` FROM `triple_varchar` WHERE `subject_id` IN (12076, 12077, 12078);
SELECT `object` FROM `triple_longtext` WHERE `subject_id` IN (12076, 12077, 12078);
SELECT `object` FROM `triple_integer` WHERE `subject_id` IN (12076, 12077, 12078);
SELECT `object` FROM `triple_double` WHERE `subject_id` IN (12076, 12077, 12078);
SELECT `object` FROM `triple_datetime` WHERE `subject_id` IN (12076, 12077, 12078);

You can see that the object-data is fetched from the database without having to provide explicit type or attribute information. The type of the object is stored in one of the triples. This is useful in case of inheritance where the exact type of an object can only be determined at runtime.

Arrays and multilinguality

Many object attributes have an array datatype (an unordered set). To model these in a relational database you would need a separate table for each of these attributes. Querying all attributes of a series of objects including these array attributes is far from easy. In the triple store you can model an unordered set as a series of triples having the same subject and predicate and a different object. When you query all object data, you will get the array values the same way as you get the scalar values.

Multilinguality is also a hassle in relational databases. For each of the attributes that need to be available in more than one language the table structure needs to be adjusted and it is hard to avoid data duplication. In a triplestore you can treat a multilingual attribute almost like an array element. The only thing is that the predicates are similar but not the same. We use the following uri's for the representation of different language variants of an attribute: http://our-business.com/supplier/description#nl, http://our-business.com/supplier/description#en, http://our-business.com/supplier/description#de (in the tables these predicates are replaced by their integer ids for faster querying).

Data revision control

Version control is pretty common for developers when it comes to storing previous versions of their code. It allows you to track the changes of the code, revert to a previous version, and to work on the same file together. Still, when it comes to data, version control is very uncommon. And I think that is mainly because the overhead to create such a system is huge in a traditional relational database.

One of the requirements for our framework was that there should be some form of data-change history available. And when you think of it, it is actually really simple to keep track of all the changes that are made to the data if you use triples. And that's because from a version-control point of view, all that changes each revision is that some triples are added, and others are removed.

So all that is needed is two more tables, one to keep track of the revision-data, like, who made the change, when, and a short description for future reference, and another to track all the added and removed triples in this revision:

CREATE TABLE `revision` (
    `revision_id`           int(11) not null auto_increment,
    `user_id`                int(11),
    `revision_timestamp`    int(11) not null,
    `revision_description`    varchar(255),
    PRIMARY KEY  (`revision_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `mod_lime_revision_action` (
    `action_id`                   int(11) NOT NULL AUTO_INCREMENT,
    `revision_id`                 int(11) NOT NULL,
    `triple_id`                   int(11) NOT NULL,
    `action`                   enum ('ACTIVATE', 'DEACTIVATE') NOT NULL,
    `section_id`                  int(11),
    PRIMARY KEY  (`action_id`),
    CONSTRAINT `revision_triple_ibfk_1` FOREIGN KEY (`revision_id`) REFERENCES `revision` (`revision_id`) ON DELETE CASCADE,
    CONSTRAINT `revision_triple_ibfk_2` FOREIGN KEY (`triple_id`) REFERENCES `triple` (`triple_id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Each time a user changes the data, a new revision is stored in the database, along with a list of all triples that are added or deactivated, and a compact description of the change. A triple that was already available in an inactive state is made active. If no such triple was present, an active one is created. Triples are never really removed, they are only set to be inactive.

If you query the triplestore (the set of all triples), you need to ensure that only the active triples are queried.

With this information, you can:

  • List all revisions made to the data, showing who made the change and when, along with a small description of the change.
  • Revert changes back to a previous revision, by performing the revisions backwards: activate the deactivated triples, deactivate the activated triples. It is also possible to undo a single revision, that is not even the last one. But beware that revisions following it may have dependencies on it.
  • Work together on an object by merging the changes made by the two users using the difference in data between the start and end revisions.

 

© Adam Betts

Object database

Businesses are used to work with objects. A web of data needs to be structured first before it can be used for common business purposes. To this end we decided to build an object oriented layer on top of the triplestore. But even though the Web Ontology Language (OWL) was designed for this purpose, we did not use it, since we needed only a very small subset anyway and we wanted complete freedom for our modelling activities, because processing speed was very high on our priority list. I will not cover all the details here, since it is a very extensive project, but I want to mention the following features:

  • The database was set up as a repository: no direct database access is possible by the application developer. Object creation, modification, destruction, and querying is done via the repository API. This provided the OOP principles of information hiding, modularity.
  • Object types could be associated with PHP classes. This is no requirement, but it proved really easy to generate object types from PHP classes. This provided us with the principle of polymorphism.
  • Not only simple objects are modelled as objects (a set of triples, having the same subject), but object types as well. Furthermore, the attributes of the types are modelled as objects as well. Objects and their types can be used in the same query.
  • Object types can be subtyped. The triplestore allows us to query objects of a given type and all its subtypes in a straightforward way.
  • The attributes of objects can be subtyped as well. This allows you to add datatype restrictions to the attributes of subtypes that were not applicable on a higher level up the type hierarchy.

These features are very powerful. It is possible to build a real Object database using triples as a datastructure only. Types and attributes are treated the same as normal objects. This means that the same code can be used to manipulate normal data as wel as metadata. Also, to implement inheritance is relatively easy, since object data is not chunked in single rows any more.

Query language

After some time we felt that the simple queries we were performing on the object database were too constraining. We wanted the same power that SQL provides. And on top of that, since we continue to use normal relation tables as well, the object queries needed to be able combine the object database with the relational tables. For these reasons, the semantic web query language of choice, SPARQL, was insufficient for our purposes. We now build SQL-like queries using method chaining on a query object. The object then creates the SQL query.

I mention this because you really need to build or use a new query language when starting to work with either a triplestores or an object database. The underlying triple store is too Spartan for an application developer. The lower level SQL queries consist of many self-joins connecting the subject of one to the object of another. Very hard to understand.

Afterword

I wrote this article because I think this sort of detailed information about emerging datastructures is lacking on the internet. It is also great to work for a company (Procurios!) that agrees with me that knowledge should be given back to the community if possible. Gerbert Kaandorp from Backbase once asked me very seriously what I had done to promote the semantic web, like it was some kind of holy mission. I hope this article has made a small contribution and that it inspires some of you to build your own semantic web based object datastore. Let me know what you think!

zaterdag 28 november 2020

The right way of looking up names in a database while understanding a human sentence - this is it!

This is about natural language processing. Parsing a sentence. That holds names. Of persons, or cars, or whatever. When these names are linked to ID's in the database. 

 Let's just get to the point. These are the steps you need to take to parse a sentence, turn it into a semantic representation and resolve the names in the input to ID's in the database.

  • Parse
  • Relationize
  • Find Sorts
  • Find Names

The first thing to note is that these steps can be taken sequentially. It is not necessary to combine them.

Parse

Parsing the sentence is done with a parser using a grammar that contains rewrite rules. Each rewrite rule has pieces of semantics attached. I suggest you use Earley's algorithm. Some example syntax rule with semantics (= sense):
{ rule: s(S1) -> interrogative(S1),                                     
  sense: go:intent(question) }
  
{ rule: interrogative(P1) -> 'who' copula(_) np(E1),                    
  sense: go:quant_check($np, go:intent(who, E1) dom:person(E1)) }
  
{ rule: proper_noun_group(N1) -> proper_noun(N1) proper_noun(N1) proper_noun(N1) }
{ rule: proper_noun_group(N1) -> proper_noun(N1) proper_noun(N1) }
{ rule: proper_noun_group(N1) -> proper_noun(N1) }


When the parse is complete you have one or more syntax trees and each node in the tree may have semantic attachments.

In the rules above, take a special look at the `proper_noun_group` and `proper_noun` categories. The `proper_noun_group` matches a multi-word name (there are three rules, one for 1 word, one for 2 words, and one for 3 words). The `proper_noun` category matches a single word in the name.

Insight 1: the `proper_noun` category matches any word

That's right: this grammar has a category that matches any word. Why? Because names can be anything, from "Michael Jackson" to "IBM", "iPhone", "the miracles", "X11", and of course the legendary band the the (i mean The The). If these names are in a database, there is no need to guess which words are names, just look them up. This is how to do it.

You may be worried, nay, you should be worried, that this approach leads to many, many parse trees. And yes, this is true, a little, but not much. It leads so some more parse trees, but not a lot. And this is because of

Insight 2: syntax constrains the number of possible parse trees enormously

While any word can be a proper noun, and thus a noun, these proper nouns have to fit in with the rest of the parse tree. Once the parse is complete, only valid parse trees remain. Thus reducing the number of parse trees enormously.

Relationize and extract names

Combine the semantic attachments to create a single semantic representation of the sentence. No need for the syntax anymore.

When you walk through the syntax tree to combine the semantic representations into a single representation, you occasionally come across these `proper_noun_group` nodes that matched names. When you do, pick up the name-words from the child-nodes and append them into the full multi-word name. Create a mapping from variable to name, like this

  • E1: "Michael Jackson"
  • E2: "Lisa Marie Presley"

The variables refer to the entities that form the arguments of the relations in the semantic representation. Here's a simple semantic representation that we can use as an example:

marry(E1, E2)

We must now continue to find the database ID with these variables so that we may replace the variables with database ID's, like this

marry(19932, 20021)

Once we have that, we can check this relation to the database entry. But how do we find the name in the database?

Find sorts

Usually there is some table that maps names to ids, of course. And in the simple case where you need only a single entity type, this is straightforward. But I want to go a bit beyond that and consider the case where you have tens or hundreds of entity types, each with its own tables. Must you look up the name in each of these tables? No. And what if the name is present in multiple tables? Ouch.

The proper way to address this is to find out the sorts of the entities involved. The sort is the type of the entity. 

Each relation has a semantic selection that constrains its sorts. For example:

marry(person, person)

Once you find this out, you can map the variables to sorts, like this:

  • E1: person
  • E2: person

Now you know that its two persons that married. And you can look up the names in the person table, not the country table.

Find names

In the last step you take the variables, the names and the sorts, and you check which database supports which sorts and which table belongs to that particular sort. You may need database specific information about how to extract the ID from the table. But at least you know where to look for the name.

Database 1:
    sort = person, locate = relation(Id, _, Name, _, _)

At this point you know the exact table you need to look for the name, and you need to do only a single query to find the ID with the name. When this is done, you have the variable - ID link, like this:

  • E1: 19932
  • E2: 20021

And you can replace the variables with constants:

marry(19932, 20021)

As you can see, this representation does not have naming information in it any more. It has direct links to the primary key of the tables. And you know what this means: speed! And also

Insight 3: resolve names before executing the semantic representation

Afterword

Why did I use this slightly hysterical way of writing an article? Because this is a hard subject to get attention. And because it took me five years to get to this point, and it was a real revelation when I found out. It finally clicked. It all came together. It's so simple, once you see it.

I hope you enjoyed this and it makes you want to use it. Good luck!










zaterdag 25 januari 2020

Combining the data of multiple databases in a single natural language query

In our time we have at our disposal a growing number of open data sources. We can query these individually, and that's nice. But wouldn't it be awesome if we could make queries that combined information from diverse data sources at once?

Suppose that you have two or open data databases at your disposal. One contains information about the relations between famous persons, the other about books and authors. What if you were to ask:
Did Charles Darwin's grandfather write "The Loves of the Plants"?

Database 1 might hold

parent(`a11`, `a34`)
parent(`a34`, `a71`)
name(`a11`, "Erasmus Darwin")
name(`a71`, "Charles Darwin")

and database 2 might hold

author(`x21-r11`, `w909`)
person(`x21-r11`, "Darwin", "Erasmus")
work(`w909`, "The Loves of the Plants")

You can see that one part of the information necessary to answer the question is available in database 1, while the order part is present in database 2, but it is not clear how these parts could be combined. Is it possible to combine these two to answer the question?

Shared identities

I think the answer to this problem can be found in what I will call shared identities: database independent ids that can be used to reason about entities that are present in more than one database.

The shared identity consists of an id and a number of database ids. The shared identities that would be used in the example above could be

[type: person, shared-id: darwin-105, db-ids: {db1: a71}]
[type: person, shared-id: darwin-99, db-ids: {db1: a34}]
[type: person, shared-id: darwin-21, db-ids: {db1: a11, db2: x21-r11}]
The shared identity has an id ("darwin-21") that is present in neither database, yet it can be used to populate the query. At the same time the ids of the databases where this entity is represented are available ("db1", "db2").

How does that work?

When the question is processed, a system will first look up the names "Charles Darwin" and "The Loves of the Plants" in the databases. This results in "a71" from db1 and "w909" from db2.

Next, the system will look up the shared identities from these two entities. Now most entity types do not have shared identities. "Work" for example only occurs in database 2, so there's no need for an extra super-database construct. The id from db2, "w909", can be used as shared identity for the work "The loves of the Plants". Persons, on the other hand, are represented in both databases, and so we look up the shared identity for "a71", which is "darwin-105".

Our query Did Charles Darwin's grandfather write "The Loves of the Plants"? will then look simply like this:

parent(E1, E2)
parent(E2, `darwin-105`)
author(E1, `w909`)

Or: did the parent of the parent of `darwin-105` author `w909`?

When this query is processed, the system will present the relations to the underlying databases.

parent(E2, `darwin-105`)
 contains a shared identity. It's id's will be looked up and be replaced before the relation is passed to the database.

parent(E2, `a71`)
The database will return a34 and the system will look up its shared id: `darwin-99`. E2 will be bound to `darwin-99`.

The system will work with shared identities everywhere, except when it deals with an individual database directly.

parent(E1, E2)
will then be

parent(E1, `darwin-99`)

and E1 will be found to be `darwin-21`. In order to resolve author(`darwin-21`, `w909`), `darwin-21` will need to transformed into the database id `x21-r11`.

Who creates the shared identities?

Shared identities are stored as mapping tables: database-id <==> shared id, one table for each entity type in a database.

Only entities that live in multiple databases need shared identities. For entities that live in a single database we can still use the term "shared identity" but only use the database id. No mapping tables are used.

The mapping tables can be constructed manually or be generated by rules or heuristics. It must be done off-line, not during the query, because it takes too much time.

The tables used in the example above can be constructed as follows: for each person, take the name from db1, check if the same name occurs in db2, by appending first name and last name. If so, add an entry to the person mapping tables of db1 and db2. Take the unique database id. Make up a shared identity, it doesn't matter what.

db1: darwin-21 = a11
db2: darwin-21 = x21-r11

If a person occurs in only one database, just add the mapping for that single database.


woensdag 16 oktober 2019

Translation in PHP using class constants

Translation in PHP frameworks is done in several different ways, but in all of them (?) the source text is a string. The translations are stored in CSV files, .mo files or in PHP arrays.

I propose to store them in class constants. This makes loading translations faster and creates a better integration with the IDE.

Translation classes

In this proposal a translation source file looks like this:

class CatalogTranslator extends Translator
{
    const price = "Price";
    const name = "Name";
    const nTimes = "%s times";
}

Each module has its own source file. This one belongs to the module Catalog. Note that we now have a source (i.e. nTimes) and a default translation ("%s times"in English, in this application).

The translation file (for the main German locale) is like this:
class CatalogTranslator_de_DE extends CatalogTranslator
{
    const price = "Preis";
    const nTimes = "%s Mal";
}
Note that the translation file extends the source file. This translation here has two translations, but lacks a translation for "name". That's ok; the value is inherited from its base class.

Translation calls

A function that needs translation functionality starts by creating an instance of a translation class for this module:
$t = CatalogTranslator::resolve();
And here is an example of an actual translation call:
$name->setLabel($t::name);
There it is: $t::name. It is a constant from a class that was resolved by the CatalogTranslator::resolve() call. When the request's current locale is de_DE, the class will be CatalogTranslator_de_DE; when the locale is fr_FR it will be CatalogTranslator_fr_FR. Which of the classes is selected is determined by the resolution function resolve.

Translator class resolution

Did you notice that CatalogTranslator extends the class Translator? This base class contains the static resolve function that determines the runtime translation class:
class Translator
{
    /**
     * @return $this
     */
    public static function resolve() {
        return $xyz->getTranslator(static::class);
    }
}
The function returns $this, which tells the IDE that an object of class CatalogTranslator is returned. A subclass of it, in fact.

$xyz stands for some object in your framework that defines the getTranslator function. Important is only that the runtime class (static::class) is passed to determine the name of the active translation class.

The function getTranslator looks like this:
class $Xyz
{
    protected $translatorClasses = [];

    public function getTranslator(string $class): Translator
    {
        if (!array_key_exists($class, $this->translatorClasses)) {
            $locale = $xyz->getLocale();
            $translatorClass = $class . '_' . $locale;
            $this->translatorClasses[$class] = new $translatorClass();
        }
        return $this->translatorClasses[$class];
    }
}
As you can see the resolver fetches the active locale from the request (again your framework will have a different approach) and simply append it to the runtime classname of the translation:

CatalogTranslator + de_DE = CatalogTranslator_de_DE

Advantages and disadvantages

An advantage of this approach is that the translations do not need to be loaded in memory each new request. They already reside in the opcache. Another is that translations are clearly scoped to the module they are defined in. No naming conflicts with other modules. Further, this approach is IDE friendly. Just control-click on the constant and you are in the source file. In this file you can find out where this source text is used (Find Usages) and which translations there are (Is overridden) using standard IDE functionality.

Disadvantages: the use of PHP constants to store translations will not be familiar to developers and translation agencies. It is easier to just type a source text directly in source code than to have to define a constant. Also, module-scoped translations mean that you may have to translate the same text several times in different modules. But if this is your only concern, you can adapt the approach into a single set of system-wide translation classes.



woensdag 15 mei 2019

Natural language interaction in flowcharts

Flowchart

In this post I will sketch some of my ideas about a generic natural language interaction program in the form of a flowchart.

Natural language interaction is any form of computer program that allows you to interact with it in a complex form of natural language (i.e. English, Chinese). It is a subject I have been working on for several years.

A flowchart visualizes the flow of control of a program. It consists mainly of interconnected tasks, but I added intermediate data structures (in blue) because of their special relevance.

What you're about to see is an attempt to unify the aspects of many historical nli-systems in a single system. It's still very sketchy, but I needed to have these thoughts visualized, because it allows me to have a clearer goal.

Check this wikipedia article if you're not sure about some of the flowchart symbols.

The main loop

The main loop of the program exists (as can be expected) of a sequence of asking for input, processing it, and displaying the output.

I split the processing part into analyzing and understanding. By analyzing I mean processing the textual string into a logical, context insensitive data structure (think predicate calculus). By understanding I mean processing this logical structure into a domain-specific representation, a speech act.

Both the analysis and understanding part may fail. Analysis is language dependent. If the program supports multiple languages and analysis in one language fails, the other languages may be tried. The language that succeeds will become the default language for the next input sentence. Understanding is domain specific. If the system supports multiple domains, and understanding fails for one domain, other domains may be tried.




The parts that look like this are called predefined processes. They are placeholders for other flowcharts, show below.


Analyse input

The analysis of input forms the standard pipeline of natural language processing.



Tokenization splits a string into words. Parsing creates a parse tree. Semantic analysis maps the forms to meaning. Quantifier scoping creates nested structures for quantifiers like "all". Pragmatic analysis fills in the variables like "this", "her".

The analysis phase needs information in the form of a lexicon (word forms), a grammar (rewrite rules) and semantic attachments (from word form and phrase to meaning structure).

Ask user

Any nontrivial system sometimes needs to ask the user to disambiguate between several possible interpretations. To do this, it is sensible to present the user with a multiple choice question (i.e. not an open question). This way, any answer the user gives is OK and will not cause a new problem in itself.



The added idea here is that a user should not be asked the same question more than once in a session. So any answer he/she gives is stored for possible later use. The store, "dialog context" may be used for pragmatic variables as well. The active subject of a conversation for example, that may be referred to as "he", "she" or "it", might well be stored here.

Speech act processing

This is the central component of an NLI system. The input is a logical structure that has been formatted in a way that makes sense in a certain domain. The term "speech act" fits this description well. The system then looks up a "solution", a way of handling, for this speech act. This solution consists of a goal (which is usually just the contents of the input, in a procedural form), and a way of presenting the results (it depends on the type of speech act and even per type of question). The latter one thus forms a goal in itself.



The red square is used for internal data structures. The one shown here lists a number of available solutions.

Process procedure

The assumption in this blog is that the internal processing of a speech act is done in a procedural way (think Prolog) where a goal is achieved by fulfilling is dependent subgoals.




The symbol



stands for a choice. One of these ways may be used to process a procedure.

The goal may just be to look up a fact in a domain database. Or it may involve reasoning or planning. Finally, it may require introspection into its own former goals. Advanced topic, only handled by SHRDLU. I apologize for the extremely sketchyness of this part. I have never programmed "planning", nor the introspective parts.

The take away of this diagram is that any goal may be achieved in many ways that all require intelligence. A goal may have sub goals and these sub goals can also have sub goals. The leaf nodes of such a structure consist of a simple database lookup, or a simple action. An action may update a database, or an actual action in the world (again SHRDLU being a perfect example).

Database query and update

Transforming a logical structure into a database query is an art in itself, but in a simple flowchart it just looks like this:

 

My approach to accessing a database is to not create a grand single query to fetch all information at once. In stead, I think it works better to have many elementary queries, selecting 1 field. No aggregations.



zondag 2 december 2018

Matching brackets

For my work as a developer, I sometimes get the chance to interview new developers for the company. I check their resumes and I review some of their code they send me. We thought it would also be a good idea to give them a little programming task to do in a limited amount of time. I had no idea what kind of task to choose, so I checked Google. The following problem showed up several times and I liked it because it requires some thinking to do it right and at the same time, it doesn't require a lot of typing.

Write a function that checks if the parentheses of an expression match.

The task involves several types of brackets: [], () and {}.  For example, the parenthesis in this expression match:
({[12][34]}{(56)})
While in this expression they do not:
{[909)
I wrote a test function (in PHP):
function test(string $exp, bool $expected)
{
    if (matchBrackets($exp) !== $expected) {
        echo "Error matching " . $exp . "\n";
    }
}
and some tests
test("()", true);
test("()()", true);
test("noot()noot", true);
test("(aap)", true);
test("aap", true);
test("(", false);
test(")", false);
test("{[]}", true);
test("{[][]}", true);
test("{[]()}", true);
test("", true);
test("{[aap](({banaan}))}", true);
test("{[{[](({}){a[](({}))}){b[](({})){[](({c}))}}{[](({}))}}]({[]d(({}){[](({123}))})}({{[](({}))}}))}", true);
test("{[aap](({banaa))}", false);
test("{[aap]((anaan}))}", false);

Solving the problem in an elegant way requires you to use recursive nature of the domain.

While the developer was writing the program, I tried to figure it out for myself. And it occurred to me that next to some algorithmic options, there would also be a possibility to use a regular expression. But this could only be done by applying the advanced recursive patterns feature. I had heard of this feature, but never used it.

I would certainly not demand of the developer to give such an answer, but it intrigued me too much to let it go.

It is indeed possible, and so, without further ado, I present the answer here:
function matchBrackets(string $exp): bool
{
    return preg_match('/^
        # start of group 1
        (
            (
                (
                    [^\(\{\[\)\}\]]+
                ) |
                (
                    \( (?1) \)
                ) |
                (
                    \[ (?1) \]
                ) |
                (
                    \{ (?1) \}
                )
            )*
        )
    $/x', $exp);
}

I will try to explain by building the expression from the ground up.
//x
The "x" modifier strips all whitespace from the string. This allows you to make it more readable.
/^$/x
Match the beginning and end of $exp.
/^(recursion-group)$/x
Create the group that will be used for recursion.
/^((recursion-group)*)$/x
Each item in this group can be repeated any number of times. This also allows for the completely empty $exp.
/^(((other-chars)|(round-brackets)|(squared-brackets)|(curly-braces))*)$/x
Any nested group consists of a sequence of non-brackets, or a set of round brackets () or a set of squared brackets [] or a set of curly braces {}. Finally a "bracket" groups look like this:
\{ (?1) \}
It starts with a bracket (here "{"), followed by a recursive group, and ends with the matching bracket "}"..

A recursive group can repeat the entire expression. You would have to use (?R) in stead of (?1). Here it just matches a single group (group 1) from the expression. Group 1 is the group that starts with the first "(" in the expression. I do not use (?R) here because that would place the ^ and $ in the middle of the expression. That would never give a match.

End Note

It took me hours to get this expression just right, and as simple as possible. I expected the recursive expression to be complicated, but actually it turned out that it could be made to be pretty simple. And if you're interested: yes, it is fast. It was about 8 times as fast as the algorithmic variant I came up with. For a non-compiled language like PHP regular expressions open the door to the speed of a compiled language.

zondag 18 februari 2018

DAYDREAMER - A Landmark Cognitive Architecture

I posted this review before at Amazon
Source code of DAYDREAMER can be found on Github

This book is a revised version of Erik T. Mueller's 1987 dissertation. It was written in 1990 and republished in 2012, by the author himself.

It is a very important book in the discipline of Cognitive Science, as I will try to explain.

Daydreaming, the subject of the book, is the process of "the spontaneous activity - carried out in a stream of thought - of recalling past experiences, imagining alternative courses that a past experience might have taken, and imagining possible future experiences."

Mueller has gone way beyond that single subject, and worked out an impressive cognitive architecture that includes many mental faculties. He supports it with a massive base of research literature. Based on this architecture is the DAYDREAMER Lisp program.

Architecture

At the core of the architecture are personal goals and needs. These include achievement goals: to maintain self esteem, to have friends, and so on; and cyclic goals: food, money, entertainment.

In order to achieve these personal goals, DAYDREAMER has knowledge about "persons, physical objects, locations, possession of physical objects, ... mental states, interpersonal relationships, actions, and activities" (the "interpersonal domain"). This knowledge is implemented as planning rules and inference rules.

The program also applies these rules to other entities than itself in order to represent and make inferences on the mental states of others. This is called "other planning".

"Most AI programs take a request from a human user, handle the request, and terminate...In contrast, DAYDREAMER has a collection of goals which are instantiated and processed as the program sees fit."

A task or activity on behalf of an active top-level goal is called a "concern". Several concerns may be active at any time. "Each subsequence of contiguous representations produced on behalf of a given concern may be called a 'daydream'".

The program uses emotions to determine which concern to choose at any time ("emotion-driven planning"). Several emotions are created in different phases of a concern. For example, when a concern terminates a new positive or negative emotion is created. These emotions may trigger new concerns.

The architecture allow for two types of creative mechanisms: Serendipity and Action Mutation. Serendipity-based planning uses new input, as a fortunate coincidence, to help progress concerns that are not currently active. Action Mutation modifies events and goals in arbitrary ways (mutation) in order to find novel solutions.

Events are stored in episodic memory.

Emotions, Daydreaming goals, learning, and creativity

  • Chapter 3 shows how to represent emotions in a clear way. It continues with the description of the personal goals that are specific to daydreaming: rationalization, roving, revenge, reversal, recovery, rehearsal, and repercussions. These goals are activated when a person is idle. "Daydreaming goals are heuristics for how to exploit surplus processing time in a useful way." They are activated by emotions and directed, among others, towards the reduction of negative emotional states, and the exploration of consequences of possible actions.
  • Chapter 4 show how a person learns from daydreaming, the effect on his or her future behavior. 
  • Chapter 5 dives into the creative processes serendipity and action mutation.

Implementation

  • Chapter 6 describes the implementation of mental states and attitudes using Conceptual Dependency. 
  • Chapter 7 describes the implementation of DAYDREAMER rules and procedures in the GATE language, an AI program for the T language, a version of Scheme (Lisp). 
  • Chapter 8 describes possible episodic memory schemes, and Intersection Indexing, the technique used in DAYDREAMER.

Literature review, philosophical underpinnings, and conclusions

  • Chapter 9 is a review literature on daydreaming. 
  • Chapter 10 deals with philosophical questions. 
  • Chapter 11 describes possible applications of the architecture.

Conclusion

Mueller seems to have read every single book and paper every written on the subject, and all adjoining fields as well. Whenever he missed a book reference, and I checked, it turned out that that book hadn't been written at the time. Note that Mueller's dissertation was finished as early as 1987!

And even though it is thirty years later now, as far as I know (and I am happy to be proven wrong), this book can still be seen as the most complete description of a cognitive architecture to date.

zaterdag 13 januari 2018

My little NLI-GO DBPedia demo

With some pride (but not very much ;) I present to you my natural language interaction demo:

http://patrickvanbergen.com/dbpedia/app/

 This is the first presentable result of many years work of trying to create a library that allows a user (you) to interact with a database in natural language.

The language in this example is DBPedia, a database that is based on Wikipedia. It is a triple store that contains information about the people in Wikipedia, among other things.

My library tokenizes and parses the sentence and it converts the parse tree into a relational representation. A solution is looked up and executed. This involves a series of simple database requests (in this case Sparql queries). The results are integrated to more relations. From these relations a response in natural language is generated.

The point of my library is not to create a little interface to DBPedia. I am looking to make the interface to databases in general as simple and powerful as possible. This means that more databases, types of databases, domains, natural languages, etc, should be addable with few configuration files. No coding should be necessary eventually. It's a long way to go.

Nevertheless, I invite you to try these sentences with the demo, and check out the productions that will be shown. These are some of the steps the library takes to find the answer.

  • How many children had Lord Byron?
  • How many children had Michael Jackson?
  • Who married Elvis Presley?
  • Who married Donald Trump?
I hope you find this interesting.

Designing Software with the Needs of Human Beings in Mind

Some developers seem to think that writing tests for a software application is the best and only thing needed to write good software. It isn...