Flowchart
In this post I will sketch some of my ideas about a generic natural language interaction program in the form of a flowchart.Natural language interaction is any form of computer program that allows you to interact with it in a complex form of natural language (i.e. English, Chinese). It is a subject I have been working on for several years.
A flowchart visualizes the flow of control of a program. It consists mainly of interconnected tasks, but I added intermediate data structures (in blue) because of their special relevance.
What you're about to see is an attempt to unify the aspects of many historical nli-systems in a single system. It's still very sketchy, but I needed to have these thoughts visualized, because it allows me to have a clearer goal.
Check this wikipedia article if you're not sure about some of the flowchart symbols.
The main loop
The main loop of the program exists (as can be expected) of a sequence of asking for input, processing it, and displaying the output.I split the processing part into analyzing and understanding. By analyzing I mean processing the textual string into a logical, context insensitive data structure (think predicate calculus). By understanding I mean processing this logical structure into a domain-specific representation, a speech act.
Both the analysis and understanding part may fail. Analysis is language dependent. If the program supports multiple languages and analysis in one language fails, the other languages may be tried. The language that succeeds will become the default language for the next input sentence. Understanding is domain specific. If the system supports multiple domains, and understanding fails for one domain, other domains may be tried.
The parts that look like this are called predefined processes. They are placeholders for other flowcharts, show below.
The parts that look like this are called predefined processes. They are placeholders for other flowcharts, show below.
Analyse input
The analysis of input forms the standard pipeline of natural language processing.Tokenization splits a string into words. Parsing creates a parse tree. Semantic analysis maps the forms to meaning. Quantifier scoping creates nested structures for quantifiers like "all". Pragmatic analysis fills in the variables like "this", "her".
The analysis phase needs information in the form of a lexicon (word forms), a grammar (rewrite rules) and semantic attachments (from word form and phrase to meaning structure).
Ask user
Any nontrivial system sometimes needs to ask the user to disambiguate between several possible interpretations. To do this, it is sensible to present the user with a multiple choice question (i.e. not an open question). This way, any answer the user gives is OK and will not cause a new problem in itself.The added idea here is that a user should not be asked the same question more than once in a session. So any answer he/she gives is stored for possible later use. The store, "dialog context" may be used for pragmatic variables as well. The active subject of a conversation for example, that may be referred to as "he", "she" or "it", might well be stored here.
Speech act processing
This is the central component of an NLI system. The input is a logical structure that has been formatted in a way that makes sense in a certain domain. The term "speech act" fits this description well. The system then looks up a "solution", a way of handling, for this speech act. This solution consists of a goal (which is usually just the contents of the input, in a procedural form), and a way of presenting the results (it depends on the type of speech act and even per type of question). The latter one thus forms a goal in itself.The red square is used for internal data structures. The one shown here lists a number of available solutions.
Process procedure
The assumption in this blog is that the internal processing of a speech act is done in a procedural way (think Prolog) where a goal is achieved by fulfilling is dependent subgoals.
The symbol
stands for a choice. One of these ways may be used to process a procedure.
The goal may just be to look up a fact in a domain database. Or it may involve reasoning or planning. Finally, it may require introspection into its own former goals. Advanced topic, only handled by SHRDLU. I apologize for the extremely sketchyness of this part. I have never programmed "planning", nor the introspective parts.
The take away of this diagram is that any goal may be achieved in many ways that all require intelligence. A goal may have sub goals and these sub goals can also have sub goals. The leaf nodes of such a structure consist of a simple database lookup, or a simple action. An action may update a database, or an actual action in the world (again SHRDLU being a perfect example).
Database query and update
Transforming a logical structure into a database query is an art in itself, but in a simple flowchart it just looks like this:
My approach to accessing a database is to not create a grand single query to fetch all information at once. In stead, I think it works better to have many elementary queries, selecting 1 field. No aggregations.