zondag 25 juni 2017

An introduction to CALO

From 2003 to 2008 the American Defense Advanced Research Projects Agency (DARPA) funded the largest research effort for cognitive systems in history. It involved 250 people from 25 universities and companies. Over 500 articles were written about it. It was named PAL, for Perceptive Assistant that learns. The aim of this program was described as follows
Through the PAL program, researchers will develop software that will function as an "enduring personalized cognitive assistant" to help decision-makers manage their complex worlds of multiple simultaneous tasks and unexpected events. 
The DARPA grant was divided over Carnegie Mellon University (CMU) and Stanford Research Institute (SRI).

The CMU project was dubbed RADAR, for Reflective Agents with Distributed Adaptive Reasoning. The system's eventual goal was "The system will help busy managers to cope with time-consuming tasks such as organizing their E-mail, planning meetings, allocating scarce resources such as office space, maintaining a web site, and writing quarterly reports. Like any good assistant, RADAR must learn by interacting with its human master and by accepting explicit advice and instruction." [PAL1]

The SRI project was named CALO (Cognitive Agent that Learns and Organizes). It involved contributions from 20 additional research institutions. Its goal was broader than RADAR's. "The name was inspired by the Latin word “calonis,” which means “soldier’s assistant.” The CALO software, which will learn by working with and being advised by its users, will handle a broad range of interrelated decision-making tasks that have in the past been resistant to automation. It will have the capability to engage in and carry out routine tasks, and to assist when the unexpected happens." [PAL1]

This article is just a short introduction to CALO. Most of its information is derived from SRI's PAL website [PAL2]. The website emphasizes the learning aspect of the framework. Here, I focus on the relation between the components.

CALO

To the user CALO consists of several applications. These applications however, are integrated into a common knowledge base, and governed by an autonomous agent that aims to assist the user in common tasks.

The following image from [CALO1] is an overview of the cognitive nature of CALO.


Machine learning ("learning in the wild") was an essential part of each CALO component. It had to be able to learn all kinds of new information without adding lines of code.

These are some of the types of learning strategies involved:

  • learning by instruction (Tailor)
  • learning by discussion (PLOW)
  • learning by demonstration (LAPDOG)
  • learning by observation (Markov Logic Nets)

Different CALO components used different ontologies, but all were designed around a core Component Library (CLib) "The CLib is encoded in the Knowledge Machine knowledge representation language, with a subset being automatically transformable into the popular Web Ontology Language (OWL) of the Semantic Web framework." [CALO-MA3]

[CALO1] Building an Intelligent Personal Assistant (2006)

Application Suite

CALO grew into of a suite of applications that groups around three centers:
  • PExA, the Project Execution Assistant
  • CALO-MA, the Meeting Assistant
  • IRIS, the Information Assistant
From a cognitive agent perspective, PExA is the central component.

PExA - Project Execution Assistant

"PExA focuses on two key areas: time management and task management. Time management refers to the process of helping a user manage actual and potential temporal commitments. Time management critically involves meeting or appointment scheduling but further includes reminder generation and workload balancing. Task management involves the planning, execution, and oversight of tasks. Such tasks may be personal in that they originate with the user, or they may derive from responsibilities associated with a project." [PEXA2]

PExA contains several tools that allow it to learn new procedures, that can be used by SPARK, the task manager.

PExA Architecture


PExA includes the following applications
  • SPARK, the Task Manager
  • PTIME, the Time Manager
  • EMMA, the Event Management Assistant (with PTIME)
  • EMP, the Execution Monitor and Predictor (ProPL)
  • Towel, the Task Management / ToDo UI
  • BEAM, the ToDo Interpreter
  • SEAR, for State Estimation
  • LEPT, the Duration Learner
  • Query Manager
  • ICEE, the Task Explainer
  • Machinetta, the Team Coordinator
  • Tailor, a Procedure Learner
  • EzBuilder, a Procedure Learner
  • LAPDOG, a Procedure Learner
  • PrimTL, a Procedure Learner
  • PLOW, a Procedure Learner
Each user had its own CALO. Machinetta took care of inter-CALO communication. [CALO1]

CALO-MA - Meeting Assistant

"The meeting assistant is designed to enhance a user’s participation in a meeting through mechanisms that track the topics that are discussed, the participants’ positions, and resultant decisions." [PEXA2]

"Our efforts are focused on assisting artifact producing meetings, i.e. meetings for which the intended outcome is a tangible product such as a project management plan or a budget." [CALO-MA1]

The Meeting Assisant used the Multimodal Discourse Ontology (MMD).

The meeting assistant architecture

CALO-MA includes the following applications
  • Dialog Manager
  • Meeting Browser
  • 2-D Drawing Recognizer
  • 3-D Gesture Recognizer

IRIS - Information Assistant

"IRIS is an application framework for enabling users to create a “personal map” across their office-related information objects. IRIS is an acronym for "Integrate. Relate. Infer. Share."" [IRIS1]

IRIS collects raw data from several other applications on the user's desktop, and integrates this data into it's own knowledge base, using CLib based ontologies.

The three layer IRIS integration framework


It contains a plugin framework and plugins for the following applications had been created:
  • Email (Mozilla)
  • Browser (Mozilla)
  • Calendar (OpenOffice Glow)
  • Chat (Jabber)
  • File Explorer (custom made)
  • Data editor/viewer (Personal Radar)
Adam Cheyer was the main contributor to IRIS. He left the CALO project in 2007 to cofound the CALO spin-off Siri. [SIRI1]

In 2007 IRIS, a Java application, was rewritten as a Windows application and renamed to CALO Express. [IRIS4]

CALO Express further contains the applications:
  • Presentation Assistant (for quick creation of presentations)
  • PrepPak (find, gather, and store relevant resources for common office tasks such as attending an upcoming meeting)
[IRIS1] IRIS: Integrate. Relate. Infer. Share. (2005)
[IRIS2] A Case Study in Engineering a Knowledge Base for an Intelligent Personal Assistant
(2006)
[IRIS3] Extracting Knowledge about Users’ Activities from Raw Workstation Contents (2006)
[IRIS4] Adam Cheyer personal website

[SIRI1] SIRI RISING: The Inside Story Of Siri’s Origins (2013)

SPARK

SPARK is the central component of PExA.

"At the heart of CALO’s ability to act is a Task Manager that initiates, tracks, and executes activities and commitments on behalf of its user, while remaining responsive to external events. The Task Manager component of CALO is based on a reactive execution system called SPARK" [SPARK2]

"There is a need for agent systems that can scale to real-world applications, yet retain the clean semantic underpinning of more formal agent frameworks. We describe the SRI Procedural Agent Realization Kit (SPARK), a new BDI agent framework that combines these two qualities." [SPARK1]

SPARK Architecture


SPARK is a BDI (Belief-Desire-Intention) agent. Its central function is described as "Each agent maintains a knowledge base (KB) of beliefs about the world and itself that is updated both by sensory input from the external world and by internal events. The agent has a library of procedures that provide declarative representations of activities for responding to events and for decomposing complex tasks into simpler tasks. At any given time the agent has a set of intentions, which are procedure instances that it is currently executing. The hierarchical decomposition of tasks bottoms out in primitive actions that instruct effectors to bring about some change in the outside world or the internal state of the agent. At SPARK’s core is the executor whose role is to manage the execution of intentions. It does this by repeatedly selecting one of the current intentions to process and performing a single step of that intention. Steps generally involve activities such as performing tests on and changing the KB, adding tasks, decomposing tasks by applying procedures, or executing primitive actions." [SPARK1]

[SPARK1] The SPARK Agent Framework (2004)
[SPARK2] Task Management under Change and Uncertainty Constraint Solving Experience with the CALO Project (2005)
[SPARK3] SPARK website
[SPARK4] Balancing Formal and Practical Concerns in Agent Design (2004)
[SPARK5] Introduction to SPARK (2004)

Ending Notes

I wrote this piece because I believe CALO is worth to be studied. It holds gems in various fields of AI. At the same time, the huge number of papers written on it may be daunting to the newcomer. I just hope this blog post has opened a door.


zondag 11 juni 2017

BDI and PRS, theory and practice

The NLI system CALO I am examining uses a BDI system (named SPARK) that is a successor to SRI's PRS. In order to understand, and because PRS is historically significant, it's worth the study.

BDI

A BDI system has its philosophical roots in the Belief Desire Intention theory of Michael E. Bratman. In this theory on human rationality, formulated in the eighties, he proposes a third mental attitude, next to Beliefs and Desires, namely Intentions. It's laid down in his book Intention, Plans, and Practical Reason (= IPPR)

BDI systems are attractive because of their ability to adapt their plans to new circumstances. Since they are rational agents, their behavior is rational and one can inspect their course of action to find out why each decision was made.

Michael E. Bratman


It's very important to get the concepts straight, so I will try some definitions myself. The most important concepts are Belief, Desire, Intention, Goal, Plan, and Action. Since this theory is about humans, the concepts should be expressed in human terms, not in computational terms.

Belief

A belief is a bit of information about something, hold true by the person. It may be information about the world or about the person itself. It is different from a fact in that it isn't necessarily really true, it is just hold as true by the person.

Inference-type beliefs can be used to form plans.

Examples:
  • it rains
  • I am cold
  • bus line 4 will get me from home to Tanner Library
  • the use of my computer will heat up the room

Desire

A desire is a future state that the person wants to achieve. The desire itself can exist without the intent of actually executing it, there may not even be a plan or possibility to bring it about. Where I said "want" Bratman prefers the term "pro-attitude", a container that includes wanting, judging desirable, caring about, and others. (The future state is occasionally referred to as the goal.)

Examples:
  • to go to Tanner Library
  • a milkshake for lunch
  • find the best price
  • go to the party
  • become rich

Conflicting Desires: Choice

Desires can be conflicting. The desire to drink milkshake conflicts with the desire to loose weight, for example. Or two desirable events take place at the same time. The person needs to choose between these desires.

Intention

An intention is the possible practical consequence of a desire. It is also a future state that the person wants to achieve, but the difference with desire is that the intention involves a commitment and a plan. The commitment implies that, barring interventions, the intention will be realized. The plan is simply a prerequisite to achieve it. The intention controls the execution of the plan, every step of the way. [IPPR, p16] An intention also has inertia, which means that that one intention won't just be dropped for the next one. It strives for the plan to be executed to completion.

Examples:
  • to take bus line 4 from home to Tanner Library

Plan

A plan is the means by which an intention is executed. A plan is not just a general recipe, a scheme, or a procedure, but a concrete sequence of steps adapted to the specific circumstances at hand. A plan inherits the characteristics commitment and inertia from its intention. The plan may be partial, or incomplete. Plans are hierarchical. A plan consists of steps and subplans. [IPPR, p29]

Examples:
  • take bus (line 4, from home, to Tanner) => go to the bus stop, enter the bus, pay the conductor, have a seat, get out at the nearest bus stop to the target, walk to the target

Planning

Planning is the process of taking some inference rules and plan schemes, and turning this into a plan. But there's a catch. Plans must be consistent, there must be no contradictions between one plan and another, nor between a plan and some belief.

Examples:
  • I need a means to go to Tanner. This can be by bus or by car. But I am already planning to leave my car at home for Susan to use. [IPPR, p33]

Expected Side Effects

Side effects are the results of the actions following the intention, that are not part of the intention per sé. Some of these were purely accidental, some were expected. These side effects may well be undesirable for the person. When planning, a person needs to take into account that the side effects do not conflict with his desires (or that of other agents).

Goal?

The term Goal does not have a special position in IPPR. It is not named in the first chapters. And when it occurs, in chapter 9, when dealing with other frameworks, it just refers to the object of desire, [IPPR, p131] or as an alias for desire [IPPR, p33].

PRS

PRS is the first implementation of BDI. It was written by Michael Georgeff, Amy L. Lansky,  François Félix Ingrand, and others at SRI. It was used for fault diagnosis on the Space Shuttle. I was interested in how it worked and I found a very clear document, Procedural Reasoning System, User's Guide, A Manual for Version 2.0. There's also a paper that appears to be the original document on PRS: Reactive Reasoning and Planning, which is exceptionally clear. I will use the naming from that early paper.

Amy L. Lansky and Michael Georgeff


PRS explicitly represents its goals, plans and actions. It is a reactive planner. It creates partial plans and is able to change them if new input comes in. But let's see how it implements the ideas of Bratman: BDI.


Belief

Beliefs are implemented as a database of state descriptions. Some beliefs are built-in. Others are derived by PRS in the course of action: observations and conclusions from these observations.

Procedural knowledge (which is a belief is BDI)  is implemented as KA (for Knowledge Area, a plan schema).

Desire

Desires are called goals. Goals do not represent static world states, but rather desired behaviors. Goals are collected on the goal stack.

Intention

Intentions are implemented as a process stack of active KAs.

Plan

Plans are called Active KAs, and this probably needs some explanation. Whereas Bratman focused on plans as active mental states, the focus of PRS is in the declarative definition of the procedures. This leads to the following mapping:

BDI -> PRS

  • procedural belief -> KA / procedure
  • plan -> Active KA
The body of a KA consists of sequences of subgoals to be achieved. A KA also has an invocation condition, a logical expression that describes under which conditions it is useful.

Planning

PRS contains a large amount of procedural knowledge. This is knowledge in the form: to do A, follow steps B, C, and D. In such a procedure each of the steps can itself be a subgoal that will in turn need other procedures. This procedural knowledge is stored in KAs (Knowledge Areas). KAs are hierarchical, partial plan schemas.

Each goal is executed by applying one KA to the variables of the goal. Thus, an applied KA is formed.

PRS' System Interpreter only expands a subgoal of an active KA when it is the active step. It does not plan ahead. It does not start expanding subgoals in order to make the plan more complete. And because it does not plan ahead, it does not need to adjust its plans given changing circumstances.

Sometimes planning ahead is necessary, of course. When planning a route, for example. PRS does not use its central planning system to plan ahead. For the route planner, a KA has been that essentially consists of these steps
  • create a route-plan from here to T (and place it in the database)
  • convert the route-plan in a sequence of subgoals
  • execute the next subgoal (one subgoal at a time) until it is done

Expected Side Effects

The side effects of the KAs must be considered by their designer.

The System Interpreter

The interpreter "runs the entire system". Each cycle it performs these tasks:
  • selection of a new KA, based on goals and beliefs, and placing it on the process stack
  • executing the next KA from the process stack
Executing a KA causes
  • the creation of new beliefs (added to the database)
  • the creation of new subgoals (added to the process stack)
  • the creation of new goals (added to the goal stack) 

Final Words

BDI is cool and PRS is fascinating. 

Note: planning ahead and plan execution are two different things. Some tasks require some planning ahead, like making a reservation. An agent must plan ahead while it is executing this plan and other plans. The two tasks are done by different subsystems of the agent.

Bratman does not deal with the priorities of desires in IPPR. Priorities may be assigned to intentions in PRS, however.


zaterdag 3 juni 2017

Some Thoughts on the Cognitive Agent

I will use this post to try and structure my incoherent thoughts about the structure of cognitive agents.

The Theater

At the heart of the cognitive agent should be a theater, as proposed by Bernard J. Baars in his Global Workspace Theory. The theater represents human working memory, conscious activity (the spotlight on the stage) working serially, and the many unconscious processes working in parallel.


In this theater, decisions are made. Tasks are selected. It is a Task Manager.

Needs

The agent has needs. These are hard-coded like the need to help the user fulfill its goals or the need for self preservation. These needs define the soul of the agent. 

The idea of needs is from Maslow, designed for humans, but can be applied to other agents too, as this image illustrates:


The need of helping the user looks like this:
The number of unresolved user requests should be 0.
Importance: 0.8
The needs of an agent drive it. They determine what goals it will create and their relative importance.

"Needs" may alternatively be called "drives" or "concerns".

Goals

Needs lead to goals, based on input. If the user asks a question, for example, this will be interpreted as a user request:
User Request: "Is there a restaurant here?"
Urgency: 0.2
If the user makes a command, it will be treated the same way:
User Request: "Send me an e-mail if anyone mentions my website"
Urgency: 0.1
The question named here is a request that may be answered immediately. The command, however, becomes a permanent request.

The agent does not execute the user's request just because it's there. Executing the request just becomes a goal because one of the agent needs is to fulfill the requests of its user.

The agent has sensors that notice that its needs change. These sensors create active goals (instantiated from a template, its variables bound). These goals are entered into the Task Manager.

Plans

In order to reach a goal, the agent needs to follow a plan. In the restaurant example, the agent may have a plan like this. It consists of these steps.

  • Find current location (L)
  • Find user food preference (P)
  • Find restaurants of type P within distance D of L
  • Present the restaurants to the user

The plan is instantiated and handed over with the goal to the Theater. Note: no task is treated as something that may be executed immediately, synchronously. Every task needs to go through the theater.

Note that all cognitive activity takes the form of plans and tasks. Yes, that includes all deduction tasks. It presumes a Datalog type of knowledge representation, a logical form that is procedural in nature.
goal :- location(L), preference(P), restaurants(P, L, R), show(R).

Theater: Task Selection

The Theater, as a Task Manager, keeps track of all active plans. Each active plan has an active task.
At any point multiple active tasks may "scream for attention".

Now, the system as a whole may do many things simultaneously. But the thing is, each of its modules can only do one thing at a time. So when a given module is busy, it cannot handle another task at that time.

The Task Manager knows which modules are active and which ones are idle.

All active tasks scream with a loudness of:
Priority = Importance * Urgency
The task with the highest priority is selected by the Task Manager. This task is placed in the spotlight. TM asks all idle system modules to handle the task. The first one gets it. Or the one that bids most (as in an auction).

The Task Manager hands the task over to the module, and places it on the waiting list. TM will receive a "done" event from the module later, at which point it binds the result values to the active plan and advances the active plan to the next task. When all tasks are done, the active plan is removed.

Dialog Manager

The task "find user preference for restaurant" may have two implementations:
  • find preference in user database
  • ask user for preference
If the first task fails, because the preference has not been asked before, the second task will be performed. It is performed by the Dialog Manager. The DM is "just another module" of the system.

As soon as the "ask user for preference" task enters the Dialog Manager, it becomes active, and will remain active until it has received a reply from the user. This way, the system will not ask several questions at once.



Likewise, if the user has just asked something, DM will start active mode, and will first reply to the user before starting a question of its own.

The Dialog Manager may also initiate conversation on its own, based on a dialog plan. It cannot just start by itself, however. It needs to send a goal and a plan to the Task Manager, and request attention.

Episodic Memory

All active goals, plans and tasks are logged in episodic memory by the Task Manager. Or EM simply listens to TM. Storage occurs in a structured way. For every task, it is possible to retrieve its plan, its goal, and, ultimately, its need.

The reason for this is that Episodic Memory thus becomes a knowledge source for auditing. It allow the user to ask the question: "Why did you do that?"

This is also an important reason why all tasks need to go through the Task Manager. This way all activated tasks are accounted for in a central place.

Emotion

Remember the sensors that register that needs change? They can start emotions too. To do that, they use two types of parameters:
  • digression from the norm
  • time left to deadline
The norm in the example of assisting the user is 0 requests. If the number of requests becomes larger, like 5 or so, the need to assist the user becomes strained. In order to correct it, the importance of the need is temporarily increased. Remember the "importance" of a need? This is not a fixed value; it has a base value and increases as the need is strained.

If this does not sound very emotional, that's because it it just the arousal aspect of emotion. It causes one to become more excited in a certain area. Arousal is involuntary, it does not need to go through the theater to become active.



It is also possible to give the agent emotional expressions. It may start yelling to hurry up or to shut the pod bay doors. These expressions become (very) conscious and need to pass through the Theater in order to become active.

I suggest reading The Cognitive Structure of Emotions if you want to know more.

In Closing

I made some progress. Things are starting to make sense.

DAYDREAMER - A Landmark Cognitive Architecture

I posted this review before at Amazon Source code of DAYDREAMER can be found on Github This book is a revised version of Erik T. Mueller&...