dinsdag 23 mei 2017

What I learned about Siri by pondering over some sample sentences

Siri is a virtual assistant, available on the Apple platform, with a natural language interface that allows its user to interact with his or her smartphone.

Siri's technical details are kept secret by Apple, but of course I wanted to know how it works. I plan to examine CALO, the large SRI / DARPA initiative from which Siri is a spin-off, but for the moment I thought I'd entertain myself and you by examining some sample queries and commands and reverse engineer them a little to see what would be necessary to create an NLI engine that would be able to process the textual input.

I found the sentences in the books

Siri for dummies (2012)
Talking to Siri (2014)

Before we start, I must make clear that the interpretation given here is not how Siri really works. It describes what I think would be a simple way to create this functionality. Further, Siri's real power is interacting with dozens of very complex services. I believe its natural language component is relatively simple in comparison with the full framework.

Let's start with the first sentence. It's a command. Which is fitting, because Siri has been called "a do engine, rather than a search engine".
Call a taxi
This command, that instructs Siri to assist in making a taxi reservation, is an idiom. It cannot be understood if taking literally; it is a agreement between language users: this sequence of words means a certain thing.

Understanding of an idiom by itself is the easiest form of recognition. It is a matter of matching the words call, a, taxi in sequence. Put in Extended Backus-Naur form:
<call-a-taxi> ::= "call a taxi"
To reduce fragmentation, I will show the code with the sentence. Next sentence(s):
Victor Agreda is my boss.
Erica Sadun is my writing partner.

<my-brother> ::= "my brother"
<my-boss> ::= "my boss"
<my-wife> ::= "my wife"
<role> ::= <my-brother> | <my-boss> | <my-wife>
<role-assignment> ::= <name> "is" <role>
Siri keeps a database of some of your personal relations. You can teach Siri the link between the name and the role in your relationship and then add information to this person, like their e-mail address and their home address. This is an important part of what makes it personal.

As the "writing partner" example shows, it is also possible to define new roles on the spot.

The following sentence represents a large class of commands that can be given to Siri, in order to perform many types of tasks, using some variables.
Meet with my wife at noon tomorrow.

<adjunct> ::= <time-adjunct> | <date-adjunct> | <place-adjunct>
<meeting-planning> ::= "meet with" <role> {<adjunct>}
The structure is command subject adjuncts. Adjuncts are modifiers for time, place, person, like "at noon", "at work". There are many variations on this type of command, but in an abstract sense the structure shown here is the same. You ask Siri to do something, and it forwards the request to an application (here: the default calendar application) using a required parameter and some optional ones. If you leave out some parameters, Siri will continue to ask for them.
Add Mary Smith to my meeting at 3:30 PM
Add Mary Smith to my 3:30 PM meeting

<person> ::= <name> | <role>
<time> ::= <digit> { <digit> } { : <digit> <digit> }  { AM | PM }
<meeting-add-person> ::= "add" <person> "to my meeting at" <time>
<meeting-add-person> ::= "add" <person> "to my" <time> "meeting"
Here we see that a known relationship is referred to by name, and a known appointment is referred to by time (the date is today, implicitly).

Other examples of the command subject adjuncts pattern are:
Remind me to call Mike Jones when I get home
Take me to one infinite Loop Cupertino California from times square New York
Email Mary Smith subject meeting confirmation
Play song by David Guetta
Who directed The Big Lebowski?
This is where the real power of Siri lies: seamless interfacing with a multitude of services.

Next: "canned responses".
What is the best smartphone ever?

The one you're holding
Siri has a large array of them, just to entertain its user. Siri keeps track which responses it has given before, rather than completely randomize them. Perhaps the responses are based on keywords in the sentence, rather than complete sentences. That would increase the chance that someone matches it.
Are there any good Japanese restaurants nearby?
It is likely that "any good" is not part of an exact match, and that Siri just matches keywords like "restaurants" in this sentence.

 There's a subtle point I like to make about the word "nearby". The point is that Siri does not search for nearby restaurants itself. The query is forwarded to a third party service. Why does that matter? It means that the intelligent part of this answer is not performed by Siri, but by Apple Maps.
Play my UK tracks playlist
What's playing?
These sentences are interesting. The first sentence starts a "play playlist" context, that opens the door for the following context-dependent commands "skip", "pause", "play", "what's playing" and some others. These commands are only available after the context-initiating command ("Play playlist").

Most commands are in "global scope". The playlist commands are an exception. Another one is timer commands "Pause the timer" , "Resume the timer" and "Stop the timer".

Siri is also capable of handling some pronouns (him, her, it).
Read it again.
Call her.
In order to do this, it needs to keep track of most recently used sentence subject.

The final sentence I would like to mention is a neat trick by Siri: for knowledge intensive questions, just pass the question to Wolfram Alpha.
How much is $5.73 plus Denver sales tax


The creators of Siri have done a great job of creating a powerful virtual assistant. I have shown that the symbolic part of its natural language engine is really not that complicated. But that's just because the designers have left out a lot of unnecessary complexity. Which is to their credit. Siri's parent, CALO had much more power. And that's what I'll be examining next.

Geen opmerkingen:

Een reactie posten

Combining the data of multiple databases in a single natural language query

In our time we have at our disposal a growing number of open data sources. We can query these individually, and that's nice. But wouldn&...