Bots are in the spotlight. Tech superpowers like Microsoft and Facebook released comprehensive frameworks aimed to mass-produce bots. There are numerous startups with their own frameworks and specialized offerings. More established players, including Aspect Software, also joined the race. This post examines some of these frameworks and offerings, based on the first experience. Note that we are not looking at the bot publishing platforms, as this is a different area.
Facebook Bot Engine
Facebook Bot Engine, released in April 2016, is based on the technology of Wit.ai, acquired by Facebook in early 2015. While Wit.ai itself runs from its own server in the cloud, the Bot Engine is a wrapper built to deploy the bots in Facebook Messenger.
Facebook’s strength as a social network is in the number of users and the content they generate, and it is unlikely they will have the motivation to make the bot deployment infrastructure channel-agnostic. It is reasonable to assume it will stay confined to Facebook Messenger only (which, of course, is a huge space on its own).
Wit.ai is a different story. It was created outside of Facebook, and Facebook didn’t seem to shut down its external connectivity. As a Silicon Valley born technology, Wit.ai was engineered for viral adoption; meaning, a short learning curve. The documentation is fantastic: very extensive but easy to read. There is no free lunch, however: if efforts are focused on marketability, quick releases, and ease of use, other aspects may suffer. Jumbo jets are not controlled by a combination of two buttons for a reason.
Wit.ai offers several options:
- Extract certain predefined entities (the standard set of date, time, phone, etc.) listed here.
- Extract intent (“what the user wants to perform”).
- Extract sentiment (per utterance only).
- Define and extract own entities.
The predefined entities part seems solid. These were created by Wit.ai developers, and I’m sure a lot of effort was invested in them. The sentiment likely uses a bag of words approach, and therefore may be rudimentary, but then, the sentiment analysis is likely to be a “sidecar” in a bot, so it should not be an issue. User-defined entities and intent, however, seem quite limited.
User-defined entities rely on keywords. It’s either the keywords are part of a predefined list, or free text substring based on an adjacent keyword. It does open some possibilities but will not cater for more sophisticated needs (e.g. find a first name). It is also a bit cumbersome for longer lists (tens or hundreds of entities) and, of course, has no knowledge of inflections, so different inflected forms are to be entered one by one.
Intent uses machine learning to figure out the important bits of the input. However, it’s up to the user to train the model, which is where things get hairy. Let me quote the documentation describing how to make Wit recognize a request for the weather status:
This seems unpredictable. How many expressions is enough? What about false positives – is there a way to make sure that “I want to buy a rain coat” won’t be interpreted as a query about weather? What if the examples have parameters inside?
Another limitation is described in the section for advanced users:
…it won’t work well with the following examples mainly because one cannot infer from these sentences what these entities are about without some “music knowledge”:
I want to listen to Led Zeppelin
Play Stairway to Heaven
I’d like to listen to reggae
I like No woman no cry
Even with a lot of training, you will never be able to
1) add all the artists and songs
2) train Wit to recognize new artists and songs because in the “I want to listen to XXXX” type of queries, XXXX can be either a song or an artist and sometimes both!
What a bot framework could do, however, is to allow the user load their own lists (e.g. 200 songs from the car’s playlist) or upload pre-trained models. But Wit.ai is designed to be simple and accessible via a public cloud, so more complex deployment would conflict with the general philosophy of the system.
Good fit: independent developers, tinkerers, IoT applications, simple bots
Poor fit: enterprise, complex bots
API.ai
API.ai is another primarily web-based bot framework. While the general philosophy with the reliance on entities and intents appears similar, the integration options are far more expansive. API.ai bots can be exported as modules, integrated in Facebook, Kik, Slack, Alexa, and even Cortana.
What’s more important, API.ai seems to have understood the weakness of letting the users define entities and intents by entering numerous utterances, and provided a large set of domains. An option to define user entities solves the issue with user’s titles which Wit.ai does not solve.
API.ai can be viewed as “Wit.ai on steroids”. (I did not see examples for conversations involving follow-ups, however, it is likely supported.) Having said that, the entity / intent model may not support more complex logic required in some enterprise applications. How does one classify a question, a complaint, a requirement in general?
Good fit: independent developers, tinkerers, IoT applications, simple to mid-level bots, embedded applications, virtual assistant applications
Poor fit: customer service applications, complex bots
Microsoft Bot Framework
Microsoft announced its commitment to the conversational commerce approximately at the same time as Facebook. However, Microsoft’s needs, philosophy, and approach are somewhat different. Just like the Facebook’s offering, Microsoft’s SDK can be viewed as two components, which are independent of each other:
- Bot Connector, the integration framework
- LUIS.ai, the natural language understanding component
The integration component is impressive. It plugs into GroupMe, Skype, Slack, SMS, Telegram, web chat, Facebook Messenger, and email. There is a PaaS option on Azure, just for bots.
LUIS.ai is tightly integrated with the language via the use of .NET attributes. In more than one way, the philosophy is similar to that of Wit.ai and API.ai: intents and entities. Being a Microsoft product, it comes with nice deployment capabilities, and to cover the shortcomings of the user-defined models, the developers are given access to Cortana models. As long as Cortana keeps being developed, this library keeps growing. It is more API.ai than Wit.ai.
However, as with many Microsoft products, with great library comes great boilerplate code.
A blog post of Anthony Chu, Adding Natural Language Processing to a Bot with Microsoft Cognitive Services (LUIS), walks us through a simple “what’s the weather” example. This is a pre-trained Cortana model – essentially, a library call. Yet the plumbing must be put together by the developer:
Which, honestly, makes me scratch my head. Note, this is a model with one variable (location). Is it really necessary to make a very simple dialogue look like a 1970s GOTO-based code?
Additionally, there are the standard reservations about entity/intent-based bot frameworks expressed before. It is as if the framework goes simultaneously to two different directions. One is early adopters, prone to ADHD and opting for a shorter learning curve; another one is for enterprise applications, which is where the former normally don’t go. The result is reminiscent of 1990s Microsoft frameworks: great for simple stuff or if your example is close enough to the examples, more difficult for complex requirements.
Good fit: independent .NET developers, IoT applications, simple to mid-level bots that require integration of enterprise software, virtual assistant applications, especially Cortana functionality
Poor fit: customer service applications, complex bots, non-Microsoft platforms
Viv
Viv.ai, a company co-founded by the authors of Siri, demoed its technology in May 2016. The company is yet to release more details, but it appears that the focus is on processing complex queries, using a flexible mechanism juggling prepared recipe components which can be assembled into endless combinations.
The advance over Siri 1.0 type of assistants is not a linguistic one, but rather the new type of application logic. It appears to be similar to the way SQL queries are processed, broken down into constituents, and combined into an execution plan. The founders made it clear that Viv is intended to be a development framework. They did, however, mention multiple times that it’s a “virtual assistant”, and indeed this type of complex queries are more suitable for this type of applications. It might be too risky and unpredictable to be used in enterprise applications which must adhere to standards and procedures though, unless tight control is exercised (in which case, the advantage of the flexibility and recipes floating in the cloud is somewhat lost).
It is too early to judge who Viv will be the best for.
Aspect CXP and Aspect NLU
Aspect Customer Experience Platform (CXP) is a platform for design, implementation, and deployment of multichannel customer service applications. Numerous medium and large enterprises all over the world, many of them household names, rely on Aspect CXP to communicate with their customers via multiple channels, be it voice, text, mobile web, or smartphone apps. Aspect NLU is a component which allows making sense of human language; it was created from the technology acquired by Aspect from LinguaSys, co-founded by the author of this post. To date, Aspect Software is the only contact center software company to have embedded a bot framework in its core customer service solutions.
The approach adopted by Aspect NLU radically differs from API.ai, Wit.ai, and Microsoft. The problem was attacked from a different angle. The natural language user interfaces, or bots, were not the only applications in mind. My main motivation was to create a reusable “document object model for natural language”, which would allow developers to traverse the utterance as a graph, and use queries similar to XPath to capture the required elements. As the lexical databases of Aspect NLU are aligned across languages (WordNet style), the same semantic identifiers work for different languages. E.g. the same expression may refer to all kinds of “subway” in all languages, including all the inflected forms, or even all kinds of public transportation. No need to enter hundreds or thousands of phrases for every new script, and yes, the same script may be reused for different languages.
The same identifiers allow addressing syntactic structures to find out the subject and the object of the sentence, find out complex comparative forms of adjectives, and so on. Rather than searching for one word-form, which makes little sense in morphologically complex languages like Arabic, German, or Russian, we look for a semantic node, which may cover tens of inflections and synonyms in one language. One can create complex condition clauses, e.g. “all the first names” or “all the names of prescription drugs which are objects of a transitive verb” or “all the objects which are in a negative sentiment statement”.
The bot scripts work as “form scripts” (according to the terminology of Wit.ai and Microsoft), assigning an objective to the system. The system then fills the missing bits by prompting the user.
With more capabilities, comes longer learning curve. Development requires familiarity with some linguistic concepts. However, most enterprise projects require subject matter knowledge, so it’s not unexpected. The focus is on customization and integration. One application built with Aspect NLU, for instance, handles 150+ domains (which may overlap) and is integrated with call back capabilities and the customer’s business logic. Think C/C++ vs. Basic.
Good fit: complex bots, customer service applications, enterprise software
Poor fit: simple bots, embedded applications, IoT applications
Conclusions
The bots are said to be the “new apps”, and just like with the early apps for mobile devices, the tools are only starting to shape up. Some of these tools focus on ease of use, others focus on delivery of complex systems or integration with existing infrastructure. Today, the majority of bots are the equivalents of prank apps consisting of buttons making funny sounds; but the end-users will require more sophistication and solving real-world problems. The developers will be pressed to look for tools which can deliver what they need.
What we see today is just the first salvo.