Blog
The chatbot backlash
Chatbots got a bad name because they’ve been overused. The tech isn’t appropriate for complex issues.
Contact centers are under tremendous pressure: they need to solve a growing array of increasingly complex customer problems while under mounting pressure to reduce costs.
Companies have adopted technologies—like chatbots and IVRs—that “deflect” customers from agents. This helps to address volume and cost challenges; it does not help with increasing complexity, nor with the growing number of problems customers need help with. Companies that implement chatbots or IVRs may focus on “deflection” or containment as a measure of success; this is a flawed strategy that distracts from overarching customer service goals.
The ability of chatbots and IVRs to solve customer problems has been oversold in the last decade. Even the highest performing systems are designed around inflexible rule-based models. These technologies are not new. An industry standard for chatbots was finalized in 2001, and the industry standard for IVRs was finalized a year earlier; indeed, the underlying technology is very similar. The advent of neural natural-language-processing (NLP) models in the early 2010s led to a resurgence in chatbot interest. While performance has improved (for example, you can classify a customer problem in one step, instead of forcing a customer to navigate a menu), the systems remain rule-based and worse, fragile.
The fragility of chatbots means constant tuning and re-working of the rules. It’s simply a budget shift from agents to IT departments or external consultants.
The fundamental limitation of chatbots
Chatbots and IVRs are extremely useful technologies when used appropriately. However, they are not a panacea—and attempting to use these technologies for all customer problems gives them a bad name. They’ve been sent in to do the wrong job—of course they are going to fail to impress.
Rule-based systems (like chatbots and IVRs) can handle simple problems, and improve discoverability of self-service options. This is a good thing. Customers can rely upon a single entry point for support, and be gently routed to the best way to solve their problem. In many cases, that can be showing the customer how to get what they need from digital self-service; in others, the customers’ problems can be addressed with automated responses within the chat interface. Customers are happy because their problems are solved faster (and with no hold time!); and companies are happy because it reduces total cost of service.

Chatbots don’t serve companies well as a siloed tool focused purely on deflection. But, this type of automation can help customers find self-serve answers to simpler needs. Plus, it can help the agent—attempting first troubleshooting steps before escalating to the agent with full context, for example.
Michael Griffiths
The idea is to streamline the customers’ path to resolution—without deflecting customers who will be best served by an agent. Siloed support channels and a focus purely on deflection can result in both a fragmented customer experience and duplication of effort and investment by the company (for example: creating help content in web self-service applications as well as programming it into a chatbot, for example.) Educating customers to use existing self-service tools, where appropriate, can minimize that duplication—as well as streamline and improve the customer experience. Likewise, it’s important to provide a easy path from bot to live-agent interaction when a customer need can’t be addressed by the bot—to avoid making customers hop from channel to channel, starting their journey over with each one.
Chatbots are an excellent way to help shift incoming customer contacts from messaging, voice, or the web into digital self-service when appropriate. But, they cannot serve as the end goal—and using chatbots won’t close down your contact center. The scope of the problems that chatbots can handle is limited. It is perfectly reasonable for these systems to address upwards of 30% of customer issues, though they are often the easy issues and so the reduction on total agent time spent is less than that. To transform customer experience, companies must also look for meaningful ways to help those agents.
Resolving the increasing pressure on agents
As self-service and automation through IVRs and chatbots address more and more simple customer needs, the conversations that do get to agents are increasingly difficult. The result: Agents are handling more complex problems, handle times are longer, and cost savings never materialize.
What does it take to realize those promised cost savings? Our approach is to bring artificial intelligence right to the agent. We realized that the more you invest in customer self-service and engage chatbots for the simpler interactions, the more you need to help your agents handle parts of the job inside the conversation.
The benefits of pairing humans with AI are dramatic. We unite the best of human agents with the best of machine intelligence: more than that, we solve the problem of chatbot management.
Agents use artificial intelligence to make their job easier,It automates portions of their workflow. The AI monitors what agents use, what the outcome is, and what is truly effective for improving the customer experience—and improves itself over time. This feedback cycle means that as the system learns, agents use it more (e.g. from 15% of the time to 60% of the time a year later); and it means that the system learns from your best agents to help move every interaction in the right direction.
The best part of this approach is that we keep the benefits of chatbots. AI-driven automation trained by your agents can replace rule-based chatbots—and be used throughout the conversation, including before the customer is connected to the agent.
The Future of Work
Enterprises would be well served to determine how to eliminate significant minutes of work. By bringing AI to the agent—where upwards of 80% of CX budgets are spent today—provides a dramatic increase in value. Our systems learns how best to help those agents. It is the union of human and AI, and the development of a more robust system, that will transform the contact center.
The hidden ways AI-driven speech transcription and analytics improve CX performance
I have spent virtually all of my professional career working with CX and IT leaders in Fortune 500 companies. These companies are investing millions of dollars annually in speech analytics that have maxed out the benefits they can deliver in improving quality management.
Many companies only transcribe a small fraction of their calls and only score a smaller number of those. And often the recordings are batched and stored for later analysis. Having to sift through the data, trying to pull out relevant information, and coaching agents after-the-fact with that information is cumbersome, incomplete, and moves the needle incrementally in terms of improvement.
When I was leading technology operations at contact centers, a consistent theme when re-evaluating our speech analytics toolset was “There are no new insights into what customers are contacting us about. We get reports—but there’s nothing really actionable—so nothing changes.”
We should be asking more from our speech analytics and transcription technology.
Now we can. Advancements in artificial intelligence (AI) raises the bar for what we should expect from our sizable annual investment in this technology.

AI-driven transcription and analysis of every call helps you optimize performance—and gives you a wealth of customer insight.
- Chris Arnold, VP, Customer Experience Strategy, ASAPP
AI quietly powers transcription and speech analytics in real-time and enables us to use the results in hidden ways we did not even realize possible. Examples include:
- Empower agents with coaching support and tools to resolve issues faster
- Get voice of the customer (VoC) insight from 100% of calls
- Analyze customer sentiment in real-time and use machine learning to predict customer satisfaction scores as a call transpires
Supercharge your agents using real-time transcription
Real-time transcription can serve as fuel for your voice agents and accelerate CX performance. Since the majority of customers still contact companies by phone—and voice is the most costly channel—isn’t this where we should be focused?
While it is important to optimize your operations through both live agent and self-serve digital channels, phone calls will continue. Let’s use AI to super-power our agents to make the high-volume voice queues as high performing as possible.
Real-time transcription paired with real-time AI-driven analysis makes it possible to prompt agents with suggested responses and actions based on machine-learning. Additionally, real-time transcription enables automation of thousands of micro-processes and routine tasks, like call summaries.
One of the largest communications companies in North America uses AI to automate the dispositioning of notes at the end of agent calls and realized a 65%[1] reduction in handling time for that specific task. At ASAPP, we have seen large CX organizations who leverage this modernized approach to transcription at scale, reduce their overall CX spend by 30% which translates into hundreds of millions of dollars annually.
Fuel CX operations with voice of customer analysis for 100% of calls
In and of itself, transcription doesn’t make front page news. Very often it’s an expensive component of contact center technology that’s not providing a return on that investment. For instance, most companies are only transcribing 10-20% of their calls due to costs and as a result, business decisions are made without data from more than 80% of customer interactions. That’s not even close to a complete representation of everything happening across the totality of their CX operations.
Today, it’s realistic to transcribe every word of every customer interaction. You can leverage AI to analyze those transcriptions and make real-time decisions that empower agents and improve customer experience in the moment. Highly accurate transcription, coupled with closed-loop machine learning takes the customer experience to another level.
Predict CSAT/NPS with real-time customer sentiment analysis
Every CX leader strives to delight customers—and wants to know how they’re doing. Most use Customer Satisfaction (CSAT) or Net Promoter Scores (NPS) surveys to capture feedback. Yet average survey response rates are between 5% and 15%, depending on the industry. With machine learning, you can now use your transcriptions and speech analytics to predict the sentiment (CSAT or NPS) of every conversation. It’s the equivalent of having the silent 90% provide feedback for every interaction.
Real-time analysis of transcription can discern intent and automatically categorize each customers’ reason for contacting your company. This will give you a deep understanding of exactly what customers are calling in about—and how that compares over time. You can also apply real-time trend and anomaly detection to identify issues and quickly address them before they become catastrophic.
This real-time capture of the voice of the customer is massively valuable to not just contact center leaders, but also Product, Marketing, and Sales teams as well.
Conclusion: Let speech analytics lead the way
Artificial intelligence makes our transcription and speech analytics investment actually meaningful and allows us to make material improvement in CX operations.
If you don’t know the specific drivers behind the interaction metrics within your company, it’s hard to make anything other than incremental changes in your CX programs.
AI lets us analyze every detail from the tens of millions of interactions that occur every year. Not just the metrics—call duration, wait times, etc. but the key drivers behind those metrics. What were the reasons for unexpectedly long handle times… were agents clicking around the knowledge management database trying to find answers? Or how about unbiased opinions on cancel rates… were they due to a product flaw or issues with customer service? Could better save approaches have been used? Or what may have caused the customer sentiment to shift during the interaction?
Imagine capturing 100% of every single customer interaction, whether voice or digital. Imagine having objective insight into drivers behind your contact center metrics. Imagine being able to do that in real-time.
You no longer have to only imagine. No more waiting around for partial transcripts and partial answers. No more manual, subjective scoring of a tiny sampling of your total interactions. The future is here, and we can find it in real-time, automated transcripts and speech analytics:
- Supercharge agents with real-time desktop intelligence
- Identify coaching needs in the moment—get it right for the customer the first time
- Predict CSAT and NPS on 100%, of your interactions
- Gain real-time insights—at a glance understanding of why customers are calling
Real-time transcription, AI-driven analytics, and the ability to quickly act on insights can be your hidden weapon to accelerate transformational change in your contact center.
Collaboration in the digital age: Value at the intersection of people and machines
NLP/AI systems have yet to live up to their promise in customer service in large part because the challenge has been defined as either full automation or failure to automate. As starting from the outside, trying to ‘deflect’ as much traffic/call volume as they can and punting to live serve reps when they fail. The result of this has been hundreds of millions of dollars spent on lowering the cost of customer contact—lots of ‘claimed success’ in terms of deflection rates—and no change in the cost of customer contact or improved customer service. How could this be?
The very essence of ‘conversations’ cannot be replicated by a chat bot with a programmable set of rules: If the customer says this—the bot says that and so on. That’s not how any but the most simplistic of conversations go. Conversations are inherently probabilistic—they involve turn taking which includes disambiguation, successive approximation, backing up and starting over, summarization, clarification and so on.

The future of work will be built on an AI native platform that enables a powerful collaboration between people and machines.
Judith Spitz, PhD
The promise of conversational AI will be realized by a platform that has been designed—explicitly—to enable a collaborative conversation between service reps, AI-powered algorithms and customers—where technology works in concert with an agent to: automate parts of a conversation, hand it over to an agent when needed, make suggestions to agents about what to say next, listen and learn from what your ‘best’ agents are not only saying to customers but are ‘doing’ with your back-end systems, use machine learning to make all your agents as good as your best agent, and enable the customer to gracefully transition a conversation between their channels/device of choice without losing conversational integrity.
The platform should allow your service reps themselves to demonstrate confidence in the auto-suggestions by selecting them with increasing frequency and then the system can use those ‘confidence levels’ to transition from a ‘suggested response’ to an automated response. Automating 50% of 100% of your call volume is a lot better than automating 100% of 10% of your call volume.
The key paradigm shift here is an AI platform that has been built natively—from the ground up—to enable and foster the kind of man-machine collaboration that will be ‘the future of work’—and NOT one that promotes a kind of ‘Frankenstein’ where AI components are bolted on to existing systems—hoping for transformational results.
The future of work will be built on an AI Native® platform that enables a powerful collaboration between people and machines. This is what ASAPP delivers.
Cutting through the complexity using AI
Many channels, one conversational thread. It's what consumers expect.
The consumer is the ultimate winner in the race for accuracy in speech recognition
There is a lot of interest in automatic speech recognition (ASR) for many uses. Thanks to the recent development of efficient training mechanisms and the availability of more computing power, deep neural networks have enabled ASR systems to perform astoundingly well in a number of application domains.
At ASAPP, our focus is in augmenting human performance with AI. Today we do that in large consumer contact centers, where our customers serve consumers over both voice and digital channels. ASR is the backbone that enables us to augment agents in real-time throughout each customer interaction. We build the highest performing ASR system in the world based on industry standard benchmarks. We do this not only by leveraging the technological advancement in deep learning, but also by applying our own innovation to analyze problems at different levels of detail.

At ASAPP we continuously push the limits of what’s possible by not only leveraging technological advances in deep learning, but by also innovating. We are always looking for new ways to analyze problems and explore practical solutions at different levels of detail.
Kyu Han, PhD
LibriSpeech, a speech corpus of 1,000 hours of transcribed audiobooks, has been adopted since its introduction in 2015 as the most used benchmark dataset for ASR research in both academia and industry. Using this dataset, many prestigious research groups around the world including ASAPP, Google Brain, and Facebook AI Research have been testing their new ideas. Never have there been more rapid advances than in the past year for the race to achieve better results on the LibriSpeech testset.
In early 2019, Google’s ASR system with a novel data augmentation method outperformed all previously existing ASR systems by a big margin, boasting a word error rate (WER) of 2.5% on the LibriSpeech test-clean set (shown in the below figure). A word error rate is the percentage of words an ASR system gets wrong, measured against a reference transcript of the given audio. Later the same year, ASAPP joined the race and gained immediate attention with a WER of 2.2%, beating the best performing system at that time by Facebook. The lead, however, didn’t last long as Google in 2020 announced a couple of new systems to reclaim the driver seat in the race, reaching a sub-2.0% WER for the first time. One week after Google’s announcement, ASAPP published a new paper highlighting a 1.75% WER (98.25% accuracy!) to regain the lead. ASAPP remains at the top of the leaderboard (as of September in 2020).

The race will continue, and so will our innovation to make the ultimate winner in this race our customers. Accurate transcriptions feed directly into business benefit for our customer companies, as it enables the ASAPP platform to augment agents—providing real-time predictions of what to say and do to address consumers needs, drafting call summary notes, and automating numerous micro-processes. Plus, having insights from full and accurate transcriptions gives these companies a true voice of the customer perspective to inform a range of business decisions.
At ASAPP, innovation is based on our competent research capability that enabled the aforementioned milestones. But our strength is not only in research but also in an agile engineering culture that makes rapid productization of research innovations possible. This is well exemplified by our recent launch of a multistream convolutional neural network (CNN) model to our production ASR systems.
Multistream CNN—where an input audio is processed with different resolutions for better robustness to noisy audio—is one of the main contributing factors to the successful research outcomes from the LibriSpeech race. Its structure consists of multiple streams of convolution layers, each of which is configured with a unique filter resolution for convolutions. The downside to this kind of model is the extra processing time that causes higher latency due to many future speech frames being processed during ASR decoding. Rather than leaving it as a high-performing, but not feasible-in-production research prototype, we invented a multistream CNN model suitable for real-time ASR processing by dynamically assigning compute resources during decoding, while maintaining the same accuracy level as the slower research-lab prototype. Our current production ASR systems take advantage of this optimized model, offering more reliable transcriptions even for noisy audio signals in the agent-customer conversations of contact centers.
As illustrated in Stanley Kubrick’s 1968 movie 2001: Space Odyssey, human aspiration of creating AI able to understand the way we communicate has led to significant technological advancements in many areas. Deep learning has brought recent revolutionary changes to AI research including ASR, which has taken major leaps in the last decade more so than it did in the last 30 years. The radical improvement of ASR accuracy that would make consumers embrace voice recognition products more comfortably than at any time in history are expected to open up a $30 billion market for ASR technology in the next few years.
As we’re entering an era where our own Odyssey to human-level ASR systems might reach the aspired destination soon, ASAPP as a market leader will continue to invest in rapid innovation for AI technology through balancing cutting-edge research and fine-tuned productization to enhance customer experience in meaningful ways.
Our research work in this area was presented at the Ai4 conference.