Blog
What keeps your agents from providing great service?
A very different perspective on customer experience
Why companies who want true VoC need to engage the power of AI
The best businesses succeed by developing a holistic understanding of their customers. Most, if not all, consumer companies have a Voice of the Customer (VoC) program, intended to capture and analyze feedback, leveraging the insights to drive both strategic and operational improvements across the business. While the intent of these programs is critical to constant improvement, the tools that have been available to CX professionals fall short of delivering what they really need.
Surveys and samples only give a partial view
Many organizations build VoC programs solely on a “survey and score” foundation. When done right, surveys can play an important role in any VoC program. But due to their low average response rate and general bias, they provide organizations with a limited view of the overall customer experience and the quality of service that is being delivered by your organization.
An overreliance on surveys has other pitfalls, too. Relationship-based surveys, for example, evaluate general brand satisfaction, but often fail to provide clear feedback on the internal processes, people, and frontline events that contribute to customer experience. On the flip side, transaction-based surveys capture feedback in the moment, but tend to lose sight of what the overall relationship looks like from the customer’s point of view.
Other companies might record calls, then either listen to or transcribe a subset of these calls. This approach also limits analysis to a small sample of customer interactions.
Analyzing only a fraction of your calls fails to tell the whole story. Yet companies rely on this data to make important decisions about product, sales, and marketing initiatives as well as contact center operations.
What’s more, with both of these approaches, there can be a significant time lapse between capturing the data, gaining insight from that data, and putting that insight into action. The truth is that most of us in the customer experience world have never had a full view of the quality of service we are delivering to our customers, and the opportunities that exist to improve the way in which we serve our customers across the enterprise.
AI elevates VoC with new possibilities
Artificial intelligence fuels new options for gaining more comprehensive customer insight. And, for putting that insight into action. Forward thinking CX leaders are excited about mining this wealth of data and are heartened to learn that they won’t need an army of data scientists on staff to do it.
Highly accurate transcription is key
The best of these new solutions start with highly accurate real-time transcription of every call. Transcription is not the goal, but a means to an end. However the importance of the quality of transcription can’t be overstated, as this is the fuel for meaningful analysis. More on this here.

AI solutions that use machine learning models custom trained on a company’s lexicon are—not surprisingly—far more accurate than solutions using generic models trained on everyone’s data. Consequently, they can deliver far more value.
Michael Lawder
Getting this data in real time gives companies the opportunity to take action instantly instead of waiting weeks, months, or even longer to address customer needs. And having it for every call gives companies a much fuller customer perspective.
Rich actionable insights
The real value comes not in just getting the data, but in being able to put it to use in meaningful ways. Beyond accurately transcribing customer conversations, an AI-driven VoC program can:
- Analyze sentiment and even predict CSAT and NPS scores
- Capture customers problem statements
- Classify intent at a useful level of detail
- Spot correlations between things—for example: callbacks or sentiment by agent, intent, or length of call
- Highlight trends and anomalies in customer conversations
- Alert supervisors of coaching need by agent or topic
- Automate summary notes, providing cleaner data for analysis and better records for future customer contact
For the first time, you can effectively measure the quality of service you are delivering for every product, every interaction, every agent.
Cultivating VoC of this depth can do more than help manage and optimize CX operations. It has the power to influence business as a whole. CX leaders become the ultimate advocate for the customer, able to synthesize customer wants and needs as they relate to every stage of the customer journey. This elevates their stature in the organization, as they become trusted sources for insights that inform key decisions and strategy aimed to build customer loyalty and grow revenue. If you’d like to hear how companies in your industry are using AI-driven speech intelligence solutions in their VOC programs, drop us a line at ask@asapp.com.
Think beyond the bot: 3 proven strategies for digital-first business success
Everyone’s talking about the importance of digital-first customer experience, especially these days—but you have to think bigger than that. It’s not enough to just have a great mobile app or self-service chatbot. Market-leading success relies on having a digital-first culture that drives your business and meets your customers where they are.
In simplest terms, ‘digital-first’ means making your operations as digital and mobile-friendly as possible, including enabling seamless communication across multiple responsive, persistent and asynchronous digital channels. Particularly in customer service, digital-first operations and the notion of engaging with your customers in the same place that they talk to their friends and family is a game-changer.
I’m not talking about technology replacing people, but about using the latest digital capabilities to make your organization radically more efficient and productive, and improving the customer experience at the same time. It’s all about empowering employees to do their best work, and meeting customer needs faster and better, wherever they are and whenever they need you.
Where should you focus first? I think a digital-first culture for customer service relies on a few fundamentals for meeting today’s demands. Let’s look at three proven strategies that are powering market leading companies.
1—Be where customers are—service on the go
Customer service is all about building relationships with your customers. In a digital-first world, those conversations are now across many channels from online support portals and webchat, to email, in-app, standard text messaging, and social media. Among all the options for digital engagement, asynchronous messaging has emerged as a clear winner with consumers. In fact, messaging has been the dominant way in which people communicate since over a decade ago. It makes sense that they want to engage with brands they love in the same place, given the familiarity and convenience.
For digital-first customer service, providing messaging capabilities is now the essential way to be there for customers in the moment. It gives customers a fast, convenient way to reach out, and can make it easier to resolve service issues cost-efficiently on the first contact.
Big tech leaders paved the way with technologies like Apple Business Chat and Google Business Messaging, enabling companies to integrate messaging at multiple touch points. New AI-powered solutions are now taking it a step further.
I like to call this the ‘asynchronous revolution.’ Digital-first leadership now demands multimodal customer service that supports ongoing conversations. The “start and stop” flexibility that people love so much in their personal messaging is now a business-critical interaction model that today’s digital consumers expect.
2—Make conversations seamless across channels
Being there for customers in digital messaging is key—but what if they need to pause the conversation or switch channels in the middle of an issue? Typically it means starting all over again, which is a frustrating experience. When people have to explain their problem to multiple agents, customer satisfaction drops off a cliff.
Delivering true omni-channel support is the next essential in a digital-first business, and it is finally here.
Whether consumers reach out via messaging, webchat, or voice, all of these channels must be intelligently integrated to ensure customer service agents really know each person. Employees need to be equipped with the right interaction history and procedural knowledge to quickly move toward resolution, no matter where or when that agent stepped into the conversation.
Many companies are finding that’s the most critical customer experience problem to solve, and the biggest source of frustration for consumers who need help. That’s why I’m so excited when I see new technologies seamlessly unify and thread conversation histories across channels. Innovative solutions are finally bringing a holistic approach to customer service.
Someone can request assistance in an app, then switch to a webchat, mobile messaging, or even voice, and any agent will know all the relevant details. It ensures the customer has an easy, cohesive experience, and enables agents to work more efficiently.
In this way, AI helps fuel a digital-first business and empower support and sales teams to treat every customer interaction as a moment that matters.

AI amplifies the effectiveness of digital-first channels and new techniques help to accelerate adoption. Powering digital and mobile touch points with data-driven intelligence also radically increases productivity at a lower cost—while delivering responsive, personalized customer experiences that win brand loyalty.
Michael Lawder
3—Empower employees with predictive capabilities
Once you’ve got digital messaging and omni-channel support in place, it’s time to kick it up a notch by augmenting customer service agents and sales teams with predictive and highly contextualized knowledge, powered by self-learning AI models. Most companies have vast volumes of data that can be harnessed to make support and marketing efforts substantially more productive.
Here’s where cutting-edge AI solutions really stand out. They can transcribe and analyze digital and voice conversations in real time, and integrate those insights with other transactional and historical data. The system can then proactively guide agents and sales reps with the most accurate and relevant information for a given customer need or situation, ensuring the best answer and outcome every time.
Those ‘conversational analytics’ translate into predictive insights that deliver powerful benefits:
- Dramatically improves productivity
- when your workforce knows exactly what to say and do through every interaction, without having to dig for details.
- Captures the knowledge
- of your best agents and sales reps. Sometimes that’s the best ‘data’ you’ve got, and it’s a lot more valuable when everyone can access it.
- Delivers data-driven intelligence
- to sales and marketing teams, making it easier to improve your operation and tailor customer experiences based on predictive insights instead of best guesses.
Ultimately, I believe the power of AI and predictive technologies like ASAPP are defining the future of digital-first business. This is the first time in the history of the customer service industry that we can simultaneously meet customers where they are, drive revenue growth, deliver a better customer and employee experience—and do it all at lower costs. And it’s about time.
A silver bullet to end the conflict between lower cost and better customer service
Modern CX Teams: What do they have that you don’t?
Long before the pandemic, consumers made the digital shift—but business has been slow to catch up. For over a decade, texting and digital messaging have been the most popular ways to communicate. Yet, the average Fortune 500 company still spends 80% or more of their contact center budget on voice calls—where the experience is worse and the cost is higher. Customers are handled more like anonymous callers, and the levers to drive efficiency and quality are limited.
For brands to stay competitive, customer service needs to modernize, using new technologies to really know customers and be there for them where and how they want to be supported. Call centers will still be part of the mix, but they need to deliver more value, and be integrated into a larger, holistic omni-channel support strategy. By embracing innovative capabilities, companies can deliver cost-efficient human-centric digital experiences that help win and keep customers.
Rethinking digital communications
When customers have a question or a problem, they want quick, convenient resolution on their preferred communication channel. Today that most often means digital. Customers may want to use self-service tools, reach out on social media, and chat, text, or message in a variety of ways. The best brands will make it an easy, personalized experience that nurtures loyalty and increases lifetime value, regardless of channel. There is real power in engaging with your customers in the same place they talk to their friends and family.
Many budget-minded executives see digital communications solely as a way to minimize costly engagement with customers. They measure success in deflection and containment. But that’s not a sustainable (or logical) approach. What if, instead, your digital capabilities made every interaction substantially better?

Reimagining CX with AI at the core dramatically changes the paradigm. Companies can deliver great experiences AND drive new levels of efficiency.
Michael Lawder
AI innovations are making that possible. Today, businesses can not only deliver smarter digital channels, but empower better customer service communications all around.
One of the biggest wins is meeting customer needs on the first contact. When an AI-powered solution integrates all your digital communication channels, both agents and customers have a greater opportunity to get it right the first time. If a customer does need to follow up on something, a smart system provides all the relevant details so any agent can quickly move things forward instead of starting all over.
That’s one part of a better, modernized customer experience. Taking it further, AI-driven solutions also make agents significantly more productive through every interaction. Machine learning augments CX teams with the right knowledge at the right time for faster, easier resolution. I love that some of the most advanced machine learning technologies are being used to help make people (in this case, agents) better.By dramatically increasing efficiency, customer care agents can deliver a more personalized and contextual interaction — turning potentially negative experiences into real loyalty moments.
Fueling conversation-powered operations
Conversations are the heart of customer service, and digital-first technologies can make those communications operationally more efficient. In real time, it means faster resolution for shorter and more concurrent interactions, at lower costs. In the long view, it provides a wealth of data for machine learning insights, providing ‘conversational analytics’ to fuel strategies and actions that help the business.
Think about ‘voice of the customer’ programs, where feedback mostly comes through surveys. What if you could easily uncover insights from conversations across many touch points?
After decades leading contact center organizations, I think the future of workforce management is in conversation-powered operations. Companies have a goldmine of data that isn’t being tapped. With the best of modern technology, machine learning can extract valuable customer sentiment from 100% of contacts with zero manual work instead of the old way of listening to calls and reading transcripts.
It can rapidly analyze thousands of agent conversations for internal quality assurance to improve compliance, soft skills, and process optimization. This modernization provides a significantly greater ability to understand the quality of service agents are delivering, and opens a window of insights into the enterprise. It’s a faster, smarter way to identify how and where to make both transactional and strategic improvements that are better for the business and the customer experience.
Humanizing your agents
Circling back to empowering agents, modern CX teams recognize customer service agents are the voice of their brand and should be brand ambassadors. We all know the adage about ‘making every agent as good as your best agents’—but you can only get there if you support them well. Instead of juggling multiple tools and inefficient processes, agents need to be able to focus on ensuring each customer feels known, valued, and supported.
That’s the kind of experience that powers a brand. And that’s where ASAPP really shines.
ASAPP modernizes CX with a streamlined, unified platform to easily support customers across channels. With AI-driven predictive knowledge, agents know what to say and do to serve each customer better and faster. It uses the power of technology not to replace the human touch or hide from customers, but to make contact more personalized. Ultimately, that’s the path to greatness for a brand…empowering both sides of the conversation with the right balance of digital efficiency and an emotional connection. That builds relationships with your customers and makes great customer experience a core part of the value proposition of your brand.
If you're missing the signals you'll likely miss the sale
ASAPP tops ASR leaderboard with E‑Branchformer
ASR technology has been beneficial for businesses and their customers for many years. ASR, or Automatic Speech Recognition, is the software that translates human speech into text. With continual advancements in research and AI modeling, accuracy has improved immensely over time. Developing the most accurate ASR possible has become a high priority for many top tech companies because of how much it benefits businesses when it’s done correctly.
ASR’s primary goal is to maintain high recognition accuracy. There are various units of evaluating recognition rates or error rates, such as phonemes, characters, words, or sentences. The most commonly used method to determine the accuracy of ASR is Word Error Rate (WER).
Healthy competition
To fairly compare AI speech recognition research studies across the industry, we evaluate WER on public datasets. Librispeech, one of the most widely used datasets, consists of about 1000 hours of English reading speech with transcription and extra text corpus. Researchers worldwide have been competing for years to substantiate their methods’ superiority using the Librispeech dataset and WER.
Recently, the speech community has been trending towards end-to-end (E2E) modeling for ASR. Instead of having separate acoustic and language models, as in conventional ASR methods, E2E modeling has achieved great success in both efficiency and accuracy by simultaneously training a single integrated model.
Although several E2E model structures, such as Transducer and Attention-based Encoder-Decoder (AED) have been explored, most of them share a common encoder, the module that extracts meaningful representative information from the input speech.
Speech scientists, looking to create a more powerful encoder, are actively studying novel training objectives, acoustic feature engineering, data augmentation methods, and self-supervised learning using untranscribed speech.
But these research areas don’t address a fundamental question, “What is the optimal neural network architecture for constructing the encoder?”
To address this question, ASAPP researchers recently developed the E-Branchformer model, a highly accurate neural network. Other similar models include Transformer, Conformer, and Branchformer; however, the E-Branchformer surpasses these models in accuracy. Here’s a quick overview of the different models ASAPP used to develop E-Branchformer.
Transformer
The Transformer has shown promising performance in several sequence modeling tasks for ASR and NLU (natural language understanding). This potential is due to the strength of multi-headed self-attention, which can effectively extract meaningful information from the input sequence, while considering the global context.
Conformer
To improve the Transformer, many methods have been introduced and utilized to create synergy by applying convolution, which has advantages in modeling the local context.
In particular, Conformer was introduced and is widely considered as the state-of-the-art accuracy in Librispeech ASR tasks.
By evaluating with an external Language Model (LM) trained using Librispeech text corpus, Conformer achieves 1.9% and 3.9% WER on test-clean and test-others, respectively. Although Conformer demonstrates that stacking convolution sequentially after self-attention is a better method than using them in parallel, other research studies, like Branchformer, have applied these two neural networks in parallel, and found performance to be noticeable.
Branchformer
Branchformer was introduced with three main components:
- Local-context branch using MLP with convolutional gating (cgMLP)
- Global-context branch using multi-headed self-attention
- Merging the module with a linear projection layer
Each branch is computed in parallel before results are merged. Through intensive experiments, Branchformer showed comparable performance with Conformer. Other experiments stacked different combinations by mixing Branchformer and Conformer, but didn’t achieve better results.
E-Branchformer
Inspired by the Branchformer studies, ASAPP researched how convolution and self-attention can be combined more effectively and efficiently.

This resulted in the highest performing model, E-Branchformer, setting the new state-of-the-art WER at 1.81% and 3.65% on Librispeech test-clean and test-other with an external LM.
Kwangyoun Kim
To develop E-Branchformer, we made two primary contributions to Branchformer that significantly improved performance.
- We enhanced the merging module, which combines the output of the global and local branches, by introducing additional convolutions. This change has the effect of combining self-attention with convolution sequentially and in parallel. Through extensive experiments on several types of merge modules, we proved that adding a single depth-wise convolution can significantly improve accuracy with negligible computational increase.
- We revisited the point-wise Feed-Forward Network (FFN). Transformer and its variants commonly stack FFN with self-attention in an interleaving pattern. We experimentally demonstrated that even in Branchformer, stacking FFN together is more effective in improving the model’s capacity. For example, we found that a stack of 17 Branchformers and 17 FFNs in an interleaving pattern has a similar model size to the 25 Branchformers, but is much more advantageous in accuracy.
ASAPP has topped the leaderboard of WER in Librispeech ASR tasks by using the newly proposed E-Branchformer. We are confident that this new model structure can be applied to other tasks and achieve impressive results.
We’re sharing our findings with the community so that everyone can benefit from them. You’ll be able to find all of the detailed methods and experimental results in our upcoming paper, which has been accepted and will be presented at SLT 2022. We’ll also release more information about how we implemented E-Branchformer. Our models’ recipes will be available through ESPnet, so anyone who wants to can reproduce our results. If you’d like to talk about E-Branchformer in person, please reach out to us during the SLT 2022 conference.