Imagine you’re booking airline tickets through a conversational AI assistant, and after purchasing tickets, you ask for help in finding an in-home pet sitter during your trip. The conversational AI misinterprets what you mean, and instead shares details on how to board your flight with pets. This has an obvious reason: the AI has never encountered this particular task, and was unable to map it to a procedure. Thus, your request to find an in-home pet sitter was out of the distribution of what the assistant was trained to handle. Alternatively, suppose you had asked about upgrading your flight, but the system confuses your request as wanting to update your flight to a different date. In this case, the AI assistant is capable of managing flights but was unable to complete the request due to a dialogue breakdown. In both cases, we arrive at the same result: a failed conversation.
Both out of distribution requests and dialogue breakdowns described above are considered out-of-scope (OOS) situations since they represent cases that your assistant is unable to handle. To avoid customer frustration, detecting OOS scenarios becomes an essential skill of today’s conversational AI and dialogue systems. While the ideal conversational AI agent would be able to help find an in-home pet sitter as requested and manage all the complex nuances of natural language, this is simply not possible given that training data is finite and consumer queries are not. So knowing when the user is asking something in-scope vs out-of-scope can help refine conversational AI systems into better performing in their core tasks.
It can be hard to provide training data for, or even enumerate, the potentially limitless number of out-of-scope queries a dialogue system may face. However, new ASAPP research presented at the conference on Empirical Methods in Natural Language Processing (EMNLP) offers a novel way to address this limited-data problem.
Out-of-Scope Detection with Data Augmentation
We introduce GOLD (Generating Out-of-scope Labels with Data augmentation), as a new technique that augments existing data to train better out-of-scope detectors operating in low-data regimes. The key insight is that rather than training on in-scope data alone, our proposed method operates on out-of-scope data as well. Furthermore, we discover that common NLP techniques for augmenting in-scope data, such as paraphrasing, do not provide the same benefit when working with out-of-scope data.
GOLD works by starting with a small seed set of known out-of-scope examples. This small amount (only 1% of the training data) is typically used by prior methods for tuning thresholds and other hyperparameters. Instead, GOLD uses this seed set of OOS examples to find semantically similar utterances from an auxiliary dataset, which yields a large set of matches. Next, we create candidate examples by replacing utterances in the known out-of-scope dialogues with the sentences found in extracted matches. Lastly, we filter down candidates to only those which are most likely to be out-of-scope. These pseudo-labeled examples created through data augmentation are then used to train the OOS detector.
The results? State-of-the-art performance across three task-oriented dialogue datasets on multiple metrics. These datasets were created by post-processing existing dialogue corpora spanning multiple domains with multi-turn interactions. Notably, the out-of-scope instances were designed as a natural progression of the conversation, rather than generated through synthetic noise or negative sampling.
Why this matters
Data augmentation is a popular method to improve model performance in low-resource settings, especially in real life settings where annotating more examples can quickly become cost-prohibitive. With just a small seed of out-of-scope examples, GOLD achieved a 10X improvement in training out-of-scope detectors compared to using the seed data alone. Previous methods relied on using tremendous amounts of labeled out-of-scope data that is unrealistic to obtain in real-world settings or relied on in-scope data alone which doesn’t provide sufficient signal for detecting OOS items.
GOLD supports robustness and prevents overfitting by relying on other methods during the filtering process. As other out-of-scope detection methods improve over time, GOLD can take advantage of those gains and improve as well.
At ASAPP, we are exploring similar methods in our products to both reduce out-of-scope issues in our conversational systems, as well as improve overall systems when operating in limited data regimes. If you’re a researcher conducting work to detect more granular levels of errors, or more sophisticated methods of data efficiency, we’d love to chat! Give us a tweet at @ASAPP.
Derek Chen is a Research Scientist at ASAPP designing intelligent dialogue systems with stronger natural language understanding capabilities. He received his Masters in Computer Science from the University of Washington and his undergraduate degree from UC Berkeley. His research is focused on data efficiency methods including active learning, data augmentation and meta-learning. He is also interested in techniques surrounding uncertainty measurement so that a dialogue agent can better manage ambiguity and out-of-scope situations.