Data Collection and End-to-End Learning for Conversational AI
EMNLP 2019 Tutorial
3rd November, 2019.
Hong Kong, China.
A fundamental long-term goal of conversational AI is to merge two main dialogue system paradigms into a standalone multi-purpose system. Such a system should be capable of conversing about arbitrary topics (Paradigm 1: open-domain dialogue systems), and simultaneously assist humans with completing a wide range of tasks with well-defined semantics such as restaurant search and booking, customer service applications, or ticket bookings (Paradigm 2: task-based dialogue systems).
The recent developmental leaps in conversational AI technology are undoubtedly linked to more and more sophisticated deep learning algorithms that capture patterns in increasing amounts of data generated by various data collection mechanisms. The goal of this tutorial is therefore twofold. First, it aims at familiarising the research community with the recent advances in algorithmic design of statistical dialogue systems for both open-domain and task-based dialogue paradigms. The focus of the tutorial is on recently introduced end-to-end learning for dialogue systems and their relation to more common modular systems. In theory, learning end-to-end from data offers seamless and unprecedented portability of dialogue systems to a wide spectrum of tasks and languages. From a practical point of view, there are still plenty of research challenges and opportunities remaining: in this tutorial we analyse this gap between theory and practice, and introduce the research community with the main advantages as well as with key practical limitations of current end-to-end dialogue learning.
The critical requirement of each statistical dialogue system is the data at hand. The system cannot provide assistance for the task without having appropriate task-related data to learn from. Therefore, the second major goal of this tutorial is to provide a comprehensive overview of the current approaches to data collection for dialogue, and analyse the current gaps and challenges with diverse data collection protocols, as well as their relation to and current limitations of data-driven end-to-end dialogue modeling. We will again analyse this relation and limitations both from research and industry perspective, and provide key insights on the application of state-of-the-art methodology into industry-scale conversational AI systems.
Paweł Budzianowski . Machine Learning Engineer . PhD candidate @ Cambridge . EMNLP’18 Best Paper . ICASSP’18 Best Paper
Iñigo Casanueva . Machine Learning Scientist at Poly AI . PostDoc @ Cambridge . 10+ dialogue publications . PhD from Sheffield
Ivan Vulic . Senior Scientist at PolyAI . Senior PostDoc @ Cambridge . 50+ top NLP publications . PhD from KU Leuven
EMNLP Tutorial 2019 slides