⚠️ Unsupported Browser

Your browser is not supported.

The latest version of Safari, Chrome, Firefox, Internet Explorer or Microsoft Edge is required to use this website.

Click the button below to update and we look forward to seeing you soon.

Update now

A Repository of Conversational Datasets

Matthew Henderson
15 Apr 2019 - 2 minutes read

Progress in Machine Learning is often driven by large datasets and consistent evaluation metrics. To this end, PolyAI is releasing a collection of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation framework for models of conversational response selection.

We are initially releasing three large conversational datasets — Reddit, OpenSubtitles, and AmazonQA.

Conversational response selection, the task of identifying a correct response to a given conversational context, provides a powerful signal for learning implicit semantic representations useful for many downstream tasks in natural language understanding. Models of conversational response selection can also be directly used to power dialogue systems, question answer features, and response suggestion systems.

We hope that these datasets can provide a common testbed for work on conversational response selection. The 1-of-100 accuracy metric, which measures how often the correct response is selected over 99 random responses, allows for direct comparison of models.

1-of-100 accuracy (%) results for various baselines on the three datasets. The PolyAI encoder model is a deep neural network model trained to project contexts and responses into a shared high dimensional vector space. For full details, see the paper on arXiv.

For full details, see the Conversational Datasets GitHub repository, and our paper on arXiv. The GitHub repository contains scripts to generate these datasets, implementations of various conversational response selection baselines, and tables of benchmark evaluation results.

We welcome contributions to the GitHub repository, for new datasets, new evaluation results, new baselines etc.

Thanks to my colleagues at PolyAI.

Get more like this!

Get the latest on conversational AI for customer service

  • This field is for validation purposes and should be left unchanged.