Training Chatbots and Conversational Artificial Intelligence Agents with Amazon Mechanical Turk and Facebook’s ParlAI - MCL349 - re:Invent 2017

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training Chatbots and Conversational
Intelligence Agents with Amazon
Mechanical Turk and Facebook’s ParlAI
J a c k U r b a n e k – F a c e b o o k
N o v e m b e r 2 0 1 7
M C L 3 4 9

Session preview
• What is ParlAI and what is it trying to solve?
• Brief intro to Amazon Mechanical Turk (MTurk)
• How we collect conversational data with MTurk
• Optimizing for the human element
• How to leverage ParlAI for your problem

Who am I?
• Research Engineer on Facebook AI Research
(FAIR)
• Engineer on the ParlAI team
• Primary contributor to ParlAI’s
MTurk implementation
• User of ParlAI-MTurk for data collection

Why ParlAI?
Quick NLP primer
Issues in current dialogue agent creation efforts and tasks
Motives for a dialogue research platform
ParlAI and its features
I n t r o d u c t o r y m a t e r i a l s

NLP is difficult because language is imprecise.
One fundamental goal:
• Enable human ⟷ computer dialogue
Dialogue is broken into 1000’s of tasks with:
• Different skill requirements
• A shared input/output format
Most NLP research attempts are siloed:
• They focus on only a subset of tasks
NLP primer

The issue of siloed research
Take two dialogue tasks:
• Question Answering (QA) and chit-chat
One popular QA Dataset is Stanford’s (SQuAD)
• It maps a question and Wikipedia paragraph
pair to the answer’s start/end indices in that
paragraph
A model trained to perform really well on
SQuAD will not generalize to chit chat, even
though they share the same core of requiring
contextual language understanding.
A mock SQuAD-like interaction
Who won the 2017
Super Bowl?
(94,114)

Why a dialogue research platform?
• Testing on multiple tasks can expose model weaknesses
• Multi-task training may enable a broader sense of learning
• Standardized method for training and data collection encourages
sharing of compatible datasets
• Better allow the NLP community to share, test, and iterate on models

ParlAI features
• Unified framework/API for training and evaluation of dialogue models
• Many easy-to-access tasks to train and evaluate on
• Multi-task training over any tasks
• Supports both supervised and interactive (online and reinforcement
learning) tasks
• Supports other media including images
• Existing models to work from
• Data collection and model evaluation through
Mechanical Turk
• Open Source

ParlAI features: Tasks
QA datasets
SQuAD
bAbI tasks
MCTest
SimpleQuestions
WikiQa, WebQuestions,
WikiMovies, MTurkWikiMovies
MovieDD (Movie-Recommendations)
MS MARCO
TriviaQA
InsuranceQA
Dialogue Goal-Oriented
bAbIDialog tasks
Dialog-based Language Learning bAbI
Dialog-based Language Learning
Movie
MovieDD-QARecs dialogue
personalized dialog, bAbI+
Visual QA / Visual Dialogue
VQAv1, VQAv2
VisDial, FVQA
CLEVR
Sentence Completion
QACNN
QADailyMail
CBT
BookTest
Dialogue Chit-Chat
Ubuntu
Movies SubReddit
Cornell Movie
OpenSubtitles
Negotiation
Deal or No Deal?
Machine Translation
WMT EnDe (in progress)

ParlAI features: Basic implementation
Main classes:
• world – Defines the environment and drives interaction between agents
• agent – A communicator in the world
• teacher – An agent that talks to learning agents, implementing a task
• action – A Python dict that passes text, labels, and rewards between agents
teacher = SquadTeacher(opt)
agent = MyAgent(opt)
world = World(opt, [teacher, agent])
for i in range(num_exs):
world.parley()
print(world.display())
def parley(self):
for agent in self.agents:
act = agent.act()
for other_agent in self.agents:
if other_agent != agent:
other_agent.observe(act)
Main code to train an agent and
print results of each example
Implementation of world.parley in
which each agent acts in turn
while others observe

ParlAI features: Agents
drqa: an attentive LSTM model DrQA (Chen et al., 2017) implemented in PyTorch
that has competitive results on SQuAD amongst other datasets.
memnn: code for an end-to-end memory network (Sukhbaatar et al., 2015) in Lua
Torch.
seq2seq: basic sequence to sequence model (Sutskever et al., 2014).
ir_baseline: information retrieval baseline that scores responses with TFIDF
matching.
remote_agent: basic class for any agent connecting over ZeroMQ.
local_human: keyboard input replaces an ML agent.
repeat_label: basic class for merely repeating all data sent to it
mturk_agent: human worker on MTurk is able to act in a ParlAI world
More details and overall use instructions at parl.ai.

Mechanical Turk and How We Use It
Intro to Mechanical Turk
Summary of our MTurk use
ParlAI’s MTurk operational flow
I n t r o d u c t o r y m a t e r i a l s

Simple intro to Mechanical Turk
Crowdsourcing internet marketplace
for tasks computers currently can’t do.
Requesters pay people to handle bulk
work.
Workers complete this work in the
form of human intelligence tasks
(HITs) and you get the results.

Intro to Mechanical Turk
• HITs are created through a simple templated workflow.
• When workers complete a HIT, you review their work to accept/reject it.
• If you reject the work, you are refusing to pay. Keep in mind that
these are people and this is their work.
Reviewing work for an image tagging taskCreating MTurk Project

How ParlAI uses MTurk
MTurk workers act remotely within a ParlAI
world we can collect data from.
We are able to have workers interact with
models, then rate the model.
We support automated review where
appropriate.
Interactions with MTurk are almost entirely
programmatic.

ParlAI MTurk functionality

Engineering Goals and Challenges
Completely programmatic interactions with external services
Ability to enable easy creation of arbitrary conversational tasks
Support for multiple actors or trained models
Method for preparing workers for a task
Options for automated work approval
B u i l d i n g P a r l A I M T u r k

Arbitrary conversational tasks
Problem: Need complete control over what we can show workers in order
to support arbitrary chats.

Supporting arbitrary content
Solution: Use MTurk’s programmatic interface and
support for external endpoints to be able to connect
to its workers while retaining control of our content.
1. Set up an external server
2. Host the HIT details there externally from MTurk
3. Create an “ExternalQuestion” HIT pointing to the
server
4. Collect data from the server
This is all done programmatically whenever a ParlAI
user wants to collect data.

Supporting arbitrary content
Implementation:
• HIT details – simple python dictionary
• Frontend – templated HTML and
JavaScript
• Server – initialized on per-task basis
Users can set up a task with no additional
MTurk or server knowledge required
Creating complex tasks requires writing
only additional task-related code using
templating
HIT content as delivered by the external server

Handling multiple responsive actors
Problem: The normal MTurk flow doesn’t natively line up with our use
case.
Solution: Link multiple HITs together within our server.
Single worker per task
instance

Handling multiple actors
Implementation:
• The server acts as a pass-through between
workers
• A worker’s messages are handled as acts in
ParlAI
• Workers receive ParlAI observations
Easy to swap other agents like pre-trained
models in for workers, allowing workers to test
your models.

Preparing workers for tasks
Problem: Conversational tasks can be complicated or unclear, and
qualification tests don’t always provide the context to prepare a worker for
a task.
Solution: Onboard workers within a task.
It can be unclear how to prepare a worker

Preparing workers for tasks
Implementation:
ParlAI has this functionality through
onboarding worlds that provide:
• Specific turn-based steps
• Mocks of the real task
• Filtering of workers who cannot
complete the task
• Option to only onboard workers the
first time they take your HIT Onboarding worlds can quiz workers
before they are added to the available
worker pool

Automated approval of work
Problem:
• Models may require a lot of data to produce good
results
• Conversations can be hard to judge as properly fitting
into the dataset you were trying to create
• It can take nearly as much time to verify the examples
manually as it did to collect them in the first place
Solution: Strive for automated approval of work.
Implementation: Rule-based verification of data.

The Human Element – Lessons Learned
Understanding worker interaction with tasks
Handling disconnects and abandoned work
Improving results by improving tasks
Managing unintended task abuse
B u i l d i n g P a r l A I M T u r k

Understanding the MTurk workflow
Problem: MTurk workers don’t necessarily take one task at a time, they often try to
optimize their work output which can lead to unexpected behaviors.
Solution: Be aware of how workers interact with and claim tasks, and the generally
asynchronous nature of the MTurk interface. Set reasonable task expiration times.
Initial test was stalled when a worker
quickly queued all 8 of the test HITs and
nobody else was able to claim them
Initial test left a worker waiting in
pool for 30 minutes after another
worker abandoned a HIT without
returning it

Handling disconnects and abandons
Problem: Workers may disconnect or leave the other person or people
hanging.
Always have to remember that these are people – it won’t feel good to
have one’s work ripped away from them due to others.
Worker interaction isn’t always perfect

Handling disconnects and abandons
Solution:
ParlAI MTurk implements functionality to
improve these situations.
• Optional paying out to abandoned
workers
• Allow tasks to set a maximum act time
before the worker is considered inactive
and disconnected
• Support reconnecting within a timeframe
• Explain all failure states to the user when
they happen
Text displayed when a partner
disconnects

Improving results by improving tasks
Work on balancing task length and pay
Workers are less likely to take the time you may need for your dataset if their time
spent isn’t well compensated.
Engaging tasks keep people’s interest
Workers aren’t robots – if you make the tasks fun or somehow rewarding, it is a
better outcome for everyone involved. Improving their experience improves your
data and encourages more people to work on your tasks.
Clear tasks lead to proper output
Ensuring that workers fully understand your task and intention is a shortcut to
quality data.

Preventing task abuse
Some workers aren’t going to produce the kind of data you want.
Oftentimes they may optimize an unclear or tedious problem in an
unintended way that makes the data produced invalid or otherwise
unwanted.
While rare, these can be mitigated by a combination of:
• Clarifying the problem and setting clearer restrictions of expected
behavior
• Checking and filtering out specific bad behavior from your results
• Blocking workers who continue to abuse your HITs

How to Use ParlAI MTurk
Setting up HIT details
Creating and running your HIT
Extended use cases
Examples
A c c o m p l i s h i n g y o u r g o a l s

Setting up HIT details
Starting a ParlAI MTurk task begins
with creating a task config file to
customize MTurk display information:
• hit_title
• hit_description
• hit_keywords
• task_description
Use this file to catch workers’ attention
and give them an overview of what to
expect.
task_config = {}
task_config['hit_title'] =
’Simulating a Customer Service Interaction’
task_config['hit_description'] =
’’’Play the role of either Customer Service or a
customer with a problem and attempt to solve the
problem through dialog with another MTurk worker’’’
task_config['hit_keywords'] =
'chat,dialog,customer service’
task_config['task_description'] =
''’In this task, you will be assigned the role of a
customer or a customer service rep. As a customer, you
will be given a problem and have to communicate it to
the rep, then confirm the solution they suggest
trying. As the rep, you must offer a solution to the
customer and ensure that their problem is solved.'''
Example HIT setup file

Setting up ParlAI World
Much of ParlAI’s MTurk functionality can be customized by implementing a
few functions. Most functionality can be altered within just parley.
Stubs and examples are available on our GitHub.
class MTurkCustomerServiceWorld(MTurkTaskWorld):
def parley(self):
if not self.is_init:
self.workers[0].observe(self.cust_task)
self.workers[1].observe(self.rep_task)
self.is_init = True
else:
customer_act = self.workers[0].act()
self.process_customer_act(customer_act)
rep_act = self.workers[1].act()
self.process_rep_act(rep_act)
def process_customer_act(self, act):
if act[‘type’] == ’action’:
if act[‘action’] == self.cust_task.req_action:
self.problem_resolved = True
else: # action type is message
self.worker[1].observe(act)
def process_rep_act(self, act):
if act[‘type’] == ‘action’:
if act[‘action’] == ‘resolve’:
self.episode_done = self.problem_resolved
else: # action type is message
self.worker[0].observe(act)
Example ParlAI parley code

Creating and running the HIT
Running a ParlAI MTurk hit is as
simple as calling the run file for
your task with a few flags:
• -nc – Number of conversations
• -r – payout reward per conversation
• --unique – only allows each worker to
complete this task once
• --count-complete – only count
finished conversations towards the
number requested
• --sandbox/--live – run the HIT on the
MTurk sandbox server or push it live to
workers
Detailed explanations for running a
hit are available on our GitHub
>> python3 run.py –nc 15 –r 0.1 --sandbox --count-complete
[ optional arguments: ]
[ datapath: /Users/jju/ParlAI/data ]
[ Mechanical Turk: ]
[ mturk_log_path: /Users/jju/ParlAI/logs/mturk ]
[ num_conversations: 15 ]
[ unique_worker: False ]
[ reward: 0.1 ]
[ is_sandbox: True ]
[ hard_block: False ]
[ count_complete: True ]
You are going to allow workers from Amazon Mechanical Turk to
be an agent in ParlAI.
During this process, Internet connection is required, and you
should turn off your computer's auto-sleep feature.
Please press Enter to continue...

Custom HIT pages
Grounded dialogue often requires additional UI
elements. For this we provide the ability to use
custom HTML in the task.
Additional JavaScript can also be used to allow
for interactions with buttons and additional UI
elements to be sent through to the ParlAI
world as well.
Information can be sent from the ParlAI world
to be rendered on the frontend, allowing
conversations to be grounded on something
determined by the ParlAI world.

The finished experience: Customer

The finished experience - Rep

Extended use cases
ParlAI MTurk supports much more than can be explained here.
• Filtering workers through requirements: Allow successful
workers to continue working on your specific HITs without
damaging the reputation of unsuccessful workers with blocks
• Repeat worker role assignments: Ensure that workers are
only given a specific role in a conversation in cases where
experiencing more than one role would disturb the task
results
• Task experimentation: Run one task with multiple worlds or
options, randomly assigning workers to different variants in
order to collect experimental data within one HIT
• Hands-free iteration: Use MTurk in an evaluation loop
combined with Task Experimentation in order to iterate on
optimizing a model with no concrete automated evaluation
metric with no additional interaction

Major Takeaways

Major takeaways
• Lots of unexpected tasks can be done through MTurk if you’re willing to experiment
• Workers are humans, thus better and more clarified experiences drive better data
• ParlAI MTurk can enable both data collection and model evaluation for your dialogue needs
• Simple conversation tasks can be created with almost no new code
• Grounded conversation tasks are easily enabled by existing ParlAI MTurk
frameworks
• Bonus: ParlAI MTurk is open source and still growing. Pull requests are always welcome, and
ideas for features or improvements may be addressed if they can improve the way that
ParlAI supports research.

Questions?

Thanks for attending!

Training Chatbots and Conversational Artificial Intelligence Agents with Amazon Mechanical Turk and Facebook’s ParlAI - MCL349 - re:Invent 2017

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Training Chatbots and Conversational Artificial Intelligence Agents with Amazon Mechanical Turk and Facebook’s ParlAI - MCL349 - re:Invent 2017

Ähnlich wie Training Chatbots and Conversational Artificial Intelligence Agents with Amazon Mechanical Turk and Facebook’s ParlAI - MCL349 - re:Invent 2017 (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Training Chatbots and Conversational Artificial Intelligence Agents with Amazon Mechanical Turk and Facebook’s ParlAI - MCL349 - re:Invent 2017