5. The (voice) bots are coming…
“Bots are the new apps''
because they ”fundamentally
revolutionize how computing is
experienced by everybody.”
Microsoft’s CEO Nardella
7. Can we use machine learning for customer
facing applications?
Which machine learning methods are suitable?
Will future machines speak neuralese?
Machine Learning for
Conversational AI Systems
What do we learn when learning from “big
data”?
8. Machine Learning for Conversational AI
• Task-driven Statistical Dialogue Systems
– Reinforcement Learning
– Results from the E2E Generation Challenge
• Social Chatbots
– Seq2Seq models
– Amazon Alexa Challenge
• Future challenges
– Evaluation
– Data
– Ethics
9. Spoken Dialogue System Architecture
e.g. Rieser & Lemon,
Comp. Ling. 2011,
ACL’10,’08,’06
e.g. Rieser et al.,
ACL’05,’09,’10,’16
EMNLP’12,’15,’17,EACL’09,’
14
e.g. Boidin &
Rieser,
Interspeech’09
11. Reinforcement Learning
Qp
(s,a) = Tss'
a
s'
å [Rss'
a
+gVp
(s')];
Bellmann optimality equation (1952), see [Sutton and Barto, 1998].
V. Rieser (PhD thesis 2008): Bootstrapping Reinforcement Learning-based Dialogue Strategies.
*Winner of the Eduard-Martin Prize for outstanding research
12. Drawbacks of RL for dialogue
• Requires many training episodes.
– Simulated users [Rieser & Lemon, 2006]
• Manual specification of learning problem.
– What is a good reward function/ state space representation?
[Rieser & Lemon, 2008]
• System outputs are usually hand-crafted.
– Mismatch between “what to say” and “how to say it” [Rieser &
Lemon, 2009]
13. • Learn from “raw” dialogue data (e.g. movie
subtitles).
• No semantic or pragmatic annotation required.
Input-output
mapping
End-to-End Response Generation
15. The E2E Data Set
name [Loch Fyne],
eatType[restaurant],
food[Japanese],
price[cheap],
kid-friendly[yes]
Serving low cost Japanese style cuisine,
Loch Fyne caters for everyone, including
families with small children.
Loch Fyne is a child friendly
restaurant serving cheap Japanese
food.
50k
DATA
J. Novikova, O. Dusek and V. Rieser. The E2E Dataset: New Challenges For End-to-End
Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2017)*
* Nominated for best paper award!
16. The E2E NLG Challenge 2017
• Submissions: 62 systems with diverse system architectures
by 17 institutions from 11 countries, with about 1/3 of these
submissions coming from industry.
http://www.macs.hw.ac.uk/InteractionLab/E2E/
17. The E2E NLG Challenge 2017
Seq2Seq models vs. hand-engineered systems:
Natural sounding
- Complexity, length, diversity.
- Miss out on information.
- Overall quality ratings by users.
Neural NLG systems tend to settle for the most
frequent options, thus penalising length and favouring
high-frequency word sequences.
18. Machine Learning for Conversational AI
• Task-driven Statistical Dialogue Systems
– Reinforcement Learning
– Results from the E2E Generation Challenge
• Social Chatbots
– Seq2Seq models
– Amazon Alexa Challenge
• Future challenges
– Evaluation
– Data
– Ethics
22. Neural models for Alexa?
• BIG training data.
– Reddit, Twitter, Movie Subtitles, Daytime
TV transcripts…..
• Results:
2
2
23. Is big data good data?
2
3
“I can sleep with as many people as I want to” (Reddit)
“You will die” (Movies)
“Shall I kill myself?”
“Yes” (Twitter)
“Shall I sell my stocks and shares?”
“Sell, sell, sell” (Twitter)
24. 24
Alana Architecture
Bot Ensemble
Persona: What’s your favourite food? I love bytes.
News: Here is what happened to Donald Trump. (news)
Facts: Did you know that one day Mars will have a ring.
Wiki: Leonard Cohen’s latest album is called ‘You Want It Darker’.
….
Neural Ranker
Persona
News
Facts
Wiki
…
User utterance,
social signals,
current plan,
state of the world
Dialogue
history
Multimodal output:
• Speech
• Actions
• Gestures Chatbots
User utterance
25. 25
Avg duration: 2.30 mins
10% of calls over 10 mins
avg: 14.4 turns
Alexa developers
29. Las Vegas final
• 2 conversations x 3 testers = 6 conversations
• Rated by external judges
• Prague: “I want to talk about baseball”
• UW: “I want to talk about basketball”
• HWU: “I want to talk about ….. … "
29
Hi lie
32. Lessons learnt
• Evaluation standards define the game.
• Learning from big data is only valid in restricted
contexts.
• Getting LOTS of real customer data is worth it!
– over 360k rated customer interactions
41. A Neural Conversational Model
http://neuralconvo.huggingface.co/
A re-implementation of:
Oriol Vinyals and Quoc V. Le (2015). A Neural Conversational Model. ICML Deep
Learning Workshop.
d*
f*
43. Ethical Issues with Conversational AI
• Learning from biased data.
• Sexual abuse and bullying through the user.
4% of customer conversations with our Alexa
bot contain sexual harassment!
45. How do current systems behave when faced
with abuse?
What are good mitigation strategies?
Ethical Conversational AI Systems
Does learning from data introduce biases?
46. • Approx. 4% of customer interactions in our corpus!
• Fall in 4 categories as defined by Linguistic Society of
America:
“Are you gay?” (Gender and Sexuality)
“I love watching porn.” (Sexualised Comments)
“You stupid b***.” (Sexualised Insults)
“Will you have sex with me.” (Sexual Requests)
47. We insulted a lot of bots…
• Commercial:
– Amazon Alexa, Apple Siri, Google Home, Microsoft's Cortana.
• Rule-based:
– E.L.I.Z.A., Party. A.L.I.C.E, Alley
• Data-driven:
– Cleverbot, NeuralConvo, Information Retrieval (Ritter et al.
2010),
– “clean” in-house seq2seq model
• Negative Baseline: 6 Adult-only bots.
48. How do different systems react?
CommercialData-drivenAdult-only
Flirtatious
Chastising,
Retaliation
Non-sense
Flirtatious
Swearing back
Avoiding to
answer.
Amanda Cercas Curry and Verena Rieser. How Ethical are Conversational Systems?
Insights from the #MeTooAlexa Corpus on Sexual Harassment. 27th International
Conference on Computational Linguistics (COLING), Santa Fe, New-Mexico, USA.
49. Bias in the data?
• Trained a seq2seq model on “clean” data.
• Still encouraging/ flirting back.
I love watching
porn.
What shows do
you prefer?
50. How do current systems behave when faced
with abuse?
What are good mitigation strategies?
Ethical Conversational AI Systems
Does learning from data introduce biases?
51. Conclusion
• Machine Learning methods for Conversational AI
• Neural methods for task-based systems
produce natural, but often incorrect output.
• Neural methods for open-domain systems are
hard to control.
• How should a system deal with edge cases, such
as abuse?
52. Big thanks to my amazing team!
Dr. Ondrej Dusek Dr. Simon Keizer Dr. Xingkun Liu Dr. Jekaterina Novikova
Shubham Agarwal
(PhD candidate)
Amanda Cercas Curry
(PhD candidate)
Karin Sevegnani
(PhD candidate)
Xinnuo Xu
(PhD candidate)
54. … And my amazing husband!
Prof. Oliver Lemon“Dr.” Kati
55. Key References
• Amanda Cercas Curry and Verena Rieser. #MeToo Alexa: How Conversational
Systems Respond to Sexual Harassment. Second Workshop on Ethics in NLP. NAACL
2018.
• Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu,
Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. An Ensemble Model with
Ranking for Social Dialogue. In: NIPS workshop on Conversational AI, 2017.
* Finalist in Amazon Alexa Challenge
• Jekaterina Novikova, Ondrej Dusek and Verena Rieser. New Challenges For End-to-
End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL),
2017 * Nominated for best paper.
• Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Natural Language Generation
enhances human decision-making with uncertain information. Annual meeting of the
Association for Computational Linguistics (ACL), 2016.
• Eshrag Rafaee and Verena Rieser. A Hybrid Approach for Determining Sentiment
Intensity of Arabic Twitter Phrases. 10th International Workshop on Semantic
Evaluation (SemEval), 2016. * winner of SemEval'16 challenge task 7
• Verena Rieser, Oliver Lemon and Simon Keizer. Natural Language Generation as
Incremental Planning Under Uncertainty: Adaptive Information Presentation for
Statistical Dialogue Systems. IEEE/ACM Transactions on Audio, Speech and
Language Processing, Volume 22, Issue 5, 2014.
• Verena Rieser and Oliver Lemon. Reinforcement Learning for Adaptive Dialogue
Systems: A Data-driven Methodology for Dialogue Management and Natural
Language Generation. Book Series: Theory and Applications of Natural Language
Processing, Springer, 2011. * >7,500 downloads
56. Want to know more?
• Study on our MSc on Conversational AI!
• 2-year Conversion Course in AI
– No prior knowledge in programming
required!
• 12 funded DataLab scholarships available.
– Deadline: 31 May 2018
• Contact: MACSpgenquiries@hw.ac.uk
58. Key References
• Amanda Cercas Curry and Verena Rieser. #MeToo Alexa: How Conversational
Systems Respond to Sexual Harassment. Second Workshop on Ethics in NLP. NAACL
2018.
• Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu,
Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. An Ensemble Model with
Ranking for Social Dialogue. In: NIPS workshop on Conversational AI, 2017.
* Finalist in Amazon Alexa Challenge
• Jekaterina Novikova, Ondrej Dusek and Verena Rieser. New Challenges For End-to-
End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL),
2017 * Nominated for best paper.
• Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Natural Language Generation
enhances human decision-making with uncertain information. Annual meeting of the
Association for Computational Linguistics (ACL), 2016.
• Eshrag Rafaee and Verena Rieser. A Hybrid Approach for Determining Sentiment
Intensity of Arabic Twitter Phrases. 10th International Workshop on Semantic
Evaluation (SemEval), 2016. * winner of SemEval'16 challenge task 7
• Verena Rieser, Oliver Lemon and Simon Keizer. Natural Language Generation as
Incremental Planning Under Uncertainty: Adaptive Information Presentation for
Statistical Dialogue Systems. IEEE/ACM Transactions on Audio, Speech and
Language Processing, Volume 22, Issue 5, 2014.
• Verena Rieser and Oliver Lemon. Reinforcement Learning for Adaptive Dialogue
Systems: A Data-driven Methodology for Dialogue Management and Natural
Language Generation. Book Series: Theory and Applications of Natural Language
Processing, Springer, 2011. * >7,500 downloads