Data Science with Human in the Loop @Faculty of Science #Leiden University

Cognitive Computing
with Human in the Loop
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Lora Aroyo
Web & Media Group, VU
IBM Center for Advanced Studies (CAS)
Harnessing User Semantics at Scale

Who am I …
Vrije Universiteit Amsterdam
computer science professor
heading web & media group
Amsterdam Data Science
IBM Center for Advanced Studies, Amsterdam
research associate
leading cognitive computing & crowdsourcing team
Columbia University, NY
visiting scholar
computer science, NLP, Computer Vision
Columbia Data Science
Tagasauris Inc, NY
Chief of Science

VU Web & Media Group …
Tobias Kuhn
Davide Ceolin
Victor de Boer
Jan Wielemaker
10 PhD Students
Lora Aroyo

VU Web & Media Group …
Tobias Kuhn
Davide Ceolin
Victor de Boer
Jan Wielemaker
10 PhD Students
Lora Aroyo
Intelligent & Interactive Information Systems
enriching metadata & content of digital collections
content analysis for entity extraction
modeling provenance in digital collections
tracking changes over time
augmenting online multimedia
text & video summarization
interactive product placement, hotspots
assessing quality of web data
bias, controversy, opinions, perspectives
uncertainty, ambiguity
trust, privacy

… but they don’t actually understand people
software systems becoming ever more intelligent

not all human knowledge can yet be captured by machines
for wide ranges of real-world contexts
Knowledge Representation
aims at human knowledge in machine-readable form

all the information machines have
is all the information there is

there is always something else …

key scientific challenge:
capturing human knowledge
at scale and adequate to real-world needs

Human Computation:
how human intelligence at scale can be used to
improve machine-based knowledge

understanding human computation:
improving how machine-based systems
acquire, capture & harness human knowledge

… understanding the data
variety of meanings
multitude of perspectives
abundance of sources
endless applications

… understanding the crowds
volunteers
enthusiasts
visitors on-site
visitors online
paid crowds
in-house experts
understand who are the different crowds
what can they do for your collection

http://crowdtruth.org/
framework that facilitates
data collection, processing & analytics
of human computation knowledge

“best collective decisions are
result of disagreement,
not consensus or compromise”
James Surowiecki

disagreement = signal

disagreement is signal
for the natural ambiguity of language and
diversity & perspectives of human interpretation

http://controcurator.org/

X
Interac(ve Explora,on & Discovery in Context
building automa(c storylines (narra(ves)
DIVE+
Aggregated views over the collec(on
collec(ng perspec,ves from crowds & niches
http://diveproject.beeldengeluid.nl/

VOTE for DIVE: https://summit2017.lodlam.net/2017/04/12/dive-explorative-search-for-digital-humanities/

VU – IBM CAS Team

Victor de Boer Lora Aroyo Oana Inel
Chiel van den Akker Susan Legêne

Carlos MarAnez OrAz
Werner Helmich
Berber Hagedoorn Sabrina Sauer

Liliana Melgar
Johan Oomen Jaap Blom

https://www.rijksmuseum.nl/en/rijksstudio
Crowds for Co-crea-on Data

… by user-driven augmentations
of exiting online collections

Nichesourcing with Experts
http://annotate.accurator.nl

niches of people with the right expertise to
contribute specific information

Train Lay Crowds to be Experts
training the general crowd to be a niche:
game in which players can carry out an expert
annotation tasks with some assistance

http://spotvogel.vroegevogels.vara.nl
Volunteer crowds for continuous gaming

Paid Crowds for Video Analysis
CrowdTruth.org

Paid Crowds for Text Analysis
CrowdTruth.org

Paid Crowds for Image Analysis
CrowdTruth.org

Challenge 1: Typically undertaken in isolation
Challenge 2: Difficult to estimate & control the time to complete
Challenge 3: Difficult to assess & compare quality
Challenge 4: Demands continuous promotional effort
Challenge 5: Active learning (human-in-the-loop) needs different
expertise
Challenge 6: Challenging for institutions to incorporate
crowdsourcing results into their existing content infrastructure
Crowdsourcing Challenges

measure & assess
ensure impact
•  be aware of the channel, e.g. Wikipedia,
Wikimedia, Facebook

Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo
(2011). On the role of user-generated metadata in audio visual collections. International
conference on Knowledge capture K-CAP '11, Pages 145-152
measure & assess
monitor progress
6 months 2 years
340,551 tags 36,981 tags
137.421 matches
602 items 1.782 items
555 registered players 2,017 users (taggers)
thousands of anonymous players
12,279 visits (3+ min online)
44,362 pageviews

user vocabulary
8% in professional vocabulary
23% in Dutch lexicon
89% found on Google
locations (7%)
engeland
persons (31%)
objects (57%)
measure & assess
evaluate content, compare crowds
88% of the tags useful for specific genres

disagreement signals ambiguity
if people disagree then it will be more difficult for a
machine to classify that example

http://mediasuite.clariah.nl/

1998
from DVDs to data science

1998 2006
1 million dollar prize
for best algorithm

Netflix switches to streaming
20071998 2006

Team BellKor wins Netflix Prize
20071998 2006 2009

From Jeopardy to real-world problems
2011 2017

data is at the centre of every process

data is essential to evolve with users

Data Science with Human in the Loop @Faculty of Science #Leiden University

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie Data Science with Human in the Loop @Faculty of Science #Leiden University

Ähnlich wie Data Science with Human in the Loop @Faculty of Science #Leiden University (20)

Mehr von Lora Aroyo

Mehr von Lora Aroyo (17)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Data Science with Human in the Loop @Faculty of Science #Leiden University