Software systems are becoming ever more intelligent and more useful, but the way we interact with these machines too often reveals that they don’t actually understand people. Knowledge Representation and Semantic Web focus on the scientific challenges involved in providing human knowledge in machine-readable form. However, we observe that various types of human knowledge cannot yet be captured by machines, especially when dealing with wide ranges of real-world tasks and contexts. The key scientific challenge is to provide an approach to capturing human knowledge in a way that is scalable and adequate to real-world needs. Human Computation has begun to scientifically study how human intelligence at scale can be used to methodologically improve machine-based knowledge and data management. My research is focusing on understanding human computation for improving how machine-based systems can acquire, capture and harness human knowledge and thus become even more intelligent. In this talk I will show how the CrowdTruth framework (http://crowdtruth.org) facilitates data collection, processing and analytics of human computation knowledge.
Some project links:
- http://controcurator.org/
- http://crowdtruth.org/
- http://diveproject.beeldengeluid.nl/
- http://vu-amsterdam-web-media-group.github.io/linkflows/
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Data Science with Human in the Loop @Faculty of Science #Leiden University
1. Cognitive Computing
with Human in the Loop
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Lora Aroyo
Web & Media Group, VU
IBM Center for Advanced Studies (CAS)
Harnessing User Semantics at Scale
2. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Who am I …
Vrije Universiteit Amsterdam
computer science professor
heading web & media group
Amsterdam Data Science
IBM Center for Advanced Studies, Amsterdam
research associate
leading cognitive computing & crowdsourcing team
Columbia University, NY
visiting scholar
computer science, NLP, Computer Vision
Columbia Data Science
Tagasauris Inc, NY
Chief of Science
4. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
VU Web & Media Group …
Tobias Kuhn
Davide Ceolin
Victor de Boer
Jan Wielemaker
10 PhD Students
Lora Aroyo
Intelligent & Interactive Information Systems
enriching metadata & content of digital collections
content analysis for entity extraction
modeling provenance in digital collections
tracking changes over time
augmenting online multimedia
text & video summarization
interactive product placement, hotspots
assessing quality of web data
bias, controversy, opinions, perspectives
uncertainty, ambiguity
trust, privacy
6. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
not all human knowledge can yet be captured by machines
for wide ranges of real-world contexts
Knowledge Representation
aims at human knowledge in machine-readable form
13. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
… understanding the crowds
volunteers
enthusiasts
visitors on-site
visitors online
paid crowds
in-house experts
understand who are the different crowds
what can they do for your collection
21. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
X
Interac(ve Explora,on & Discovery in Context
building automa(c storylines (narra(ves)
DIVE+
Aggregated views over the collec(on
collec(ng perspec,ves from crowds & niches
http://diveproject.beeldengeluid.nl/
28. Cognitive Computing
with Human in the Loop
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Lora Aroyo
Web & Media Group, VU
IBM Center for Advanced Studies (CAS)
Harnessing User Semantics at Scale
34. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Train Lay Crowds to be Experts
training the general crowd to be a niche:
game in which players can carry out an expert
annotation tasks with some assistance
39. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Challenge 1: Typically undertaken in isolation
Challenge 2: Difficult to estimate & control the time to complete
Challenge 3: Difficult to assess & compare quality
Challenge 4: Demands continuous promotional effort
Challenge 5: Active learning (human-in-the-loop) needs different
expertise
Challenge 6: Challenging for institutions to incorporate
crowdsourcing results into their existing content infrastructure
Crowdsourcing Challenges
40. measure & assess
ensure impact
• be aware of the channel, e.g. Wikipedia,
Wikimedia, Facebook
41. Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo
(2011). On the role of user-generated metadata in audio visual collections. International
conference on Knowledge capture K-CAP '11, Pages 145-152
measure & assess
monitor progress
6 months 2 years
340,551 tags 36,981 tags
137.421 matches
602 items 1.782 items
555 registered players 2,017 users (taggers)
thousands of anonymous players
12,279 visits (3+ min online)
44,362 pageviews
42. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
user vocabulary
8% in professional vocabulary
23% in Dutch lexicon
89% found on Google
locations (7%)
engeland
persons (31%)
objects (57%)
measure & assess
evaluate content, compare crowds
88% of the tags useful for specific genres