SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Data Excellence:
Better Data for Better AI
ODSC 2020
Lora Aroyo
http://lora-aroyo.org
@laroyo
By Scanned from The Magic of M. C. Escher. (Harry N. Abrams, Inc. ISBN
0-8109-6720-0) by Justin Foote (talk)., Fair use,
https://en.wikipedia.org/w/index.php?curid=3955850
http://lora-aroyo.org @laroyo
TAKE HOME MESSAGE
2
data lifecycle - just like in software - is needed to
guide data research & development practices
data is the compass for AI - AI advances where
there is data
data is at the center - AI systems success
depends on the quality of their data
https://en.wikipedia.org/wiki/Metamorphosis_II
data quality must be addressed in AI practices
- multitude of notions of truth
- necessity for data quality standards
data lifecycle is the backbone for data
excellence tools and practices to stay ahead of
future unintended AI behaviours
http://lora-aroyo.org @laroyo 3
The Rise of the Machines
“AI Winter”
lab experiments
Expert Systems
small scale
experiments
http://lora-aroyo.org @laroyo 4
The Rise of the Machines
“AI Winter” → “AI Breakthroughs in Games”
IBM Watson Jeopardy
DeepMind AlphaGo
beat the humans
http://lora-aroyo.org @laroyo 5
The Rise of the Machines
“AI Winter” → “AI Breakthroughs in Games” → “Real World Tasks”
Health diagnostics
Flue prediction
Weather prediction
Text, Image and Video classification
Text Generation
Text Translation
Conversational AI
support the humans
http://lora-aroyo.org @laroyo 6
Mainstream Deployment of AI
“Real World Tasks” deployed in the wild → Unintended behaviors
Microsoft Tay bot
IBM Watson Oncology
Amazon Rekognition
Google Photos
Apple Face ID
Facebook chat bots
Various Speech Assistants
http://lora-aroyo.org @laroyo 7
getting computers to “see”
the diversity of data
data quality is essential for
guiding AI away from
unintended behaviours
Data is the compass for AI
http://lora-aroyo.org @laroyo 8
The Life of AI Data
“It exists!”
bootstrapping AI with data
Caltech101
LabelMe
Berkley-3D
https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
http://lora-aroyo.org @laroyo 9
The Life of AI Data
“It exists!” → “It is bigger!”
data hungry AI
ImageNet
SIFT10M
OpenImages
COCO
Web 1T 5-Gram
https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
http://lora-aroyo.org @laroyo 10
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
but before it got better ...
http://lora-aroyo.org @laroyo 11
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
but before it got better ...
it got worse ...
http://lora-aroyo.org @laroyo 12
Unintended Behaviors in AI
Adapted from “AI in the Open World: Discovering Blind Spots of AI”, SafeAI 2020, Ece Kumar
http://lora-aroyo.org @laroyo 13
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
but before it got better ...
reactive
data improvement
http://lora-aroyo.org @laroyo 14
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
to reach here
we need proactive
data improvement
http://lora-aroyo.org @laroyo 15
The Life of AI Data
Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 24, 2 (2009)
In the decade since then, the research community have done a lot
with quantity, but quality has been left behind
http://lora-aroyo.org @laroyo 16
In the 90’s we introduced standards
to achieve Software reliability
introduced software engineering lifecycle
- requirements, design and testing
established processes for software maintenance
- version control, sharing, documenting
established software quality metrics & processes
Ben Hutchinson, 2020
http://lora-aroyo.org @laroyo 17
Now we need the same for Data
introduce data lifecycle
- requirements, design and testing
establish processes for dataset maintenance
- version control, sharing, documenting
establish data quality metrics & processes
Ben Hutchinson, 2020
http://lora-aroyo.org @laroyo 18
data quality is typically not
caused by software bugs or just
by human errors
dataset are not easy to debug
data quality is typically result of:
- how well a dataset
represent the actual task
- how is the annotation done
- are the quality metrics
adequate
Data Quality is not easy ...
http://lora-aroyo.org @laroyo
it is not easy to give Y/N answer
for most of our AI tasks
19
Do these images depict a GUITAR ?
Data Quality is not only human error
✓
✓ ✓
✘
✘
✘✘✓
✓
http://lora-aroyo.org @laroyo 20
Do these images depict NEW ZEALAND ?
Data Quality should consider context of use
it is not easy to give Y/N answer
for most of our AI tasks
the answer typically depends on
the context, on the task, on the
usage, etc
✓ ✘
✓ ✓ ✘
✘
http://lora-aroyo.org @laroyo 21
Do these images depict a WEDDING ?
Data Quality should include real world diversity
it is not easy to give Y/N answer
for most of our AI tasks
the answer typically depends on
the context, on the task, on the
usage, etc
disagreement is signal for
diversity and should be included
in AI training
✓
✘
✓
✓
✘
✓
http://lora-aroyo.org @laroyo 22
Does the Sentence expresses
Does the sentence express TREATS relation between Chloroquine, Malaria?
Data Quality is difficult even with experts
For prevention of malaria, use only in individuals traveling to malarious
areas where CHLOROQUINE resistant P. falciparum MALARIA
has not been reported.
Rheumatoid arthritis and MALARIA have been treated
with CHLOROQUINE for decades.
Among 56 subjects reporting to a clinic with symptoms of MALARIA
53 (95%) had ordinarily effective levels of CHLOROQUINE in blood.
✓
✘
✓
http://lora-aroyo.org @laroyo
DISAGREEMENT IS SIGNAL
Variety of sources for disagreement
http://lora-aroyo.org @laroyo 24
Does the Sentence expresses
Model of semantic interpretation
TRIANGLE OF MEANING
“Three Sides of CrowdTruth”, Human Computation Journal, v1, 2014, L. Aroyo, C. Welty
Workshop on “Subjectivity, Ambiguity and Disagreement (SAD) in Crowdsourcing”, The Web Conference 2019, https://sadworkshop.wordpress.com/
Annotator disagreement
is signal, not noise
Annotator disagreement
is indicative of
variation in human
interpretation
Annotator disagreement
is indicative of
ambiguity, vagueness,
similarity, over-generality,
& quality
http://lora-aroyo.org @laroyo 25
Three sides of human interpretation
CROWDTRUTH Disagreement provides
guidance in task analysis:
● items with poor semantics
● items with salient terms
● items difficult to classify
● items that are ambiguous
● subjective annotations
● time-sensitive annotations
● difficult annotation tasks
● mis-translated annotations
● users with/without
specific knowledge
● communities of thought
● spammers
You can’t remove the corners…
“Three Sides of CrowdTruth”, Human Computation Journal, v1, 2014, L. Aroyo, C. Welty
http://lora-aroyo.org @laroyo
THE WORLD IS A SMOOTH SPECTRUM OF TRUTH
26
http://lora-aroyo.org @laroyo 27
One truth: knowledge acquisition typically assumes one
correct interpretation for every example
Experts rule: knowledge is captured from domain experts
One is enough: single expert’s knowledge is sufficient
Disagreement bad: when people disagree, they must not
understand the problem
Detailed explanations help: if examples cause
disagreement - adding instructions should help
Once done, forever valid: knowledge is not updated; new
data not aligned with old
All examples are created equal: triples are triples, one is
not more important than another, they are all either true or
false
… and we force the smoothness into a binary form
7 Myths about Human Annotation
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
http://lora-aroyo.org @laroyo 28
High Quality Data
represents a phenomena
accurately and consistently over time
and is replicable, reproducible,
and maintainable over time;
has empirical and explanatory power;
and is collected, stored, and used
responsibly.
Rigorous Evaluation of AI Systems workshop, 2019, Human Computation (HCOMP), http://eval.how/
Evaluating Evaluation for AI Systems workshop, 2020, Association for the Advancement of Artificial Intelligence (AAAI), http://eval.how/aaai-2020/
http://lora-aroyo.org @laroyo 29
From Data Quality to Data Excellence
Data Quality is
- a point-estimate of goodness of data
Data Excellence is
- the set of practices and tools that result in
high quality data
http://lora-aroyo.org @laroyo 30
How do we achieve Data Excellence?
Maintainability
Well documented datasets with
owners, which follow best practices
for data at any scale.
Reproducibility
Basic and critical regression tests
for datasets which suppo solid
conclusions for decision making.
Reliability
Datasets which are internally sound
and consistent; factors that a ect
the data are addressed or disclosed.
Fidelity
Data which faithfully, accurately, and
comprehensively represents the
captured phenomenon.
Validity
Datasets which explain aspects of
the phenomena that they represent
in terms of external measures.
1st International Workshop on Data Excellence: http://eval.how/dew2020/
Utility
Data which adequately and
accurately achieves the intended
product behavior.
http://lora-aroyo.org @laroyo 31
much like in software lifecycles, cutting corners at each stage
cascades to subsequent versions, which lead to technical debt
Dataset [Requirements] Analysis
Requirements Analysis
Stakeholder Input
Privacy, compliance
Trust & safety planning
Dataset Maintenance
Updating data over time
Extending to other languages
Version control
Storage and accessibility
Dataset Design
Data acquisition methodology
Rater guidelines
Construct validation
Dataset Testing
Representation metrics
Fairness metrics
Reliability metrics
Approval process
Dataset Implementation
Human labeled data
Logging interaction data
Data
Lifecycle
Ben Hutchinson, 2020
http://lora-aroyo.org @laroyo
TAKE HOME MESSAGE
32
https://en.wikipedia.org/wiki/Metamorphosis_II
data lifecycle - just like in software - is needed to
guide data research & development practices
data is the compass for AI - AI advances where
there is data
data is at the center - AI systems success
depends on the quality of their data
data quality must be addressed in AI practices
- multitude of notions of truth
- necessity for data quality standards
data lifecycle is the backbone for data
excellence tools and practices to stay ahead of
future unintended AI behaviours
http://lora-aroyo.org @laroyo 33
Collaborators
EthicalAI
Ben Hutchinson
Crowd Platform
Amol Wankhede
Anurag Batra
People + AI Research (PAIR)
Nithya Sambasivan
Kristen Olson
Shivani Kapania
Jess Holbrook
Andrew Zaldivar
Mahima Pushkarna
Maysam Moussalem
Praveen Paritosh Ka Wong
Lora Aroyo Devi Krishna
Likert team
Data Excellence:
Better Data for Better AI
ODSC 2020
Lora Aroyo
http://lora-aroyo.org
@laroyo
By Scanned from The Magic of M. C. Escher. (Harry N. Abrams, Inc. ISBN
0-8109-6720-0) by Justin Foote (talk)., Fair use,
https://en.wikipedia.org/w/index.php?curid=3955850
high profile data failure
not bugs in the software, not mistake of humans
problems caused by quality in the data
just like software quality in 90’s - the same has to happen with data
examples of questionable data
crowdtruth relation extraction
how would you annotate it
how do we know and measure the quality of the data
how well does it represent the actual task we are trying to solve
like software we need to establish data quality standards

Weitere ähnliche Inhalte

Was ist angesagt?

Arbeit 4.0: Megatrends digitaler Arbeit der Zukunft - 25 Thesen
Arbeit 4.0: Megatrends digitaler Arbeit der Zukunft - 25 ThesenArbeit 4.0: Megatrends digitaler Arbeit der Zukunft - 25 Thesen
Arbeit 4.0: Megatrends digitaler Arbeit der Zukunft - 25 ThesenDanielPoetzsch
 
Conversational AI is Now the Heart of Customer Experience.pdf
Conversational AI is Now the Heart of Customer Experience.pdfConversational AI is Now the Heart of Customer Experience.pdf
Conversational AI is Now the Heart of Customer Experience.pdfScallionRice
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMsSylvainGugger
 
The Biggest Artificial Intelligence Milestones Of The Decade So Far
The Biggest Artificial Intelligence Milestones Of The Decade So FarThe Biggest Artificial Intelligence Milestones Of The Decade So Far
The Biggest Artificial Intelligence Milestones Of The Decade So FarBernard Marr
 
Augmented intelligence is new way forward !
Augmented intelligence is new way forward ! Augmented intelligence is new way forward !
Augmented intelligence is new way forward ! Steve Ardire
 
실리콘 밸리 데이터 사이언티스트의 하루
실리콘 밸리 데이터 사이언티스트의 하루실리콘 밸리 데이터 사이언티스트의 하루
실리콘 밸리 데이터 사이언티스트의 하루Jaimie Kwon (권재명)
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 
Introduction to AI with Business Use Cases
Introduction to AI with Business Use CasesIntroduction to AI with Business Use Cases
Introduction to AI with Business Use CasesJack C Crawford
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMsLoic Merckel
 
Daniel Samaan: ChatGPT and the Future of Work
Daniel Samaan: ChatGPT and the Future of WorkDaniel Samaan: ChatGPT and the Future of Work
Daniel Samaan: ChatGPT and the Future of WorkEdunomica
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big dataSeta Wicaksana
 
Introduction - Lecture 1 - Advanced Topics in Information Systems (4016792ENR)
Introduction - Lecture 1 - Advanced Topics in Information Systems (4016792ENR)Introduction - Lecture 1 - Advanced Topics in Information Systems (4016792ENR)
Introduction - Lecture 1 - Advanced Topics in Information Systems (4016792ENR)Beat Signer
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
 
Introduction of Artificial Intelligence and Machine Learning
Introduction of Artificial Intelligence and Machine Learning Introduction of Artificial Intelligence and Machine Learning
Introduction of Artificial Intelligence and Machine Learning bigdata trunk
 
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdf
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdfai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdf
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdfjason668539
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detectionijtsrd
 

Was ist angesagt? (20)

Arbeit 4.0: Megatrends digitaler Arbeit der Zukunft - 25 Thesen
Arbeit 4.0: Megatrends digitaler Arbeit der Zukunft - 25 ThesenArbeit 4.0: Megatrends digitaler Arbeit der Zukunft - 25 Thesen
Arbeit 4.0: Megatrends digitaler Arbeit der Zukunft - 25 Thesen
 
Conversational AI is Now the Heart of Customer Experience.pdf
Conversational AI is Now the Heart of Customer Experience.pdfConversational AI is Now the Heart of Customer Experience.pdf
Conversational AI is Now the Heart of Customer Experience.pdf
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
The Biggest Artificial Intelligence Milestones Of The Decade So Far
The Biggest Artificial Intelligence Milestones Of The Decade So FarThe Biggest Artificial Intelligence Milestones Of The Decade So Far
The Biggest Artificial Intelligence Milestones Of The Decade So Far
 
Augmented intelligence is new way forward !
Augmented intelligence is new way forward ! Augmented intelligence is new way forward !
Augmented intelligence is new way forward !
 
실리콘 밸리 데이터 사이언티스트의 하루
실리콘 밸리 데이터 사이언티스트의 하루실리콘 밸리 데이터 사이언티스트의 하루
실리콘 밸리 데이터 사이언티스트의 하루
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Introduction to AI with Business Use Cases
Introduction to AI with Business Use CasesIntroduction to AI with Business Use Cases
Introduction to AI with Business Use Cases
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
Daniel Samaan: ChatGPT and the Future of Work
Daniel Samaan: ChatGPT and the Future of WorkDaniel Samaan: ChatGPT and the Future of Work
Daniel Samaan: ChatGPT and the Future of Work
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Introduction - Lecture 1 - Advanced Topics in Information Systems (4016792ENR)
Introduction - Lecture 1 - Advanced Topics in Information Systems (4016792ENR)Introduction - Lecture 1 - Advanced Topics in Information Systems (4016792ENR)
Introduction - Lecture 1 - Advanced Topics in Information Systems (4016792ENR)
 
AI and Accountability
AI and AccountabilityAI and Accountability
AI and Accountability
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Introduction of Artificial Intelligence and Machine Learning
Introduction of Artificial Intelligence and Machine Learning Introduction of Artificial Intelligence and Machine Learning
Introduction of Artificial Intelligence and Machine Learning
 
Andy Roy - Conversational AI - Why We Must Build.pdf
Andy Roy - Conversational AI - Why We Must Build.pdfAndy Roy - Conversational AI - Why We Must Build.pdf
Andy Roy - Conversational AI - Why We Must Build.pdf
 
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdf
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdfai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdf
ai-powered-marketing-and-sales-reach-new-heights-with-generative-ai.pdf
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 

Ähnlich wie Data excellence: Better data for better AI

Technology Governance & Migration In The AI Era
Technology Governance & Migration In The AI EraTechnology Governance & Migration In The AI Era
Technology Governance & Migration In The AI Era2toLead Limited
 
Knowledge Graphs, Ontologies, and AI Applications
Knowledge Graphs, Ontologies, and AI ApplicationsKnowledge Graphs, Ontologies, and AI Applications
Knowledge Graphs, Ontologies, and AI ApplicationsEarley Information Science
 
UX in the Age of AI: Leading with Design
UX in the Age of AI: Leading with DesignUX in the Age of AI: Leading with Design
UX in the Age of AI: Leading with DesignUXPA International
 
UX in the Age of AI: Leading with Design UXPA2018
UX in the Age of AI: Leading with Design UXPA2018UX in the Age of AI: Leading with Design UXPA2018
UX in the Age of AI: Leading with Design UXPA2018Carol Smith
 
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingUnderstanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingDATAVERSITY
 
Designing Trustable AI Experiences at IxDA Pittsburgh, Jan 2019
Designing Trustable AI Experiences at IxDA Pittsburgh, Jan 2019Designing Trustable AI Experiences at IxDA Pittsburgh, Jan 2019
Designing Trustable AI Experiences at IxDA Pittsburgh, Jan 2019Carol Smith
 
Designing Trustable AI Experiences at World Usability Day in Cleveland
Designing Trustable AI Experiences at World Usability Day in ClevelandDesigning Trustable AI Experiences at World Usability Day in Cleveland
Designing Trustable AI Experiences at World Usability Day in ClevelandCarol Smith
 
Artificial Intelligence (AI) – Powering Data and Conversations.pptx
Artificial Intelligence (AI) – Powering Data and Conversations.pptxArtificial Intelligence (AI) – Powering Data and Conversations.pptx
Artificial Intelligence (AI) – Powering Data and Conversations.pptxBrian Pichman
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionRamkumar Ravichandran
 
How do we train AI to be Ethical and Unbiased?
How do we train AI to be Ethical and Unbiased?How do we train AI to be Ethical and Unbiased?
How do we train AI to be Ethical and Unbiased?Mark Borg
 
Interactive XAI for ODSC East 2023
Interactive XAI for ODSC East 2023Interactive XAI for ODSC East 2023
Interactive XAI for ODSC East 2023Meg Kurdziolek
 
Catalyze Webcast - Five Myths Of RIA With Laurie Gray - 031808
Catalyze Webcast - Five Myths Of RIA With Laurie Gray - 031808Catalyze Webcast - Five Myths Of RIA With Laurie Gray - 031808
Catalyze Webcast - Five Myths Of RIA With Laurie Gray - 031808Tom Humbarger
 
IA in the Age of AI: Embracing Abstraction and Change at IA Summit 2018
IA in the Age of AI: Embracing Abstraction and Change at IA Summit 2018IA in the Age of AI: Embracing Abstraction and Change at IA Summit 2018
IA in the Age of AI: Embracing Abstraction and Change at IA Summit 2018Carol Smith
 
Streamlining Information Flows In The Digital Workplace
Streamlining Information Flows In The Digital WorkplaceStreamlining Information Flows In The Digital Workplace
Streamlining Information Flows In The Digital WorkplaceEarley Information Science
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceAnimesh Singh
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
 
Designing AI for Humanity at dmi:Design Leadership Conference in Boston
Designing AI for Humanity at dmi:Design Leadership Conference in BostonDesigning AI for Humanity at dmi:Design Leadership Conference in Boston
Designing AI for Humanity at dmi:Design Leadership Conference in BostonCarol Smith
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurityscoopnewsgroup
 

Ähnlich wie Data excellence: Better data for better AI (20)

Technology Governance & Migration In The AI Era
Technology Governance & Migration In The AI EraTechnology Governance & Migration In The AI Era
Technology Governance & Migration In The AI Era
 
Knowledge Graphs, Ontologies, and AI Applications
Knowledge Graphs, Ontologies, and AI ApplicationsKnowledge Graphs, Ontologies, and AI Applications
Knowledge Graphs, Ontologies, and AI Applications
 
UX in the Age of AI: Leading with Design
UX in the Age of AI: Leading with DesignUX in the Age of AI: Leading with Design
UX in the Age of AI: Leading with Design
 
UX in the Age of AI: Leading with Design UXPA2018
UX in the Age of AI: Leading with Design UXPA2018UX in the Age of AI: Leading with Design UXPA2018
UX in the Age of AI: Leading with Design UXPA2018
 
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingUnderstanding the New World of Cognitive Computing
Understanding the New World of Cognitive Computing
 
Designing Trustable AI Experiences at IxDA Pittsburgh, Jan 2019
Designing Trustable AI Experiences at IxDA Pittsburgh, Jan 2019Designing Trustable AI Experiences at IxDA Pittsburgh, Jan 2019
Designing Trustable AI Experiences at IxDA Pittsburgh, Jan 2019
 
Designing Trustable AI Experiences at World Usability Day in Cleveland
Designing Trustable AI Experiences at World Usability Day in ClevelandDesigning Trustable AI Experiences at World Usability Day in Cleveland
Designing Trustable AI Experiences at World Usability Day in Cleveland
 
EIS-Webinar-data.world-collab-2023-02-15.pptx
EIS-Webinar-data.world-collab-2023-02-15.pptxEIS-Webinar-data.world-collab-2023-02-15.pptx
EIS-Webinar-data.world-collab-2023-02-15.pptx
 
Artificial Intelligence (AI) – Powering Data and Conversations.pptx
Artificial Intelligence (AI) – Powering Data and Conversations.pptxArtificial Intelligence (AI) – Powering Data and Conversations.pptx
Artificial Intelligence (AI) – Powering Data and Conversations.pptx
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolution
 
How do we train AI to be Ethical and Unbiased?
How do we train AI to be Ethical and Unbiased?How do we train AI to be Ethical and Unbiased?
How do we train AI to be Ethical and Unbiased?
 
Interactive XAI for ODSC East 2023
Interactive XAI for ODSC East 2023Interactive XAI for ODSC East 2023
Interactive XAI for ODSC East 2023
 
Catalyze Webcast - Five Myths Of RIA With Laurie Gray - 031808
Catalyze Webcast - Five Myths Of RIA With Laurie Gray - 031808Catalyze Webcast - Five Myths Of RIA With Laurie Gray - 031808
Catalyze Webcast - Five Myths Of RIA With Laurie Gray - 031808
 
IA in the Age of AI: Embracing Abstraction and Change at IA Summit 2018
IA in the Age of AI: Embracing Abstraction and Change at IA Summit 2018IA in the Age of AI: Embracing Abstraction and Change at IA Summit 2018
IA in the Age of AI: Embracing Abstraction and Change at IA Summit 2018
 
Streamlining Information Flows In The Digital Workplace
Streamlining Information Flows In The Digital WorkplaceStreamlining Information Flows In The Digital Workplace
Streamlining Information Flows In The Digital Workplace
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
 
Designing AI for Humanity at dmi:Design Leadership Conference in Boston
Designing AI for Humanity at dmi:Design Leadership Conference in BostonDesigning AI for Humanity at dmi:Design Leadership Conference in Boston
Designing AI for Humanity at dmi:Design Leadership Conference in Boston
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurity
 

Mehr von Lora Aroyo

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfLora Aroyo
 
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningLora Aroyo
 
Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Lora Aroyo
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumLora Aroyo
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorLora Aroyo
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataLora Aroyo
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumLora Aroyo
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18Lora Aroyo
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsLora Aroyo
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesLora Aroyo
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the LoopLora Aroyo
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoLora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...Lora Aroyo
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Lora Aroyo
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchLora Aroyo
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital AgeLora Aroyo
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to SnapchatLora Aroyo
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyLora Aroyo
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...Lora Aroyo
 

Mehr von Lora Aroyo (20)

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
 
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
 
Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH Symposium
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP Demonstrator
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithms
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & Machines
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening Ceremony
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
 

Kürzlich hochgeladen

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Data excellence: Better data for better AI

  • 1. Data Excellence: Better Data for Better AI ODSC 2020 Lora Aroyo http://lora-aroyo.org @laroyo By Scanned from The Magic of M. C. Escher. (Harry N. Abrams, Inc. ISBN 0-8109-6720-0) by Justin Foote (talk)., Fair use, https://en.wikipedia.org/w/index.php?curid=3955850
  • 2. http://lora-aroyo.org @laroyo TAKE HOME MESSAGE 2 data lifecycle - just like in software - is needed to guide data research & development practices data is the compass for AI - AI advances where there is data data is at the center - AI systems success depends on the quality of their data https://en.wikipedia.org/wiki/Metamorphosis_II data quality must be addressed in AI practices - multitude of notions of truth - necessity for data quality standards data lifecycle is the backbone for data excellence tools and practices to stay ahead of future unintended AI behaviours
  • 3. http://lora-aroyo.org @laroyo 3 The Rise of the Machines “AI Winter” lab experiments Expert Systems small scale experiments
  • 4. http://lora-aroyo.org @laroyo 4 The Rise of the Machines “AI Winter” → “AI Breakthroughs in Games” IBM Watson Jeopardy DeepMind AlphaGo beat the humans
  • 5. http://lora-aroyo.org @laroyo 5 The Rise of the Machines “AI Winter” → “AI Breakthroughs in Games” → “Real World Tasks” Health diagnostics Flue prediction Weather prediction Text, Image and Video classification Text Generation Text Translation Conversational AI support the humans
  • 6. http://lora-aroyo.org @laroyo 6 Mainstream Deployment of AI “Real World Tasks” deployed in the wild → Unintended behaviors Microsoft Tay bot IBM Watson Oncology Amazon Rekognition Google Photos Apple Face ID Facebook chat bots Various Speech Assistants
  • 7. http://lora-aroyo.org @laroyo 7 getting computers to “see” the diversity of data data quality is essential for guiding AI away from unintended behaviours Data is the compass for AI
  • 8. http://lora-aroyo.org @laroyo 8 The Life of AI Data “It exists!” bootstrapping AI with data Caltech101 LabelMe Berkley-3D https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
  • 9. http://lora-aroyo.org @laroyo 9 The Life of AI Data “It exists!” → “It is bigger!” data hungry AI ImageNet SIFT10M OpenImages COCO Web 1T 5-Gram https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
  • 10. http://lora-aroyo.org @laroyo 10 The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” but before it got better ...
  • 11. http://lora-aroyo.org @laroyo 11 The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” but before it got better ... it got worse ...
  • 12. http://lora-aroyo.org @laroyo 12 Unintended Behaviors in AI Adapted from “AI in the Open World: Discovering Blind Spots of AI”, SafeAI 2020, Ece Kumar
  • 13. http://lora-aroyo.org @laroyo 13 The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” but before it got better ... reactive data improvement
  • 14. http://lora-aroyo.org @laroyo 14 The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” to reach here we need proactive data improvement
  • 15. http://lora-aroyo.org @laroyo 15 The Life of AI Data Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 24, 2 (2009) In the decade since then, the research community have done a lot with quantity, but quality has been left behind
  • 16. http://lora-aroyo.org @laroyo 16 In the 90’s we introduced standards to achieve Software reliability introduced software engineering lifecycle - requirements, design and testing established processes for software maintenance - version control, sharing, documenting established software quality metrics & processes Ben Hutchinson, 2020
  • 17. http://lora-aroyo.org @laroyo 17 Now we need the same for Data introduce data lifecycle - requirements, design and testing establish processes for dataset maintenance - version control, sharing, documenting establish data quality metrics & processes Ben Hutchinson, 2020
  • 18. http://lora-aroyo.org @laroyo 18 data quality is typically not caused by software bugs or just by human errors dataset are not easy to debug data quality is typically result of: - how well a dataset represent the actual task - how is the annotation done - are the quality metrics adequate Data Quality is not easy ...
  • 19. http://lora-aroyo.org @laroyo it is not easy to give Y/N answer for most of our AI tasks 19 Do these images depict a GUITAR ? Data Quality is not only human error ✓ ✓ ✓ ✘ ✘ ✘✘✓ ✓
  • 20. http://lora-aroyo.org @laroyo 20 Do these images depict NEW ZEALAND ? Data Quality should consider context of use it is not easy to give Y/N answer for most of our AI tasks the answer typically depends on the context, on the task, on the usage, etc ✓ ✘ ✓ ✓ ✘ ✘
  • 21. http://lora-aroyo.org @laroyo 21 Do these images depict a WEDDING ? Data Quality should include real world diversity it is not easy to give Y/N answer for most of our AI tasks the answer typically depends on the context, on the task, on the usage, etc disagreement is signal for diversity and should be included in AI training ✓ ✘ ✓ ✓ ✘ ✓
  • 22. http://lora-aroyo.org @laroyo 22 Does the Sentence expresses Does the sentence express TREATS relation between Chloroquine, Malaria? Data Quality is difficult even with experts For prevention of malaria, use only in individuals traveling to malarious areas where CHLOROQUINE resistant P. falciparum MALARIA has not been reported. Rheumatoid arthritis and MALARIA have been treated with CHLOROQUINE for decades. Among 56 subjects reporting to a clinic with symptoms of MALARIA 53 (95%) had ordinarily effective levels of CHLOROQUINE in blood. ✓ ✘ ✓
  • 23. http://lora-aroyo.org @laroyo DISAGREEMENT IS SIGNAL Variety of sources for disagreement
  • 24. http://lora-aroyo.org @laroyo 24 Does the Sentence expresses Model of semantic interpretation TRIANGLE OF MEANING “Three Sides of CrowdTruth”, Human Computation Journal, v1, 2014, L. Aroyo, C. Welty Workshop on “Subjectivity, Ambiguity and Disagreement (SAD) in Crowdsourcing”, The Web Conference 2019, https://sadworkshop.wordpress.com/ Annotator disagreement is signal, not noise Annotator disagreement is indicative of variation in human interpretation Annotator disagreement is indicative of ambiguity, vagueness, similarity, over-generality, & quality
  • 25. http://lora-aroyo.org @laroyo 25 Three sides of human interpretation CROWDTRUTH Disagreement provides guidance in task analysis: ● items with poor semantics ● items with salient terms ● items difficult to classify ● items that are ambiguous ● subjective annotations ● time-sensitive annotations ● difficult annotation tasks ● mis-translated annotations ● users with/without specific knowledge ● communities of thought ● spammers You can’t remove the corners… “Three Sides of CrowdTruth”, Human Computation Journal, v1, 2014, L. Aroyo, C. Welty
  • 26. http://lora-aroyo.org @laroyo THE WORLD IS A SMOOTH SPECTRUM OF TRUTH 26
  • 27. http://lora-aroyo.org @laroyo 27 One truth: knowledge acquisition typically assumes one correct interpretation for every example Experts rule: knowledge is captured from domain experts One is enough: single expert’s knowledge is sufficient Disagreement bad: when people disagree, they must not understand the problem Detailed explanations help: if examples cause disagreement - adding instructions should help Once done, forever valid: knowledge is not updated; new data not aligned with old All examples are created equal: triples are triples, one is not more important than another, they are all either true or false … and we force the smoothness into a binary form 7 Myths about Human Annotation “Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
  • 28. http://lora-aroyo.org @laroyo 28 High Quality Data represents a phenomena accurately and consistently over time and is replicable, reproducible, and maintainable over time; has empirical and explanatory power; and is collected, stored, and used responsibly. Rigorous Evaluation of AI Systems workshop, 2019, Human Computation (HCOMP), http://eval.how/ Evaluating Evaluation for AI Systems workshop, 2020, Association for the Advancement of Artificial Intelligence (AAAI), http://eval.how/aaai-2020/
  • 29. http://lora-aroyo.org @laroyo 29 From Data Quality to Data Excellence Data Quality is - a point-estimate of goodness of data Data Excellence is - the set of practices and tools that result in high quality data
  • 30. http://lora-aroyo.org @laroyo 30 How do we achieve Data Excellence? Maintainability Well documented datasets with owners, which follow best practices for data at any scale. Reproducibility Basic and critical regression tests for datasets which suppo solid conclusions for decision making. Reliability Datasets which are internally sound and consistent; factors that a ect the data are addressed or disclosed. Fidelity Data which faithfully, accurately, and comprehensively represents the captured phenomenon. Validity Datasets which explain aspects of the phenomena that they represent in terms of external measures. 1st International Workshop on Data Excellence: http://eval.how/dew2020/ Utility Data which adequately and accurately achieves the intended product behavior.
  • 31. http://lora-aroyo.org @laroyo 31 much like in software lifecycles, cutting corners at each stage cascades to subsequent versions, which lead to technical debt Dataset [Requirements] Analysis Requirements Analysis Stakeholder Input Privacy, compliance Trust & safety planning Dataset Maintenance Updating data over time Extending to other languages Version control Storage and accessibility Dataset Design Data acquisition methodology Rater guidelines Construct validation Dataset Testing Representation metrics Fairness metrics Reliability metrics Approval process Dataset Implementation Human labeled data Logging interaction data Data Lifecycle Ben Hutchinson, 2020
  • 32. http://lora-aroyo.org @laroyo TAKE HOME MESSAGE 32 https://en.wikipedia.org/wiki/Metamorphosis_II data lifecycle - just like in software - is needed to guide data research & development practices data is the compass for AI - AI advances where there is data data is at the center - AI systems success depends on the quality of their data data quality must be addressed in AI practices - multitude of notions of truth - necessity for data quality standards data lifecycle is the backbone for data excellence tools and practices to stay ahead of future unintended AI behaviours
  • 33. http://lora-aroyo.org @laroyo 33 Collaborators EthicalAI Ben Hutchinson Crowd Platform Amol Wankhede Anurag Batra People + AI Research (PAIR) Nithya Sambasivan Kristen Olson Shivani Kapania Jess Holbrook Andrew Zaldivar Mahima Pushkarna Maysam Moussalem Praveen Paritosh Ka Wong Lora Aroyo Devi Krishna Likert team
  • 34. Data Excellence: Better Data for Better AI ODSC 2020 Lora Aroyo http://lora-aroyo.org @laroyo By Scanned from The Magic of M. C. Escher. (Harry N. Abrams, Inc. ISBN 0-8109-6720-0) by Justin Foote (talk)., Fair use, https://en.wikipedia.org/w/index.php?curid=3955850
  • 35. high profile data failure not bugs in the software, not mistake of humans problems caused by quality in the data just like software quality in 90’s - the same has to happen with data examples of questionable data crowdtruth relation extraction how would you annotate it how do we know and measure the quality of the data how well does it represent the actual task we are trying to solve like software we need to establish data quality standards