SlideShare ist ein Scribd-Unternehmen logo
1 von 123
Interpretting Embeddings
with Comparison
Michael Gleicher
Department of Computer Sciences
University ofWisconsin - Madison
Two talks (or five) in one!
How do we deal with
embeddings of text data?
Alexander, et. al. Serendip: Topic Model-Driven
Visual Exploration of Text Corpora.VAST ‘14.
Alexander & Gleicher. Task-Driven Comparison
of Topic Models.VAST ‘15
Heimerl & Gleicher. Interactive Analysis of
Word Vector Embeddings. EuroVis ’18.
Heimerl, et al. Interactive Visual Comparison of
Object Embeddings. (TVCG ‘21)
How do we use comparison as a
tool for hard data problems?
Gleicher. Considerations for Visualizing
Comparison. InfoVis 2017.
How do we deal with embeddings
of text data?
Alexander, et. al. Serendip: Topic Model-
Driven Visual Exploration of Text Corpora.
VAST ‘14.
Alexander & Gleicher. Task-Driven Comparison
of Topic Models.VAST ‘15
Heimerl & Gleicher. Interactive Analysis of
Word Vector Embeddings. EuroVis ’18.
Heimerl, et al. Interactive Visual Comparison of
Object Embeddings.
TVCG ‘21
Eric Alexander
Assistant Prof.
Carleton College
Florian Heimerl
Post-Doc
University ofWisconsin
Michael Gleicher
University ofWisconsin
Visual Computing Group
Human Graphics Interaction
authoring pictures, videos, animations
Human Robot Interaction
robots!
Human Data Interaction
visualization, visual analytics, interactive learning
`
DataVisualization
Visual Analytics
Human Data Interaction
Interpretting Embeddings
with Comparison
Michael Gleicher
Department of Computer Sciences
University ofWisconsin - Madison
Two different stories

Embeddings are a great way to
analyze text collections!
But they are hard to interpret!
Use comparison as a strategy
Comparison is a great way to
think about analysis problems!
But it’s abstract – need examples!
Use embeddings as a case study
Interpretting Embeddings
with Comparison
Michael Gleicher
Department of Computer Sciences
University ofWisconsin - Madison
What is an embedding?
General mathematics:
Place a smaller structure into a
larger structure
Computer science:
Place a discrete set of objects into
a vector space
Encode relationships between
objects
Paris
4,8,1,3,

Atlanta
5,2,1,7,

New York
4,8,1,3,

London
5,2,1,7,

Boston
3,2,5,1,

Tokyo
9,2,6,4,

Beijing
7,3,2,7,

San Jose
5,2,1,7,

Jakarta
3,2,5,1,

Sydney
9,2,6,4,

Munich
7,3,2,7,

High Dimensional Data
Objects have associated Vectors
Kinds of relationships in embeddings
Distance
A is closer to B than to C
B
C
D
A
E
Linear Structure
A is to B as C is to D
Semantic Directions
A is more X than C
B
C
D
A
F
F
E
B
A
C
D
Relationships are interesting even if global positions are not
Embeddings
We care about relationships not values
The coordinates (axes) may have no meaning
A B
1
Very different shapes?
Similar neighbors
(points close in one are close in the other)
Embeddings for analyzingText Corpora
Embed Documents
Similar documents are close
Estimate similar meaning by
word usage statistics in a corpus
Embed Words
Similar words are close
Estimate similar meaning by
word usage statistics in a corpus
Document Similarity
Documents that use similar words
(probably) have similar content
Idea from the 1960s
https://www.nytimes.com/2019/01/02/obituaries/karen-sparck-jones-overlooked.html
Similarity based on word counts?
Need to deal with similar words (synonyms)
Need to reduce the number of words (too high dimensional)
Find groups of words
Pre-defined dictionaries (word types)
word usage analysis
Statistical groupings (words that tend to go together)
Topic Modeling
Topic modeling
curse
kill’d
revenge
deed
love
woo
fair
oaths
lead
farewell
fast
meet
Word vector embeddings
Place words in a high-dimensional vector space
Words similar in meaning should be close in space
Infer similarity by distributional semantics:
similar context implies similar meaning
Construct embeddings by processing a corpus of text
my pet cat is brown
my pet dog is brown
my big car is brown
Why useWordVector Embeddings?
Learn about
Language or Corpora (Texts)
Find similar words/synonyms
Track changes of word usage
Exploring polysemy
Creating lexical resources
Evidence of bias


Natural Language Applications
Pre-Processing
Translation
Sentiment Analysis
Interpretation


Several ways to build word embeddings
Word2Vec
Skip-gram model
Neural embedding
GLoVE
Co-occurrence model
Factor matrix by optimization
Several ways to build topic models
(document embeddings)
LDA (Latent Dirichlet Allocation)
Standard algorithm
Iterative and probabilistic
NMF (Non-Negative Matrix Factors)
Fast Algorithm
Less widely adopted
Why Interpret Embeddings?
Gain insight into a model
Did you build a good model?
Gain insight into the modeling process
Which methods are better?
Gain insight about the underlying data
What does the model tell us?
Challenges ofWordVector Embeddings
Challenges of Document Embeddings
Large numbers ofWords (or documents)
High-dimensional spaces
Complex relationships – meaningless positions (or questionable)
Complex processes for building embeddings
Complex downstream applications
No ground truth - subjective aspects
Interactive Analysis ofWord
Vector Embeddings
Florian Heimerl and Michael Gleicher
Department of Computer Sciences
University ofWisconsin – Madison
EuroVis 2018
Summary:
Word vector embeddings offer unique challenges
Task analysis of needs 3 Designs for unmet needs
Buddy Plots
ConceptAxis Plots
Co-occurance Matrices
Task Analysis:
What do people do withWord Embeddings?
Literature survey
111 papers from diverse communities
Consider use cases in Linguistics, HCI, Digitial Humanities, etc.
Augment this list by extrapolation
what tasks would users want – but aren’t doing yet
Use cases suggest tasks
Learn about
Language or Corpora (Texts)
Find similar words/synonyms
Track changes of word usage
Exploring polysemy
Creating lexical resources
Natural Language Applications
Pre-Processing
Translation
Sentiment Analysis
Interpretation
Evaluation
Intrinsic (good embedding?)
Extrinsic (applications success?)
Interpretation
Identify items of interest
Probe values of interest
LinguisticTasks and Characteristics
We identified 7 distinct linguistic tasks within the literature
4 characteristics pertinent to those tasks: similarity, arithmetic
structures, concept axis, and co-occurrences
28
PriorWork:
WordVector EmbeddingVisualizations
Liu et al., InfoVis 2017
Analogy Relationships
Smilkov et al., 2016
Distance Relationships
Rong and Adar, 2016
Training Process
Tasks and Characteristics vs.Visualizations
Rank word pairs similarity
View neighbors similarity
Select synonyms similarity
Compare concepts average, similarity
Find analogies offset, similarity
Project on Concept concept axis
Predict Contexts co-oc. probability
Smilkov, et al 2016
Liu, et al 2017
Tasks and Characteristics vs.Visualizations
Rank word pairs similarity
View neighbors similarity
Select synonyms similarity
Compare concepts average, similarity
Find analogies offset, similarity
Project on Conceptconcept axis
Predict Contexts co-oc. probability
Smilkov, et al 2016
Liu, et al 2017
Tasks we seek designs for in this paper
Buddy Plots
ConceptAxis Plots
Co-occurance Matrices
1. Similarities (local distances)
2. Co-Occurrances
3. Concept Axes
Buddy Plots
ConceptAxis Plots
Co-occurance Matrices
1. Similarities (local distances)
2. Co-Occurrances
3. Concept Axes
Concept Axes:
Understanding Semantic Directions
Opposing concepts
make an axis
Food
Pet
Concept Axes:
Understanding ConceptAxes
Define an axis from one
concept to another
Food
Pet
Dog
Cat
Goldfish
Chicken
Cow
Trout
Ways to define axes
Vector between two concepts
Interaxis - Kim et al., 2015
Classifier between two groups
Explainers – Gleicher, 2013
Food
Pet
Dog
Cat
Goldfish
Chicken
Cow
Trout
Multiple ConceptAxes
Use multi-variate plots
2D = Scatterplot
Pet Food
Small
Big
man woman
academic
worker
Buddy Plots
ConceptAxis Plots
Co-occurance Matrices
1. Similarities (local distances)
2. Co-Occurrances
3. Concept Axes
Similarities:
Understanding local distances
Distances are meaningful
even if absolute values are not
What is close to a word?
Are there groups of words that are similar?
Ordered lists are useful
Density (how many can you show)
Sense of relative distances
Comparison between words
Embedding Projector
Smilkov, et al. 2016
Buddy Plots (1D lists)
Alexander and Gleicher, 2016 – forTopic Models
Map distance (to selected reference) to horizontal axis
Reference
object
Closest
to ref
Not dimensionality reduction?
Map distance to word to horizontal axis
Focus on a single point – other relations not preserved
?
Embedding Projector
Smilkov, et al. 2016
Stacked/Chained buddy plots
Use color to encode distance
Stacked/Chained buddy plots
Use color to encode distance in the reference row
A
A
B
B
Stacked/Chained buddy plots
Use color to encode distance in the reference
Same word
 different embedding
Broadcast
Comparison?
For interpretation
Do the differences show
something?
Word meaning change
(between corpora)
Correlations / Biases
(within corpora)
For modeling (selection)
How are these models different?
Which is better?
Comparison?
Comparison is an important task
in Data Analysis
Comparison is special!
(since it involves multiple things)
It’s an important special case
It deserves special attention
Almost all Data Analysis can be
viewed as comparison
Comparison is a lens
(to look at problems)
It’s a generally useful tool
It deserves special attention
How do I think about
comparison?
to help me develop tools to help people do it
My CS838 (Data Visualization) class
March 11, 2010
Maybe I shouldn’t publish this

It’s my secret weapon to devise new tools

Considerations for Visualizing
Comparison
Michael Gleicher
Department of Computer Sciences
University of Wisconsin – Madison
InfoVis 2017
What is this paper?
Considerations for Visualizing Comparison
4 questions to ask when designing a
visualization or tool ?
To examine (two or more
objects, ideas, people, etc.) in
order to note similarities and
differences
To mark or point out the
similarities and differences of
(two or more things)
What is the comparison? Why is it hard?
How to address the challenges? Which visual design to use?
Comparative Elements
Targets
Actions
Comparative Challenges
Number of Targets
Large or Complex Targets
Complex Relationships
Scalability Strategies
Scan Sequentially
Select Subset
Summarize Somehow
Comparative Designs
Juxtapose
Superpose
Explicit Encoding
What is the comparison? Why is it hard?
How to address the challenges? Which visual design to use?
Wrong Question:
Is my problem Comparison?
Just about anything can be viewed as comparison
Not everything benefits from being viewed this way
Serendip, VAST ‘14
Alexander, E., Kohlmann, J., Valenza, R., Witmore, M., &
Gleicher, M. (2014). Serendip: Topic model-driven visual
exploration of text corpora. In 2014 IEEE Conference on
Visual Analytics Science and Technology (VAST)
Serendip: Topic model-driven visual
exploration of text corpora
Use a topic model to guide exploration of a text corpus
Find patterns and connect back to specifics
Documents Model Documents/Passages
Goals
Support inquiry across levels of abstraction
Corpus Document Passage Word
Goals
Support inquiry across levels of abstraction
Combat issues of scale in the data
Many documents Long documents
Goals
Support inquiry across levels of abstraction
Combat issues of scale
Promote serendipitous discovery:
Multiple entry points
Highlight adjacency
Flexible exploration
A. Thudt, U. Hinrichs, S. Carpendale 2012
Basics of Serendip
Three different (interlinked views)
Corpus Document Passage Word
CorpusViewer TextViewer RankViewer
Is this comparison?
No!
A tool for exploring a topic
model!
We didn’t describe it as comparison
Tool for looking at one topic model
Unclear how users think about it
Yes!
Comparison thinking really
helped!
We did think about comparison
Tool for using topic models
Our users had comparison tasks
Is this comparison? I don’t care!
No!
A tool for exploring a topic model!
We didn’t describe it as comparison
Tool for looking at one topic model
Unclear how users think about it
Yes!
Comparison thinking really helped!
We did think about comparison
Tool for using topic models
Our users had comparison tasks
A survey of comparison
would have missed this.
It’s a great example of
comparison ideas
Comparative Elements
Targets
Actions
Comparative Challenges
Number of Targets
Large or Complex Targets
Complex Relationships
Scalability Strategies
Scan Sequentially
Select Subset
Summarize Somehow
Comparative Designs
Juxtapose
Superpose
Explicit Encoding
What is the comparison? Why is it hard?
How to address the challenges? Which visual design to use?
Question 1:
What is the comparison?
What are the elements of the comparison?
The Elements of a Comparison . . .
Targets — Set of things being compared
Action — What to do with the relationship among them
To mark or point out the
similarities and differences of
(two or more things)
To examine
(two or more objects, ideas, etc.)
in order to note
similarities and differences
Question 1A:
The Elements: Targets
Do you know what you are comparing?
Explicit Comparisons – the system has the set of targets
Implicit Comparisons – the system may not know all the targets
compare against an implicit baseline
compare against the user’s knowledge
compare with targets only the user knows
Question 1A:
The Elements: Targets
What is being compared? – Comparison Targets
Does the model match my expectations?
What documents are similar?
How do groups of documents differ?
What words indicate these differences?
How are words used differently?
Where in texts are these differences?
Do the patterns match other things I know?
Question 1B:
The Elements: Actions
Verbs on relationships
Try to be more specific than “examine” or “compare”
Truth in Advertising: I didn’t have this worked out in 2010
Question 1B:
The Elements: Actions
What to do with the relationship? Comparison Actions (Verbs)
Does the model match my expectations? Measure/Quantify relationship
What documents are similar? Identify similar things
How do groups of documents differ? Measure/Quantify relationship
What words indicate these differences? Dissect a difference
How are words used differently? Identify meaningful differences
Where in texts are these differences? Contextualize the relationships
Do the patterns match other things I know? Identify similar things
Comparative Elements
Targets
Actions
Comparative Challenges
Number of Targets
Large or Complex Targets
Complex Relationships
Scalability Strategies
Scan Sequentially
Select Subset
Summarize Somehow
Comparative Designs
Juxtapose
Superpose
Explicit Encoding
What is the comparison? Why is it hard?
How to address the challenges? Which visual design to use?
Question 2:
Why is this comparison hard?
If it isn’t hard, you probably don’t need to think about it (much)
Abstractly
Too many targets to compare
Large or Complex Targets
Complex Relationships
Serendip
Challenges of Scale!
Lots of documents
Long / complex documents
Complicated models
Many different comparisons
Challenges from the kind (not scale)
‱ Hard target types (implicit)
‱ Hard action types (dissection)
‱ Hard combinations (dissect implicit)
Tasks influence scalability challenges
Solutions must respond to both!
Only Scalability Challenges?
Serendip Comparison Example:
Where does this happen? (contextualize)
Task Challenge:
Contextualize – fit user knowledge
Strategy: show in context
Design: use text as scaffold
Scalability Challenge:
Long Documents
Strategy: summarize
Design: overview + detail
Line-graph Overview
Combating scale
Showing trends
Affording navigation
Document
Comparative Elements
Targets
Actions
Comparative Challenges
Number of Targets
Large or Complex Targets
Complex Relationships
Scalability Strategies
Scan Sequentially
Select Subset
Summarize Somehow
Comparative Designs
Juxtapose
Superpose
Explicit Encoding
What is the comparison? Why is it hard?
How to address the challenges? Which visual design to use?
Question 3:
What is your strategy for those challenges?
Abstractly
Scan Sequentially
Select Subset
Summarize Somehow
Sarikaya, Gleicher & Szafir. (2018). Design Factors
for Summary Visualization in Visual Analytics.
Computer Graphics Forum, 37(3), 145–156.
EuroVis 2018.
Scalability Strategies!
Serendip Comparison Example:
Compare Groups
Task Challenge:
Implicit targets – what groups?
Strategy: make explicit
Design: user specifies groups
Scalability Challenge:
Lots of documents
Strategy: summarize
Design: how to present statistics?
Comparative Elements
Targets
Actions
Comparative Challenges
Number of Targets
Large or Complex Targets
Complex Relationships
Scalability Strategies
Scan Sequentially
Select Subset
Summarize Somehow
Comparative Designs
Juxtapose
Superpose
Explicit Encoding
What is the comparison? Why is it hard?
How to address the challenges? Which visual design to use?
Question 4:
What Visual Design for Comparison?
Abstractly
Juxtaposition
Superposition
Explicit Encoding
Gleicher, M., Albers, D., Walker, R., Jusufi, I., Hansen, C. D., & Roberts, J. C. (2011).
Visual comparison for information visualization. Information Visualization,
10(4), 289–309.
Comparative Elements
Targets
Actions
Comparative Challenges
Number of Targets
Large or Complex Targets
Complex Relationships
Scalability Strategies
Scan Sequentially
Select Subset
Summarize Somehow
Comparative Designs
Juxtapose
Superpose
Explicit Encoding
What is the comparison? Why is it hard?
How to address the challenges? Which visual design to use?
Should I think about
comparison?
Maybe
 If it helps
If you’re so good at
comparison

We should be able to compare complex things
Interactive Visual Comparison of
Object Embeddings
Florian Heimerl, Christoph Kralj,
Torsten Möller and Michael Gleicher
University of Wisconsin – Madison, University of Vienna
TVCG 2020 (presented at VIS ‘21)
The Problem:
Compare two embeddings
We are interested in the relationships between objects
local structure
Not their positions in space
global structure
back to the introductory example

10 dimensional data mapped to 2 dimensions
867 objects
2 runs ofTSNE (different random seeds)
Perplexity 30, Learning Rate 200
Local structure (near neighbors) should be preserved
(that’s what the algorithm does)
Very different shapes?
Similar neighbors
(points close in one are close in the other)
Is the local structure really similar?
measure difference/similarity?
assess and localize it?
interpret and diagnose it?
identify exemplars?
understand context?
Metrics (multiple)
Summary views
Link between views
Connect to detail views
Connect back to global views
Similar neighbors
(points close in one are close in the other)
How many objects have the same nearest
neighbor?
651
216
Color Encoding
7 Items
327 Items
Where are those 7?
What are those 7?
Spread view:
Where do they go? (outside of the top N)
7 items
2 items
651 items
Select 2 groups
7 items
2 items
Distance view (highlight those 9)
Clusters (k-Means)
Select by region
A real example:
WordVector Embeddings, 2 Coprora
Wikipedia (modern) vs EEBO (1470-1700)
GloVE (embedding algorithm)
300 dims (key parameter of algorithm)
5000 most common words (can’t show all words)
Select groups with lots of overlap
Sort by most frequent words (from 41)
numbers, days, proper names, 

Select no overlap in top 50
Cup?
Another example:
Build topic models with abstracts?
Vispub data (871 papers)
Abstracts vs. FullText
NMF, 10 topics
Hypothesis: Abstracts are different than papers
Dimensionality Reduction ofTopic Models
(abstracts vs. texts)
Red dots are the “topics” – 1 hot vectors of each topic (unit dimensions in each vector)
Pick a cluster – it does correspond
Most of these have lots of overlap
Cluster doesn’t match
left: abstracts about GPU/rendering
right: papers more about domain
Limitations
Binary comparison – less good for parameter tuning
need new designs?
Many complex views that need to be combined
address usability – through task-driven pre-arrangements?
Scalability of implementation and visual designs
Validation on real problems with real users
Summary
Embeddings forText Analysis
If we can design interpretation tools
Compare document and word
embeddings to interpret them
Specialized tools for comparison
of embeddings
Task-centric design process
Comparison as Analysis Approach
Because we have a design process
An approach to thinking about
design for comparison
Examples using text analysis with
embedding comparison
4 considerations of comparison
Thanks!
To you for listening
To my students and collaborators
(too many to list)
To our sponsors
NSF, NIH, DARPA, Mellon Foundation
Michael Gleicher
http://pages.cs.wisc.edu/~gleicher

Weitere Àhnliche Inhalte

Ähnlich wie Interpreting Embeddings with Comparison

Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchDawn Anderson MSc DigM
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackAndre Freitas
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsUniversity of Minnesota, Duluth
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfdevangmittal4
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioDeep Learning Italia
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional SemanticsAndre Freitas
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...Amit Sheth
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approachdinesh_joshy
 
Stance indexicalityworkshop
Stance indexicalityworkshopStance indexicalityworkshop
Stance indexicalityworkshopScott F. Kiesling
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
 
Semantic Relatedness of Web Resources by XESA - Philipp Scholl
Semantic Relatedness of Web Resources by XESA - Philipp SchollSemantic Relatedness of Web Resources by XESA - Philipp Scholl
Semantic Relatedness of Web Resources by XESA - Philipp SchollCROKODIl consortium
 
causativeppt.ppt
causativeppt.pptcausativeppt.ppt
causativeppt.pptAjiNug3
 

Ähnlich wie Interpreting Embeddings with Comparison (20)

Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
Stance indexicalityworkshop
Stance indexicalityworkshopStance indexicalityworkshop
Stance indexicalityworkshop
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
Ijcai 2007 Pedersen
Ijcai 2007 PedersenIjcai 2007 Pedersen
Ijcai 2007 Pedersen
 
Semantic Relatedness of Web Resources by XESA - Philipp Scholl
Semantic Relatedness of Web Resources by XESA - Philipp SchollSemantic Relatedness of Web Resources by XESA - Philipp Scholl
Semantic Relatedness of Web Resources by XESA - Philipp Scholl
 
causativeppt.ppt
causativeppt.pptcausativeppt.ppt
causativeppt.ppt
 

KĂŒrzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

KĂŒrzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Interpreting Embeddings with Comparison

  • 1. Interpretting Embeddings with Comparison Michael Gleicher Department of Computer Sciences University ofWisconsin - Madison
  • 2. Two talks (or five) in one! How do we deal with embeddings of text data? Alexander, et. al. Serendip: Topic Model-Driven Visual Exploration of Text Corpora.VAST ‘14. Alexander & Gleicher. Task-Driven Comparison of Topic Models.VAST ‘15 Heimerl & Gleicher. Interactive Analysis of Word Vector Embeddings. EuroVis ’18. Heimerl, et al. Interactive Visual Comparison of Object Embeddings. (TVCG ‘21) How do we use comparison as a tool for hard data problems? Gleicher. Considerations for Visualizing Comparison. InfoVis 2017.
  • 3. How do we deal with embeddings of text data? Alexander, et. al. Serendip: Topic Model- Driven Visual Exploration of Text Corpora. VAST ‘14. Alexander & Gleicher. Task-Driven Comparison of Topic Models.VAST ‘15 Heimerl & Gleicher. Interactive Analysis of Word Vector Embeddings. EuroVis ’18. Heimerl, et al. Interactive Visual Comparison of Object Embeddings. TVCG ‘21 Eric Alexander Assistant Prof. Carleton College Florian Heimerl Post-Doc University ofWisconsin
  • 4.
  • 5. Michael Gleicher University ofWisconsin Visual Computing Group Human Graphics Interaction authoring pictures, videos, animations Human Robot Interaction robots! Human Data Interaction visualization, visual analytics, interactive learning `
  • 7. Interpretting Embeddings with Comparison Michael Gleicher Department of Computer Sciences University ofWisconsin - Madison
  • 8. Two different stories
 Embeddings are a great way to analyze text collections! But they are hard to interpret! Use comparison as a strategy Comparison is a great way to think about analysis problems! But it’s abstract – need examples! Use embeddings as a case study
  • 9. Interpretting Embeddings with Comparison Michael Gleicher Department of Computer Sciences University ofWisconsin - Madison
  • 10. What is an embedding? General mathematics: Place a smaller structure into a larger structure Computer science: Place a discrete set of objects into a vector space Encode relationships between objects Paris 4,8,1,3,
 Atlanta 5,2,1,7,
 New York 4,8,1,3,
 London 5,2,1,7,
 Boston 3,2,5,1,
 Tokyo 9,2,6,4,
 Beijing 7,3,2,7,
 San Jose 5,2,1,7,
 Jakarta 3,2,5,1,
 Sydney 9,2,6,4,
 Munich 7,3,2,7,
 High Dimensional Data Objects have associated Vectors
  • 11. Kinds of relationships in embeddings Distance A is closer to B than to C B C D A E Linear Structure A is to B as C is to D Semantic Directions A is more X than C B C D A F F E B A C D Relationships are interesting even if global positions are not
  • 12. Embeddings We care about relationships not values The coordinates (axes) may have no meaning A B 1
  • 14. Similar neighbors (points close in one are close in the other)
  • 15. Embeddings for analyzingText Corpora Embed Documents Similar documents are close Estimate similar meaning by word usage statistics in a corpus Embed Words Similar words are close Estimate similar meaning by word usage statistics in a corpus
  • 16. Document Similarity Documents that use similar words (probably) have similar content Idea from the 1960s https://www.nytimes.com/2019/01/02/obituaries/karen-sparck-jones-overlooked.html
  • 17. Similarity based on word counts? Need to deal with similar words (synonyms) Need to reduce the number of words (too high dimensional) Find groups of words Pre-defined dictionaries (word types) word usage analysis Statistical groupings (words that tend to go together) Topic Modeling
  • 19. Word vector embeddings Place words in a high-dimensional vector space Words similar in meaning should be close in space Infer similarity by distributional semantics: similar context implies similar meaning Construct embeddings by processing a corpus of text my pet cat is brown my pet dog is brown my big car is brown
  • 20. Why useWordVector Embeddings? Learn about Language or Corpora (Texts) Find similar words/synonyms Track changes of word usage Exploring polysemy Creating lexical resources Evidence of bias 
 Natural Language Applications Pre-Processing Translation Sentiment Analysis Interpretation 

  • 21. Several ways to build word embeddings Word2Vec Skip-gram model Neural embedding GLoVE Co-occurrence model Factor matrix by optimization Several ways to build topic models (document embeddings) LDA (Latent Dirichlet Allocation) Standard algorithm Iterative and probabilistic NMF (Non-Negative Matrix Factors) Fast Algorithm Less widely adopted
  • 22. Why Interpret Embeddings? Gain insight into a model Did you build a good model? Gain insight into the modeling process Which methods are better? Gain insight about the underlying data What does the model tell us?
  • 23. Challenges ofWordVector Embeddings Challenges of Document Embeddings Large numbers ofWords (or documents) High-dimensional spaces Complex relationships – meaningless positions (or questionable) Complex processes for building embeddings Complex downstream applications No ground truth - subjective aspects
  • 24. Interactive Analysis ofWord Vector Embeddings Florian Heimerl and Michael Gleicher Department of Computer Sciences University ofWisconsin – Madison EuroVis 2018
  • 25. Summary: Word vector embeddings offer unique challenges Task analysis of needs 3 Designs for unmet needs Buddy Plots ConceptAxis Plots Co-occurance Matrices
  • 26. Task Analysis: What do people do withWord Embeddings? Literature survey 111 papers from diverse communities Consider use cases in Linguistics, HCI, Digitial Humanities, etc. Augment this list by extrapolation what tasks would users want – but aren’t doing yet
  • 27. Use cases suggest tasks Learn about Language or Corpora (Texts) Find similar words/synonyms Track changes of word usage Exploring polysemy Creating lexical resources Natural Language Applications Pre-Processing Translation Sentiment Analysis Interpretation Evaluation Intrinsic (good embedding?) Extrinsic (applications success?) Interpretation Identify items of interest Probe values of interest
  • 28. LinguisticTasks and Characteristics We identified 7 distinct linguistic tasks within the literature 4 characteristics pertinent to those tasks: similarity, arithmetic structures, concept axis, and co-occurrences 28
  • 29. PriorWork: WordVector EmbeddingVisualizations Liu et al., InfoVis 2017 Analogy Relationships Smilkov et al., 2016 Distance Relationships Rong and Adar, 2016 Training Process
  • 30. Tasks and Characteristics vs.Visualizations Rank word pairs similarity View neighbors similarity Select synonyms similarity Compare concepts average, similarity Find analogies offset, similarity Project on Concept concept axis Predict Contexts co-oc. probability Smilkov, et al 2016 Liu, et al 2017
  • 31. Tasks and Characteristics vs.Visualizations Rank word pairs similarity View neighbors similarity Select synonyms similarity Compare concepts average, similarity Find analogies offset, similarity Project on Conceptconcept axis Predict Contexts co-oc. probability Smilkov, et al 2016 Liu, et al 2017 Tasks we seek designs for in this paper
  • 32. Buddy Plots ConceptAxis Plots Co-occurance Matrices 1. Similarities (local distances) 2. Co-Occurrances 3. Concept Axes
  • 33. Buddy Plots ConceptAxis Plots Co-occurance Matrices 1. Similarities (local distances) 2. Co-Occurrances 3. Concept Axes
  • 34. Concept Axes: Understanding Semantic Directions Opposing concepts make an axis Food Pet
  • 35. Concept Axes: Understanding ConceptAxes Define an axis from one concept to another Food Pet Dog Cat Goldfish Chicken Cow Trout
  • 36. Ways to define axes Vector between two concepts Interaxis - Kim et al., 2015 Classifier between two groups Explainers – Gleicher, 2013 Food Pet Dog Cat Goldfish Chicken Cow Trout
  • 37. Multiple ConceptAxes Use multi-variate plots 2D = Scatterplot Pet Food Small Big
  • 39. Buddy Plots ConceptAxis Plots Co-occurance Matrices 1. Similarities (local distances) 2. Co-Occurrances 3. Concept Axes
  • 40. Similarities: Understanding local distances Distances are meaningful even if absolute values are not What is close to a word? Are there groups of words that are similar? Ordered lists are useful Density (how many can you show) Sense of relative distances Comparison between words Embedding Projector Smilkov, et al. 2016
  • 41. Buddy Plots (1D lists) Alexander and Gleicher, 2016 – forTopic Models Map distance (to selected reference) to horizontal axis Reference object Closest to ref
  • 42. Not dimensionality reduction? Map distance to word to horizontal axis Focus on a single point – other relations not preserved ?
  • 44. Stacked/Chained buddy plots Use color to encode distance
  • 45. Stacked/Chained buddy plots Use color to encode distance in the reference row A A B B
  • 46. Stacked/Chained buddy plots Use color to encode distance in the reference
  • 47.
  • 48. Same word
 different embedding Broadcast
  • 49. Comparison? For interpretation Do the differences show something? Word meaning change (between corpora) Correlations / Biases (within corpora) For modeling (selection) How are these models different? Which is better?
  • 50. Comparison? Comparison is an important task in Data Analysis Comparison is special! (since it involves multiple things) It’s an important special case It deserves special attention Almost all Data Analysis can be viewed as comparison Comparison is a lens (to look at problems) It’s a generally useful tool It deserves special attention
  • 51. How do I think about comparison? to help me develop tools to help people do it
  • 52. My CS838 (Data Visualization) class March 11, 2010
  • 53. Maybe I shouldn’t publish this
 It’s my secret weapon to devise new tools

  • 54. Considerations for Visualizing Comparison Michael Gleicher Department of Computer Sciences University of Wisconsin – Madison InfoVis 2017
  • 55. What is this paper? Considerations for Visualizing Comparison 4 questions to ask when designing a visualization or tool ?
  • 56. To examine (two or more objects, ideas, people, etc.) in order to note similarities and differences To mark or point out the similarities and differences of (two or more things)
  • 57. What is the comparison? Why is it hard? How to address the challenges? Which visual design to use?
  • 58. Comparative Elements Targets Actions Comparative Challenges Number of Targets Large or Complex Targets Complex Relationships Scalability Strategies Scan Sequentially Select Subset Summarize Somehow Comparative Designs Juxtapose Superpose Explicit Encoding What is the comparison? Why is it hard? How to address the challenges? Which visual design to use?
  • 59. Wrong Question: Is my problem Comparison? Just about anything can be viewed as comparison Not everything benefits from being viewed this way
  • 60. Serendip, VAST ‘14 Alexander, E., Kohlmann, J., Valenza, R., Witmore, M., & Gleicher, M. (2014). Serendip: Topic model-driven visual exploration of text corpora. In 2014 IEEE Conference on Visual Analytics Science and Technology (VAST)
  • 61. Serendip: Topic model-driven visual exploration of text corpora Use a topic model to guide exploration of a text corpus Find patterns and connect back to specifics Documents Model Documents/Passages
  • 62. Goals Support inquiry across levels of abstraction Corpus Document Passage Word
  • 63. Goals Support inquiry across levels of abstraction Combat issues of scale in the data Many documents Long documents
  • 64. Goals Support inquiry across levels of abstraction Combat issues of scale Promote serendipitous discovery: Multiple entry points Highlight adjacency Flexible exploration A. Thudt, U. Hinrichs, S. Carpendale 2012
  • 65. Basics of Serendip Three different (interlinked views) Corpus Document Passage Word CorpusViewer TextViewer RankViewer
  • 66.
  • 67. Is this comparison? No! A tool for exploring a topic model! We didn’t describe it as comparison Tool for looking at one topic model Unclear how users think about it Yes! Comparison thinking really helped! We did think about comparison Tool for using topic models Our users had comparison tasks
  • 68. Is this comparison? I don’t care! No! A tool for exploring a topic model! We didn’t describe it as comparison Tool for looking at one topic model Unclear how users think about it Yes! Comparison thinking really helped! We did think about comparison Tool for using topic models Our users had comparison tasks A survey of comparison would have missed this. It’s a great example of comparison ideas
  • 69. Comparative Elements Targets Actions Comparative Challenges Number of Targets Large or Complex Targets Complex Relationships Scalability Strategies Scan Sequentially Select Subset Summarize Somehow Comparative Designs Juxtapose Superpose Explicit Encoding What is the comparison? Why is it hard? How to address the challenges? Which visual design to use?
  • 70. Question 1: What is the comparison? What are the elements of the comparison?
  • 71. The Elements of a Comparison . . . Targets — Set of things being compared Action — What to do with the relationship among them To mark or point out the similarities and differences of (two or more things) To examine (two or more objects, ideas, etc.) in order to note similarities and differences
  • 72. Question 1A: The Elements: Targets Do you know what you are comparing? Explicit Comparisons – the system has the set of targets Implicit Comparisons – the system may not know all the targets compare against an implicit baseline compare against the user’s knowledge compare with targets only the user knows
  • 73. Question 1A: The Elements: Targets What is being compared? – Comparison Targets Does the model match my expectations? What documents are similar? How do groups of documents differ? What words indicate these differences? How are words used differently? Where in texts are these differences? Do the patterns match other things I know?
  • 74. Question 1B: The Elements: Actions Verbs on relationships Try to be more specific than “examine” or “compare” Truth in Advertising: I didn’t have this worked out in 2010
  • 75. Question 1B: The Elements: Actions What to do with the relationship? Comparison Actions (Verbs) Does the model match my expectations? Measure/Quantify relationship What documents are similar? Identify similar things How do groups of documents differ? Measure/Quantify relationship What words indicate these differences? Dissect a difference How are words used differently? Identify meaningful differences Where in texts are these differences? Contextualize the relationships Do the patterns match other things I know? Identify similar things
  • 76. Comparative Elements Targets Actions Comparative Challenges Number of Targets Large or Complex Targets Complex Relationships Scalability Strategies Scan Sequentially Select Subset Summarize Somehow Comparative Designs Juxtapose Superpose Explicit Encoding What is the comparison? Why is it hard? How to address the challenges? Which visual design to use?
  • 77. Question 2: Why is this comparison hard? If it isn’t hard, you probably don’t need to think about it (much) Abstractly Too many targets to compare Large or Complex Targets Complex Relationships Serendip Challenges of Scale! Lots of documents Long / complex documents Complicated models
  • 78. Many different comparisons Challenges from the kind (not scale) ‱ Hard target types (implicit) ‱ Hard action types (dissection) ‱ Hard combinations (dissect implicit) Tasks influence scalability challenges Solutions must respond to both! Only Scalability Challenges?
  • 79. Serendip Comparison Example: Where does this happen? (contextualize) Task Challenge: Contextualize – fit user knowledge Strategy: show in context Design: use text as scaffold Scalability Challenge: Long Documents Strategy: summarize Design: overview + detail
  • 80. Line-graph Overview Combating scale Showing trends Affording navigation Document
  • 81. Comparative Elements Targets Actions Comparative Challenges Number of Targets Large or Complex Targets Complex Relationships Scalability Strategies Scan Sequentially Select Subset Summarize Somehow Comparative Designs Juxtapose Superpose Explicit Encoding What is the comparison? Why is it hard? How to address the challenges? Which visual design to use?
  • 82. Question 3: What is your strategy for those challenges? Abstractly Scan Sequentially Select Subset Summarize Somehow Sarikaya, Gleicher & Szafir. (2018). Design Factors for Summary Visualization in Visual Analytics. Computer Graphics Forum, 37(3), 145–156. EuroVis 2018. Scalability Strategies!
  • 83. Serendip Comparison Example: Compare Groups Task Challenge: Implicit targets – what groups? Strategy: make explicit Design: user specifies groups Scalability Challenge: Lots of documents Strategy: summarize Design: how to present statistics?
  • 84. Comparative Elements Targets Actions Comparative Challenges Number of Targets Large or Complex Targets Complex Relationships Scalability Strategies Scan Sequentially Select Subset Summarize Somehow Comparative Designs Juxtapose Superpose Explicit Encoding What is the comparison? Why is it hard? How to address the challenges? Which visual design to use?
  • 85. Question 4: What Visual Design for Comparison? Abstractly Juxtaposition Superposition Explicit Encoding Gleicher, M., Albers, D., Walker, R., Jusufi, I., Hansen, C. D., & Roberts, J. C. (2011). Visual comparison for information visualization. Information Visualization, 10(4), 289–309.
  • 86. Comparative Elements Targets Actions Comparative Challenges Number of Targets Large or Complex Targets Complex Relationships Scalability Strategies Scan Sequentially Select Subset Summarize Somehow Comparative Designs Juxtapose Superpose Explicit Encoding What is the comparison? Why is it hard? How to address the challenges? Which visual design to use?
  • 87. Should I think about comparison? Maybe
 If it helps
  • 88. If you’re so good at comparison
 We should be able to compare complex things
  • 89. Interactive Visual Comparison of Object Embeddings Florian Heimerl, Christoph Kralj, Torsten Möller and Michael Gleicher University of Wisconsin – Madison, University of Vienna TVCG 2020 (presented at VIS ‘21)
  • 90. The Problem: Compare two embeddings We are interested in the relationships between objects local structure Not their positions in space global structure
  • 91. back to the introductory example
 10 dimensional data mapped to 2 dimensions 867 objects 2 runs ofTSNE (different random seeds) Perplexity 30, Learning Rate 200 Local structure (near neighbors) should be preserved (that’s what the algorithm does)
  • 93. Similar neighbors (points close in one are close in the other)
  • 94. Is the local structure really similar? measure difference/similarity? assess and localize it? interpret and diagnose it? identify exemplars? understand context? Metrics (multiple) Summary views Link between views Connect to detail views Connect back to global views
  • 95. Similar neighbors (points close in one are close in the other)
  • 96. How many objects have the same nearest neighbor? 651 216
  • 101. Spread view: Where do they go? (outside of the top N)
  • 103. Select 2 groups 7 items 2 items
  • 106.
  • 108. A real example: WordVector Embeddings, 2 Coprora Wikipedia (modern) vs EEBO (1470-1700) GloVE (embedding algorithm) 300 dims (key parameter of algorithm) 5000 most common words (can’t show all words)
  • 109.
  • 110.
  • 111. Select groups with lots of overlap
  • 112. Sort by most frequent words (from 41) numbers, days, proper names, 

  • 113. Select no overlap in top 50
  • 114. Cup?
  • 115. Another example: Build topic models with abstracts? Vispub data (871 papers) Abstracts vs. FullText NMF, 10 topics Hypothesis: Abstracts are different than papers
  • 116.
  • 117. Dimensionality Reduction ofTopic Models (abstracts vs. texts) Red dots are the “topics” – 1 hot vectors of each topic (unit dimensions in each vector)
  • 118. Pick a cluster – it does correspond
  • 119. Most of these have lots of overlap
  • 120. Cluster doesn’t match left: abstracts about GPU/rendering right: papers more about domain
  • 121. Limitations Binary comparison – less good for parameter tuning need new designs? Many complex views that need to be combined address usability – through task-driven pre-arrangements? Scalability of implementation and visual designs Validation on real problems with real users
  • 122. Summary Embeddings forText Analysis If we can design interpretation tools Compare document and word embeddings to interpret them Specialized tools for comparison of embeddings Task-centric design process Comparison as Analysis Approach Because we have a design process An approach to thinking about design for comparison Examples using text analysis with embedding comparison 4 considerations of comparison
  • 123. Thanks! To you for listening To my students and collaborators (too many to list) To our sponsors NSF, NIH, DARPA, Mellon Foundation Michael Gleicher http://pages.cs.wisc.edu/~gleicher

Hinweis der Redaktion

  1. I present the basic framework in my class in 2010 – before phones had panoramic photography. Danielle who later built sequence surveyor was there.
  2. I mean comparison in it’s broad standard dictionary-use sense: the act of relating two or more things
  3. To think about comparison in an abstract way, I have a set of four questions.
  4. For each question, I have an abstraction of the space of answers – in the form categories for the answers. This slide is the main piece of the work. But it’s a little much to take in abstractly, so let me introduce it with an example.
  5. The idea of Serendip is that it was a tool for viewing topic models and using them to gain insights on the text collections that they were built from. It was a system that was built in collaboration with humanities scholars and had 3 main views.
  6. But, you should be asking if you should try thinking this way.