3. 09/05/14 pag. 3
Motivation
• Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018),
Zettabytes(1021)
• Variety:
– structured, semi-structured, unstructured;
– text, image, audio, video, record
• Velocity: dynamic, sometimes time-varying
Big
Data
refers
to
datasets
that
grow
so
large
that
it
is
difficult
to
capture,
store,
manage,
share,
analyze
and
visualize
with
the
typical
database
soFware
tools.
4. 09/05/14 pag. 4
The social layer in an interconnected world
2+
billion
people
on
the
Web
by
end
2011
30
billion
RFID
tags
today
(1.3B
in
2005)
4.6
billion
camera
phones
world
wide
100s
of
millions
of
GPS
enabled
devices
sold
annually
76
million
smart
meters
in
2009…
200M
by
2014
12+ TBs
of tweet data
every day
25+ TBs of
log data
every day
?
TBs
of
data
every
day
5. 09/05/14 pag. 5
Motivation
Raw
data
has
no
value
in
itself,
only
the
extracted
informaIon
has
value.
Src
:
h"p://www.sas.com/knowledge-‐exchange/business-‐analyIcs/featured/big-‐data-‐drives-‐performance-‐%E2%80%93-‐if-‐
you-‐can-‐exploit-‐the-‐informaIon-‐overload/index.html
6. 09/05/14 pag. 6
Information overload
• Refers to the danger of getting lost in data, which may be:
– irrelevant
to
the
current
task
at
hand,
– processed
in
an
inappropriate
way,
or
– presented
in
an
inappropriate
way.
Src:
h"p://supermarketpeople.co.uk/archives/713
7. 09/05/14 pag. 7
Visual Analytics - Mastering the
Information Age
h"ps://www.youtube.com/watch?v=5i3xbitEVfs
8. 09/05/14 pag. 8
Visual analytics is the science of analytical reasoning
facilitated by interactive visual interfaces.
Definition
9. 09/05/14 pag. 9
Whom does it matter
• Research Community J
• Business Community - New tools, new capabilities, new
infrastructure, new business models etc.
Financial
Services...
12. 09/05/14 pag. 12
History
discovery tools
• Shneiderman (IV ‘02) suggests combining computational analysis
approaches such as data mining with information visualization
– Too often viewed as competitors in past
– Instead, can complement each other
– Each has something valuable to contribute
• Alternatives
– Issues influencing the design of discovery tools:
• Statistical algorithms vs. visual data presentation
• Hypothesis testing vs. exploratory data analysis
– Each has Pro’s and Con’s
13. 09/05/14 pag. 13
Hypothesis testing & exploratory data analysis
• Hypothesis testing
– Advocates:
• By stating hypotheses up front, limit variables and sharpens thinking, more
precise measurement
– Critics:
• Too far from reality, initial hypotheses bias toward finding evidence to support it
• Exploratory Data Analysis
– Advocates:
• Find the interesting things this way, we now have computational capabilities to
do them
– Skeptics:
• Not generalizable, everything is a special case, detecting statistical relationships
does not infer cause and effect
14. 09/05/14 pag. 14
Recommendations
• Integrate data mining and information visualization
• Allow users to specify what they are seeking
• Recognize that users are situated in a social context
• Respect human responsibility
15. 09/05/14 pag. 15
History
• Information visualization systems inadequately supported
decision making (Amar & Stasko IV ‘04):
– Limited affordances
– Predetermined representations
– Decline of determinism in decision-Making
• “Representational primacy” versus “Analytic primacy”
– Telling truth about data vs. providing analytically useful visualizations
16. 09/05/14 pag. 16
Representational primacy
• Pursuit of faithful data replication and comprehension
• Above all else, we must show the data
• Can be a limiting notion
• May focus on low level tasks not relevant to users
• Some data is hard to represent faithfully (think high
dimensional data)
17. 09/05/14 pag. 17
Gaps between representation and analysis
• When explanatory or correlative models are the desired outcome, model
visualization may be more important than data visualization
• One end is to build black boxes where data goes in and the answer comes out
• Not a good idea to trust black boxes
– What
can
go
wrong
with
this?
– White
box
approach
be"er.
18. 09/05/14 pag. 18
Application of Precepts
Worldview-based tasks (“Did we show
the right thing to the user?”)
– Determine Domain Parameters
– Expose Multivariate Explanation
– Facilitate Hypothesis Testing
Rationale-based tasks (“Will the
user believe what she sees?”)
- Expose
Uncertainty
- ConcreIze
RelaIonships
- Expose
Cause
and
Effect
Amar
&
Stasko
2005
19. 09/05/14 pag. 19
Task level emphases
• Don’t just help “low-level” tasks
– Find, filter, correlate, etc.
• Facilitate analytical thinking
– Complex decision-making, especially under uncertainty
– Learning a domain
– Identifying the nature of trends
– Predicting the future
20. 09/05/14 pag. 20
History:
coining
the
term
• 2003-04 Jim Thomas of PNNL, together with colleagues, develops
notion of visual analytics
– Holds workshops at PNNL and at InfoVis ‘04 to help define a research agenda
• Agenda is formalized in book, Illuminating the Path
– Available online, http://nvac.pnl.gov/
• People use visual analytics tools and techniques to
– Synthesize information and derive insight from massive, dynamic, ambiguous, and
often conflicting data
– Detect the expected and discover the unexpected
– Provide timely, defensible, and understandable assessments
– Communicate assessment effectively for action.
Src: Stasko
21. 09/05/14 pag. 21
Visual
AnalyMcs
• Not really an “area” per se
– More of an “umbrella” notion
– Combines multiple areas or disciplines
– Ultimately about using data to improve our knowledge and help make
decisions
• Main Components:
Src: Stasko
22. 09/05/14 pag. 22
Visual Analytics: alternate Definition
Visual analytics combines automated analysis techniques with
interactive visualizations for an effective understanding,
reasoning and decision making on the basis of very large and
complex data sets
Keim et al, chapter in Information Visualization: Human-Centered Issues
and Perspectives, 2008
24. 09/05/14 pag. 24
Synergy
• Combine strengths of both human and electronic data
processing
– Gives a semi-automated analytical process
– Use strengths from each
– Below from Keim, 2008
25. 09/05/14 pag. 25
InfoVis Comparison
• Clearly much overlap
• Perhaps fair to say that infovis hasn’t always focused on
analysis tasks so much and that it doesn’t always include
advanced data analysis algorithms
– Not a criticism, just not focus
– InfoVis has a more narrow scope
Src: Stasko
26. 09/05/14 pag. 26
Visual Analytics
• Encompassing, integrated approach to data analysis
– Use computational algorithms where helpful
– Use human-directed visual exploration where helpful
– Not just “Apply A, then apply B” though
– Integrate the two tightly
Stasko, 2013
Src: Stasko
27. 09/05/14 pag. 27
Visual analytics aims at making data and
information processing transparent
just as information visualization has changed our view on databases,
the goal of visual analytics is to make our way of processing
data and information transparent for an analytic discourse
28. 09/05/14 pag. 28
The visual analytics process
The
visual
analyIcs
process
is
characterized
through
interacIon
between
data,
visualizaIons,
models
about
the
data,
and
the
users
in
order
to
discover
knowledge.
29. 09/05/14 pag. 29
Information seeking mantra for visual
analytics applications
Overview first, zoom/filter, details on demand
Analyze first, show the important, zoom/filter, analyze further,
details on demand.
Keim
et
al.
2006
35. 09/05/14 pag. 35
Relationship discovery
Scale independent representations, whole and parts at same time at
multiple levels of abstraction, often linked
36. 09/05/14 pag. 36
Relationship discovery
Explore high dimensional relationships, theme groupings, outlier
detection, searching by proximity at multiple scales
37. 09/05/14 pag. 37
Combined exploratory and confirmatory analytics
• Develop and refine hypothesis
• Evidence collection, management, and matching to hypothesis
• Tailor views/displays for thematic/hypothesis focus of interest
• Often suggestive of predictions enabling proactive thinking
38. 09/05/14 pag. 38
Multiple data types
• Supports multiple data types: structured/unstructured text
• Imagery/video, cyber
• Systems of either data type or application specific
39. 09/05/14 pag. 39
Temporal views and interactions
• Most analytics situations involve time, pace, velocity
• Group segments of thoughts by time
• Compare time segments
• Often combined with geospatial
41. 09/05/14 pag. 41
Grouping and outlier detection
• Form groups of thought/data
• Labels and annotation
• Compare groupings
• Find small groups or outliers
41
42. 09/05/14 pag. 42
Labeling
• Critically important,
• Dynamic in scope, number labels, size, color
• Positioning
• Almost everything has labels
• Labels tell semantic meaning
43. 09/05/14 pag. 43
Multiple linked views
Temporal, geospatial, theme, cluster, list views with association linkages
between views
45. 09/05/14 pag. 45
Challenge 1: scalability
• Challenge with regard to both:
– visual
representaIons
– automaIc
analysis
• Requirements/needs:
– SoluIon
needs
to
scale
in
size,
dimensionality,
data
types,
levels
of
quality.
– Methods
are
needed
to
deal
with
input
data
that
is
noisy
and
conInuous.
– Pa"erns
and
relaIonships
need
to
be
visualized
on
different
levels
of
details,
and
with
appropriate
levels
of
data
and
visual
abstracIon.
46. 09/05/14 pag. 46
Challenge 2: uncertainty
• Challenge:
– large
amount
of
noise
and
missing
values
from
heterogeneous
data
sources
– bias
introduced
by
automaIc
analysis
methods
as
well
as
human
percepIon
• Requirements/needs:
– representaIon
of
the
noIon
of
data
quality
and
the
confidence
– analysts
need
to
be
aware
of
the
uncertainty
and
be
able
to
analyze
quality
47. 09/05/14 pag. 47
Challenge 3: text data stream
• Analysis and visualization of text steams is still a relatively new field.
• Text stream data often
– have
li"le
structure,
grammar
and
context,
– are
provided
as
high-‐frequency
mulIlingual
stream,
– contain
a
high
percentage
of
non-‐meaningful
and
irrelevant
messages.
• The challenge is to handle the dynamic nature of the stream data
and the low-tolerance of monitoring delay in emergency cases.
48. 09/05/14 pag. 48
Challenge 4: interaction
• Novel interaction techniques and tangible user-interfaces for
seamless intuitive visual communication with the system.
• The analyst should be able to
– fully
focus
on
the
task
at
hand
and
– not
be
distracted
by
overly
technical
or
complex
user
interfaces
and
interacIons.
• User feedback should be taken as intelligently as possible, requiring
as little user input as possible.
49. 09/05/14 pag. 49
Challenge 5: evaluation
• Challenge:
– hard
to
assess
the
quality
of
visual
analyIcs
soluIons
– due
to
the
interdisciplinary
nature
and
complex
process
• Needs:
– evaluaIon
framework
to
assess
effecIveness,
efficiency
and
acceptance
– of
new
visual
analyIcs
techniques,
methods,
and
models
50. 09/05/14 pag. 50
Challenge 6: infrastructure
• Challenge:
– most
soluIons
develop
their
own
infrastructures
– mismatch
between
the
level
of
service
provided
and
real
needs:
• fast
and
precise
answers
with
progressive
refinement,
• incremental
re-‐computaIon,
• steering
the
computaIon
towards
data
regions
that
are
of
interest.
• Requirements/needs:
– infrastructure
to
bind
together
all
the
processes,
funcIons,
and
services
– repositories
of
available
visual
analyIcs
soluIons
54. 09/05/14 pag. 54
References
• Amar, R. A., & Stasko, J. T. (2005).
Knowledge precepts for design and evaluation of information
visualizations. Visualization and Computer Graphics, IEEE
Transactions on, 11(4), 432-442.
• D. A. Keim, F. Mansmann, J. Schneidewind, and H. Ziegler.
Challenges in visual data analysis. In Information Visualization
(IV 2006), Invited Paper, July 5-7, London, United Kingdom. IEEE
Press, 2006.
• Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten
Görg, Jörn Kohlhammer, and Guy Melancon. 2008. Visual
Analytics: Definition, Process, and Challenges. In Information
Visualization, Andreas Kerren, John T. Stasko, Jean-Daniel
Fekete, and Chris North (Eds.). Lecture Notes In Computer
Science, Vol. 4950. Springer-Verlag, Berlin, Heidelberg 154-175
• Shneiderman, B. (2002)
Inventing discovery tools: combining information visualization
with data mining1. Information visualization, 1(1), 5-12.