Even though exploring data visually is an integral part of the data analytic pipeline, we struggle to visually explore data once the number of dimensions go beyond three. This talk will focus on showcasing techniques to visually explore multi dimensional data p 3. The aim would be show examples of each of following techniques, potentially using one exemplar dataset. This talk was given at the Strata + Hadoop World Conference @ Singapore 2015 and at Fifth Elephant conference @ Bangalore, 2015
7. 70%
of the sensory
receptors are in
the eyes
50%
of the brain
used for visual
processing
100ms
to get a sense
of the visual
scene
Visual Wired Brain
13. Area Sales
North 5
East 25
West 15
South 20
Central 10
x y
1 5
2 25
3 15
4 20
5 10
x (C) = Area
y (Q) = Sales
Parse
Variables
Acquire
Data
14. Area Sales
North 5
East 25
West 15
South 20
Central 10
x y
1 5
2 25
3 15
4 20
5 10
x (C) = Area
y (Q) = Sales
x y
20
60
100
140
180
Encode Shape
& Select Scales
Parse
Variables
Acquire
Data
x - position, y - bar
scale - 200 x 200
15. Area Sales
North 5
East 25
West 15
South 20
Central 10
x y
1 5
2 25
3 15
4 20
5 10
x (C) = Area
y (Q) = Sales
x - position, y - bar
scale - 200 x 200
x y
20
60
100
140
180
Parse
Variables
Acquire
Data
cartesian
Render with
Coordinates
Encode Shape
& Select Scales
16. Points Line Bar
Bar - Stacked Bar - Stagger Coordinates
System
Create Visualisations
29. Visualise Big
Data
x,y => 1,000,000
Comparable to the
Number of Pixels
on my MacBook Air
1400 x 900
Data
30. Data Sample
Sampling can be
effective (with
overweighting
unusual values)
Require multiple
plots or careful
tuning parameters
31. Data Sample
Model
Models are great as
they scale nicely.
But, visualisation is
required as
“I don’t know, what I
don’t know.”
32. Data Sample
ModelBinning
Binning can solve a
lot of these
challenges
“Bin - Summarize - Smooth: A
framework for visualising big
data” - Hadley Wickam (2013)
“imMens: Real-time Visual
Querying of Big Data” - Liu,
Jiang, Heer (2013)
39. Multi Dimensional Viz
Standard
2d/3d
Pixel Based
Approach
Glyph
Approach
Geometric
Transforms
Stacking
Approach
Scatterplot
SPLOM
Trellis / Facets
Multiple View
Star plots
Stick Figure
Chernoff Faces
Color Icons
Parallel Coord
Table lens
Star Coords
Tours
Space Filling
Pixel Bar Chart
Spiral Technique
Treemaps
Dimensional
Stacking
Hierarchical
Axis
40. Multi Dimensional Viz
Standard
2d/3d
Pixel Based
Approach
Glyph
Approach
Geometric
Transforms
Stacking
Approach
Scatterplot
SPLOM
Trellis / Facets
Multiple View
Star plots
Stick Figure
Chernoff Faces
Color Icons
Parallel Coord
Table lens
Star Coords
Tours
Space Filling
Pixel Bar Chart
Treemaps
Dimensional
Stacking
Hierarchical
Axis
Need for Interaction
Ease of Interpretation
Spiral Technique
42. Diamonds dataset
50K+ observations of 10 dimensions
Price of
diamonds is
related to
the 4C’s
price in US$
carat weight (⅕ of a gram)
cut 5 levels [Fair to ideal]
colour 7 levels [J to D]
clarity 8 levels [I1 to IF]
43. Diamonds dataset
50K+ observations of 10 dimensions
z
depth
table width
z
y
x
x length mm
y width mm
z height mm
depth z depth %
table table width %
44. Diamonds dataset
price carat cut color clarity x y z depth table
326 0.23 Ideal E SI2 3.95 3.98 2.43 61.5 55
326 0.21 Premium E SI1 3.89 3.84 2.31 59.8 61
327 0.23 Good E VS1 4.05 4.07 2.31 56.9 65
334 0.29 Premium I VS2 4.2 4.23 2.63 62.4 58
335 0.31 Good J SI2 4.34 4.35 2.75 63.3 58
336 0.24 Very Good J VVS2 3.94 3.96 2.48 62.8 57
50K+ observations of 10 dimensions
81. Spiral Pixel Curve Pixel Bar Chart
Pixel Bar Chart - KeimVisDB - Keim
Pixel Based Approach
82. Data Viz Process
(Wide Data)
Acquire
Data
Encode
Shape
Select
Scales
Render
Algorithm
Parse
Variables
Filter
Data
Aggregate
Data
Make
Views
Add
Interactivity
83. Data Viz Process
(Wide Data)
Acquire
Data
Encode
Shape
Select
Scales
Render
Algorithm
Parse
Variables
Filter
Data
Aggregate
Data
Make
Views
Add
Interactivity
1. Encode wisely
2. Use space and multiples
3. Add interactivity
4. Reduce dimensions