On Integrating Information Visualization Techniques into Data Mining: A Review by Keqian Li (2015)

On Integrating Information Visualization
Techniques into Data Mining: A Review
by
Keqian Li (2015)
Sushant Gautam
MSIISE
Department of Electronics and Computer Engineering
Institute of Engineering, Thapathali Campus
6 April 2022

Background
● exploding growth of digital data in the information era
○ immensurable potential value
● Need for different types of data-driven techniques
○ exploit available data
● Datamining and Information Visualization
○ different communities/approaches of problem solving
● Combining:
○ DM: sophisticated algorithmic techniques
○ IV: intuitively and interactivity
2

Stages of typical process of data mining
● Preliminary data analysis
● Model construction
● Model evaluation
3

VISUALIZATION FOR PRELIMINARY DATA
ANALYSIS
4

PRELIMINARY DATA ANALYSIS
● Light-weight before main data mining algorithm
● show the basic characteristic of data
● quick grasp of intuitive
● formulate further strategy of heavy weighted analysis
○ facilitate further data mining approach
● visualization of preliminary analysis of data
○ visual representation, arrangement and simple manipulation
5

Scalar value data visualization
● Scalar value distribution
○ Box and whiskers plot🖼, Variable width box plots, histograms
○ Bagplot and Functional boxplot (envelope) 🖼
● visualizing scalar value along other dimensions
○ study the relation between the scalar data & other dimensions
■ time series analysis for stock price
○ Scatter plots🖼, spatial index
6

7
same distribution with boxplot and
probability density function
2d scatterplot with contour and a
calculated regression line

Multi-dimensional data visualization
● Glyph 🖼 based vector visualization
○ Vector field or even a tensor field
○ Chernoff face 🖼
● Similarly encoding each dimension
○ Scatterplot matrix
○ Parallel Scatterplot Matrix
○ Parallel Coordinates techniques
● 3D Visual Data Mining
○ 3D Parallel Coordinates Trees🖼
○ Equalized Density Surfaces 🖼 9
3D Scatterplots

10
example of glyphs
(a) Stick figures form textural patterns
(b) Dense glyph packing for diffusion tensor data
(c) Helix glyphs on maps for analyzing cyclic
temporal patterns for two diseases
(d) The local flow probe can simultaneously depict a
multitude of different variables
3D-Parallel-CoordinateTrees

12
Density-equalization on a hexagon
Density-equalization on a 3D cube

Bipartite flow visualization
Hierarchical heavy hitters
NFlowVis: 🖼 network flow between local hosts and external
IPs
14
Computed Lattice for a
bipartite relationship
3D visualization for
binary heavy hitters

15
Nflowvis for network traffic analysis

Network Data
- online social media, physical/biological interaction network
- UCINet, JUNG, and GUESS, GraphViz, Pajek, Visone
● Node-link diagrams
○ Vizster 🖼
● Adjacency matrix view
○ MatLink 🖼, NodeTrix 🖼
16

17
Vizster for
attribute search
enhanced
node–link
diagrams

18
MatLink for links and highlight enhanced matrix view
NodeTrix using matrix view for local communities'
analysis

Text Data
WWW: unstructured data beyond the multidimensional
framework described above
● bag of words
● Analysis on high dimensional space
○ ThemeRiver 🖼, The BETA system 🖼, The TextArc 🖼
● Analysis with dimension reduction
○ Navigating Wikipedia with a topic model 🖼
○ Starlight’s text visualization system 🖼
19

20
StreamGraph*: kind of used in ThmeRiver
system using the word frequency vector to
analyze the text
BETA system using entity-based analysis to help the
search query visualization

21
TextArc system for
concordance visualization

22
one topics from a dynamic topic model fit to Science
from 1880 to 2002 WEBSOM text visualization system with
SOM dimension reduction

23
Architecture of WebSOM method

24
Navigating Wikipedia: Navigating flow based on the topic model fit to Wikipedia

25
Starlight’s text
visualization system
using TRUST for
dimension reduction
Boeing Text
Representation
Using Subspace
Transformation

VISUALIZATION FOR MODEL
CONSTRUCTION
26

VISUALIZATION FOR MODEL CONSTRUCTION
● Data mining: highly automated model
● marrying the classical techniques with the idea of info vis
○ stimulated the research on Interactive model construction
○ visualization is provided for partial results of the current model
○ user can provide feedback to change the behavior of the process.
○ incorporate automatic analysis with the human wisdom
○ greatly increase interpretability of the model
27

Progressive construction
step-by-step construction
● Weka: Visual decision tree construction 🖼
● PBC: Perception Based Classification
● HDEye system for visual clustering 🖼
28

30
Weka: Visual decision tree construction

31
the HDEye system for visual clustering

Iterative prototyping
give feedback to a visual prototype of the trained model
● Seifert et.al.
● the Ensemble Matrix interface
32

33
The system by Seifert
et.al. The classification
probability view allows
users to select a falsely
classified item to the
right class and retrain
the classifier.

34
Primary view in EnsembleMatrix. Confusion matrices of component classifiers are shown in
thumbnails on the right. The matrix on the left shows the confusion matrix of the current ensemble
classifier built by the user.

Interactive pipeline construction
● Each submodule as black box
● visualization system for users to assemble submodules
● Weka
● RapidMiner, KNIME, STATISTICA, Orange
35

36
The Overview Panel in Rapidminer
system shows the entire data mining
pipeline.
Weka Panel

VISUALIZATION FOR MODEL EVALUATION
37

VISUALIZATION FOR MODEL EVALUATION
● visually convey the results of a mining task
○ such as classification, clustering
● answer the question
○ what the conclusions are
○ why a model makes such conclusions
○ how good they perform
● critical for human makers to analyze model behavior
38

Classification
● Test oriented visualization
○ ROC curves
○ Contingency Wheel++
● Instance space-based visualization
○ SPLOM based visualization tool 🖼
○ Projection techniques based on self-organizing maps (SOM)
● Factor analysis
○ Nomograms
○ CViz for multivariate data visualization
39

40
Byklov et. al’s tool showing the scatterplot matrix for an iteration

41
Contingency Wheel++
Alsallakh et.al’s tool, showing
(a) the confusion wheel
(b) the feature analysis view
(c, d) histograms and scatterplots for the separability of the selected features

42
Rheingans et.al’s tool, showing SOM
probability map with test instances

43
Nomograph showing the nonlinearities in the Pressure-
Temperature relation. Demo tool Scary Nomograph

Clustering
● H-BLOB clustering visualization
● OPSM biclustering method
● Hierarchical Clustering Explorer
44

45
Bicluster visualization, showing the overlapping,
similarity and separation for clusters and entities
The H-BLOB visualizes the clusters by
computing a hierarchy of implicit surfaces

46
HCE(Hierarchical
Clustering
Explorer)

Association Rules
● Double-Decker plots
● Parallel Coordinates
● Network
● TwoKey plot
47

48
(a) Doubledecker plot for the
OvaryCancer data showing the
conditional distribution of X-ray, given
operation, given stage, and with
survival highlighted.

49
(b) Parallel Coordinates
(c) Network (d) TwoKey plot

Neuron Network
● SOM
○ U-matrix
○ Distance Matrix
○ Similarity coloring
● Visualization of neurons
○ semantic conceptual features: stimuli
50

51
Different techniques for showing cluster structure among SOM cells.

52
Visualization of neurons shows evolution of a randomly chosen neurons in different layers through the training
process. Each layer’s features are displayed in a different horizontal block
Deep learning for Building
High-level Features

Graphical Probabilistic Models
• capturing dependencies among random variables
• building large-scale multivariate statistical models
• nodes encodes a variable, link encodes correlation.
Fluxflow 🖼
53

54
The FluxFlow system visualizes the hidden states inside the probabilistic model

Conclusions
● Surveyed integrating the wisdom in two fields towards
more effective and efficient data analytic
● Info Viz compliments Data Mining in all of the three key
stages.
55

On Integrating Information Visualization Techniques into Data Mining: A Review by Keqian Li (2015)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

Ähnlich wie On Integrating Information Visualization Techniques into Data Mining: A Review by Keqian Li (2015)

Ähnlich wie On Integrating Information Visualization Techniques into Data Mining: A Review by Keqian Li (2015) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

On Integrating Information Visualization Techniques into Data Mining: A Review by Keqian Li (2015)

Hinweis der Redaktion