OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

OpenML.org: Networked Science and IoT Data Streams
Jan N. van Rijn
University of Freiburg
November 24, 2016

Motivation
Galileo Galilei (1564–1642)
Created the best telescopes
Discovered the rings of Saturn
Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 2

Motivation
Galileo Galilei (1564–1642)
Created the best telescopes
Discovered the rings of Saturn
Sent anagrams of his discoveries,
instead of publishing the results

Openml.org

Datasets
Data (ARFF) uploaded or referenced, versioned
Analysed, characterized, organized on line
Indexed based on name, meta-features, tags, etc.
Support for other data formats (on request)

Tasks
Data alone does not deﬁne an experiment
Tasks contain: data, target attribute, goals, procedures
Readable by tools, automates experimentation
Real time ‘leaderboard’ and overview

Flows (algorithms)
Run locally, auto-registered by tools
Integrations + APIs (REST, Java, R, Python, . . . )

Flows (algorithms)
Run locally, auto-registered by tools
Integrations + APIs (REST, Java, R, Python, . . . )
1 from s c i k i t l e a r n import t r e e
2 from openml import tasks , runs
3
4 task = t a s k s . get (59)
5 c l f = t r e e . D e c i s i o n T r e e C l a s s i f i e r ()
6 run = run . r u n t a s k ( task , c l f )
7 r e t u r n t a s k , response = run . p u b l i s h ()

Runs
Flow uploads predictions
Predictions are evaluated on OpenML
Reproducible, linked to data, ﬂows and researcher
Contains:
predictions
parameter settings
model information
evaluation measures

Analysis
Answer basic questions about performance of algorithms to study . . .
the eﬀect / behaviour of parameters on a given algorithm
the eﬀect of feature selection on a given algorithm
how algorithms behave with respect to each other
which algorithms perform well on a wide range of datasets

Eﬀect of parameter
93
94
95
96
97
98
99 RBFK
ernel(1)
J48(2)
IBk(1)
Logistic(1)
Random
Forest(1)
REPTree(1)
PredictiveAccuracy(%)

Eﬀect of parameter
93
94
95
96
97
98
99 RBFK
ernel(1)
J48(2)
IBk(1)
Logistic(1)
Random
Forest(1)
REPTree(1)
PredictiveAccuracy(%)
21
2
2
2
3
24
2
5
26
2
7
28
4 16 64 256 1024 4096 16384
Optimalvalue
Number Of Features

Eﬀect of Feature Selection
256
512
1024
2048
4096
8192
16384
32768
65536
1 4 16 64 256 1024 4096 16384
NumberOfInstances
Number Of Features
Better
Equal
Worse
k-NN (k = 1)
256
512
1024
2048
4096
8192
16384
32768
65536
1 4 16 64 256 1024 4096 16384
NumberOfInstances
Number Of Features
Better
Equal
Worse
Naive Bayes

Eﬀect of Feature Selection
256
512
1024
2048
4096
8192
16384
32768
65536
1 4 16 64 256 1024 4096 16384
NumberOfInstances
Number Of Features
Better
Equal
Worse
k-NN (k = 1)
256
512
1024
2048
4096
8192
16384
32768
65536
1 4 16 64 256 1024 4096 16384
NumberOfInstances
Number Of Features
Better
Equal
Worse
Naive Bayes
256
512
1024
2048
4096
8192
16384
32768
65536
1 4 16 64 256 1024 4096 16384
NumberOfInstances
Number Of Features
Better
Equal
Worse
Decision Tree (C4.5)
256
512
1024
2048
4096
8192
16384
32768
65536
1 4 16 64 256 1024 4096 16384
NumberOfInstances
Number Of Features
Better
Equal
Worse
SVM (RBF Kernel)

Performance of Algorithms
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
JRip
LM
T
H
oeffdingTree
Random
Tree
Random
Forest
N
aiveBayes
SM
O
(PolyK
ernel)
M
ultilayerPerceptron
LogitBoost(D
ecisionStum
p)
M
ultilayerPerceptron
D
ecisionTable
SM
O
(RBFK
ernel)
LogisticH
yperPipes
M
ultilayerPerceptron
IBk
FU
RIA
BayesN
etA
daBoostM
1(N
aiveBayes)
O
LM
Sim
pleCart
ConjunctiveRule
A
daBoostM
1(D
ecisionStum
p)
LA
D
TreeO
neR
Bagging(REPTree)
J48
A
daBoostM
1(J48)
IBk
Accuracy

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
JRip
LM
T
H
oeffdingTree
Random
Tree
Random
Forest
N
aiveBayes
SM
O
(PolyK
ernel)
M
ultilayerPerceptron
LogitBoost(D
ecisionStum
p)
M
ultilayerPerceptron
D
ecisionTable
SM
O
(RBFK
ernel)
LogisticH
yperPipes
M
ultilayerPerceptron
IBk
FU
RIA
BayesN
etA
daBoostM
1(N
aiveBayes)
O
LM
Sim
pleCart
ConjunctiveRule
A
daBoostM
1(D
ecisionStum
p)
LA
D
TreeO
neR
Bagging(REPTree)
J48
A
daBoostM
1(J48)
IBk
Accuracy
0.4
0.5
0.6
0.7
0.8
0.9
1
JRip
LM
T
H
oeffdingTree
Random
Tree
Random
Forest
N
aiveBayes
SM
O
(PolyK
ernel)
M
ultilayerPerceptron
LogitBoost(D
ecisionStum
p)
M
ultilayerPerceptron
D
ecisionTable
SM
O
(RBFK
ernel)
LogisticH
yperPipes
M
ultilayerPerceptron
IBk
FU
RIA
BayesN
etA
daBoostM
1(N
aiveBayes)
O
LM
Sim
pleCart
ConjunctiveRule
A
daBoostM
1(D
ecisionStum
p)
LA
D
TreeO
neR
Bagging(REPTree)
J48
A
daBoostM
1(J48)
IBk
AreaundertheROCcurve

105 datasets, 30 classiﬁers
Friedman - Nemenyi test (α = 0.05)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Logistic Model Tree
Random Forest
Bagging(REP Tree)
AdaBoost(J48)
FURIA
SMO(Poly Kernel)
Simple Cart
LogitBoost(Decision Stump)
Multilayer Perceptron (20)
J48
Logistic
JRip
Multilayer Perceptron (10)
REP Tree
k-NN (k=10) LAD Tree
Multilayer Perc. (10, 10)
k-NN (k=1)
Decision Table
Hoeffding Tree
SMO(RBF Kernel)
Bayesian Network
AdaBoost(NaiveBayes)
NaiveBayes
AdaBoost(DecisionStump)
Random Tree
OneR
Conjunctive Rule
Hyper Pipes
OLM
CD

Data Streams
On line learning
Many IoT applications in this paradigm
Example: Predict the electricity price for the next day
Feedback whether the prediction was correct
Model can become obsolete (concept drift)
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0 5 10 15 20 25 30 35 40
accuracy
interval
Hoeffding Tree
Naive Bayes
SPegasos
k-NN

Performance of Data Streams Algorithms
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.00
0.25
0.50
0.75
1.00N
oChange
M
ajorityClass
SPegasoslogloss
SPegasoshingeloss
SG
D
logloss
SG
D
hingeloss
D
ecisionStum
pPerceptron
AW
E(O
neR)
AW
E(D
ecisionStum
p)
RuleClassifier
Random
H
oeffdingTree
N
aiveBayeskN
N
k
=
1
AW
E(REPTree)
kN
N
k
=
10
AW
E(SM
O
(PolyKernel))
AW
E(Logistic)
kN
N
w
ithPAW
k
=
10AW
E(J48)
AW
E(JRip)
H
oeffdingTree
A
SH
oeffdingTree
H
oeffdingO
ptionTree
H
oeffdingA
daptiveTree
PredictiveAccuracy

Performance of Data Streams Algorithms
1 2 3 4 5 6 7 8 9 10111213141516171819202122232425
HoeffdingOptionTree
HoeffdingAdaptiveTree
HoeffdingTree
ASHoeffdingTree
AWE(J48)
AWE(JRip)
AWE(SMO(PolyKernel))
AWE(Logistic)
kNNwithPAW k = 10
AWE(REPTree)
kNN k = 10
kNN k = 1
NaiveBayes
RandomHoeffdingTree
Perceptron
RuleClassifier
AWE(DecisionStump)
AWE(OneR)
SPegasos logloss
DecisionStump
SPegasos hingeloss
SGD hingeloss
SGD logloss
MajorityClass
NoChange
CD

Goal
Can we build a classiﬁer that does better?
How can we use the expermental results in OpenML for this?

Goal
Can we build a classiﬁer that does better?
How can we use the expermental results in OpenML for this?
Probably! By combining them in a smart way (ensembles)
Approach: work on intervals of 1,000 observations
Task: try to predict for the next interval which classiﬁer to use

The OpenML approach
Many data streams (and tasks) from various sources
Real world: electricity, forest convertype, airlines
Synthetic: Bayesian Network Generator, Moving Hyperplanes, LED
Meta-features per data stream
Direct access to all MOA classiﬁers
Experimental results
Models
Predictions
Measured Performance

Meta-Features
Category Meta-features
Simple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Observations with
Missing Values, # Missing Values, % Observations With Missing Values, % Missing Values, #
Numeric Attributes, # Nominal Attributes, # Binary Attributes, % Numeric Attributes, % Nominal
Attributes, % Binary Attributes, Majority Class Size, % Majority Class, Minority Class Size, %
Minority Class
Statistical Mean of Means of Numeric Attributes, Mean Standard Deviation of Numeric Attributes, Mean
Kurtosis of Numeric Attributes, Mean Skewness of Numeric Attributes
Information Theoretic Class Entropy, Mean Attribute Entropy, Mean Mutual Information, Equivalent Number Of At-
tributes, Noise to Signal Ratio
Landmarkers Accuracy, Kappa and Area under the ROC Curve of the following classiﬁers: Decision Stump, J48
(conﬁdence factor: 0.01), k-NN, NaiveBayes, REP Tree (maximum depth: 3)

Meta-Features
Minority Class
Drift detection Changes by Adwin (Hoeffding Tree), Warnings by Adwin (Hoeffding Tree), Changes by DDM
(Hoeffding Tree), Warnings by DDM (Hoeffding Tree), Changes by Adwin (Naive Bayes), Warnings
by Adwin (Naive Bayes), Changes by DDM (Naive Bayes), Warnings by DDM (Naive Bayes)

Meta-Features
Minority Class
Drift detection Changes by Adwin (Hoeffding Tree), Warnings by Adwin (Hoeffding Tree), Changes by DDM
(Hoeffding Tree), Warnings by DDM (Hoeffding Tree), Changes by Adwin (Naive Bayes), Warnings
by Adwin (Naive Bayes), Changes by DDM (Naive Bayes), Warnings by DDM (Naive Bayes)
Stream Landmarkers Accuracy Naive Bayes on previous window, Accuracy k-NN on previous window, . . .

Stream Landmarkers
. . . c . . .
w
l1 ✓ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✓ ✗ 0.7
l2 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✗ ✗ ✓ 0.7
l3 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✓ ✗ ✓ 0.8

Stream Landmarkers
P(l , c, α, L) =
1 iﬀ c = 0
P(l , c − 1, α, L) · α + (1 − L(l (PSc ), l(PSc ))) · (1 − α) otherwise
(1)

Classifier Output Difference
25 on line classifiers (data streams)
NoChange
SGDHINGELOSS
SGDLOGLOSS
SPegasosHINGELOSS
SPegasosLOGLOSS
MajorityClass
Perceptron
AWE(OneRule)
DecisionStump
AWE(DecisionStump)
RuleClassifier
1−NN
k−NNwithPAW
k−NN
RandomHoeffdingTree
HoeffdingAdaptiveTree
HoeffdingOptionTree
ASHoeffdingTree
HoeffdingTree
AWE(JRip)
AWE(REPTree)
AWE(J48)
NaiveBayes
AWE(SMO)
AWE(Logistic)
0.00.10.20.30.40.50.6

Results
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.25
0.50
0.75
1.00
M
ajority
VoteEnsem
ble
AW
E(J48)
BestSingleClassifier
O
nlineBaggingM
eta−learning
Ensem
ble
BLA
ST
(W
indow
)
BLA
ST
(FF)
Leveraging
Bagging
PredictiveAccuracy

Results
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.25
0.50
0.75
1.00
M
ajority
VoteEnsem
ble
AW
E(J48)
O
nlineBaggingM
eta−learning
Ensem
ble
BLA
ST
(W
indow
)
BLA
ST
(FF)
Leveraging
Bagging
PredictiveAccuracy
1 2 3 4 5 6 7 8
Leveraging Bagging
BLAST (FF)
Online Bagging
BLAST (Window) Meta-learning Ensemble
Best Single Classiﬁer
AWE(J48)
Majority Vote Ensemble
CD

Results
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1
10
100
1000
10000
AW
E(J48)
M
ajority
VoteEnsem
ble
BLA
ST
(W
indow
)
BLA
ST
(FF)
O
nlineBagging
Leveraging
Bagging
RunCpuTime

Results
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1
10
100
1000
10000
AW
E(J48)
M
ajority
VoteEnsem
ble
BLA
ST
(W
indow
)
BLA
ST
(FF)
O
nlineBagging
Leveraging
Bagging
RunCpuTime
1 2 3 4 5 6 7
Best Single Classiﬁer
AWE(J48)
Majority Vote Ensemble
BLAST (Window)
BLAST (FF)
Online Bagging
Leveraging Bagging
CD

Conclusions
Two techniques
Online Performance Estimation
Ensemble of heterogeneous classiﬁers
Individual performances are average
Combination (BLAST) boosts performance considerably
Parameters to optimize:
Ensemble composition
Window size
Voting policy

Thank you for your attention

OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Ähnlich wie OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg