SlideShare ist ein Scribd-Unternehmen logo
1 von 93
Exploratory DataAnalysis
Using XGBoost
XGBoost を使った探索的データ分析
第1回 R勉強会@仙台(#Sendai.R)
?誰
臨床検査事業 の なかのひと
?専門
遊牧@モンゴル(生態学/環境科学)
▼
臨床検査事業の研究所(データを縦にしたり横にしたりするしごと)
1cm
@kato_kohaku
Exploratory Data Analysis (EDA)
https://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm
is an approach/philosophy for data analysis that employs a variety of
techniques (mostly graphical) to
1. maximize insight into a data set;
2. uncover underlying structure;
3. extract important variables;
4. detect outliers and anomalies;
5. test underlying assumptions;
6. develop parsimonious models; and
7. determine optimal factor settings.
EDA (or explanation) after modelling
Taxonomy of Interpretation / Explanation
https://christophm.github.io/interpretable-ml-book/
EDA using Random Forest (EDARF)
randomForest を使った探索的データ分析 (off-topic)
Random Forest
model
Imputation for missing
 rfimpute()
 {missForest}
Rule Extraction
 {intrees}
 defragTrees@python
 EDARF::plot_prox()
 getTree()
Feature importance
 Gini / Accuracy
 Permutation based
Sensitivity analysis
 Partial Dependence Plot (PDP)
 feature contribution based {forestFloor}
Suggestion
 Feature Tweaking
Today’s topic
Intrinsic Post hoc
Model-Specific
Methods
• Linear Regression
• Logistic Regression
• GLM, GAM and more
• Decision Tree
• Decision Rules
• RuleFit
• Naive Bayes Classifier
• K-Nearest Neighbors
• Feature Importance (OOB error@RF;
gain/cover/weight @XGB)
• Feature Contribution (forestFloor@RF,
XGBoostexplainer, lightgbmExplainer)
• Alternate / Enumerate lasso
(@LASSO)
• inTrees / defragTrees (@RF/XGB)
• Actionable feature tweaking
(@RF/XGB)
Model-
Agnostic
Methods
Intrinsic interpretable
Model にも適用可能
• Partial Dependence Plot
• Individual Conditional Expectation
• Accumulated Local Effects Plot
• Feature Interaction
• Permutation Feature Importance
• Global Surrogate
• Local Explanation (LIME, Shapley
Values, breakDown)
Example-
based
Explanations
??
• Counterfactual Explanations
• Adversarial Examples
• Prototypes and Criticisms
• Influential Instances
EDA × XGBoost
Why EDA × XGBoost (or LightGBM)?
Motivation
https://twitter.com/fchollet/status/1113476428249464833?s=19
Decision tree, Random Forest & Gradient Boosting
Overview
https://www.kdnuggets.com/2017/10/understanding-machine-learning-algorithms.html
http://www.cse.chalmers.se/~richajo/dit866/lectures/l8/gb_explainer.pdf
Gradient Boosting
Gradient Boosting & XGBoost
Overview
http://www.yisongyue.com/courses/cs155/2019_winter/lectures/Lecture_06.pdf
https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf
XGBoost’s Improvements:
 Overfitting suppression
 Split finding efficiency
 Computation time
EDA using XGBoost
XGBoost を使った探索的データ分析
XGBoost
model
Rule Extraction
 Xgb.model.dt.tree()
 {intrees}
 defragTrees@python
Feature importance
 Gain & Cover
 Permutation based
Summarize explanation
 Clustering of observations
 Variable response (2)
 Feature interaction
Suggestion
 Feature Tweaking
Individual explanation
 Shapley value (predcontrib)
 Structure based (predapprox)
Variable response (1)
 PDP / ICE / ALE
EDA (or explanation) using XGBoost
1. Build XGBoost model
2. Feature importance
• Gain & Cover
• Permutation based
3. Variable response (1)
• Partial Dependence Plot (PDP/ICE/ALE)
4. Rule Extraction
• Xgb.model.dt.tree()
• intrees
• defragTrees@python
5. Individual explanation
• Shapley value (predcontrib)
• Structure based (predapprox)
6. Variable response (2)
• Shapley value (predcontrib)
• Structure based (predapprox)
7. Feature interaction
• 2-way SHAP (predinteraction)
URL
Today’s Topic
Suggestion(off topic)
 Feature Tweaking
To Get ALL the Sample Codes
Please see github:
• https://github.com/katokohaku/EDAxgboost
1.XGBOOST MODELの構築
1. データセット
1. 変数の基本プロファイルの確認(型、定義、情報、構造、etc)
2. 前処理(変数変換、教師/テストへの分割・サンプリング、 データ変換)
2. タスクと評価指標の設定
1. 分類問題? 回帰問題(回帰の種類)? クラスタリング? その他?
2. 正確度、誤差、AUC、その他?
3. ハイパーパラメタの設定
1. パラメターサーチする・しない
2. どのパラメータ?、探索の方法?
4. 学習済みモデルの評価
1. 予測精度、予測特性(バイアス傾向)、その他
https://github.com/katokohaku/EDAxgboost/blob/master/100_building_xgboost_model.Rmd
EDA (or explanation) after modelling
1. Build XGBoost model
2. Feature importance
• Structure based (Gain & Cover)
• Permutation based
3. Variable response (1)
• Partial Dependence Plot (PDP / ICE / ALE)
4. Rule Extraction
• Xgb.model.dt.tree()
• intrees
5. Individual explanation
• Shapley value (predcontrib)
• Structure based (predapprox)
6. Variable response (2)
• Shapley value (predcontrib)
• Structure based (predapprox)
7. Feature interaction
• 2-way SHAP (predinteraction)
URL
EDA tools for XGBoost
Suggestion(off topic)
 Feature Tweaking
Human Resources Analytics Data Set
Preparation
• left (target to predict)
• Whether the employee left the workplace or not (1 or 0) Factor
• satisfaction_level
• Level of satisfaction (0-1)
• last_evaluation
• Time since last performance evaluation (in Years)
• number_project
• Number of projects completed while at work
• average_montly_hours
• Average monthly hours at workplace
• time_spend_company
• Number of years spent in the company
• Work_accident
• Whether the employee had a workplace accident
• promotion_last_5years
• Whether the employee was promoted in the last five years
• Sales
• Department in which they work for
• Salary
• Relative level of salary (high)
Source
https://github.com/ryankarlos/Human-Resource-Analytics-Kaggle-Dataset/tree/master/Original_Kaggle_Dataset
Take a glance
Preparation
• GGally::ggpairs()
+ Random Noise
Make continuous features noisy with the same way as:
• https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211
Preparation
Baseline profile: table1::table1()
Convert Train / Test set to xgb.DMatrix
Preparation
1. Factor variable → Integer (or dummy)
2. Separate trainset / testset (+under sampling)
3. (data.frame →) matrix → xgb.DMatrix
Convert Train / Test set to xgb.DMatrix
To minimize the intercept
of xgb model
Factor → Integer
Separate train set
(+under sampling)
Convert xgb.DMatrix
Separate test set
Convert xgb.DMatrix
Hyper-parameter settings
Preparation
• According to:
https://xgboost.readthedocs.io/en/latest/parameter.html
• Tune with Grid/Random/BayesOpt. etc., if you like.
(Recommendation: using mlR)
Search optimal number of booster
Build XGBoost model
• Using cross-validation : xgb.cv()
Build XGBoost model: xgb.cv()
Predictive performances
• For test set
Distribution of Prediction
Predictive performances
URL
2.学習したXGBOOST MODELのプロファイル
1. 予測における特徴量の重要度 (feature importance)
1. Structure based importance(Gain & Cover): xgb.importance()
2. Permutation based importance: DALEX::variable_importance()
URL
https://github.com/katokohaku/EDAxgboost/blob/master/100_building_xgboost_model.Rmd
EDA (or explanation) after modelling
1. Build XGBoost model
2. Feature importance
• Structure based (Gain & Cover)
• Permutation based
3. Variable response (1)
• Partial Dependence Plot (PDP / ICE / ALE)
4. Rule Extraction
• Xgb.model.dt.tree()
• intrees
5. Individual explanation
• Shapley value (predcontrib)
• Structure based (predapprox)
6. Variable response (2)
• Shapley value (predcontrib)
• Structure based (predapprox)
7. Feature interaction
• 2-way SHAP (predinteraction)
URL
EDA tools for XGBoost
Suggestion(off topic)
 Feature Tweaking
xgb.importance()
Feature importance
For a tree model:
Gain
• represents fractional contribution of each feature to the model based on the
total gain of this feature's splits. Higher percentage means a more important
predictive feature.
Cover
• metric of the number of observation related to this feature;
Frequency
• percentage representing the relative number of times a feature have been
used in trees.
For a linear model's importance:
Weight
• the linear coefficient of the feature;
https://www.rdocumentation.org/packages/xgboost/versions/0.6.4.1/topics/xgb.importance
Feature importance (structure based)
Calculates weight when not split further for each node
1. Distribute weight differences to each node
2. Accumulate the weight of the path passed by each observation, for each
booster for each feature (node)
Feature importance (structure based)
Feature importance
Gain
• represents fractional contribution of each feature to the model based on the
total gain of this feature's splits. Higher percentage means a more important
predictive feature.
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
Gain of ith feature at kth node in jth booster is calculated as
Feature importance (permutation based)
Feature importance
• Calculating the increase in the model’s prediction error after
permuting the feature.
• A feature is “important” if shuffling its values increases the model error,
because in this case the model relied on the feature for the prediction.
https://christophm.github.io/interpretable-ml-book/feature-importance.html
FROM: https://www.kaggle.com/dansbecker/permutation-importance
Structure based vs Permutation based
Feature Importance
Structure based Permutation based
For consistency check, rather than for "which is better?“.
Feature Importance
3.感度分析(1)
1. 変数値の変化に対するモデル出力の応答
1. Individual Conditional Expectation & Partial Dependence Plot (ICE & PD plot)
2. PDPの問題点
3. Accumulated Local Effect (ALE) Plot
URL
https://github.com/katokohaku/EDAxgboost/blob/master/200_Sensitivity_analysis.Rmd
EDA (or explanation) after modelling
1. Build XGBoost model
2. Feature importance
• Structure based (Gain & Cover)
• Permutation based
3. Variable response (1)
• Partial Dependence Plot (PDP / ICE / ALE)
4. Rule Extraction
• Xgb.model.dt.tree()
• intrees
5. Individual explanation
• Shapley value (predcontrib)
• Structure based (predapprox)
6. Variable response (2)
• Shapley value (predcontrib)
• Structure based (predapprox)
7. Feature interaction
• 2-way SHAP (predinteraction)
URL
EDA tools for XGBoost
Suggestion(off topic)
 Feature Tweaking
Marginal Response for a Single Variable
Sensitivity Analysis: ICE+PDP vs ALE Plot
Variable response comparison:
ICE+PD Plot
ALE Plot
What-If & other observation (ICE) + average line (PD)
Ceteris Paribus Plots (blue line)
• show possible scenarios for model predictions allowing for changes in a single
dimension keeping all other features constant (the ceteris paribus principle).
Individual Conditional Expectation (ICE) plot (gray lines)
• visualizes one line per instance.
Partial Dependence plot (red line)
• are shown as the average line of all observation.
https://christophm.github.io/interpretable-ml-book/ice.html
Feature value
Modeloutput
The assumption of independence
• is the biggest issue with Partial Dependence plots. When the features are correlated,
PD create new data points in areas of the feature distribution where the actual
probability is very low.
Disadvantage of Ceteris Paribus Plots and PDP
https://christophm.github.io/interpretable-ml-book/pdp.html#disadvantages-5
Forexample,it is unlikelythat:
Someone is 2 meters tall
but weighs less than 50 kg.
A Solution
Local Effect
• averages its derivative of observations on conditional distribution, instead of averaging
overall distribution of target feature.
Accumulated Local Effects (ALE)
• averages Local Effects across the window after being calculated for each window.
https://arxiv.org/abs/1612.08468
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
LocalEffect(4)
ALE = mean(Local Effects)
Sensitivity Analysis: ICE+PDP & ALE Plot
Sensitivity Analysis: ICE+PDP vs ALE Plot
4-1.ツリーの可視化 と ルールの要約
1. ツリーの可視化
1. boosterのダンプ: xgb.model.dt.tree()
2. Single boosterの可視化: xgb.plot.tree()
3. 要約したツリーの可視化: xgb.plot.multi.trees()
2. 予測ルールの抽出(inTrees)
1. ルールの列挙
2. ルールの要約
URL
https://github.com/katokohaku/EDAxgboost/blob/master/300_rule_extraction_xgbPlots.Rmd
EDA (or explanation) after modelling
1. Build XGBoost model
2. Feature importance
• Structure based (Gain & Cover)
• Permutation based
3. Variable response (1)
• Partial Dependence Plot (PDP / ICE / ALE)
4. Rule Extraction
• Xgb.model.dt.tree()
• intrees
5. Individual explanation
• Shapley value (predcontrib)
• Structure based (predapprox)
6. Variable response (2)
• Shapley value (predcontrib)
• Structure based (predapprox)
7. Feature interaction
• 2-way SHAP (predinteraction)
URL
EDA tools for XGBoost
Suggestion(off topic)
 Feature Tweaking
Text dump Tree model structure
Rule Extraction:: xgb.model.dt.tree()
• Parse a boosted tree model into a data.table structure.
Plot a boosted tree model (1st tree)
Rule Extraction
URL
Plot a boosted tree model (2nd tree)
Rule Extraction
URL
Plot multiple tree model
Rule Extraction
URL
Multiple-in-one plot
Rule Extraction
URL
4-2.ツリーの可視化 と ルールの要約
1. ツリーの可視化
1. boosterのダンプ: xgb.model.dt.tree()
2. Single boosterの可視化: xgb.plot.tree()
3. 要約したツリーの可視化: xgb.plot.multi.trees()
2. 予測ルールの抽出(inTrees)
1. ルールの列挙
2. ルールの要約
URL
https://github.com/katokohaku/EDAxgboost/blob/master/300_rule_extraction_xgbPlots.Rmd
Extract rules from of trees
Rule Extraction: {inTrees}
https://arxiv.org/abs/1408.5456
• Using inTrees
Enumerate rules from of trees
Rule Extraction: {inTrees}
Build a simplified tree ensemble learner (STEL)
Rule Extraction: {inTrees}
ALL of sample code are:
https://github.com/katokohaku/EDAxgboost/blob/master/310_rule_extraction_inTrees.md
5-1.FEATURE CONTRIBUTIONにもとづくプロファイル
1. 個別の観察の説明 (prediction breakdown)
1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE)
2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE)
3. 予測に基づく観察対象の次元削減
4. クラスタリングによるグループ化
5. グループ内の観察の可視化
URL
https://github.com/katokohaku/EDAxgboost/blob/master/400_breakdown_individual-explanation_and_clustering.Rmd
EDA (or explanation) after modelling
1. Build XGBoost model
2. Feature importance
• Structure based (Gain & Cover)
• Permutation based
3. Variable response (1)
• Partial Dependence Plot (PDP / ICE / ALE)
4. Rule Extraction
• Xgb.model.dt.tree()
• intrees
5. Individual explanation
• Shapley value (predcontrib)
• Structure based (predapprox)
6. Variable response (2)
• Shapley value (predcontrib)
• Structure based (predapprox)
7. Feature interaction
• 2-way SHAP (predinteraction)
URL
EDA tools for XGBoost
Suggestion(off topic)
 Feature Tweaking
Shapley value
A method for assigning payouts to players depending on their contribution to
the total payout. Players cooperate in a coalition and receive a certain profit
from this cooperation.
The “game”
• is the prediction task for a single instance of the dataset.
The “gain”
• is the actual prediction for this instance minus the average prediction for all instances.
The “players”
• are the feature values of the instance that collaborate to receive the gain (= predict a
certain value).
• https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
• https://christophm.github.io/interpretable-ml-book/shapley.html
Feature contribution based on cooperative game theory
Shapley value
Shapley value is the average of all the marginal contributions
to all possible coalitions.
• One solution to keep the computation time manageable is to compute
contributions for only a few samples of the possible coalitions.
• https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
• https://christophm.github.io/interpretable-ml-book/shapley.html
Feature contribution based on cooperative game theory
Shapley value
Breakdown individual explanation path
Feature contribution based on tree structure
Based on xgboost model structure,
1. Calculate weight when not split further for each node
2. Distribute weight differences to each node
3. Accumulate the weight of the path passed by each observation, for each
booster for each feature (node)
Feature contribution based on tree structure
To get prediction path
Feature contribution based on tree structure
Individual explanation path
Enumerate Feature contribution based on Shapley / tree structure
Each row explains each observation (prediction breakdown)
Explain single observation
Individual explanation:
Each row explains each observation (prediction breakdown)
5-2.FEATURE CONTRIBUTIONにもとづくプロファイル
1. 個別の観察の説明 (prediction breakdown)
1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE)
2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE)
3. 予測に基づく観察対象の次元削減
4. クラスタリングによるグループ化
5. グループ内の観察の可視化
URL
https://github.com/katokohaku/EDAxgboost/blob/master/400_breakdown_individual-explanation_and_clustering.Rmd
Identify clusters based on xgboost
Clustering of featurecontribution of each observation using t-SNE
• Dimension reduction using t-SNE
Dimension reduction: Rtsne::Rtsne()
Identify clusters based on xgboost
Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel()
• Class labeling using hierarchical clustering (hclust)
Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel()
Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel()
Scatter plot with group label
Similar observations in a cluster (1)
Individual explanation
URL
Similar observations in a cluster (2)
Individual explanation
URL
Individual explanation
https://github.com/katokohaku/EDAxgboost/blob/master/R/waterfallBreakdown.R
6.FEATURE CONTRIBUTIONにもとづく感度分析
1. 変数値の変化に対するモデル出力の応答(感度分析)②
1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE)
2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE)
URL
https://github.com/katokohaku/EDAxgboost/blob/master/410_breakdown_feature_response-interaction.Rmd
EDA (or explanation) after modelling
1. Build XGBoost model
2. Feature importance
• Structure based (Gain & Cover)
• Permutation based
3. Variable response (1)
• Partial Dependence Plot (PDP / ICE / ALE)
4. Rule Extraction
• Xgb.model.dt.tree()
• intrees
5. Individual explanation
• Shapley value (predcontrib)
• Structure based (predapprox)
6. Variable response (2)
• Shapley value (predcontrib)
• Structure based (predapprox)
7. Feature interaction
• 2-way SHAP (predinteraction)
URL
EDA tools for XGBoost
Suggestion(off topic)
 Feature Tweaking
Individual explanation path
Individual explanation
Each column explains each feature impact (variable response)
Individual Feature Impact (1)
Sensitivity Analysis
Each column explains each feature impact (variable response)
Individual Feature Impact (2-1)
Sensitivity Analysis
Each column explains each feature impact (variable response)
Individual Feature Impact (2-2)
Sensitivity Analysis
Each column explains each feature impact (variable response)
Contribution dependency plots
Sensitivity Analysis
URL
xgb.plot.shap()
• display the estimated contributions (Shapley value) of a feature to model
prediction for each individual case.
Feature Impact Summary
Sensitivity Analysis
http://www.f1-predictor.com/model-interpretability-with-shap/
Similar to SHAPR,
• contribution breakdown from prediction path (model structure).
6.CONTRIBUTIONにもとづく相互作用分析
1. 変数同士の相互作用
1. 2変数の相互作用の強さ: predict(..., predinteraction = TRUE)
URL
https://github.com/katokohaku/EDAxgboost/blob/master/410_breakdown_feature_response-interaction.Rmd
EDA (or explanation) after modelling
1. Build XGBoost model
2. Feature importance
• Structure based (Gain & Cover)
• Permutation based
3. Variable response (1)
• Partial Dependence Plot (PDP / ICE / ALE)
4. Rule Extraction
• Xgb.model.dt.tree()
• intrees
5. Individual explanation
• Shapley value (predcontrib)
• Structure based (predapprox)
6. Variable response (2)
• Shapley value (predcontrib)
• Structure based (predapprox)
7. Feature interaction
• 2-way SHAP (predinteraction)
URL
EDA tools for XGBoost
Suggestion(off topic)
 Feature Tweaking
Feature interaction of single observation
• Feature contribution can be decomposed as 2-way feature interaction.
Feature interaction
2-way featue interaction:
Feature contribution for feature contribution
Individual explanation
Each row shows breakdown of contribution
Feature interaction of single observation
• xgboost:::predict.xgb.Booster(..., predinteraction = TRUE)
xgboost:::predict.xgb.Booster(..., predinteraction = TRUE)
Individual explanation
Feature contribution for feature contribution of single instance
Absolute mean of all interaction
• SHAP can be decomposed as 2-way feature interaction.
xgboost:::predict.xgb.Booster(..., predinteraction = TRUE)
xgboost
Original Paper
• https://www.kdd.org/kdd2016/subtopic/view/xgboost-a-scalable-tree-
boosting-system
Tasks, Metrics & other Parameters
• https://xgboost.readthedocs.io/en/latest/
For R
• http://dmlc.ml/rstats/2016/03/10/xgboost.html
• https://xgboost.readthedocs.io/en/latest/R-
package/xgboostPresentation.html
• https://xgboost.readthedocs.io/en/latest/R-package/discoverYourData.html
解説ブログ記事・スライド(日本語)
• http://kefism.hatenablog.com/entry/2017/06/11/182959
• https://speakerdeck.com/hoxomaxwell/dive-into-xgboost
References
Data & Model explanation
Generic interpretability/explainability
• Iml book
• https://christophm.github.io/interpretable-ml-book/
Exploratory Data Analysis (EDA)
• What is EDA?
• https://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm
• DALEX
• Descriptive mAchine Learning EXplanations
• https://pbiecek.github.io/DALEX/
• DrWhy
• the collection of tools for Explainable AI (XAI)
• https://pbiecek.github.io/DALEX/
References

Weitere ähnliche Inhalte

Was ist angesagt?

アンサンブル木モデル解釈のためのモデル簡略化法
アンサンブル木モデル解釈のためのモデル簡略化法アンサンブル木モデル解釈のためのモデル簡略化法
アンサンブル木モデル解釈のためのモデル簡略化法Satoshi Hara
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用Tomonari Masada
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Shintaro Fukushima
 
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...Deep Learning JP
 
数式を使わずイメージで理解するEMアルゴリズム
数式を使わずイメージで理解するEMアルゴリズム数式を使わずイメージで理解するEMアルゴリズム
数式を使わずイメージで理解するEMアルゴリズム裕樹 奥田
 
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)ARISE analytics
 
変分ベイズ法の説明
変分ベイズ法の説明変分ベイズ法の説明
変分ベイズ法の説明Haruka Ozaki
 
グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門Kawamoto_Kazuhiko
 
顕著性マップの推定手法
顕著性マップの推定手法顕著性マップの推定手法
顕著性マップの推定手法Takao Yamanaka
 
統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-Shiga University, RIKEN
 
[DL輪読会]医用画像解析におけるセグメンテーション
[DL輪読会]医用画像解析におけるセグメンテーション[DL輪読会]医用画像解析におけるセグメンテーション
[DL輪読会]医用画像解析におけるセグメンテーションDeep Learning JP
 
合成変量とアンサンブル:回帰森と加法モデルの要点
合成変量とアンサンブル:回帰森と加法モデルの要点合成変量とアンサンブル:回帰森と加法モデルの要点
合成変量とアンサンブル:回帰森と加法モデルの要点Ichigaku Takigawa
 
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)MLSE
 
[DL輪読会]representation learning via invariant causal mechanisms
[DL輪読会]representation learning via invariant causal mechanisms[DL輪読会]representation learning via invariant causal mechanisms
[DL輪読会]representation learning via invariant causal mechanismsDeep Learning JP
 
ブースティング入門
ブースティング入門ブースティング入門
ブースティング入門Retrieva inc.
 
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...Deep Learning JP
 
LightGBM: a highly efficient gradient boosting decision tree
LightGBM: a highly efficient gradient boosting decision treeLightGBM: a highly efficient gradient boosting decision tree
LightGBM: a highly efficient gradient boosting decision treeYusuke Kaneko
 
距離と分類の話
距離と分類の話距離と分類の話
距離と分類の話考司 小杉
 

Was ist angesagt? (20)

アンサンブル木モデル解釈のためのモデル簡略化法
アンサンブル木モデル解釈のためのモデル簡略化法アンサンブル木モデル解釈のためのモデル簡略化法
アンサンブル木モデル解釈のためのモデル簡略化法
 
MICの解説
MICの解説MICの解説
MICの解説
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用
 
EMアルゴリズム
EMアルゴリズムEMアルゴリズム
EMアルゴリズム
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析
 
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
 
数式を使わずイメージで理解するEMアルゴリズム
数式を使わずイメージで理解するEMアルゴリズム数式を使わずイメージで理解するEMアルゴリズム
数式を使わずイメージで理解するEMアルゴリズム
 
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
 
変分ベイズ法の説明
変分ベイズ法の説明変分ベイズ法の説明
変分ベイズ法の説明
 
グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門
 
顕著性マップの推定手法
顕著性マップの推定手法顕著性マップの推定手法
顕著性マップの推定手法
 
統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-
 
[DL輪読会]医用画像解析におけるセグメンテーション
[DL輪読会]医用画像解析におけるセグメンテーション[DL輪読会]医用画像解析におけるセグメンテーション
[DL輪読会]医用画像解析におけるセグメンテーション
 
合成変量とアンサンブル:回帰森と加法モデルの要点
合成変量とアンサンブル:回帰森と加法モデルの要点合成変量とアンサンブル:回帰森と加法モデルの要点
合成変量とアンサンブル:回帰森と加法モデルの要点
 
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
 
[DL輪読会]representation learning via invariant causal mechanisms
[DL輪読会]representation learning via invariant causal mechanisms[DL輪読会]representation learning via invariant causal mechanisms
[DL輪読会]representation learning via invariant causal mechanisms
 
ブースティング入門
ブースティング入門ブースティング入門
ブースティング入門
 
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
 
LightGBM: a highly efficient gradient boosting decision tree
LightGBM: a highly efficient gradient boosting decision treeLightGBM: a highly efficient gradient boosting decision tree
LightGBM: a highly efficient gradient boosting decision tree
 
距離と分類の話
距離と分類の話距離と分類の話
距離と分類の話
 

Ähnlich wie Exploratory data analysis using xgboost package in R

모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 r-kor
 
226 team project-report-manjula kollipara
226 team project-report-manjula kollipara226 team project-report-manjula kollipara
226 team project-report-manjula kolliparaManjula Kollipara
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
ProFET - Protein Feature Engineering Toolki
ProFET - Protein Feature Engineering ToolkiProFET - Protein Feature Engineering Toolki
ProFET - Protein Feature Engineering ToolkiDan Ofer
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseNikolay Samokhvalov
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneXiaoweiJiang7
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Understanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-LearnUnderstanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-Learn철민 권
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuDatabricks
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native CompilationPGConf APAC
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Rajeev Rastogi (KRR)
 

Ähnlich wie Exploratory data analysis using xgboost package in R (20)

Ember
EmberEmber
Ember
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
MLBox
MLBoxMLBox
MLBox
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
226 team project-report-manjula kollipara
226 team project-report-manjula kollipara226 team project-report-manjula kollipara
226 team project-report-manjula kollipara
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
ProFET - Protein Feature Engineering Toolki
ProFET - Protein Feature Engineering ToolkiProFET - Protein Feature Engineering Toolki
ProFET - Protein Feature Engineering Toolki
 
DB
DBDB
DB
 
Spock
SpockSpock
Spock
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tune
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Understanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-LearnUnderstanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-Learn
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
 
[ppt]
[ppt][ppt]
[ppt]
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 

Mehr von Satoshi Kato

How to generate PowerPoint slides Non-manually using R
How to generate PowerPoint slides Non-manually using RHow to generate PowerPoint slides Non-manually using R
How to generate PowerPoint slides Non-manually using RSatoshi Kato
 
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Satoshi Kato
 
How to use in R model-agnostic data explanation with DALEX & iml
How to use in R model-agnostic data explanation with DALEX & imlHow to use in R model-agnostic data explanation with DALEX & iml
How to use in R model-agnostic data explanation with DALEX & imlSatoshi Kato
 
Introduction of inspectDF package
Introduction of inspectDF packageIntroduction of inspectDF package
Introduction of inspectDF packageSatoshi Kato
 
Introduction of featuretweakR package
Introduction of featuretweakR packageIntroduction of featuretweakR package
Introduction of featuretweakR packageSatoshi Kato
 
Genetic algorithm full scratch with R
Genetic algorithm full scratch with RGenetic algorithm full scratch with R
Genetic algorithm full scratch with RSatoshi Kato
 
Intoroduction & R implementation of "Interpretable predictions of tree-based ...
Intoroduction & R implementation of "Interpretable predictions of tree-based ...Intoroduction & R implementation of "Interpretable predictions of tree-based ...
Intoroduction & R implementation of "Interpretable predictions of tree-based ...Satoshi Kato
 
Multiple optimization and Non-dominated sorting with rPref package in R
Multiple optimization and Non-dominated sorting with rPref package in RMultiple optimization and Non-dominated sorting with rPref package in R
Multiple optimization and Non-dominated sorting with rPref package in RSatoshi Kato
 
Deep forest (preliminary ver.)
Deep forest  (preliminary ver.)Deep forest  (preliminary ver.)
Deep forest (preliminary ver.)Satoshi Kato
 
Introduction of "the alternate features search" using R
Introduction of  "the alternate features search" using RIntroduction of  "the alternate features search" using R
Introduction of "the alternate features search" using RSatoshi Kato
 
forestFloorパッケージを使ったrandomForestの感度分析
forestFloorパッケージを使ったrandomForestの感度分析forestFloorパッケージを使ったrandomForestの感度分析
forestFloorパッケージを使ったrandomForestの感度分析Satoshi Kato
 
Oracle property and_hdm_pkg_rigorouslasso
Oracle property and_hdm_pkg_rigorouslassoOracle property and_hdm_pkg_rigorouslasso
Oracle property and_hdm_pkg_rigorouslassoSatoshi Kato
 
Imputation of Missing Values using Random Forest
Imputation of Missing Values using  Random ForestImputation of Missing Values using  Random Forest
Imputation of Missing Values using Random ForestSatoshi Kato
 
Interpreting Tree Ensembles with inTrees
Interpreting Tree Ensembles with  inTreesInterpreting Tree Ensembles with  inTrees
Interpreting Tree Ensembles with inTreesSatoshi Kato
 

Mehr von Satoshi Kato (14)

How to generate PowerPoint slides Non-manually using R
How to generate PowerPoint slides Non-manually using RHow to generate PowerPoint slides Non-manually using R
How to generate PowerPoint slides Non-manually using R
 
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
 
How to use in R model-agnostic data explanation with DALEX & iml
How to use in R model-agnostic data explanation with DALEX & imlHow to use in R model-agnostic data explanation with DALEX & iml
How to use in R model-agnostic data explanation with DALEX & iml
 
Introduction of inspectDF package
Introduction of inspectDF packageIntroduction of inspectDF package
Introduction of inspectDF package
 
Introduction of featuretweakR package
Introduction of featuretweakR packageIntroduction of featuretweakR package
Introduction of featuretweakR package
 
Genetic algorithm full scratch with R
Genetic algorithm full scratch with RGenetic algorithm full scratch with R
Genetic algorithm full scratch with R
 
Intoroduction & R implementation of "Interpretable predictions of tree-based ...
Intoroduction & R implementation of "Interpretable predictions of tree-based ...Intoroduction & R implementation of "Interpretable predictions of tree-based ...
Intoroduction & R implementation of "Interpretable predictions of tree-based ...
 
Multiple optimization and Non-dominated sorting with rPref package in R
Multiple optimization and Non-dominated sorting with rPref package in RMultiple optimization and Non-dominated sorting with rPref package in R
Multiple optimization and Non-dominated sorting with rPref package in R
 
Deep forest (preliminary ver.)
Deep forest  (preliminary ver.)Deep forest  (preliminary ver.)
Deep forest (preliminary ver.)
 
Introduction of "the alternate features search" using R
Introduction of  "the alternate features search" using RIntroduction of  "the alternate features search" using R
Introduction of "the alternate features search" using R
 
forestFloorパッケージを使ったrandomForestの感度分析
forestFloorパッケージを使ったrandomForestの感度分析forestFloorパッケージを使ったrandomForestの感度分析
forestFloorパッケージを使ったrandomForestの感度分析
 
Oracle property and_hdm_pkg_rigorouslasso
Oracle property and_hdm_pkg_rigorouslassoOracle property and_hdm_pkg_rigorouslasso
Oracle property and_hdm_pkg_rigorouslasso
 
Imputation of Missing Values using Random Forest
Imputation of Missing Values using  Random ForestImputation of Missing Values using  Random Forest
Imputation of Missing Values using Random Forest
 
Interpreting Tree Ensembles with inTrees
Interpreting Tree Ensembles with  inTreesInterpreting Tree Ensembles with  inTrees
Interpreting Tree Ensembles with inTrees
 

Kürzlich hochgeladen

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 

Kürzlich hochgeladen (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 

Exploratory data analysis using xgboost package in R

  • 1. Exploratory DataAnalysis Using XGBoost XGBoost を使った探索的データ分析 第1回 R勉強会@仙台(#Sendai.R)
  • 3. Exploratory Data Analysis (EDA) https://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to 1. maximize insight into a data set; 2. uncover underlying structure; 3. extract important variables; 4. detect outliers and anomalies; 5. test underlying assumptions; 6. develop parsimonious models; and 7. determine optimal factor settings.
  • 4. EDA (or explanation) after modelling Taxonomy of Interpretation / Explanation https://christophm.github.io/interpretable-ml-book/
  • 5. EDA using Random Forest (EDARF) randomForest を使った探索的データ分析 (off-topic) Random Forest model Imputation for missing  rfimpute()  {missForest} Rule Extraction  {intrees}  defragTrees@python  EDARF::plot_prox()  getTree() Feature importance  Gini / Accuracy  Permutation based Sensitivity analysis  Partial Dependence Plot (PDP)  feature contribution based {forestFloor} Suggestion  Feature Tweaking
  • 6. Today’s topic Intrinsic Post hoc Model-Specific Methods • Linear Regression • Logistic Regression • GLM, GAM and more • Decision Tree • Decision Rules • RuleFit • Naive Bayes Classifier • K-Nearest Neighbors • Feature Importance (OOB error@RF; gain/cover/weight @XGB) • Feature Contribution (forestFloor@RF, XGBoostexplainer, lightgbmExplainer) • Alternate / Enumerate lasso (@LASSO) • inTrees / defragTrees (@RF/XGB) • Actionable feature tweaking (@RF/XGB) Model- Agnostic Methods Intrinsic interpretable Model にも適用可能 • Partial Dependence Plot • Individual Conditional Expectation • Accumulated Local Effects Plot • Feature Interaction • Permutation Feature Importance • Global Surrogate • Local Explanation (LIME, Shapley Values, breakDown) Example- based Explanations ?? • Counterfactual Explanations • Adversarial Examples • Prototypes and Criticisms • Influential Instances EDA × XGBoost
  • 7. Why EDA × XGBoost (or LightGBM)? Motivation https://twitter.com/fchollet/status/1113476428249464833?s=19
  • 8. Decision tree, Random Forest & Gradient Boosting Overview https://www.kdnuggets.com/2017/10/understanding-machine-learning-algorithms.html http://www.cse.chalmers.se/~richajo/dit866/lectures/l8/gb_explainer.pdf Gradient Boosting
  • 9. Gradient Boosting & XGBoost Overview http://www.yisongyue.com/courses/cs155/2019_winter/lectures/Lecture_06.pdf https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf XGBoost’s Improvements:  Overfitting suppression  Split finding efficiency  Computation time
  • 10. EDA using XGBoost XGBoost を使った探索的データ分析 XGBoost model Rule Extraction  Xgb.model.dt.tree()  {intrees}  defragTrees@python Feature importance  Gain & Cover  Permutation based Summarize explanation  Clustering of observations  Variable response (2)  Feature interaction Suggestion  Feature Tweaking Individual explanation  Shapley value (predcontrib)  Structure based (predapprox) Variable response (1)  PDP / ICE / ALE
  • 11. EDA (or explanation) using XGBoost 1. Build XGBoost model 2. Feature importance • Gain & Cover • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP/ICE/ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees • defragTrees@python 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL Today’s Topic Suggestion(off topic)  Feature Tweaking
  • 12. To Get ALL the Sample Codes Please see github: • https://github.com/katokohaku/EDAxgboost
  • 13. 1.XGBOOST MODELの構築 1. データセット 1. 変数の基本プロファイルの確認(型、定義、情報、構造、etc) 2. 前処理(変数変換、教師/テストへの分割・サンプリング、 データ変換) 2. タスクと評価指標の設定 1. 分類問題? 回帰問題(回帰の種類)? クラスタリング? その他? 2. 正確度、誤差、AUC、その他? 3. ハイパーパラメタの設定 1. パラメターサーチする・しない 2. どのパラメータ?、探索の方法? 4. 学習済みモデルの評価 1. 予測精度、予測特性(バイアス傾向)、その他 https://github.com/katokohaku/EDAxgboost/blob/master/100_building_xgboost_model.Rmd
  • 14. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  • 15. Human Resources Analytics Data Set Preparation • left (target to predict) • Whether the employee left the workplace or not (1 or 0) Factor • satisfaction_level • Level of satisfaction (0-1) • last_evaluation • Time since last performance evaluation (in Years) • number_project • Number of projects completed while at work • average_montly_hours • Average monthly hours at workplace • time_spend_company • Number of years spent in the company • Work_accident • Whether the employee had a workplace accident • promotion_last_5years • Whether the employee was promoted in the last five years • Sales • Department in which they work for • Salary • Relative level of salary (high) Source https://github.com/ryankarlos/Human-Resource-Analytics-Kaggle-Dataset/tree/master/Original_Kaggle_Dataset
  • 16. Take a glance Preparation • GGally::ggpairs()
  • 17. + Random Noise Make continuous features noisy with the same way as: • https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 Preparation
  • 19. Convert Train / Test set to xgb.DMatrix Preparation 1. Factor variable → Integer (or dummy) 2. Separate trainset / testset (+under sampling) 3. (data.frame →) matrix → xgb.DMatrix
  • 20. Convert Train / Test set to xgb.DMatrix To minimize the intercept of xgb model Factor → Integer Separate train set (+under sampling) Convert xgb.DMatrix Separate test set Convert xgb.DMatrix
  • 21. Hyper-parameter settings Preparation • According to: https://xgboost.readthedocs.io/en/latest/parameter.html • Tune with Grid/Random/BayesOpt. etc., if you like. (Recommendation: using mlR)
  • 22. Search optimal number of booster Build XGBoost model • Using cross-validation : xgb.cv()
  • 26. 2.学習したXGBOOST MODELのプロファイル 1. 予測における特徴量の重要度 (feature importance) 1. Structure based importance(Gain & Cover): xgb.importance() 2. Permutation based importance: DALEX::variable_importance() URL https://github.com/katokohaku/EDAxgboost/blob/master/100_building_xgboost_model.Rmd
  • 27. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  • 28. xgb.importance() Feature importance For a tree model: Gain • represents fractional contribution of each feature to the model based on the total gain of this feature's splits. Higher percentage means a more important predictive feature. Cover • metric of the number of observation related to this feature; Frequency • percentage representing the relative number of times a feature have been used in trees. For a linear model's importance: Weight • the linear coefficient of the feature; https://www.rdocumentation.org/packages/xgboost/versions/0.6.4.1/topics/xgb.importance
  • 29. Feature importance (structure based) Calculates weight when not split further for each node 1. Distribute weight differences to each node 2. Accumulate the weight of the path passed by each observation, for each booster for each feature (node)
  • 30. Feature importance (structure based) Feature importance Gain • represents fractional contribution of each feature to the model based on the total gain of this feature's splits. Higher percentage means a more important predictive feature. https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf Gain of ith feature at kth node in jth booster is calculated as
  • 31. Feature importance (permutation based) Feature importance • Calculating the increase in the model’s prediction error after permuting the feature. • A feature is “important” if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. https://christophm.github.io/interpretable-ml-book/feature-importance.html FROM: https://www.kaggle.com/dansbecker/permutation-importance
  • 32. Structure based vs Permutation based Feature Importance Structure based Permutation based For consistency check, rather than for "which is better?“.
  • 34. 3.感度分析(1) 1. 変数値の変化に対するモデル出力の応答 1. Individual Conditional Expectation & Partial Dependence Plot (ICE & PD plot) 2. PDPの問題点 3. Accumulated Local Effect (ALE) Plot URL https://github.com/katokohaku/EDAxgboost/blob/master/200_Sensitivity_analysis.Rmd
  • 35. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  • 36. Marginal Response for a Single Variable Sensitivity Analysis: ICE+PDP vs ALE Plot Variable response comparison: ICE+PD Plot ALE Plot
  • 37. What-If & other observation (ICE) + average line (PD) Ceteris Paribus Plots (blue line) • show possible scenarios for model predictions allowing for changes in a single dimension keeping all other features constant (the ceteris paribus principle). Individual Conditional Expectation (ICE) plot (gray lines) • visualizes one line per instance. Partial Dependence plot (red line) • are shown as the average line of all observation. https://christophm.github.io/interpretable-ml-book/ice.html Feature value Modeloutput
  • 38. The assumption of independence • is the biggest issue with Partial Dependence plots. When the features are correlated, PD create new data points in areas of the feature distribution where the actual probability is very low. Disadvantage of Ceteris Paribus Plots and PDP https://christophm.github.io/interpretable-ml-book/pdp.html#disadvantages-5 Forexample,it is unlikelythat: Someone is 2 meters tall but weighs less than 50 kg.
  • 39. A Solution Local Effect • averages its derivative of observations on conditional distribution, instead of averaging overall distribution of target feature. Accumulated Local Effects (ALE) • averages Local Effects across the window after being calculated for each window. https://arxiv.org/abs/1612.08468 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 LocalEffect(4) ALE = mean(Local Effects)
  • 42. 4-1.ツリーの可視化 と ルールの要約 1. ツリーの可視化 1. boosterのダンプ: xgb.model.dt.tree() 2. Single boosterの可視化: xgb.plot.tree() 3. 要約したツリーの可視化: xgb.plot.multi.trees() 2. 予測ルールの抽出(inTrees) 1. ルールの列挙 2. ルールの要約 URL https://github.com/katokohaku/EDAxgboost/blob/master/300_rule_extraction_xgbPlots.Rmd
  • 43. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  • 44. Text dump Tree model structure Rule Extraction:: xgb.model.dt.tree() • Parse a boosted tree model into a data.table structure.
  • 45. Plot a boosted tree model (1st tree) Rule Extraction URL
  • 46. Plot a boosted tree model (2nd tree) Rule Extraction URL
  • 47. Plot multiple tree model Rule Extraction URL
  • 49. 4-2.ツリーの可視化 と ルールの要約 1. ツリーの可視化 1. boosterのダンプ: xgb.model.dt.tree() 2. Single boosterの可視化: xgb.plot.tree() 3. 要約したツリーの可視化: xgb.plot.multi.trees() 2. 予測ルールの抽出(inTrees) 1. ルールの列挙 2. ルールの要約 URL https://github.com/katokohaku/EDAxgboost/blob/master/300_rule_extraction_xgbPlots.Rmd
  • 50. Extract rules from of trees Rule Extraction: {inTrees} https://arxiv.org/abs/1408.5456 • Using inTrees
  • 51. Enumerate rules from of trees Rule Extraction: {inTrees}
  • 52. Build a simplified tree ensemble learner (STEL) Rule Extraction: {inTrees} ALL of sample code are: https://github.com/katokohaku/EDAxgboost/blob/master/310_rule_extraction_inTrees.md
  • 53. 5-1.FEATURE CONTRIBUTIONにもとづくプロファイル 1. 個別の観察の説明 (prediction breakdown) 1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE) 2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE) 3. 予測に基づく観察対象の次元削減 4. クラスタリングによるグループ化 5. グループ内の観察の可視化 URL https://github.com/katokohaku/EDAxgboost/blob/master/400_breakdown_individual-explanation_and_clustering.Rmd
  • 54. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  • 55. Shapley value A method for assigning payouts to players depending on their contribution to the total payout. Players cooperate in a coalition and receive a certain profit from this cooperation. The “game” • is the prediction task for a single instance of the dataset. The “gain” • is the actual prediction for this instance minus the average prediction for all instances. The “players” • are the feature values of the instance that collaborate to receive the gain (= predict a certain value). • https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf • https://christophm.github.io/interpretable-ml-book/shapley.html Feature contribution based on cooperative game theory
  • 56. Shapley value Shapley value is the average of all the marginal contributions to all possible coalitions. • One solution to keep the computation time manageable is to compute contributions for only a few samples of the possible coalitions. • https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf • https://christophm.github.io/interpretable-ml-book/shapley.html Feature contribution based on cooperative game theory
  • 58. Breakdown individual explanation path Feature contribution based on tree structure Based on xgboost model structure, 1. Calculate weight when not split further for each node 2. Distribute weight differences to each node 3. Accumulate the weight of the path passed by each observation, for each booster for each feature (node)
  • 59. Feature contribution based on tree structure To get prediction path
  • 60. Feature contribution based on tree structure
  • 61. Individual explanation path Enumerate Feature contribution based on Shapley / tree structure Each row explains each observation (prediction breakdown)
  • 62. Explain single observation Individual explanation: Each row explains each observation (prediction breakdown)
  • 63. 5-2.FEATURE CONTRIBUTIONにもとづくプロファイル 1. 個別の観察の説明 (prediction breakdown) 1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE) 2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE) 3. 予測に基づく観察対象の次元削減 4. クラスタリングによるグループ化 5. グループ内の観察の可視化 URL https://github.com/katokohaku/EDAxgboost/blob/master/400_breakdown_individual-explanation_and_clustering.Rmd
  • 64. Identify clusters based on xgboost Clustering of featurecontribution of each observation using t-SNE • Dimension reduction using t-SNE
  • 66. Identify clusters based on xgboost Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel() • Class labeling using hierarchical clustering (hclust)
  • 67. Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel()
  • 68. Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel() Scatter plot with group label
  • 69. Similar observations in a cluster (1) Individual explanation URL
  • 70. Similar observations in a cluster (2) Individual explanation URL
  • 72. 6.FEATURE CONTRIBUTIONにもとづく感度分析 1. 変数値の変化に対するモデル出力の応答(感度分析)② 1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE) 2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE) URL https://github.com/katokohaku/EDAxgboost/blob/master/410_breakdown_feature_response-interaction.Rmd
  • 73. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  • 74. Individual explanation path Individual explanation Each column explains each feature impact (variable response)
  • 75. Individual Feature Impact (1) Sensitivity Analysis Each column explains each feature impact (variable response)
  • 76. Individual Feature Impact (2-1) Sensitivity Analysis Each column explains each feature impact (variable response)
  • 77.
  • 78. Individual Feature Impact (2-2) Sensitivity Analysis Each column explains each feature impact (variable response)
  • 79.
  • 80. Contribution dependency plots Sensitivity Analysis URL xgb.plot.shap() • display the estimated contributions (Shapley value) of a feature to model prediction for each individual case.
  • 81. Feature Impact Summary Sensitivity Analysis http://www.f1-predictor.com/model-interpretability-with-shap/ Similar to SHAPR, • contribution breakdown from prediction path (model structure).
  • 82.
  • 83.
  • 84. 6.CONTRIBUTIONにもとづく相互作用分析 1. 変数同士の相互作用 1. 2変数の相互作用の強さ: predict(..., predinteraction = TRUE) URL https://github.com/katokohaku/EDAxgboost/blob/master/410_breakdown_feature_response-interaction.Rmd
  • 85. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  • 86. Feature interaction of single observation • Feature contribution can be decomposed as 2-way feature interaction. Feature interaction
  • 87. 2-way featue interaction: Feature contribution for feature contribution Individual explanation Each row shows breakdown of contribution
  • 88. Feature interaction of single observation • xgboost:::predict.xgb.Booster(..., predinteraction = TRUE) xgboost:::predict.xgb.Booster(..., predinteraction = TRUE)
  • 89. Individual explanation Feature contribution for feature contribution of single instance
  • 90. Absolute mean of all interaction • SHAP can be decomposed as 2-way feature interaction. xgboost:::predict.xgb.Booster(..., predinteraction = TRUE)
  • 91.
  • 92. xgboost Original Paper • https://www.kdd.org/kdd2016/subtopic/view/xgboost-a-scalable-tree- boosting-system Tasks, Metrics & other Parameters • https://xgboost.readthedocs.io/en/latest/ For R • http://dmlc.ml/rstats/2016/03/10/xgboost.html • https://xgboost.readthedocs.io/en/latest/R- package/xgboostPresentation.html • https://xgboost.readthedocs.io/en/latest/R-package/discoverYourData.html 解説ブログ記事・スライド(日本語) • http://kefism.hatenablog.com/entry/2017/06/11/182959 • https://speakerdeck.com/hoxomaxwell/dive-into-xgboost References
  • 93. Data & Model explanation Generic interpretability/explainability • Iml book • https://christophm.github.io/interpretable-ml-book/ Exploratory Data Analysis (EDA) • What is EDA? • https://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm • DALEX • Descriptive mAchine Learning EXplanations • https://pbiecek.github.io/DALEX/ • DrWhy • the collection of tools for Explainable AI (XAI) • https://pbiecek.github.io/DALEX/ References

Hinweis der Redaktion

  1. ある予測が得られる過程を協力ゲームと考える: 予測値=「報酬」 各変数=ゲームの「プレーヤー」 各変数の貢献度を、特徴間で「報酬」を公平に配分する 協力した = 元の予測値 協力しない= 変数をシャッフルしたときの予測値 両者の差分をすべての組み合わせで評価する 特徴量を取り除いて学習したモデルの予測値と元のモデルの予測値との差ではないことに注意。
  2. ある予測が得られる過程を協力ゲームと考える: 予測値=「報酬」 各変数=ゲームの「プレーヤー」 各変数の貢献度を、特徴間で「報酬」を公平に配分する 協力した = 元の予測値 協力しない= 変数をシャッフルしたときの予測値 両者の差分をすべての組み合わせで評価する 特徴量を取り除いて学習したモデルの予測値と元のモデルの予測値との差ではないことに注意。