SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
© 2020 KNIME AG. All Right Reserved.
Automating Inferences out of Financial Data
Based on the example of credit card fraud detection
Maarit Widmann maarit.widmann@knime.com
Mathilde Humeau mathilde.humeau@knime.com
© 2020 KNIME AG. All Rights Reserved.
Approaches for a labeled vs. unlabeled dataset
• Situation 1: The dataset has enough fraud examples
– Train a classification model
• Situation 2: The dataset has no (or just a negligible
number of) fraud examples
– Use a neural autoencoder
– Use an outlier detection technique, e.g. isolation forest
2
© 2020 KNIME AG. All Rights Reserved. 3
Situation 1: The dataset has enough fraud examples
© 2020 KNIME AG. All Rights Reserved.
Fraud detection using a labeled dataset
4
Transactions
• Trx 1
• Trx 2
• Trx 3
• Trx 4
• Trx 5
• Trx 6
• …
Model
© 2020 KNIME AG. All Rights Reserved.
KNIME Analytics Platform
• An open source tool for data analysis, manipulation, visualization, and
reporting
• Based on the graphical programming paradigm
• Provides a diverse array of extensions:
– Text Mining
– Network Mining
– Cheminformatics
– Many integrations,
such as Java, R, Python,
Weka, Keras, Plotly, H2O, etc.
5
© 2020 KNIME AG. All Rights Reserved.
Model training with labeled data
Workflow on the KNIME Hub:
https://kni.me/w/gwBpbUtj0awOERjg
© 2020 KNIME AG. All Rights Reserved.
The Final Goal of a Classification Model
7
Contact customers for no reason
vs. accept a higher amount of fraud
© 2020 KNIME AG. All Rights Reserved.
Model training with labeled data
Classification based on the predicted
positive class score
Optimize on Cohen’s
kappa
© 2020 KNIME AG. All Rights Reserved. 10
Classifying Imbalanced Data
© 2020 KNIME AG. All Rights Reserved.
Classifying Imbalanced Data
11
Accuracy = 99.9 %
Accuracy = 95.4 %
x
Fraudulent
Legitimate
% Correctly classified
x
Fraudulent
Legitimate
• Some accuracy metrics are not informative when the target class is imbalanced
y
y
y
99 %
51 %
% Correctly classified
y
98 %
93 %
© 2020 KNIME AG. All Rights Reserved.
• Resample data in order to make the target class distribution balanced
Handling Imbalanced Data
12
x
Fraudulent
Legitimate
y
x
Fraudulent
Legitimate
y
© 2020 KNIME AG. All Rights Reserved.
SMOTE
• Generate events into the
minority class
Undersampling
• Remove a random
sample of the majority
class events
Oversampling
• Duplicate a random
sample of the minority
class events
Resampling Techniques
13
Unbalanced data
x
Fraudulent
Legitimate
y
x
Fraudulent
Legitimate
y
x
Fraudulent
Legitimate
y
x
Fraudulent
Legitimate
y
© 2020 KNIME AG. All Rights Reserved. 14
Situation 2: The dataset has no fraud examples
© 2020 KNIME AG. All Rights Reserved.
Fraud detection using an unlabeled dataset
15
Fault Detection
Fraud Detection
Predictive Maintenance
Intrusion
Medicine
Heart Beat
Sensor Data
AssemblingDetails
Transactions
Networks
Finance
IoT
Weather Information
Fraud Detection
System Health Monitoring
© 2020 KNIME AG. All Rights Reserved.
What is an autoencoder?
16
Input Layer Hidden Layers Output Layer
Input 𝒙 Output 𝒙‘
Feature vector of a
transaction (time,
amount, etc.) Linear transformation
of the feature vector
Reconstructed feature
vector of a transaction
(time, amount, etc.)
Distance between 𝒙 and 𝒙‘
→ fraudulent or legitimate
© 2020 KNIME AG. All Rights Reserved.
Example of an autoencoder
17
Decoder
Training with numbers:
Input Compressed
representation
Reconstructed
input
− −= small = big
Encoder Decoder
Appling the trained autoencoder:
Encoder Decoder
Encoder
© 2020 KNIME AG. All Rights Reserved.
Fraud detection using an autoencoder
18
Workflow on the KNIME Hub:
https://kni.me/w/9qFNMrsuN4PH1hRg
© 2020 KNIME AG. All Rights Reserved.
Fraud detection using isolation forest
19
Workflow on the KNIME Hub:
https://kni.me/w/xSIWSAh_u-fwgi5B
© 2020 KNIME AG. All Rights Reserved.
Isolation forest algorithm
Idea: Outlier can be isolated with less random splits
20
𝑥1
𝑥2
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2
𝑥2 𝑥2 𝑥2
𝑥1 𝑥1
𝑥2
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2
𝑥2 𝑥2 𝑥2
𝑥1 𝑥1
𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2
𝑥2 𝑥2 𝑥2
𝑥2 𝑥2
𝑥1
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2
𝑥2 𝑥2 𝑥2
𝑥1 𝑥1
𝑥2
→ shorter mean length,
i.e. less random splits
© 2020 KNIME AG. All Rights Reserved.
Fraud Detection in Labeled and Non-Labeled Data
• Fraud Detection Using a Neural Autoencoder
as #13 most read article on
• Fraud Detection using Random Forest, Neural Autoencoder, and Isolation
Forest techniques tutorial on
21
Follow the KNIME blog for more articles:
https://www.knime.com/blog
© 2020 KNIME AG. All Rights Reserved.
The KNIME Hub
22
https://hub.knime.com
© 2020 KNIME AG. All Rights Reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION®trademark are used by
KNIME AG under license from KNIME GmbH, and are registered in the United States.
KNIME® is also registered in Germany.
27
Thank You

Weitere ähnliche Inhalte

Ähnlich wie Automating Inferences out of Financial Data

Logistic regression with low event rate (rare events)
Logistic regression with low event rate (rare events)Logistic regression with low event rate (rare events)
Logistic regression with low event rate (rare events)
Tejamoy Ghosh
 

Ähnlich wie Automating Inferences out of Financial Data (11)

Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection Tutorial
 
Logistic regression with low event rate (rare events)
Logistic regression with low event rate (rare events)Logistic regression with low event rate (rare events)
Logistic regression with low event rate (rare events)
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
 
20181129 keynote augmented intelligence and artificial intelligence
20181129 keynote augmented intelligence and artificial intelligence20181129 keynote augmented intelligence and artificial intelligence
20181129 keynote augmented intelligence and artificial intelligence
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
 
Analytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnalytics for large-scale time series and event data
Analytics for large-scale time series and event data
 
IntellectEU - InsurTech Innovation Award 2022
IntellectEU - InsurTech Innovation Award 2022IntellectEU - InsurTech Innovation Award 2022
IntellectEU - InsurTech Innovation Award 2022
 
Automobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud DetectionAutomobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud Detection
 
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptxCredit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
 

Mehr von KNIMESlides

Webinar: Behind the Scenes on Guided Analytics
Webinar: Behind the Scenes on Guided AnalyticsWebinar: Behind the Scenes on Guided Analytics
Webinar: Behind the Scenes on Guided Analytics
KNIMESlides
 

Mehr von KNIMESlides (18)

What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case Studies
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
 
Webinar: Behind the Scenes on Guided Analytics
Webinar: Behind the Scenes on Guided AnalyticsWebinar: Behind the Scenes on Guided Analytics
Webinar: Behind the Scenes on Guided Analytics
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME Server
 
Guided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine LearningGuided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine Learning
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
 
From raw data to deployment
From raw data to deployment From raw data to deployment
From raw data to deployment
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsAdvanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network Mining
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIME
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 

Kürzlich hochgeladen

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Kürzlich hochgeladen (20)

10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 

Automating Inferences out of Financial Data

  • 1. © 2020 KNIME AG. All Right Reserved. Automating Inferences out of Financial Data Based on the example of credit card fraud detection Maarit Widmann maarit.widmann@knime.com Mathilde Humeau mathilde.humeau@knime.com
  • 2. © 2020 KNIME AG. All Rights Reserved. Approaches for a labeled vs. unlabeled dataset • Situation 1: The dataset has enough fraud examples – Train a classification model • Situation 2: The dataset has no (or just a negligible number of) fraud examples – Use a neural autoencoder – Use an outlier detection technique, e.g. isolation forest 2
  • 3. © 2020 KNIME AG. All Rights Reserved. 3 Situation 1: The dataset has enough fraud examples
  • 4. © 2020 KNIME AG. All Rights Reserved. Fraud detection using a labeled dataset 4 Transactions • Trx 1 • Trx 2 • Trx 3 • Trx 4 • Trx 5 • Trx 6 • … Model
  • 5. © 2020 KNIME AG. All Rights Reserved. KNIME Analytics Platform • An open source tool for data analysis, manipulation, visualization, and reporting • Based on the graphical programming paradigm • Provides a diverse array of extensions: – Text Mining – Network Mining – Cheminformatics – Many integrations, such as Java, R, Python, Weka, Keras, Plotly, H2O, etc. 5
  • 6. © 2020 KNIME AG. All Rights Reserved. Model training with labeled data Workflow on the KNIME Hub: https://kni.me/w/gwBpbUtj0awOERjg
  • 7. © 2020 KNIME AG. All Rights Reserved. The Final Goal of a Classification Model 7 Contact customers for no reason vs. accept a higher amount of fraud
  • 8. © 2020 KNIME AG. All Rights Reserved. Model training with labeled data Classification based on the predicted positive class score Optimize on Cohen’s kappa
  • 9. © 2020 KNIME AG. All Rights Reserved. 10 Classifying Imbalanced Data
  • 10. © 2020 KNIME AG. All Rights Reserved. Classifying Imbalanced Data 11 Accuracy = 99.9 % Accuracy = 95.4 % x Fraudulent Legitimate % Correctly classified x Fraudulent Legitimate • Some accuracy metrics are not informative when the target class is imbalanced y y y 99 % 51 % % Correctly classified y 98 % 93 %
  • 11. © 2020 KNIME AG. All Rights Reserved. • Resample data in order to make the target class distribution balanced Handling Imbalanced Data 12 x Fraudulent Legitimate y x Fraudulent Legitimate y
  • 12. © 2020 KNIME AG. All Rights Reserved. SMOTE • Generate events into the minority class Undersampling • Remove a random sample of the majority class events Oversampling • Duplicate a random sample of the minority class events Resampling Techniques 13 Unbalanced data x Fraudulent Legitimate y x Fraudulent Legitimate y x Fraudulent Legitimate y x Fraudulent Legitimate y
  • 13. © 2020 KNIME AG. All Rights Reserved. 14 Situation 2: The dataset has no fraud examples
  • 14. © 2020 KNIME AG. All Rights Reserved. Fraud detection using an unlabeled dataset 15 Fault Detection Fraud Detection Predictive Maintenance Intrusion Medicine Heart Beat Sensor Data AssemblingDetails Transactions Networks Finance IoT Weather Information Fraud Detection System Health Monitoring
  • 15. © 2020 KNIME AG. All Rights Reserved. What is an autoencoder? 16 Input Layer Hidden Layers Output Layer Input 𝒙 Output 𝒙‘ Feature vector of a transaction (time, amount, etc.) Linear transformation of the feature vector Reconstructed feature vector of a transaction (time, amount, etc.) Distance between 𝒙 and 𝒙‘ → fraudulent or legitimate
  • 16. © 2020 KNIME AG. All Rights Reserved. Example of an autoencoder 17 Decoder Training with numbers: Input Compressed representation Reconstructed input − −= small = big Encoder Decoder Appling the trained autoencoder: Encoder Decoder Encoder
  • 17. © 2020 KNIME AG. All Rights Reserved. Fraud detection using an autoencoder 18 Workflow on the KNIME Hub: https://kni.me/w/9qFNMrsuN4PH1hRg
  • 18. © 2020 KNIME AG. All Rights Reserved. Fraud detection using isolation forest 19 Workflow on the KNIME Hub: https://kni.me/w/xSIWSAh_u-fwgi5B
  • 19. © 2020 KNIME AG. All Rights Reserved. Isolation forest algorithm Idea: Outlier can be isolated with less random splits 20 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥2 → shorter mean length, i.e. less random splits
  • 20. © 2020 KNIME AG. All Rights Reserved. Fraud Detection in Labeled and Non-Labeled Data • Fraud Detection Using a Neural Autoencoder as #13 most read article on • Fraud Detection using Random Forest, Neural Autoencoder, and Isolation Forest techniques tutorial on 21 Follow the KNIME blog for more articles: https://www.knime.com/blog
  • 21. © 2020 KNIME AG. All Rights Reserved. The KNIME Hub 22 https://hub.knime.com
  • 22. © 2020 KNIME AG. All Rights Reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION®trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. 27 Thank You