SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
© 2020 KNIME AG. All Right Reserved.
Tutorial on Credit Card Fraud Detection
Maarit Widmann
maarit.widmann@knime.com
© 2020 KNIME AG. All Rights Reserved.
Approaches for a labeled vs. unlabeled dataset
• Situation 1: The dataset has enough fraud examples
– Train a classification model
• Situation 2: The dataset has no (or just a negligible
number of) fraud examples
– Use a neural autoencoder
– Use an outlier detection technique, e.g. isolation forest
2
© 2020 KNIME AG. All Rights Reserved. 3
Situation 1: The dataset has enough fraud examples
© 2020 KNIME AG. All Rights Reserved.
Fraud detection using a labeled dataset
4
Transactions
• Trx 1
• Trx 2
• Trx 3
• Trx 4
• Trx 5
• Trx 6
• …
Model
© 2020 KNIME AG. All Rights Reserved.
KNIME Analytics Platform
• An open source tool for data analysis, manipulation, visualization, and
reporting
• Based on the graphical programming paradigm
• Provides a diverse array of extensions:
– Text Mining
– Network Mining
– Cheminformatics
– Many integrations,
such as Java, R, Python,
Weka, Keras, Plotly, H2O, etc.
5
© 2020 KNIME AG. All Rights Reserved.
Model training with labeled data
Workflow on the KNIME Hub:
https://kni.me/w/gwBpbUtj0awOERjg
© 2020 KNIME AG. All Rights Reserved.
The Final Goal of a Classification Model
7
Contact customers for no reason
vs. accept a higher amount of fraud
© 2020 KNIME AG. All Rights Reserved.
Model training with labeled data
Classification based on the predicted
positive class score
Optimize on Cohen’s
kappa
© 2020 KNIME AG. All Rights Reserved.
• Find the optimal classification threshold based true positive rate and
false positive rate
• Find the threshold according to your final goal of the model
Finding the Optimal Classification Threshold
9
P (fraud)
P (fraud) 0.913
random
Optimal
threshold
False Positive Rate
(Legitimate classified as fraud)
TruePositiveRate
(Fraudclassifiedasfraud)
Tolerate more fraud
and less false alarms
Tolerate less fraud
and more false
alarms
© 2020 KNIME AG. All Rights Reserved. 10
Classifying Imbalanced Data
© 2020 KNIME AG. All Rights Reserved.
Classifying Imbalanced Data
11
Accuracy = 99.9 %
Accuracy = 95.4 %
x
Fraudulent
Legitimate
% Correctly classified
x
Fraudulent
Legitimate
• Some accuracy metrics are not informative when the target class is imbalanced
y
y
y
99 %
51 %
% Correctly classified
y
98 %
93 %
© 2020 KNIME AG. All Rights Reserved.
• Resample data in order to make the target class distribution balanced
Handling Imbalanced Data
12
x
Fraudulent
Legitimate
y
x
Fraudulent
Legitimate
y
© 2020 KNIME AG. All Rights Reserved.
SMOTE
• Generate events into the
minority class
Undersampling
• Remove a random
sample of the majority
class events
Oversampling
• Duplicate a random
sample of the minority
class events
Resampling Techniques
13
Unbalanced data
x
Fraudulent
Legitimate
y
x
Fraudulent
Legitimate
y
x
Fraudulent
Legitimate
y
x
Fraudulent
Legitimate
y
© 2020 KNIME AG. All Rights Reserved. 14
Situation 2: The dataset has no fraud examples
© 2020 KNIME AG. All Rights Reserved.
Fraud detection using an unlabeled dataset
15
Fault Detection
Fraud Detection
Predictive Maintenance
Intrusion
Medicine
Heart Beat
Sensor Data
AssemblingDetails
Transactions
Networks
Finance
IoT
Weather Information
Fraud Detection
System Health Monitoring
© 2020 KNIME AG. All Rights Reserved.
What is an autoencoder?
16
Input Layer Hidden Layers Output Layer
Input 𝒙 Output 𝒙‘
Feature vector of a
transaction (time,
amount, etc.) Linear transformation
of the feature vector
Reconstructed feature
vector of a transaction
(time, amount, etc.)
Distance between 𝒙 and 𝒙‘
→ fraudulent or legitimate
© 2020 KNIME AG. All Rights Reserved.
Example of an autoencoder
17
Decoder
Training with numbers:
Input Compressed
representation
Reconstructed
input
− −= small = big
Encoder Decoder
Appling the trained autoencoder:
Encoder Decoder
Encoder
© 2020 KNIME AG. All Rights Reserved.
Fraud detection using an autoencoder
18
Workflow on the KNIME Hub:
https://kni.me/w/9qFNMrsuN4PH1hRg
© 2020 KNIME AG. All Rights Reserved.
Fraud detection using isolation forest
19
Workflow on the KNIME Hub:
https://kni.me/w/xSIWSAh_u-fwgi5B
© 2020 KNIME AG. All Rights Reserved.
Isolation forest algorithm
Idea: Outlier can be isolated with less random splits
20
𝑥1
𝑥2
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2
𝑥2 𝑥2 𝑥2
𝑥1 𝑥1
𝑥2
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2
𝑥2 𝑥2 𝑥2
𝑥1 𝑥1
𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2
𝑥2 𝑥2 𝑥2
𝑥2 𝑥2
𝑥1
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2
𝑥1 𝑥1 𝑥1 𝑥1 𝑥1
𝑥2 𝑥2 𝑥2 𝑥2
𝑥2 𝑥2 𝑥2
𝑥1 𝑥1
𝑥2
→ shorter mean length,
i.e. less random splits
© 2020 KNIME AG. All Rights Reserved.
Fraud Detection in Labeled and Non-Labeled Data
• Fraud Detection Using a Neural Autoencoder
as #13 most read article on
• Fraud Detection using Random Forest, Neural Autoencoder, and Isolation
Forest techniques tutorial on
21
Follow the KNIME blog for more articles:
https://www.knime.com/blog
© 2020 KNIME AG. All Rights Reserved.
The KNIME Hub
22
https://hub.knime.com
© 2020 KNIME AG. All Rights Reserved.
Next Data Talk Meetup Announcement!
knime.com/events OR on meetup.com !
Thank You!
#KNIME
#BerlinMeetup

Weitere ähnliche Inhalte

Was ist angesagt?

05 exploitation platforms in support of agriculture monitoring erwin goor v...
05 exploitation platforms in support of agriculture monitoring   erwin goor v...05 exploitation platforms in support of agriculture monitoring   erwin goor v...
05 exploitation platforms in support of agriculture monitoring erwin goor v...plan4all
 
Data Science vs. Machine Learning vs. Artificial Intelligience
Data Science vs. Machine Learning vs. Artificial IntelligienceData Science vs. Machine Learning vs. Artificial Intelligience
Data Science vs. Machine Learning vs. Artificial IntelligienceStefan Nica
 
Edge Intelligence: The Convergence of Humans, Things and AI
Edge Intelligence: The Convergence of Humans, Things and AIEdge Intelligence: The Convergence of Humans, Things and AI
Edge Intelligence: The Convergence of Humans, Things and AIThomas Rausch
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsPeter Milne
 
Recommendation engine using Aerospike and/OR MongoDB
Recommendation engine using Aerospike and/OR MongoDBRecommendation engine using Aerospike and/OR MongoDB
Recommendation engine using Aerospike and/OR MongoDBPeter Milne
 
Tim Warr: Cloud Computing and GIS – all hype or something useful?
Tim Warr: Cloud Computing and GIS – all hype or something useful?Tim Warr: Cloud Computing and GIS – all hype or something useful?
Tim Warr: Cloud Computing and GIS – all hype or something useful?AGI Geocommunity
 
The Race To Better Datacenters - Tailormade Colocation by Globalways AG
The Race To Better Datacenters - Tailormade Colocation by Globalways AGThe Race To Better Datacenters - Tailormade Colocation by Globalways AG
The Race To Better Datacenters - Tailormade Colocation by Globalways AGMarkus Binder
 
The Future Of IA
The Future Of IAThe Future Of IA
The Future Of IANick Finck
 
Team 07 find your farm producer
Team 07 find your farm producerTeam 07 find your farm producer
Team 07 find your farm producerplan4all
 

Was ist angesagt? (9)

05 exploitation platforms in support of agriculture monitoring erwin goor v...
05 exploitation platforms in support of agriculture monitoring   erwin goor v...05 exploitation platforms in support of agriculture monitoring   erwin goor v...
05 exploitation platforms in support of agriculture monitoring erwin goor v...
 
Data Science vs. Machine Learning vs. Artificial Intelligience
Data Science vs. Machine Learning vs. Artificial IntelligienceData Science vs. Machine Learning vs. Artificial Intelligience
Data Science vs. Machine Learning vs. Artificial Intelligience
 
Edge Intelligence: The Convergence of Humans, Things and AI
Edge Intelligence: The Convergence of Humans, Things and AIEdge Intelligence: The Convergence of Humans, Things and AI
Edge Intelligence: The Convergence of Humans, Things and AI
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
Recommendation engine using Aerospike and/OR MongoDB
Recommendation engine using Aerospike and/OR MongoDBRecommendation engine using Aerospike and/OR MongoDB
Recommendation engine using Aerospike and/OR MongoDB
 
Tim Warr: Cloud Computing and GIS – all hype or something useful?
Tim Warr: Cloud Computing and GIS – all hype or something useful?Tim Warr: Cloud Computing and GIS – all hype or something useful?
Tim Warr: Cloud Computing and GIS – all hype or something useful?
 
The Race To Better Datacenters - Tailormade Colocation by Globalways AG
The Race To Better Datacenters - Tailormade Colocation by Globalways AGThe Race To Better Datacenters - Tailormade Colocation by Globalways AG
The Race To Better Datacenters - Tailormade Colocation by Globalways AG
 
The Future Of IA
The Future Of IAThe Future Of IA
The Future Of IA
 
Team 07 find your farm producer
Team 07 find your farm producerTeam 07 find your farm producer
Team 07 find your farm producer
 

Ähnlich wie Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020

Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialKNIMESlides
 
Oleg Bondarenko - Threat Intelligence particularities world-wide. Real life u...
Oleg Bondarenko - Threat Intelligence particularities world-wide. Real life u...Oleg Bondarenko - Threat Intelligence particularities world-wide. Real life u...
Oleg Bondarenko - Threat Intelligence particularities world-wide. Real life u...NoNameCon
 
Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareKNIMESlides
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...Dataconomy Media
 
20181129 keynote augmented intelligence and artificial intelligence
20181129 keynote augmented intelligence and artificial intelligence20181129 keynote augmented intelligence and artificial intelligence
20181129 keynote augmented intelligence and artificial intelligenceSantiago Cabrera-Naranjo
 
Digital Crime Scene Investigation
Digital Crime Scene InvestigationDigital Crime Scene Investigation
Digital Crime Scene InvestigationInspirient
 
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...DATAVERSITY
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNIRJET Journal
 
Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...
Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...
Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...TigerGraph
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabilisteCdiscount
 
Automobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud DetectionAutomobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud DetectionIRJET Journal
 
MongoDB World 2019: Turkeys vs. Swans: Building Antifragile IT Systems for Di...
MongoDB World 2019: Turkeys vs. Swans: Building Antifragile IT Systems for Di...MongoDB World 2019: Turkeys vs. Swans: Building Antifragile IT Systems for Di...
MongoDB World 2019: Turkeys vs. Swans: Building Antifragile IT Systems for Di...MongoDB
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceIRJET Journal
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceIRJET Journal
 
Journey to the Center of Security Operations
Journey to the Center of Security OperationsJourney to the Center of Security Operations
Journey to the Center of Security Operations♟Sergej Epp
 
Webinar vogel it_so geht industrial edge analytics mittels machine learning_1...
Webinar vogel it_so geht industrial edge analytics mittels machine learning_1...Webinar vogel it_so geht industrial edge analytics mittels machine learning_1...
Webinar vogel it_so geht industrial edge analytics mittels machine learning_1...Peter Seeberg
 
IntellectEU - InsurTech Innovation Award 2022
IntellectEU - InsurTech Innovation Award 2022IntellectEU - InsurTech Innovation Award 2022
IntellectEU - InsurTech Innovation Award 2022The Digital Insurer
 

Ähnlich wie Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020 (20)

Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection Tutorial
 
Oleg Bondarenko - Threat Intelligence particularities world-wide. Real life u...
Oleg Bondarenko - Threat Intelligence particularities world-wide. Real life u...Oleg Bondarenko - Threat Intelligence particularities world-wide. Real life u...
Oleg Bondarenko - Threat Intelligence particularities world-wide. Real life u...
 
Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME Software
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
 
20181129 keynote augmented intelligence and artificial intelligence
20181129 keynote augmented intelligence and artificial intelligence20181129 keynote augmented intelligence and artificial intelligence
20181129 keynote augmented intelligence and artificial intelligence
 
Digital Crime Scene Investigation
Digital Crime Scene InvestigationDigital Crime Scene Investigation
Digital Crime Scene Investigation
 
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
 
Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...
Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...
Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste
 
Automobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud DetectionAutomobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud Detection
 
MongoDB World 2019: Turkeys vs. Swans: Building Antifragile IT Systems for Di...
MongoDB World 2019: Turkeys vs. Swans: Building Antifragile IT Systems for Di...MongoDB World 2019: Turkeys vs. Swans: Building Antifragile IT Systems for Di...
MongoDB World 2019: Turkeys vs. Swans: Building Antifragile IT Systems for Di...
 
Your Flight is Boarding Now!
Your Flight is Boarding Now!Your Flight is Boarding Now!
Your Flight is Boarding Now!
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
 
Journey to the Center of Security Operations
Journey to the Center of Security OperationsJourney to the Center of Security Operations
Journey to the Center of Security Operations
 
Webinar vogel it_so geht industrial edge analytics mittels machine learning_1...
Webinar vogel it_so geht industrial edge analytics mittels machine learning_1...Webinar vogel it_so geht industrial edge analytics mittels machine learning_1...
Webinar vogel it_so geht industrial edge analytics mittels machine learning_1...
 
IntellectEU - InsurTech Innovation Award 2022
IntellectEU - InsurTech Innovation Award 2022IntellectEU - InsurTech Innovation Award 2022
IntellectEU - InsurTech Innovation Award 2022
 

Mehr von KNIMESlides

What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1KNIMESlides
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesKNIMESlides
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9KNIMESlides
 
Webinar: Behind the Scenes on Guided Analytics
Webinar: Behind the Scenes on Guided AnalyticsWebinar: Behind the Scenes on Guided Analytics
Webinar: Behind the Scenes on Guided AnalyticsKNIMESlides
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerKNIMESlides
 
Guided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine LearningGuided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine LearningKNIMESlides
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformKNIMESlides
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformKNIMESlides
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedKNIMESlides
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to DeploymentKNIMESlides
 
From raw data to deployment
From raw data to deployment From raw data to deployment
From raw data to deployment KNIMESlides
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsAdvanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsKNIMESlides
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKNIMESlides
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIMEKNIMESlides
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!KNIMESlides
 

Mehr von KNIMESlides (17)

What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case Studies
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
 
Webinar: Behind the Scenes on Guided Analytics
Webinar: Behind the Scenes on Guided AnalyticsWebinar: Behind the Scenes on Guided Analytics
Webinar: Behind the Scenes on Guided Analytics
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME Server
 
Guided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine LearningGuided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine Learning
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
 
From raw data to deployment
From raw data to deployment From raw data to deployment
From raw data to deployment
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsAdvanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network Mining
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIME
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 

Kürzlich hochgeladen

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntelliSource Technologies
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Projectwajrcs
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9Jürgen Gutsch
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 

Kürzlich hochgeladen (20)

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptx
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Sustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire ThornewillSustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire Thornewill
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
Program with GUTs
Program with GUTsProgram with GUTs
Program with GUTs
 

Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020

  • 1. © 2020 KNIME AG. All Right Reserved. Tutorial on Credit Card Fraud Detection Maarit Widmann maarit.widmann@knime.com
  • 2. © 2020 KNIME AG. All Rights Reserved. Approaches for a labeled vs. unlabeled dataset • Situation 1: The dataset has enough fraud examples – Train a classification model • Situation 2: The dataset has no (or just a negligible number of) fraud examples – Use a neural autoencoder – Use an outlier detection technique, e.g. isolation forest 2
  • 3. © 2020 KNIME AG. All Rights Reserved. 3 Situation 1: The dataset has enough fraud examples
  • 4. © 2020 KNIME AG. All Rights Reserved. Fraud detection using a labeled dataset 4 Transactions • Trx 1 • Trx 2 • Trx 3 • Trx 4 • Trx 5 • Trx 6 • … Model
  • 5. © 2020 KNIME AG. All Rights Reserved. KNIME Analytics Platform • An open source tool for data analysis, manipulation, visualization, and reporting • Based on the graphical programming paradigm • Provides a diverse array of extensions: – Text Mining – Network Mining – Cheminformatics – Many integrations, such as Java, R, Python, Weka, Keras, Plotly, H2O, etc. 5
  • 6. © 2020 KNIME AG. All Rights Reserved. Model training with labeled data Workflow on the KNIME Hub: https://kni.me/w/gwBpbUtj0awOERjg
  • 7. © 2020 KNIME AG. All Rights Reserved. The Final Goal of a Classification Model 7 Contact customers for no reason vs. accept a higher amount of fraud
  • 8. © 2020 KNIME AG. All Rights Reserved. Model training with labeled data Classification based on the predicted positive class score Optimize on Cohen’s kappa
  • 9. © 2020 KNIME AG. All Rights Reserved. • Find the optimal classification threshold based true positive rate and false positive rate • Find the threshold according to your final goal of the model Finding the Optimal Classification Threshold 9 P (fraud) P (fraud) 0.913 random Optimal threshold False Positive Rate (Legitimate classified as fraud) TruePositiveRate (Fraudclassifiedasfraud) Tolerate more fraud and less false alarms Tolerate less fraud and more false alarms
  • 10. © 2020 KNIME AG. All Rights Reserved. 10 Classifying Imbalanced Data
  • 11. © 2020 KNIME AG. All Rights Reserved. Classifying Imbalanced Data 11 Accuracy = 99.9 % Accuracy = 95.4 % x Fraudulent Legitimate % Correctly classified x Fraudulent Legitimate • Some accuracy metrics are not informative when the target class is imbalanced y y y 99 % 51 % % Correctly classified y 98 % 93 %
  • 12. © 2020 KNIME AG. All Rights Reserved. • Resample data in order to make the target class distribution balanced Handling Imbalanced Data 12 x Fraudulent Legitimate y x Fraudulent Legitimate y
  • 13. © 2020 KNIME AG. All Rights Reserved. SMOTE • Generate events into the minority class Undersampling • Remove a random sample of the majority class events Oversampling • Duplicate a random sample of the minority class events Resampling Techniques 13 Unbalanced data x Fraudulent Legitimate y x Fraudulent Legitimate y x Fraudulent Legitimate y x Fraudulent Legitimate y
  • 14. © 2020 KNIME AG. All Rights Reserved. 14 Situation 2: The dataset has no fraud examples
  • 15. © 2020 KNIME AG. All Rights Reserved. Fraud detection using an unlabeled dataset 15 Fault Detection Fraud Detection Predictive Maintenance Intrusion Medicine Heart Beat Sensor Data AssemblingDetails Transactions Networks Finance IoT Weather Information Fraud Detection System Health Monitoring
  • 16. © 2020 KNIME AG. All Rights Reserved. What is an autoencoder? 16 Input Layer Hidden Layers Output Layer Input 𝒙 Output 𝒙‘ Feature vector of a transaction (time, amount, etc.) Linear transformation of the feature vector Reconstructed feature vector of a transaction (time, amount, etc.) Distance between 𝒙 and 𝒙‘ → fraudulent or legitimate
  • 17. © 2020 KNIME AG. All Rights Reserved. Example of an autoencoder 17 Decoder Training with numbers: Input Compressed representation Reconstructed input − −= small = big Encoder Decoder Appling the trained autoencoder: Encoder Decoder Encoder
  • 18. © 2020 KNIME AG. All Rights Reserved. Fraud detection using an autoencoder 18 Workflow on the KNIME Hub: https://kni.me/w/9qFNMrsuN4PH1hRg
  • 19. © 2020 KNIME AG. All Rights Reserved. Fraud detection using isolation forest 19 Workflow on the KNIME Hub: https://kni.me/w/xSIWSAh_u-fwgi5B
  • 20. © 2020 KNIME AG. All Rights Reserved. Isolation forest algorithm Idea: Outlier can be isolated with less random splits 20 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥1 𝑥1 𝑥2 → shorter mean length, i.e. less random splits
  • 21. © 2020 KNIME AG. All Rights Reserved. Fraud Detection in Labeled and Non-Labeled Data • Fraud Detection Using a Neural Autoencoder as #13 most read article on • Fraud Detection using Random Forest, Neural Autoencoder, and Isolation Forest techniques tutorial on 21 Follow the KNIME blog for more articles: https://www.knime.com/blog
  • 22. © 2020 KNIME AG. All Rights Reserved. The KNIME Hub 22 https://hub.knime.com
  • 23. © 2020 KNIME AG. All Rights Reserved. Next Data Talk Meetup Announcement! knime.com/events OR on meetup.com ! Thank You! #KNIME #BerlinMeetup