SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Anomaly Detection
What are anomalies/outliers? ,[object Object],Applications:   Credit card fraud detection,  telecommunication fraud detection,  network intrusion detection,  fault detection
Variants of Anomaly/Outlier Detection Problems Given a database D, find all the data points x D with anomaly scores greater than some threshold t Given a database D, find all the data points x D having the top-n largest anomaly scores f(x) Given a database D, containing mostly normal (but unlabeled) data points, and a test point x, compute the anomaly score of x with respect to D
Anomaly Detection Challenges How many outliers are there in the data? Method is unsupervised  Validation can be quite challenging (just like for clustering) Finding needle in a haystack Working assumption: There are considerably more “normal” observations than “abnormal” observations (outliers/anomalies) in the data
Anomaly Detection Schemes  General Steps: Build a profile of the “normal” behavior Profile can be patterns or summary statistics for the overall population Use the “normal” profile to detect anomalies Anomalies are observations whose characteristicsdiffer significantly from the normal profile
Types of anomaly detection schemes ,[object Object]
Distance-based
Model-based,[object Object]
Grubbs’ Test Detect outliers in univariate data Assume data comes from normal distribution Detects one outlier at a time, remove the outlier, and repeat H0: There is no outlier in data HA: There is at least one outlier Grubbs’ test statistic:  Reject H0 if:
Statistical-based – Likelihood Approach Assume the data set D contains samples from a mixture of two probability distributions:  M (majority distribution)  A (anomalous distribution) General Approach: Initially, assume all the data points belong to M Let Lt(D) be the log likelihood of D at time t
Contd… For each point xtthat belongs to M, move it to A  Let Lt+1 (D) be the new log likelihood.  Compute the difference,  = Lt(D) – Lt+1 (D)  If  > c  (some threshold), then xt is declared as an anomaly and moved permanently from M to A
Limitations of Statistical Approaches  Most of the tests are for a single attribute In many cases, data distribution may not be known For high dimensional data, it may be difficult to estimate the true distribution
Distance-based Approaches Data is represented as a vector of features Three major approaches Nearest-neighbor based Density based Clustering based
Nearest-Neighbor Based Approach Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance D The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the k nearest neighbors is greatest
Density-based: LOF approach For each point, compute the density of its local neighborhood Compute local outlier factor (LOF) of a sample p as the average of the ratios of the density of sample p and the density of its nearest neighbors Outliers are points with largest LOF value
Clustering-Based Basic idea: Cluster the data into groups of different density Choose points in small cluster as candidate outliers Compute the distance between candidate points and non-candidate clusters.  If candidate points are far from all other non-candidate points, they are outliers
Pros and Cons Advantages:  No need to be supervised  Easily adaptable to on-line / incremental mode suitable for anomaly detection from temporal data
Pros and Cons Drawbacks  Computationally expensive  Using indexing structures (k-d tree, R* tree) may alleviate this problem  If normal points do not create any clusters the techniques may fail  In high dimensional spaces, datais sparse and distances between any two data records may become quite similar.  Clustering algorithms may not give any meaningful clusters
conclusion Anomaly detection in data mining is dealt in detail in this presentation Types of anomaly detection and their merits and demerits are briefly discussed.

Weitere ähnliche Inhalte

Was ist angesagt?

Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesHumberto Marchezi
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsManojit Nandi
 
Intrusion Detection with Neural Networks
Intrusion Detection with Neural NetworksIntrusion Detection with Neural Networks
Intrusion Detection with Neural Networksantoniomorancardenas
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
 
Deep learning approach for network intrusion detection system
Deep learning approach for network intrusion detection systemDeep learning approach for network intrusion detection system
Deep learning approach for network intrusion detection systemAvinash Kumar
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slidesQuantUniversity
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introductionDaeJin Kim
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnagramfort
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKoundinya Desiraju
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Anomaly Detection in DataMining
Anomaly Detection in DataMiningAnomaly Detection in DataMining
Anomaly Detection in DataMiningBilalAbbasAwan
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learningSecurity Bootcamp
 
Introduction To Intrusion Detection Systems
Introduction To Intrusion Detection SystemsIntroduction To Intrusion Detection Systems
Introduction To Intrusion Detection SystemsPaul Green
 
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...Jowin John Chemban
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 

Was ist angesagt? (20)

Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time Series
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
Intrusion Detection with Neural Networks
Intrusion Detection with Neural NetworksIntrusion Detection with Neural Networks
Intrusion Detection with Neural Networks
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
Deep learning approach for network intrusion detection system
Deep learning approach for network intrusion detection systemDeep learning approach for network intrusion detection system
Deep learning approach for network intrusion detection system
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learn
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Anomaly Detection in DataMining
Anomaly Detection in DataMiningAnomaly Detection in DataMining
Anomaly Detection in DataMining
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
 
Introduction To Intrusion Detection Systems
Introduction To Intrusion Detection SystemsIntroduction To Intrusion Detection Systems
Introduction To Intrusion Detection Systems
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 

Andere mochten auch

Open Source Private Cloud Management with OpenStack and Security Evaluation w...
Open Source Private Cloud Management with OpenStack and Security Evaluation w...Open Source Private Cloud Management with OpenStack and Security Evaluation w...
Open Source Private Cloud Management with OpenStack and Security Evaluation w...XHANI TRUNGU
 
Architecture Challenges In Cloud Computing
Architecture Challenges In Cloud ComputingArchitecture Challenges In Cloud Computing
Architecture Challenges In Cloud ComputingIndicThreads
 
Analysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data MiningAnalysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data MiningPritesh Ranjan
 
Intrusion Detection in the Cloud (SEC402) | AWS re:Invent 2013
Intrusion Detection in the Cloud (SEC402) | AWS re:Invent 2013Intrusion Detection in the Cloud (SEC402) | AWS re:Invent 2013
Intrusion Detection in the Cloud (SEC402) | AWS re:Invent 2013Amazon Web Services
 
Leverage points for wicked problems
Leverage points for wicked problemsLeverage points for wicked problems
Leverage points for wicked problemsDemos Helsinki
 
Herding Cats: Innovation Management in an Unpredictable World
Herding Cats: Innovation Management in an Unpredictable WorldHerding Cats: Innovation Management in an Unpredictable World
Herding Cats: Innovation Management in an Unpredictable WorldMichael von Kutzschenbach
 
Network Functions Virtualization – Our Strategy
Network Functions Virtualization – Our StrategyNetwork Functions Virtualization – Our Strategy
Network Functions Virtualization – Our StrategyADVA
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...Ahmed kasim
 
Examples for leverage points
Examples for leverage pointsExamples for leverage points
Examples for leverage pointsGeorges Grinstein
 
Module 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingModule 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingThink2Impact
 
Module 4 Leverage points and systemic interventions
Module 4 Leverage points and systemic interventionsModule 4 Leverage points and systemic interventions
Module 4 Leverage points and systemic interventionsThink2Impact
 
Module 3 Systems archetypes
Module 3 Systems archetypesModule 3 Systems archetypes
Module 3 Systems archetypesThink2Impact
 
Design Tools for Systems Thinking
Design Tools for Systems ThinkingDesign Tools for Systems Thinking
Design Tools for Systems ThinkingPeter Vermaercke
 
FIne Grain Multithreading
FIne Grain MultithreadingFIne Grain Multithreading
FIne Grain MultithreadingDharmesh Tank
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAdam Gibson
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learningAdam Gibson
 
Update Your Disaster Recovery Plans with Virtualization
Update Your Disaster Recovery Plans with VirtualizationUpdate Your Disaster Recovery Plans with Virtualization
Update Your Disaster Recovery Plans with VirtualizationJason Dea
 
Disaster recovery and the cloud
Disaster recovery and the cloudDisaster recovery and the cloud
Disaster recovery and the cloudJason Dea
 
Intrusion detection system ppt
Intrusion detection system pptIntrusion detection system ppt
Intrusion detection system pptSheetal Verma
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple pptAgarwaljay
 

Andere mochten auch (20)

Open Source Private Cloud Management with OpenStack and Security Evaluation w...
Open Source Private Cloud Management with OpenStack and Security Evaluation w...Open Source Private Cloud Management with OpenStack and Security Evaluation w...
Open Source Private Cloud Management with OpenStack and Security Evaluation w...
 
Architecture Challenges In Cloud Computing
Architecture Challenges In Cloud ComputingArchitecture Challenges In Cloud Computing
Architecture Challenges In Cloud Computing
 
Analysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data MiningAnalysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data Mining
 
Intrusion Detection in the Cloud (SEC402) | AWS re:Invent 2013
Intrusion Detection in the Cloud (SEC402) | AWS re:Invent 2013Intrusion Detection in the Cloud (SEC402) | AWS re:Invent 2013
Intrusion Detection in the Cloud (SEC402) | AWS re:Invent 2013
 
Leverage points for wicked problems
Leverage points for wicked problemsLeverage points for wicked problems
Leverage points for wicked problems
 
Herding Cats: Innovation Management in an Unpredictable World
Herding Cats: Innovation Management in an Unpredictable WorldHerding Cats: Innovation Management in an Unpredictable World
Herding Cats: Innovation Management in an Unpredictable World
 
Network Functions Virtualization – Our Strategy
Network Functions Virtualization – Our StrategyNetwork Functions Virtualization – Our Strategy
Network Functions Virtualization – Our Strategy
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
 
Examples for leverage points
Examples for leverage pointsExamples for leverage points
Examples for leverage points
 
Module 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingModule 5 Bayesian belief network modelling
Module 5 Bayesian belief network modelling
 
Module 4 Leverage points and systemic interventions
Module 4 Leverage points and systemic interventionsModule 4 Leverage points and systemic interventions
Module 4 Leverage points and systemic interventions
 
Module 3 Systems archetypes
Module 3 Systems archetypesModule 3 Systems archetypes
Module 3 Systems archetypes
 
Design Tools for Systems Thinking
Design Tools for Systems ThinkingDesign Tools for Systems Thinking
Design Tools for Systems Thinking
 
FIne Grain Multithreading
FIne Grain MultithreadingFIne Grain Multithreading
FIne Grain Multithreading
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Update Your Disaster Recovery Plans with Virtualization
Update Your Disaster Recovery Plans with VirtualizationUpdate Your Disaster Recovery Plans with Virtualization
Update Your Disaster Recovery Plans with Virtualization
 
Disaster recovery and the cloud
Disaster recovery and the cloudDisaster recovery and the cloud
Disaster recovery and the cloud
 
Intrusion detection system ppt
Intrusion detection system pptIntrusion detection system ppt
Intrusion detection system ppt
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple ppt
 

Ähnlich wie Anomaly Detection

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detectionguest76d673
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxrandyburney60861
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptSubrata Kumer Paul
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionKhalid Elshafie
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysisKrish_ver2
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
 

Ähnlich wie Anomaly Detection (20)

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
 
03 presentation-bothiesson
03 presentation-bothiesson03 presentation-bothiesson
03 presentation-bothiesson
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
12 outlier
12 outlier12 outlier
12 outlier
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
 
Chapter 12 outlier
Chapter 12 outlierChapter 12 outlier
Chapter 12 outlier
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
angle based outlier de
angle based outlier deangle based outlier de
angle based outlier de
 
Kdd08 abod
Kdd08 abodKdd08 abod
Kdd08 abod
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 

Mehr von guest0edcaf

Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introductionguest0edcaf
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extractionguest0edcaf
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysisguest0edcaf
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 

Mehr von guest0edcaf (6)

Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 

Kürzlich hochgeladen

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Anomaly Detection

  • 2.
  • 3. Variants of Anomaly/Outlier Detection Problems Given a database D, find all the data points x D with anomaly scores greater than some threshold t Given a database D, find all the data points x D having the top-n largest anomaly scores f(x) Given a database D, containing mostly normal (but unlabeled) data points, and a test point x, compute the anomaly score of x with respect to D
  • 4. Anomaly Detection Challenges How many outliers are there in the data? Method is unsupervised Validation can be quite challenging (just like for clustering) Finding needle in a haystack Working assumption: There are considerably more “normal” observations than “abnormal” observations (outliers/anomalies) in the data
  • 5. Anomaly Detection Schemes General Steps: Build a profile of the “normal” behavior Profile can be patterns or summary statistics for the overall population Use the “normal” profile to detect anomalies Anomalies are observations whose characteristicsdiffer significantly from the normal profile
  • 6.
  • 8.
  • 9. Grubbs’ Test Detect outliers in univariate data Assume data comes from normal distribution Detects one outlier at a time, remove the outlier, and repeat H0: There is no outlier in data HA: There is at least one outlier Grubbs’ test statistic: Reject H0 if:
  • 10. Statistical-based – Likelihood Approach Assume the data set D contains samples from a mixture of two probability distributions: M (majority distribution) A (anomalous distribution) General Approach: Initially, assume all the data points belong to M Let Lt(D) be the log likelihood of D at time t
  • 11. Contd… For each point xtthat belongs to M, move it to A Let Lt+1 (D) be the new log likelihood. Compute the difference,  = Lt(D) – Lt+1 (D) If  > c (some threshold), then xt is declared as an anomaly and moved permanently from M to A
  • 12. Limitations of Statistical Approaches Most of the tests are for a single attribute In many cases, data distribution may not be known For high dimensional data, it may be difficult to estimate the true distribution
  • 13. Distance-based Approaches Data is represented as a vector of features Three major approaches Nearest-neighbor based Density based Clustering based
  • 14. Nearest-Neighbor Based Approach Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance D The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the k nearest neighbors is greatest
  • 15. Density-based: LOF approach For each point, compute the density of its local neighborhood Compute local outlier factor (LOF) of a sample p as the average of the ratios of the density of sample p and the density of its nearest neighbors Outliers are points with largest LOF value
  • 16. Clustering-Based Basic idea: Cluster the data into groups of different density Choose points in small cluster as candidate outliers Compute the distance between candidate points and non-candidate clusters. If candidate points are far from all other non-candidate points, they are outliers
  • 17. Pros and Cons Advantages: No need to be supervised Easily adaptable to on-line / incremental mode suitable for anomaly detection from temporal data
  • 18. Pros and Cons Drawbacks Computationally expensive Using indexing structures (k-d tree, R* tree) may alleviate this problem If normal points do not create any clusters the techniques may fail In high dimensional spaces, datais sparse and distances between any two data records may become quite similar. Clustering algorithms may not give any meaningful clusters
  • 19. conclusion Anomaly detection in data mining is dealt in detail in this presentation Types of anomaly detection and their merits and demerits are briefly discussed.
  • 20. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net