SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Kamal Gupta Roy
Kamal Gupta Roy
kNN Algorithm
Kamal Gupta Roy
What would be the color of new circles?
| 2
A
B
C
D
Circle Color
A Red
B Brown
C Blue
D ???
Kamal Gupta Roy
| 3
Kamal Gupta Roy
Kamal Gupta Roy
Its all about how
far and how
close you are
from others?
| 4
A
B
C
D
Finding Nearest
Neighbors, who
they are?
Kamal Gupta Roy
Kamal Gupta Roy
Different Names for the same algorithm
| 5
Memory
based
Reasoning
Example
based
Reasoning
Instance
based
Learning
Lazy Learning
K-nearest neighbor
(KNN)
Kamal Gupta Roy
Kamal Gupta Roy
What is k in kNN?
| 6
A
Dotted Circle Decision k
Purple Red 1
Green Red 3
Orange Blue 13
Kamal Gupta Roy
Kamal Gupta Roy
Choosing the value of k?
| 7
Neighborhood may
include points from
other classes
Sensitive to
noise points
k is
too
small
k is too
large
Kamal Gupta Roy
Kamal Gupta Roy
Optimal k
| 8
Kamal Gupta Roy
Distance
| 9
Kamal Gupta Roy
Kamal Gupta Roy
Yellow path
Red path
Blue path
Green path
| 10
A
B
Kamal Gupta Roy
Kamal Gupta Roy
Manhattan Distance
|
11
• The distance between two points
measured along axes at right angles.
• In a plane with p1 at (x1, y1) and p2 at (x2,
y2), it is |x1 - x2| + |y1 - y2|
Kamal Gupta Roy
Kamal Gupta Roy
Euclidean Distance
Kamal Gupta Roy
Kamal Gupta Roy
Manhattan vs Euclidien
| 13
Kamal Gupta Roy
Kamal Gupta Roy
New
Value =
46
| 14
Age Default distance square(distance) d
25 Y -21 441 21
35 Y -11 121 11
45 Y -1 1 1
20 Y -26 676 26
35 Y -11 121 11
52 Y 6 36 6
23 Y -23 529 23
40 N -6 36 6
60 N 14 196 14
48 N 2 4 2
33 N -13 169 13
27 N -19 361 19
37 N -9 81 9
Default =
Yes
Kamal Gupta Roy
Kamal Gupta Roy
Exercise
| 15
Age Loan Default
25 40,000 Y
35 60,000 Y
45 80,000 Y
20 20,000 Y
35 120,000 Y
52 38,000 Y
23 85,000 Y
40 62,000 N
60 98,000 N
48 100,000 N
33 110,000 N
27 130,000 N
37 90,000 N
Predict default for a customer
with age = 46 and applied loan for
128,000
Kamal Gupta Roy
• Age = 46
• loan=128,000
| 16
• Default = No
Age Loan Default age dist sq loan dist sq d
25 40,000 Y 441 7,744,000,000 88,000
35 60,000 Y 121 4,624,000,000 68,000
45 80,000 Y 1 2,304,000,000 48,000
20 20,000 Y 676 11,664,000,000 108,000
35 120,000 Y 121 64,000,000 8,000
52 38,000 Y 36 8,100,000,000 90,000
23 85,000 Y 529 1,849,000,000 43,000
40 62,000 N 36 4,356,000,000 66,000
60 98,000 N 196 900,000,000 30,000
48 100,000 N 4 784,000,000 28,000
33 110,000 N 169 324,000,000 18,000
27 130,000 N 361 4,000,000 2,000
37 90,000 N 81 1,444,000,000 38,000
Kamal Gupta Roy
• Age = 46
• loan=128 K
| 17
• Default = Yes
Age Loan Default age dist sq loan dist sq d
25 40 Y 441 7,744 90
35 60 Y 121 4,624 69
45 80 Y 1 2,304 48
20 20 Y 676 11,664 111
35 120 Y 121 64 14
52 38 Y 36 8,100 90
23 85 Y 529 1,849 49
40 62 N 36 4,356 66
60 98 N 196 900 33
48 100 N 4 784 28
33 110 N 169 324 22
27 130 N 361 4 19
37 90 N 81 1,444 39
Kamal Gupta Roy
Kamal Gupta Roy
Feature
Scaling
| 18
Kamal Gupta Roy
Why scaling?
Scaling issues – Attributes may have to be scaled to
prevent distance measures from being dominated by one
of the attributes
Example:
height of a person may vary from 1.5m to 1.8m
weight of a person may vary from 45 KG to 100KG
income of a person may vary from Rs10K to Rs 5 lakh
| 19
Kamal Gupta Roy
Standardization
Also called as Z-score normalization
Mean is zero
Standard deviation 1
| 20
Kamal Gupta Roy
Max-Min
Normalization
Also called as Min-Max scaling
normalization
Minimum is zero
Maximum is 1
| 21
Kamal Gupta Roy
| 22
Raw Data
Z Normalized
Max-Min
Kamal Gupta Roy
• Age = 46; norm age: 0.65
• loan=128,000; norm age: 0.98
| 23
• Default = No
Age Loan Default norm age norm loan age dist sq loan dist sq d
25 40,000 Y 0.13 0.18 0.28 0.64 0.96
35 60,000 Y 0.38 0.36 0.08 0.38 0.68
45 80,000 Y 0.63 0.55 0.00 0.19 0.44
20 20,000 Y - - 0.42 0.96 1.18
35 120,000 Y 0.38 0.91 0.08 0.01 0.28
52 38,000 Y 0.80 0.16 0.02 0.67 0.83
23 85,000 Y 0.08 0.59 0.33 0.15 0.70
40 62,000 N 0.50 0.38 0.02 0.36 0.62
60 98,000 N 1.00 0.71 0.12 0.07 0.44
48 100,000 N 0.70 0.73 0.00 0.06 0.26
33 110,000 N 0.33 0.82 0.11 0.03 0.36
27 130,000 N 0.18 1.00 0.23 0.00 0.48
37 90,000 N 0.43 0.64 0.05 0.12 0.41
Kamal Gupta Roy
| 24
Age Loan Default age dist sq loan dist sq d
25 40,000 Y 441 7,744,000,000 88,000
35 60,000 Y 121 4,624,000,000 68,000
45 80,000 Y 1 2,304,000,000 48,000
20 20,000 Y 676 11,664,000,000 108,000
35 120,000 Y 121 64,000,000 8,000
52 38,000 Y 36 8,100,000,000 90,000
23 85,000 Y 529 1,849,000,000 43,000
40 62,000 N 36 4,356,000,000 66,000
60 98,000 N 196 900,000,000 30,000
48 100,000 N 4 784,000,000 28,000
33 110,000 N 169 324,000,000 18,000
27 130,000 N 361 4,000,000 2,000
37 90,000 N 81 1,444,000,000 38,000
Age Loan Default
age dist
sq loan dist sq d
25 40 Y 441 7,744 90
35 60 Y 121 4,624 69
45 80 Y 1 2,304 48
20 20 Y 676 11,664 111
35 120 Y 121 64 14
52 38 Y 36 8,100 90
23 85 Y 529 1,849 49
40 62 N 36 4,356 66
60 98 N 196 900 33
48 100 N 4 784 28
33 110 N 169 324 22
27 130 N 361 4 19
37 90 N 81 1,444 39
Age Loan Default norm age norm loan age dist sq loan dist sq d
25 40,000 Y 0.13 0.18 0.28 0.64 0.96
35 60,000 Y 0.38 0.36 0.08 0.38 0.68
45 80,000 Y 0.63 0.55 0.00 0.19 0.44
20 20,000 Y - - 0.42 0.96 1.18
35 120,000 Y 0.38 0.91 0.08 0.01 0.28
52 38,000 Y 0.80 0.16 0.02 0.67 0.83
23 85,000 Y 0.08 0.59 0.33 0.15 0.70
40 62,000 N 0.50 0.38 0.02 0.36 0.62
60 98,000 N 1.00 0.71 0.12 0.07 0.44
48 100,000 N 0.70 0.73 0.00 0.06 0.26
33 110,000 N 0.33 0.82 0.11 0.03 0.36
27 130,000 N 0.18 1.00 0.23 0.00 0.48
37 90,000 N 0.43 0.64 0.05 0.12 0.41
Kamal Gupta Roy
CONFUSION MATRIX
| 25
Kamal Gupta Roy
Kamal Gupta Roy
Hiring Process Example
| 26
Matrix
Predicted
Good
Predicted
Bad
Actual
Good
Hired Good
Candidate
Rejected
Good
Candidate
Actual
Bad
Hired Bad
Candidate
Rejected
Bad
Candidate
TP
TN
FN
FP
Confusion
Matrix
Kamal Gupta Roy
Confusion Matrix
Predicted
Yes
Predicted
No
Actual
Yes
TP FN
Actual
No
FP TN
| 27
Accuracy = (TP + TN)/ (TP + FN + FP + TN)
Recall = TP / (TP + FN)
Precision = TP / (TP + FP)
Type 1 Error
Type 2 Error
Kamal Gupta Roy
Kamal Gupta Roy
Precision vs
Recall
| 28
Kamal Gupta Roy
Kamal Gupta Roy
Pregnancy Test
| 29
Predicted
Pregnant
Predicted
Not Pregnant
Actual
Pregnant
TP FN
Actual
Not Pregnant
FP TN
TN
TP
FP
FN
Kamal Gupta Roy
Sensitivity & Specificity
Predicted
Yes
Predicted
No
Actual
Yes
TP FN
Actual
No
FP TN
| 30
True Negative Rate, Specificity = TN / (TN+FP)
False Positive Rate = FP / (TN+FP)
True Positive Rate, Sensitivity = TP / (TP + FN)
False Negative Rate = FN / (TN+FP)

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighborUjjawal
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]AAKANKSHA JAIN
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 

Was ist angesagt? (20)

K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
KNN
KNN KNN
KNN
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
KNN
KNNKNN
KNN
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 

Ähnlich wie Knn Algorithm

Ähnlich wie Knn Algorithm (6)

Ot ppt
Ot pptOt ppt
Ot ppt
 
2 organizing and displaying data
2  organizing and displaying    data2  organizing and displaying    data
2 organizing and displaying data
 
Lta qrb501 wk6
Lta qrb501 wk6Lta qrb501 wk6
Lta qrb501 wk6
 
The Professor Proposes
The Professor ProposesThe Professor Proposes
The Professor Proposes
 
Auroras Lighting 10W led downlight test report
Auroras Lighting 10W led downlight test reportAuroras Lighting 10W led downlight test report
Auroras Lighting 10W led downlight test report
 
Oil andgas
Oil andgasOil andgas
Oil andgas
 

Mehr von Kamal Gupta Roy

Mehr von Kamal Gupta Roy (6)

Decision_tree.pdf
Decision_tree.pdfDecision_tree.pdf
Decision_tree.pdf
 
Text analytics
Text analyticsText analytics
Text analytics
 
Media savvy for data news
Media savvy for data newsMedia savvy for data news
Media savvy for data news
 
Rdplyr+pdf
Rdplyr+pdfRdplyr+pdf
Rdplyr+pdf
 
Learning R
Learning RLearning R
Learning R
 
Excel reference book by kamal gupta roy
Excel reference book by kamal gupta royExcel reference book by kamal gupta roy
Excel reference book by kamal gupta roy
 

Kürzlich hochgeladen

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 

Kürzlich hochgeladen (20)

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 

Knn Algorithm

  • 1. Kamal Gupta Roy Kamal Gupta Roy kNN Algorithm
  • 2. Kamal Gupta Roy What would be the color of new circles? | 2 A B C D Circle Color A Red B Brown C Blue D ???
  • 4. Kamal Gupta Roy Kamal Gupta Roy Its all about how far and how close you are from others? | 4 A B C D Finding Nearest Neighbors, who they are?
  • 5. Kamal Gupta Roy Kamal Gupta Roy Different Names for the same algorithm | 5 Memory based Reasoning Example based Reasoning Instance based Learning Lazy Learning K-nearest neighbor (KNN)
  • 6. Kamal Gupta Roy Kamal Gupta Roy What is k in kNN? | 6 A Dotted Circle Decision k Purple Red 1 Green Red 3 Orange Blue 13
  • 7. Kamal Gupta Roy Kamal Gupta Roy Choosing the value of k? | 7 Neighborhood may include points from other classes Sensitive to noise points k is too small k is too large
  • 8. Kamal Gupta Roy Kamal Gupta Roy Optimal k | 8
  • 10. Kamal Gupta Roy Kamal Gupta Roy Yellow path Red path Blue path Green path | 10 A B
  • 11. Kamal Gupta Roy Kamal Gupta Roy Manhattan Distance | 11 • The distance between two points measured along axes at right angles. • In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|
  • 12. Kamal Gupta Roy Kamal Gupta Roy Euclidean Distance
  • 13. Kamal Gupta Roy Kamal Gupta Roy Manhattan vs Euclidien | 13
  • 14. Kamal Gupta Roy Kamal Gupta Roy New Value = 46 | 14 Age Default distance square(distance) d 25 Y -21 441 21 35 Y -11 121 11 45 Y -1 1 1 20 Y -26 676 26 35 Y -11 121 11 52 Y 6 36 6 23 Y -23 529 23 40 N -6 36 6 60 N 14 196 14 48 N 2 4 2 33 N -13 169 13 27 N -19 361 19 37 N -9 81 9 Default = Yes
  • 15. Kamal Gupta Roy Kamal Gupta Roy Exercise | 15 Age Loan Default 25 40,000 Y 35 60,000 Y 45 80,000 Y 20 20,000 Y 35 120,000 Y 52 38,000 Y 23 85,000 Y 40 62,000 N 60 98,000 N 48 100,000 N 33 110,000 N 27 130,000 N 37 90,000 N Predict default for a customer with age = 46 and applied loan for 128,000
  • 16. Kamal Gupta Roy • Age = 46 • loan=128,000 | 16 • Default = No Age Loan Default age dist sq loan dist sq d 25 40,000 Y 441 7,744,000,000 88,000 35 60,000 Y 121 4,624,000,000 68,000 45 80,000 Y 1 2,304,000,000 48,000 20 20,000 Y 676 11,664,000,000 108,000 35 120,000 Y 121 64,000,000 8,000 52 38,000 Y 36 8,100,000,000 90,000 23 85,000 Y 529 1,849,000,000 43,000 40 62,000 N 36 4,356,000,000 66,000 60 98,000 N 196 900,000,000 30,000 48 100,000 N 4 784,000,000 28,000 33 110,000 N 169 324,000,000 18,000 27 130,000 N 361 4,000,000 2,000 37 90,000 N 81 1,444,000,000 38,000
  • 17. Kamal Gupta Roy • Age = 46 • loan=128 K | 17 • Default = Yes Age Loan Default age dist sq loan dist sq d 25 40 Y 441 7,744 90 35 60 Y 121 4,624 69 45 80 Y 1 2,304 48 20 20 Y 676 11,664 111 35 120 Y 121 64 14 52 38 Y 36 8,100 90 23 85 Y 529 1,849 49 40 62 N 36 4,356 66 60 98 N 196 900 33 48 100 N 4 784 28 33 110 N 169 324 22 27 130 N 361 4 19 37 90 N 81 1,444 39
  • 18. Kamal Gupta Roy Kamal Gupta Roy Feature Scaling | 18
  • 19. Kamal Gupta Roy Why scaling? Scaling issues – Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 45 KG to 100KG income of a person may vary from Rs10K to Rs 5 lakh | 19
  • 20. Kamal Gupta Roy Standardization Also called as Z-score normalization Mean is zero Standard deviation 1 | 20
  • 21. Kamal Gupta Roy Max-Min Normalization Also called as Min-Max scaling normalization Minimum is zero Maximum is 1 | 21
  • 22. Kamal Gupta Roy | 22 Raw Data Z Normalized Max-Min
  • 23. Kamal Gupta Roy • Age = 46; norm age: 0.65 • loan=128,000; norm age: 0.98 | 23 • Default = No Age Loan Default norm age norm loan age dist sq loan dist sq d 25 40,000 Y 0.13 0.18 0.28 0.64 0.96 35 60,000 Y 0.38 0.36 0.08 0.38 0.68 45 80,000 Y 0.63 0.55 0.00 0.19 0.44 20 20,000 Y - - 0.42 0.96 1.18 35 120,000 Y 0.38 0.91 0.08 0.01 0.28 52 38,000 Y 0.80 0.16 0.02 0.67 0.83 23 85,000 Y 0.08 0.59 0.33 0.15 0.70 40 62,000 N 0.50 0.38 0.02 0.36 0.62 60 98,000 N 1.00 0.71 0.12 0.07 0.44 48 100,000 N 0.70 0.73 0.00 0.06 0.26 33 110,000 N 0.33 0.82 0.11 0.03 0.36 27 130,000 N 0.18 1.00 0.23 0.00 0.48 37 90,000 N 0.43 0.64 0.05 0.12 0.41
  • 24. Kamal Gupta Roy | 24 Age Loan Default age dist sq loan dist sq d 25 40,000 Y 441 7,744,000,000 88,000 35 60,000 Y 121 4,624,000,000 68,000 45 80,000 Y 1 2,304,000,000 48,000 20 20,000 Y 676 11,664,000,000 108,000 35 120,000 Y 121 64,000,000 8,000 52 38,000 Y 36 8,100,000,000 90,000 23 85,000 Y 529 1,849,000,000 43,000 40 62,000 N 36 4,356,000,000 66,000 60 98,000 N 196 900,000,000 30,000 48 100,000 N 4 784,000,000 28,000 33 110,000 N 169 324,000,000 18,000 27 130,000 N 361 4,000,000 2,000 37 90,000 N 81 1,444,000,000 38,000 Age Loan Default age dist sq loan dist sq d 25 40 Y 441 7,744 90 35 60 Y 121 4,624 69 45 80 Y 1 2,304 48 20 20 Y 676 11,664 111 35 120 Y 121 64 14 52 38 Y 36 8,100 90 23 85 Y 529 1,849 49 40 62 N 36 4,356 66 60 98 N 196 900 33 48 100 N 4 784 28 33 110 N 169 324 22 27 130 N 361 4 19 37 90 N 81 1,444 39 Age Loan Default norm age norm loan age dist sq loan dist sq d 25 40,000 Y 0.13 0.18 0.28 0.64 0.96 35 60,000 Y 0.38 0.36 0.08 0.38 0.68 45 80,000 Y 0.63 0.55 0.00 0.19 0.44 20 20,000 Y - - 0.42 0.96 1.18 35 120,000 Y 0.38 0.91 0.08 0.01 0.28 52 38,000 Y 0.80 0.16 0.02 0.67 0.83 23 85,000 Y 0.08 0.59 0.33 0.15 0.70 40 62,000 N 0.50 0.38 0.02 0.36 0.62 60 98,000 N 1.00 0.71 0.12 0.07 0.44 48 100,000 N 0.70 0.73 0.00 0.06 0.26 33 110,000 N 0.33 0.82 0.11 0.03 0.36 27 130,000 N 0.18 1.00 0.23 0.00 0.48 37 90,000 N 0.43 0.64 0.05 0.12 0.41
  • 26. Kamal Gupta Roy Kamal Gupta Roy Hiring Process Example | 26 Matrix Predicted Good Predicted Bad Actual Good Hired Good Candidate Rejected Good Candidate Actual Bad Hired Bad Candidate Rejected Bad Candidate TP TN FN FP Confusion Matrix
  • 27. Kamal Gupta Roy Confusion Matrix Predicted Yes Predicted No Actual Yes TP FN Actual No FP TN | 27 Accuracy = (TP + TN)/ (TP + FN + FP + TN) Recall = TP / (TP + FN) Precision = TP / (TP + FP) Type 1 Error Type 2 Error
  • 28. Kamal Gupta Roy Kamal Gupta Roy Precision vs Recall | 28
  • 29. Kamal Gupta Roy Kamal Gupta Roy Pregnancy Test | 29 Predicted Pregnant Predicted Not Pregnant Actual Pregnant TP FN Actual Not Pregnant FP TN TN TP FP FN
  • 30. Kamal Gupta Roy Sensitivity & Specificity Predicted Yes Predicted No Actual Yes TP FN Actual No FP TN | 30 True Negative Rate, Specificity = TN / (TN+FP) False Positive Rate = FP / (TN+FP) True Positive Rate, Sensitivity = TP / (TP + FN) False Negative Rate = FN / (TN+FP)