4. Kamal Gupta Roy
Kamal Gupta Roy
Its all about how
far and how
close you are
from others?
| 4
A
B
C
D
Finding Nearest
Neighbors, who
they are?
5. Kamal Gupta Roy
Kamal Gupta Roy
Different Names for the same algorithm
| 5
Memory
based
Reasoning
Example
based
Reasoning
Instance
based
Learning
Lazy Learning
K-nearest neighbor
(KNN)
6. Kamal Gupta Roy
Kamal Gupta Roy
What is k in kNN?
| 6
A
Dotted Circle Decision k
Purple Red 1
Green Red 3
Orange Blue 13
7. Kamal Gupta Roy
Kamal Gupta Roy
Choosing the value of k?
| 7
Neighborhood may
include points from
other classes
Sensitive to
noise points
k is
too
small
k is too
large
11. Kamal Gupta Roy
Kamal Gupta Roy
Manhattan Distance
|
11
• The distance between two points
measured along axes at right angles.
• In a plane with p1 at (x1, y1) and p2 at (x2,
y2), it is |x1 - x2| + |y1 - y2|
14. Kamal Gupta Roy
Kamal Gupta Roy
New
Value =
46
| 14
Age Default distance square(distance) d
25 Y -21 441 21
35 Y -11 121 11
45 Y -1 1 1
20 Y -26 676 26
35 Y -11 121 11
52 Y 6 36 6
23 Y -23 529 23
40 N -6 36 6
60 N 14 196 14
48 N 2 4 2
33 N -13 169 13
27 N -19 361 19
37 N -9 81 9
Default =
Yes
15. Kamal Gupta Roy
Kamal Gupta Roy
Exercise
| 15
Age Loan Default
25 40,000 Y
35 60,000 Y
45 80,000 Y
20 20,000 Y
35 120,000 Y
52 38,000 Y
23 85,000 Y
40 62,000 N
60 98,000 N
48 100,000 N
33 110,000 N
27 130,000 N
37 90,000 N
Predict default for a customer
with age = 46 and applied loan for
128,000
16. Kamal Gupta Roy
• Age = 46
• loan=128,000
| 16
• Default = No
Age Loan Default age dist sq loan dist sq d
25 40,000 Y 441 7,744,000,000 88,000
35 60,000 Y 121 4,624,000,000 68,000
45 80,000 Y 1 2,304,000,000 48,000
20 20,000 Y 676 11,664,000,000 108,000
35 120,000 Y 121 64,000,000 8,000
52 38,000 Y 36 8,100,000,000 90,000
23 85,000 Y 529 1,849,000,000 43,000
40 62,000 N 36 4,356,000,000 66,000
60 98,000 N 196 900,000,000 30,000
48 100,000 N 4 784,000,000 28,000
33 110,000 N 169 324,000,000 18,000
27 130,000 N 361 4,000,000 2,000
37 90,000 N 81 1,444,000,000 38,000
17. Kamal Gupta Roy
• Age = 46
• loan=128 K
| 17
• Default = Yes
Age Loan Default age dist sq loan dist sq d
25 40 Y 441 7,744 90
35 60 Y 121 4,624 69
45 80 Y 1 2,304 48
20 20 Y 676 11,664 111
35 120 Y 121 64 14
52 38 Y 36 8,100 90
23 85 Y 529 1,849 49
40 62 N 36 4,356 66
60 98 N 196 900 33
48 100 N 4 784 28
33 110 N 169 324 22
27 130 N 361 4 19
37 90 N 81 1,444 39
19. Kamal Gupta Roy
Why scaling?
Scaling issues – Attributes may have to be scaled to
prevent distance measures from being dominated by one
of the attributes
Example:
height of a person may vary from 1.5m to 1.8m
weight of a person may vary from 45 KG to 100KG
income of a person may vary from Rs10K to Rs 5 lakh
| 19
26. Kamal Gupta Roy
Kamal Gupta Roy
Hiring Process Example
| 26
Matrix
Predicted
Good
Predicted
Bad
Actual
Good
Hired Good
Candidate
Rejected
Good
Candidate
Actual
Bad
Hired Bad
Candidate
Rejected
Bad
Candidate
TP
TN
FN
FP
Confusion
Matrix
27. Kamal Gupta Roy
Confusion Matrix
Predicted
Yes
Predicted
No
Actual
Yes
TP FN
Actual
No
FP TN
| 27
Accuracy = (TP + TN)/ (TP + FN + FP + TN)
Recall = TP / (TP + FN)
Precision = TP / (TP + FP)
Type 1 Error
Type 2 Error
29. Kamal Gupta Roy
Kamal Gupta Roy
Pregnancy Test
| 29
Predicted
Pregnant
Predicted
Not Pregnant
Actual
Pregnant
TP FN
Actual
Not Pregnant
FP TN
TN
TP
FP
FN
30. Kamal Gupta Roy
Sensitivity & Specificity
Predicted
Yes
Predicted
No
Actual
Yes
TP FN
Actual
No
FP TN
| 30
True Negative Rate, Specificity = TN / (TN+FP)
False Positive Rate = FP / (TN+FP)
True Positive Rate, Sensitivity = TP / (TP + FN)
False Negative Rate = FN / (TN+FP)