2. SVM
• In this presentation, we will be learning the
characteristics of SVM by analyzing it with 2
different Datasets
• 1)IRIS
• 2)Mushroom
• Both will be implementing on WEKA Data
Mining Software
3. What is SVM?
• Support Vector Machine or Support Vector
Network are supervised learning model with
associated learning algorithm that analyze data
and recognize patterns, used
for classification and regression analysis.
• The basic SVM takes a set of input data and
predicts, for each given input, which of two
possible classes forms the output, making it a
non-probablistic binary linear classification
-wikipedia
4. IRIS and SVM
• IRIS Dataset: The Iris flower data set is
a multivariate dataset which quantifies the
structural variation of three related species of Iris
flower.
• Thus classification is done on the basis of flower
species which are:
• Iris-setosa------------------->Blue
• Iris-versicolor -----------------> Red
• Iris-verginica ------------------> CYAN colour
5.
6. IRIS and SVM
• The data set consists of 50 samples/ instances from
each of three species that totals to 150.
• Four features were measured from each sample
• 1) Sepal Length
• 2) Petal Length
• 3) Sepal Width
• 4) Petal Width
• -- all in centimetres.
• To distinguish between the species linear discriminant
model is used.
• Linear discriminant analysis (LDA) are methods used to find a linear
combination of features which characterizes or separates two or more classes of objects or
events. (wikepedia)
7. IRIS and SVM
• So concerning our dataset, as we will be
simultaneously analysing the different behaviour of the
four features as mentioned above for the three
different species of the Iris flower.
• In IRIS, we will be implementing multi-class SVM
model, as there are more than 2 classes.
• We can see from the below image that class 'Iris
setosa' is linearly separable and other two classes are
not. Thus dataset like Iris is linearly not separable
which could be a best example to implement SVM.
8.
9. Implementation of SVM
• The multi-class SVM will be implemented by LIBSVM library. LIBSVM
implements the SMO algorithm for kernelized support vector
machines(SVMs), supporting classification and regression. LIBSVM
implement one against one strategy for multiclass implementation.
LIBSVM to build SVM classes
• The one against one strategy, also known as “pairwise coupling”,
“all pairs” or “round robin”, consists in constructing one SVM for
each pair of classes. Thus, for a problem with c classes, c(c-1)/2
SVMs are trained to distinguish the samples of one class from the
samples of another class. Usually, classification of an unknown
pattern is done according to the maximum voting , where each SVM
votes for one class. [http://hal.archives-
ouvertes.fr/docs/00/10/39/55/PDF/cr102875872670.pdf pp.4]
10. General Classification of IRIS
• Its shown in the histogram that how different feature of each training
example i.e measurements of petal and sepal width and length, classify
each example into different classes. The below classification is on the basis
of sepal length
11. Classification-SVM algorithms
• To construct an optimal hyperplane, SVM employs an iterative
training algorithm, which is used to minimize an error
function. According to the form of the error function, SVM
models can be classified into four distinct groups:
• Classification SVM Type 1 (also known as C-SVM classification)
• Classification SVM Type 2 (also known as nu-SVM
classification)
• [https://www.statsoft.com/textbook/support-vector-
machines]
12. Testing both algorithms, it was found that C-SVM have better performance
over nu-SVM . The MSE and RSE in C-SVM was found as 0.22 and 0.149,
whereas the same in nu-SVM was measured as 0.26 and 0.16
13. Kernal Type. As it is on Multi-classes dataset thus it will be using
the kernel trick. There are four kernel functions available for
selection
14. SVM Kernels
• Radial basis kernel function is most popular and
most widely used from all. Different Kernel
Functions will generate different confusion
matrix
• In general, the RBF kernel is a reasonable first
choice. This kernel nonlinearly maps samples into
a higher dimensional space so it, unlike the linear
kernel, can handle the case when the relation
between class labels and attributes is nonlinear
16. Testing Iris Dataset via
SVM
• Using same training set for
test set
• Using different test set
from the original training
set
• Cross validation method
• Percentage Split. if 10%
then it means 10% training
data and 90% test data
22. MUSHROOM DATASET
• This dataset is a sample of 23 different species of
mushroom, which has the poisonous and edible
effect. Thus, the training set will categorize each
species in to 2 classes.. Thus it will train the
future mushroom samples to fall into either of
two categories depends upon its similarity with
the other 23 species.
• Total instances we have 8124
• In the following picture, Edible is shown in Blue
Poisonous is in Red
23.
24. Mushroom and SVM
Following example will show how one of the feature of mushroom when have certain effect out of 9
categories, will classify it into Edible or Poisonous. Like if it smells Fishy i.e 'f' which have a count of 2160
has more probability of being poisonous.
25. Implementation of SVM
• In this dataset SVM model is used as binary classifier(default) doing linear
classification.
• It is implemented by Weka’s default algorithm SMO(Sequential Minimal
optimization), which is also used in LibSVM
• This implementation globally replaces all missing values and transforms
nominal attributes into binary ones. It also normalizes all attributes by default.
• Linear Binary kernel used k<x,y>=x,y
• As like LibSVM it has different kernel functions. By default it uses PolyKernel
pulls out the following result. I did try to implement other kernels but it was
too slow to process 8124 instances
26.
27. As like LibSVM it has different kernel functions. By default it uses PolyKernel
that pulls out the following result. I did try to implement other kernels but it
was too slow to process 8124 instances
C-SVC' and 'nu-SVC'. The original SVM formulations for Classification (SVC) used parameter C, [0, inf), to apply a penalty to the optimization for data points which were not correctly separated by the classifying hyperplane
It is always better to have k larger as then the training set can pick all the relevant structure