CISC482 - Lecture20

Supervised Learning - SVM

Dr. Jeremy Castagno

Class Business

Schedule

Reading 8-1: April 12 @ 12PM, Wednesday
Reading 8-2: April 14 @ 12PM, Friday
Project Draft Report: April 12 @ Midnight, Wednesday
Example Report on Brightspace!!

Today

Review Draft Report
Support Vector Machine
Kernels

Draft Report

Go to brightspace!

SVM

Terms

Support Vector Machine (SVM) is a supervised learning algorithm that uses hyperplanes to divide data into different classes.
Hyperplane is a flat surface that is one dimension lower than the input feature space.

In a two-dimensional feature space, a hyperplane is a _____
In a three-dimensional feature space, a hyperplane is a ____

Visual Example 1

Visual Example 2

Visual Example 3

Separating Classes with Planes

Separating Classes with Planes (Optimal)

Advantages ans Disadvantages

Advantages:

Flexible
Low storage required

Disadvantages:

Prefers balanced data (we have workarounds)
Many hyperparametes (very true)

More Terms!

Support vectors are the sample data points, which are closest to the hyperplane.
- These data points will define the separating line
Margin is a separation gap between the two lines on the closest data points

Visual Example of Terms

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10.)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False);
# ax.legend(["Class 0", "Class 1"])

Margin

Margin Terminology

A dataset is well-separated if a hyperplane can divide the dataset so that all the instances of one class fall on one side of the hyperplane, and all instances not in that class fall on the other side.

Was the prior example well separated?

Tip

The closest instances to the hyperplane are the hyperplane’s support vectors. The support vectors are the only instances that determine, or support, the hyperplane.

Not well Separated

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1.5)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10.)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=False, plot_hp=False, plot_margin=False);

Sklearn SVM

We have a hyperparameter named C that we can change to adjust how to handle misclassifications
The C parameter tells the SVM optimization how much you want to avoid misclassifying each training example
svm.SVC has as keyword argument C, the smaller C more penalized the model to make a smaller margin

Visual Example

Example - Large C

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1.5)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10000.0)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=True, plot_hp=True, plot_margin=True);

Example - Small C

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1.5)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=0.000001)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=True, plot_hp=True, plot_margin=True);

Multiple Categories/Class

Multiclass

What do you do if you have a multiple classes, not binary.
Can one hyperplane separate a space into three regions?
But we have a trick! If we have three classes (A,B,C) we are trying to seperate, we just train three binary models!
- A vs (B,C)
- B vs (A,C)
- C vs (A,B)

One Vs Rest (OVR)

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=False, plot_hp=False, plot_margin=False, classes=['A', 'B', 'C']);

A vs (B,C)

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
mask = Y == 2
Y[mask] = 1
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_sv=True, plot_hp=True, plot_margin=True, classes=['A', 'B/C'], plot_df=False);

B vs (A,C)

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
mask = Y == 1
Y[~mask] = 2
Y[mask] = 0
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_sv=True, plot_hp=True, plot_margin=True, classes=['B', 'A/C'], plot_df=False);

C vs (A,B)

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
mask = Y == 2
Y[~mask] = 1
Y[mask] = 0
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_sv=True, plot_hp=True, plot_margin=True, classes=['C', 'A/B'], plot_df=False);

One Vs Rest (OVR)

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=False, plot_margin=False, classes=['A', 'B', 'C']);

Higher Dimensional Kernels

Motivation

Feature Space

The feature space is the higher dimensional space that we can transform our data into!
The feature space can be much bigger than just 1 dimensional above our data
Some features spaces can make it easier for us to seperate our data!
It can be expensive to create these feauture spaces
- Computation!
- Memory!

Inner Product

The SVM makes decisions about what class you belong to by determining what side of the hyperplane you are on
I have never explained the process, but the basic idea is it used inner products (dot product).
You dont need to understand how it uses the inner products to make decisions. Just trust that that is what it does.

Dot Product 1

Dot Product 2

So basically all we really need is to get the dot product of these points in the higher dimension.

Example Unseperable

Mapping

Example Transformed

Kernel Trick

SVM Kernel Example

SVM With Linear Kernel

Code

fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=True, plot_margin=True, classes=['A', 'B']);

SVM With Polynomial Kernel

Code

fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="poly", degree=2, C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=False, plot_margin=False, classes=['A', 'B']);

Mathematics Behind SVM

Link to Theory

Derivation