CISC482 - Lecture20

Supervised Learning - SVM

Dr. Jeremy Castagno

Class Business

Schedule

  • Reading 8-1: April 12 @ 12PM, Wednesday
  • Reading 8-2: April 14 @ 12PM, Friday
  • Project Draft Report: April 12 @ Midnight, Wednesday
  • Example Report on Brightspace!!

Today

  • Review Draft Report
  • Support Vector Machine
  • Kernels

Draft Report

Go to brightspace!

SVM

Terms

  • Support Vector Machine (SVM) is a supervised learning algorithm that uses hyperplanes to divide data into different classes.
  • Hyperplane is a flat surface that is one dimension lower than the input feature space.
  • In a two-dimensional feature space, a hyperplane is a _____
  • In a three-dimensional feature space, a hyperplane is a ____

Visual Example 1

Visual Example 2

Visual Example 3

Separating Classes with Planes

Separating Classes with Planes (Optimal)

Advantages ans Disadvantages

Advantages:

  • Flexible
  • Low storage required

Disadvantages:

  • Prefers balanced data (we have workarounds)
  • Many hyperparametes (very true)

More Terms!

  • Support vectors are the sample data points, which are closest to the hyperplane.
    • These data points will define the separating line
  • Margin is a separation gap between the two lines on the closest data points

Visual Example of Terms

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10.)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False);
# ax.legend(["Class 0", "Class 1"])

Margin

Margin Terminology

  • A dataset is well-separated if a hyperplane can divide the dataset so that all the instances of one class fall on one side of the hyperplane, and all instances not in that class fall on the other side.

Was the prior example well separated?

Tip

The closest instances to the hyperplane are the hyperplane’s support vectors. The support vectors are the only instances that determine, or support, the hyperplane.

Not well Separated

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1.5)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10.)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=False, plot_hp=False, plot_margin=False);

Sklearn SVM

  • We have a hyperparameter named C that we can change to adjust how to handle misclassifications
  • The C parameter tells the SVM optimization how much you want to avoid misclassifying each training example
  • svm.SVC has as keyword argument C, the smaller C more penalized the model to make a smaller margin

Visual Example

Example - Large C

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1.5)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10000.0)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=True, plot_hp=True, plot_margin=True);

Example - Small C

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1.5)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=0.000001)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=True, plot_hp=True, plot_margin=True);

Multiple Categories/Class

Multiclass

  • What do you do if you have a multiple classes, not binary.
  • Can one hyperplane separate a space into three regions?
  • But we have a trick! If we have three classes (A,B,C) we are trying to seperate, we just train three binary models!
    • A vs (B,C)
    • B vs (A,C)
    • C vs (A,B)

One Vs Rest (OVR)

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=False, plot_hp=False, plot_margin=False, classes=['A', 'B', 'C']);

A vs (B,C)

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
mask = Y == 2
Y[mask] = 1
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_sv=True, plot_hp=True, plot_margin=True, classes=['A', 'B/C'], plot_df=False);

B vs (A,C)

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
mask = Y == 1
Y[~mask] = 2
Y[mask] = 0
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_sv=True, plot_hp=True, plot_margin=True, classes=['B', 'A/C'], plot_df=False);

C vs (A,B)

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
mask = Y == 2
Y[~mask] = 1
Y[mask] = 0
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_sv=True, plot_hp=True, plot_margin=True, classes=['C', 'A/B'], plot_df=False);

One Vs Rest (OVR)

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=False, plot_margin=False, classes=['A', 'B', 'C']);

Higher Dimensional Kernels

Motivation

Feature Space

  • The feature space is the higher dimensional space that we can transform our data into!
  • The feature space can be much bigger than just 1 dimensional above our data
  • Some features spaces can make it easier for us to seperate our data!
  • It can be expensive to create these feauture spaces
    • Computation!
    • Memory!

Inner Product

  • The SVM makes decisions about what class you belong to by determining what side of the hyperplane you are on
  • I have never explained the process, but the basic idea is it used inner products (dot product).
  • You dont need to understand how it uses the inner products to make decisions. Just trust that that is what it does.

Dot Product 1

Dot Product 2

So basically all we really need is to get the dot product of these points in the higher dimension.

Example Unseperable

Mapping

Example Transformed

Kernel Trick

SVM Kernel Example

SVM With Linear Kernel

Code
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=True, plot_margin=True, classes=['A', 'B']);

SVM With Polynomial Kernel

Code
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="poly", degree=2, C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=False, plot_margin=False, classes=['A', 'B']);

Mathematics Behind SVM