CISC482 - Lecture21

Unsupervised Learning

Dr. Jeremy Castagno

Class Business

Schedule

Reading 8-2: April 14 @ 12PM, Friday
Project Draft Report: April 12 @ Midnight, Wednesday
Example Report on Brightspace!!

Today

Review SVM
Unsupervised Learning
K-Means Clustering

Review SVM

Terms

Support Vector Machine (SVM) is a supervised learning algorithm that uses ________ to divide data into different classes.

In a two-dimensional feature space, a hyperplane is a _____
In a three-dimensional feature space, a hyperplane is a ____

Hyperplane 1?

Hyperplane 2?

Separating Classes with Planes (Optimal)

Visual Example of Terms

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10.)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False);
# ax.legend(["Class 0", "Class 1"])

Multiclass

What do you do if you have a multiple classes. For example, A, B, and C classes?
- A vs (B,C)
- B vs (A,C)
- C vs (A,B)

One Vs Rest (OVR)

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=False, plot_hp=False, plot_margin=False, classes=['A', 'B', 'C']);

One Vs Rest (OVR)

Code

from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=False, plot_margin=False, classes=['A', 'B', 'C']);

Kernels

What is the input space
- \(X\)
What is your feature space
- \(\phi(X)\)
What do we have feature spaces?

Kernel Trick

What mathematical operation do we use judge similarity between points?
- Inner Product! (Dot Product)
What is the kernel trick?
- Allows you to compute the product of points in a higher dimensional feature space, while actually still remaining in the input space.

SVM Kernel Example

SVM With Linear Kernel

Code

fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=True, plot_margin=True, classes=['A', 'B']);

SVM With Polynomial Kernel

Code

fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="poly", degree=2, C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=False, plot_margin=False, classes=['A', 'B']);

Unsupervised Learning

Terms

Unsupervised learning uses machine learning techniques to identify patterns in data without any prior knowledge about the data.
This means we have NO labels. Its like we have a bunch of penguin data, but it is not labeled!

Visual

Examples Algs

Learning algorithm	Example
A clustering algorithm groups observations with similar features.	Hierarchical clustering, k-means clustering
An outlier detection algorithm identifies deviations within data.	DBSCAN, local outlier factor
A latent variable model relates observable variables to a set of latent or unobservable variables.	Expectation maximization, principal component analysis

All clustering algs

K-Means

Terms

Cluster is a set of samples with similar characteristics.
Grouping samples into classes with similar characteristics is called clustering.
A natural way of quantifying similarity is by picking a centroid for each cluster and finding the distance between the sample and the centroid.
Centroid is a point that represents the center of each cluster. Samples that are closer to a cluster’s centroid are said to be similar to each other and considered part of the cluster.

Visual Example 1

Visual Example 2

Visual Example 3

K-means Algorithm

Step 0: Select the number of clusters, \(k\).

Step 1: Randomly initialize samples as cluster centroids.

Step 2: For each sample, calculate the distance between that sample and each cluster’s centroid, and assign the sample to the cluster with the closest centroid.

Step 3: For each cluster, calculate the mean of all samples in the cluster. This mean becomes the new centroid.

Step 4: Repeat steps 2 and 3 until a stopping criterion is met. Such as reaching a certain number of iterations or the centroids staying the same.

Step 0

Select the number of clusters, \(k\). This is hard if you dont have any prior information! However we have some tricks we can discuss later.

Step 1

Randomly initialize samples as cluster centroids. If \(k=3\) then randomly choose three points to become your cluster centers.

Step 2

For each sample, calculate the distance between that sample and each cluster’s centroid, and assign the sample to the cluster with the closest centroid

\(d = \sqrt {\left( {x_1 - x_2 } \right)^2 + \left( {y_1 - y_2 } \right)^2 }\)

If n=100, how many distance calculation will we have?

Step 3

For each cluster, calculate the mean of all samples in the cluster. This mean becomes the new centroid.

This replaces the random selection you had in the beginning!

Step 4

Repeat steps 2 and 3 until a stopping criterion is met. Such as reaching a certain number of iterations or the centroids staying the same.

K-means Animation

Choosing K

The number of clusters is not often obvious, especially if the data has more than two features.
The elbow method is the most common technique to determine the optimal number of clusters for the data.
The intuition is that good groups should be close together.
How can we measure how close things are together?
- The sum of squared distanced between all samples and their centroid
- within-cluster sum of squares - WCSS

Sweet Spot

Think of this example with \(n\) data points
When you have 1 group things are very spread far apart!
- Thats the the sum of squared distances between of all points to the center
When you have n groups then things are really close together!
- Each sample is its own group and has NO distance from it center!
You want the sweet spot, where when you increase the number of groups (k) and the WCSS drops alot.

CISC482 - Lecture21

Class Business

Schedule

Today

Review SVM

Terms

Hyperplane 1?

Hyperplane 2?

Separating Classes with Planes (Optimal)

Visual Example of Terms

Multiclass

One Vs Rest (OVR)

One Vs Rest (OVR)

Kernels

Kernel Trick

SVM Kernel Example

SVM With Linear Kernel

SVM With Polynomial Kernel

Unsupervised Learning

Terms

Visual

Examples Algs

All clustering algs

K-Means

Terms

Visual Example 1

Visual Example 2

Visual Example 3

K-means Algorithm

Step 0

Step 1

Step 2

Step 3

Step 4

K-means Animation

Choosing K

Sweet Spot

Elbow