CISC482 - Lecture21

Unsupervised Learning

Dr. Jeremy Castagno

Class Business

Schedule

  • Reading 8-2: April 14 @ 12PM, Friday
  • Project Draft Report: April 12 @ Midnight, Wednesday
  • Example Report on Brightspace!!

Today

  • Review SVM
  • Unsupervised Learning
  • K-Means Clustering

Review SVM

Terms

  • Support Vector Machine (SVM) is a supervised learning algorithm that uses ________ to divide data into different classes.
  • In a two-dimensional feature space, a hyperplane is a _____
  • In a three-dimensional feature space, a hyperplane is a ____

Hyperplane 1?

Hyperplane 2?

Separating Classes with Planes (Optimal)

Visual Example of Terms

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=[[-2, 5], [0, 0]], cluster_std=1)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10.)
model.fit(X, Y)

plot_svm(X, Y, model, ax=ax, plot_df=False);
# ax.legend(["Class 0", "Class 1"])

Multiclass

  • What do you do if you have a multiple classes. For example, A, B, and C classes?
    • A vs (B,C)
    • B vs (A,C)
    • C vs (A,B)

One Vs Rest (OVR)

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=False, plot_sv=False, plot_hp=False, plot_margin=False, classes=['A', 'B', 'C']);

One Vs Rest (OVR)

Code
from sklearn import svm
X, Y = make_blobs(random_state=42, n_samples=100, n_features=2, centers=3)
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=False, plot_margin=False, classes=['A', 'B', 'C']);

Kernels

  • What is the input space
    • \(X\)
  • What is your feature space
    • \(\phi(X)\)
  • What do we have feature spaces?

Kernel Trick

  • What mathematical operation do we use judge similarity between points?
    • Inner Product! (Dot Product)
  • What is the kernel trick?
    • Allows you to compute the product of points in a higher dimensional feature space, while actually still remaining in the input space.

SVM Kernel Example

SVM With Linear Kernel

Code
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="linear", C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=True, plot_margin=True, classes=['A', 'B']);

SVM With Polynomial Kernel

Code
fig, ax = plt.subplots(nrows=1, ncols=1)
model = svm.SVC(kernel="poly", degree=2, C=10)
model.fit(X, Y)
plot_svm(X, Y, model, ax=ax, plot_df=True, plot_sv=True, plot_hp=False, plot_margin=False, classes=['A', 'B']);

Unsupervised Learning

Terms

  • Unsupervised learning uses machine learning techniques to identify patterns in data without any prior knowledge about the data.
  • This means we have NO labels. Its like we have a bunch of penguin data, but it is not labeled!

Visual

Examples Algs

Learning algorithm Example
A clustering algorithm groups observations with similar features. Hierarchical clustering, k-means clustering
An outlier detection algorithm identifies deviations within data. DBSCAN, local outlier factor
A latent variable model relates observable variables to a set of latent or unobservable variables. Expectation maximization, principal component analysis

All clustering algs

K-Means

Terms

  • Cluster is a set of samples with similar characteristics.
  • Grouping samples into classes with similar characteristics is called clustering.
  • A natural way of quantifying similarity is by picking a centroid for each cluster and finding the distance between the sample and the centroid.
  • Centroid is a point that represents the center of each cluster. Samples that are closer to a cluster’s centroid are said to be similar to each other and considered part of the cluster.

Visual Example 1

Visual Example 2

Visual Example 3

K-means Algorithm

Step 0: Select the number of clusters, \(k\).

Step 1: Randomly initialize samples as cluster centroids.

Step 2: For each sample, calculate the distance between that sample and each cluster’s centroid, and assign the sample to the cluster with the closest centroid.

Step 3: For each cluster, calculate the mean of all samples in the cluster. This mean becomes the new centroid.

Step 4: Repeat steps 2 and 3 until a stopping criterion is met. Such as reaching a certain number of iterations or the centroids staying the same.

Step 0

Select the number of clusters, \(k\). This is hard if you dont have any prior information! However we have some tricks we can discuss later.

Step 1

Randomly initialize samples as cluster centroids. If \(k=3\) then randomly choose three points to become your cluster centers.

Step 2

For each sample, calculate the distance between that sample and each cluster’s centroid, and assign the sample to the cluster with the closest centroid

\(d = \sqrt {\left( {x_1 - x_2 } \right)^2 + \left( {y_1 - y_2 } \right)^2 }\)

If n=100, how many distance calculation will we have?

Step 3

For each cluster, calculate the mean of all samples in the cluster. This mean becomes the new centroid.

This replaces the random selection you had in the beginning!

Step 4

Repeat steps 2 and 3 until a stopping criterion is met. Such as reaching a certain number of iterations or the centroids staying the same.

K-means Animation

Choosing K

  • The number of clusters is not often obvious, especially if the data has more than two features.
  • The elbow method is the most common technique to determine the optimal number of clusters for the data.
  • The intuition is that good groups should be close together.
  • How can we measure how close things are together?
    • The sum of squared distanced between all samples and their centroid
    • within-cluster sum of squares - WCSS

Sweet Spot

  • Think of this example with \(n\) data points
  • When you have 1 group things are very spread far apart!
    • Thats the the sum of squared distances between of all points to the center
  • When you have n groups then things are really close together!
    • Each sample is its own group and has NO distance from it center!
  • You want the sweet spot, where when you increase the number of groups (k) and the WCSS drops alot.

Elbow