What is the difference between supervised learning and unsupervised learning?
What is the quintienstial example of unsupervised learning
clustering
Names of some example clustering algs?
All clustering algs
K-Means
Cluster is a set of samples with similar or dissimilar characteristics?
Grouping samples into classes with similar characteristics is called clustering.
Centroid ?
is a point that represents the center of each cluster.
Samples that are closer to a cluster’s centroid are said to be similar to each other and considered part of the cluster.
What do we mean by closer? Can you give a precise definition?
K-means Algorithm
Step 0: Select the number of clusters, \(k\).
Step 1: Randomly initialize samples as cluster centroids.
Step 2: For each sample, calculate the distance between that sample and each cluster’s centroid, and assign the sample to the cluster with the closest centroid.
Step 3: For each cluster, calculate the mean of all samples in the cluster. This mean becomes the new centroid. -Step 4: Repeat steps 2 and 3 until a stopping criterion is met. Such as reaching a certain number of iterations or the centroids staying the same.
Choosing K
The number of clusters is not often obvious, especially if the data has more than two features.
The elbow method is the most common technique to determine the optimal number of clusters for the data.
The intuition is that good groups should be close together.
How can we measure how close things are together?
The sum of squared distanced between all samples and their centroid
within-cluster sum of squares - WCSS
Sweet Spot
Peer Review
Motivation
Making the writing process more collaborative
Learn from one another and to think carefully about the role of writing in this course
Allows students to clarify your own ideas
You explain your ideas to classmates and the reviewer formulates questions about your writing
Peer review provides professional experience for you having your writing reviewed