flowchart LR A[Stats and Prob] --> B[Descriptive Statistics] A --> C[Distribtuions] A --> D[Population Inference] E[Data Exploration] --> F[Plotting] E --> G[Dataframes] E --> H[Exploratory Data Analysis]
Supervised Learning - KNN
Select all non-linear functions with respect to model parameters \(\mathbf{\beta}\)
flowchart LR A[Stats and Prob] --> B[Descriptive Statistics] A --> C[Distribtuions] A --> D[Population Inference] E[Data Exploration] --> F[Plotting] E --> G[Dataframes] E --> H[Exploratory Data Analysis]
flowchart LR A[Regression] --> B[Linear Regression] A --> C[Mutliple Linear Regression] A --> D[Polynomial Regression] A --> Z[Logistic Regression] E[Model Evaluation] --> F[RMSE, R^2] E --> G[Precision, Recall] E --> H[Train,Valid,Test ]
Predict the values that unlabeled data will have and to explain how the inputs lead to the predicted outputs.
A model is interpretable if the relationship between input and output features in the model are easy to explain.
A model is predictive if the outcomes produced by the model match the actual outcomes with new data.
Tip
Sometimes, the phrase “birds of a feather flock together” is used to describe the k-nearest algorithm, meaning the algorithm assumes that instances with similar inputs will have similar outputs.
X, y = make_blobs(n_samples=100, centers=2, cluster_std=1.5, n_features=2,
random_state=0)
c_names = ["Class 0", "Class 1"]
fig, ax = plt.subplots(nrows=1, ncols=1)
scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis');
ax.legend(handles=scatter.legend_elements()[0],
labels=c_names,
title="Class")
<matplotlib.legend.Legend at 0x7f12d8dd24d0>
from sklearn.model_selection import train_test_split
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
# Plot data
fig, ax = plt.subplots(nrows=1, ncols=2)
data = [(X_train, y_train, 'Train Set'), (X_test, y_test, 'Test Set')]
for (X, y, title), ax_ in zip(data, ax):
scatter = ax_.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
ax_.set_title(title)
ax_.set_xlabel("X0")
ax_.set_ylabel("X1")
ax_.legend(handles=scatter.legend_elements()[0],
labels=c_names,
title="Class")
from sklearn.neighbors import KNeighborsClassifier
# Create and train model
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
# Single point sanity check
test_point = [2, 1]
predicted_group = model.predict([test_point])
print(f"We predict that {test_point} will belong to class: {predicted_group}")
We predict that [2, 1] will belong to class: [1]
# Make predictions
predictions = model.predict(X_test)
# Plot data
fig, ax = plt.subplots(nrows=1, ncols=2)
data = [(X_test, y_test, 'Test Set (Ground Truth)'), (X_test, predictions, 'Test Set Predicted')]
for (X, y, title), ax_ in zip(data, ax):
scatter = ax_.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
ax_.set_title(title)
ax_.set_xlabel("X0")
ax_.set_ylabel("X1")
ax_.legend(handles=scatter.legend_elements()[0],
labels=c_names,
title="Class")
precision recall f1-score support
Negative 0.92 0.75 0.83 16
Postive 0.76 0.93 0.84 14
accuracy 0.83 30
macro avg 0.84 0.84 0.83 30
weighted avg 0.85 0.83 0.83 30