flowchart LR
    A[Stats and Prob] --> B[Descriptive Statistics]
    A --> C[Distribtuions]
    A --> D[Population Inference]
    E[Data Exploration] --> F[Plotting]
    E --> G[Dataframes]
    E --> H[Exploratory Data Analysis]
Supervised Learning - KNN
Select all non-linear functions with respect to model parameters \(\mathbf{\beta}\)
flowchart LR
    A[Stats and Prob] --> B[Descriptive Statistics]
    A --> C[Distribtuions]
    A --> D[Population Inference]
    E[Data Exploration] --> F[Plotting]
    E --> G[Dataframes]
    E --> H[Exploratory Data Analysis]
flowchart LR
    A[Regression] --> B[Linear Regression]
    A --> C[Mutliple Linear Regression]
    A --> D[Polynomial Regression]
    A --> Z[Logistic Regression]
    E[Model Evaluation] --> F[RMSE, R^2]
    E --> G[Precision, Recall]
    E --> H[Train,Valid,Test ]
Predict the values that unlabeled data will have and to explain how the inputs lead to the predicted outputs.
A model is interpretable if the relationship between input and output features in the model are easy to explain.
A model is predictive if the outcomes produced by the model match the actual outcomes with new data.
Tip
Sometimes, the phrase “birds of a feather flock together” is used to describe the k-nearest algorithm, meaning the algorithm assumes that instances with similar inputs will have similar outputs.
X, y = make_blobs(n_samples=100, centers=2, cluster_std=1.5, n_features=2,
                  random_state=0)
c_names = ["Class 0", "Class 1"]
fig, ax = plt.subplots(nrows=1, ncols=1)
scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis');
ax.legend(handles=scatter.legend_elements()[0], 
           labels=c_names,
           title="Class")<matplotlib.legend.Legend at 0x7f12d8dd24d0>
from sklearn.model_selection import train_test_split
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
# Plot data
fig, ax = plt.subplots(nrows=1, ncols=2)
data = [(X_train, y_train, 'Train Set'), (X_test, y_test, 'Test Set')]
for (X, y, title), ax_ in zip(data, ax):
  scatter = ax_.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
  ax_.set_title(title)
  ax_.set_xlabel("X0")
  ax_.set_ylabel("X1")
  ax_.legend(handles=scatter.legend_elements()[0], 
           labels=c_names,
           title="Class")
from sklearn.neighbors import KNeighborsClassifier
# Create and train model
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
# Single point sanity check
test_point = [2, 1]
predicted_group = model.predict([test_point])
print(f"We predict that {test_point} will belong to class: {predicted_group}")We predict that [2, 1] will belong to class: [1]# Make predictions
predictions = model.predict(X_test)
# Plot data
fig, ax = plt.subplots(nrows=1, ncols=2)
data = [(X_test, y_test, 'Test Set (Ground Truth)'), (X_test, predictions, 'Test Set Predicted')]
for (X, y, title), ax_ in zip(data, ax):
  scatter = ax_.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
  ax_.set_title(title)
  ax_.set_xlabel("X0")
  ax_.set_ylabel("X1")
  ax_.legend(handles=scatter.legend_elements()[0], 
           labels=c_names,
           title="Class")
              precision    recall  f1-score   support
    Negative       0.92      0.75      0.83        16
     Postive       0.76      0.93      0.84        14
    accuracy                           0.83        30
   macro avg       0.84      0.84      0.83        30
weighted avg       0.85      0.83      0.83        30
