CISC482 - Lecture14

Evaluating Model Performance 1

Dr. Jeremy Castagno

Class Business

Schedule

  • Reading 6-1: Mar 08 @ 12PM, Wednesday
  • Reading 6-2: Mar 10 @ 12PM, Friday
  • Proposal: Mar 22, Wednesday
  • HW5 - Mar 29 @ Midnight, Wednesday

Today

  • Overfit/Underfit
  • Bias/Variance Trade off
  • Regression Metric
  • Binary Classification Metrics

Model Error

Modelling

  • We approximate an output feature \(y\), using input features \(X\) with function \(f\) such that \(\hat{y} = f(X)\)
    • Example: Predicting penguin body mass by bill length
    • \(\text{body mass} = \hat{y} = mx + b\)
  • We have to choose \(f\) and \(X\): simple linear model, polynomial model, multiple linear regression, logistic regression, etc.
  • Example is \(\hat{y} = \beta_0 + \beta_1 x\) or is \(\hat{y} = \beta_0 + \beta_1 x + \beta_2 x^2\)

Underfit

  • Underfit - model is too simple to fit the data well.

Underfit Problems

  • An underfit model will miss the underlying trend
  • Will score poorly in metrics

Overfit

  • Overfit - model is too complex to fit the data well.

Overfit Problems

  • Fitting the data too closely
  • Incorporating too much noise (meaningless variation)
  • Misses the general trend of the data despite scoring well in some metrics
  • In fact, what is the error for this model?

Optimal

  • This model would be best fit with a quadratic model

Note

Important

A model that is overfit or underfit is a bad predictor of outcomes outside of the data set and should not be used. In the field of data science, models tend to be overfit, so model selection techniques focus on choosing the least complex model that captures the general trend.

Find Most Underfit and Most Overfit

Bias and Variance

Breaking down Error

  • The total error of a model is how much the observed values differ from predicted values. Total error is broken down into three pieces:
    • Bias - model’s prediction differs from the observed values due to the assumptions built into the model.
    • Variance - spread/variance of predictions
    • Irreducible error - error inherent to the data (noise)

Visual Explanation

Bias-Variance Tradeoff

  • Choosing a more complex model (more features, a more complicated mathematical expression, etc.) means the model’s predictions are closer to the observed sample values, which decreases the bias.
  • However, doing so makes the model’s predictions more spread out to meet the observed values, increasing the variance.
  • An optimal model should be just complex enough to capture the general trend of the data (low bias) without incorporating too much of the noise from the sample (low variance).

Visual Example

Problem 1

Problem 2

Regression Metrics

Dataset

  • We will be using the penguin dataset
Code
X = df['bill_length_mm'].values[:, np.newaxis]
y = df['body_mass_g'].values
df.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.10 18.70 181.00 3,750.00 male 2007
1 Adelie Torgersen 39.50 17.40 186.00 3,800.00 female 2007
2 Adelie Torgersen 40.30 18.00 195.00 3,250.00 female 2007
4 Adelie Torgersen 36.70 19.30 193.00 3,450.00 female 2007
5 Adelie Torgersen 39.30 20.60 190.00 3,650.00 male 2007

Two statistics

  • R-squared, \(R^2\) : Percentage of variability in the outcome explained by the regression model (in the context of SLR, the predictor)

    \[ R^2 = \frac{\text{variation explained by regression}}{\text{total variation in the data}} = \frac{\sum (\hat{y}_i - \bar{y})^2}{\sum (y_i - \bar{y})^2} \\ R^2 = 1 - \frac{\sum (\hat{y}_i - y_i)}{\sum (y_i - \bar{y})^2} \]

  • Root mean square error, RMSE: A measure of the average error (average difference between observed and predicted values of the outcome)

\[ RMSE = \sqrt{\frac{\sum_{i = 1}^n (y_i - \hat{y}_i)^2}{n}} \]

What indicates a good model fit? Higher or lower \(R^2\)? Higher or lower RMSE?

R-squared

  • Ranges between 0 (terrible predictor) and 1 (perfect predictor), Unitless
  • Calculate with model.score(X, y):
model = LinearRegression()
model.fit(X,y)
r_squared = model.score(X,y)
print(f"R^2 = {r_squared:.2f}")
R^2 = 0.35
x = X[:, 0]
regressed_fn = np.poly1d(np.polyfit(x, y, 1))
y_hat = regressed_fn(x)
r_squared = np.sum(np.square(y_hat - y.mean())) / np.sum(np.square(y.mean() - y))
print(f"R^2 = {r_squared:.2f}")
R^2 = 0.35

More Examples

r = np.corrcoef(y, y_hat)[0, 1] # matrix, get first row second column
r_squared = r ** 2
print(f"R^2 = {r_squared:.2f}")
R^2 = 0.35

Graph

ax = sns.scatterplot(x=X[:,0], y=y, color="black")
ax.plot(X, model.predict(X), color="red", linewidth=3)

\(R^2\) Example

Interpreting R-squared

The \(R^2\) of the model for predicting penguin mass from bill length is 25%. Which of the following is the correct interpretation of this value?

  • Bill Length correctly predicts 25% of penguin mass.
  • 25% of the variability in penguin mass can be explained by bill length.
  • 25% of the time penugin mass can be predicted by bill length.

RMSE

  • Ranges between 0 (perfect predictor) and infinity (terrible predictor)

  • Same units as the outcome variable

  • Calculate with means_squared_error(y_true, y_pred):

    from sklearn import metrics
    rmse = metrics.mean_squared_error(y, model.predict(X))
    print(f"RMSE: {rmse:.2f}")
    RMSE: 421823.22
  • The value of RMSE is not very meaningful on its own, but it’s useful for comparing across models.

  • Comparing a model that uses bill length for a predictor or using flipper length

RMSE Example

Binary Classification Metrics

True/False Positive/Negative

  • True Positive (TP) is an outcome that was correctly identified as positive
  • True Negative (TN) is an outcome that was correctly identified as negative.
  • False Positive (FP) is an outcome that was incorrectly identified as positive
  • False Negative (TN) is an outcome that was incorrectly identified as negative

Confustion Matrix

Positive (predicted) Negative (predicted)
Positive (actual) 170 21
Negative (actual) 1 377

Metrics

  • Accuracy - useful
  • Precison - very useful
  • Recall - very useful

Accuracy

  • Accuracy: \(\frac{\text{# Correctly Predicted}}{\text{Total}}\)
  • \(\frac{TP + TN}{TP + TN + FP + FN}\)

Precision

  • Tell how precise your prediction is
  • \(\frac{TP}{TP + FP}\)
  • The higher this number, the less False Positives you have
  • My research - Identifying an emegency landing location nearby. A precison gives confidence that it truly is safe to land at the location the model predicts.

Recall

  • the proportion of positives that were correctly predicted
  • \(\frac{TP}{TP + FN}\)
  • The higher this number, the less False Negative you have.
  • My Research - A high recall means I found nearly all the rooftops in the city that you could land on.

Example Question

Positive (predicted) Negative (predicted)
Positive (actual) 170 21
Negative (actual) 1 377

What is the Accuracy, Precision, and Recall?

Tradeoff Between Precision and Recall

Tradeoff

  • In logisitc regression you specify a threshold to use a prediciton. By deafult we use 0.5 or 50%. But that is arbitary and you move that threshold

Low Threshold

High Threshold