Regression 2
\(\beta_1 (slope) = \frac{\sum\limits_{i=1}^{n}[(x_i-\bar{x})(y_i- \bar{y})]}{\sum\limits_{i=1}^{n} (x_i - \bar{x})^2}\)
\(\beta_0\) (intercept) = \(\bar{y} - \beta_1 \bar{x}\)\(\hat{\beta}_1 (slope) = \frac{\text{Cov}(x,y)}{s_x^2} = r\frac{s_y}{s_x}\)
\(\beta_0\) (intercept) = \(\bar{y} - \beta_1 \bar{x}\)
Tip
Look at the y-axis scale!
Caution
Clearly see a huge jump in variance around 15
numpy
and scikit-learn
to fit our modelsscikit-learn
is library made for machine learning. Its awesome!from sklearn.linear_model import LinearRegression
model = LinearRegression()
X = x[:, np.newaxis] # n X 1 matrix
reg = model.fit(X, y) # X needs to be a matrix
slope, intercept = reg.coef_[0], reg.intercept_
print(f"Slope: {slope:.1f}; \nIntercept: {intercept:.1f}")
ax = sns.scatterplot(x=x, y=y)
ax.axline((0, intercept), slope=slope, color='r', label='Regressed Line');
ax.set_xlabel("X")
ax.set_ylabel("Y");
Slope: 1.5;
Intercept: 3.9
X = df[['bill_length_mm', 'flipper_length_mm']].values
y = df['body_mass_g'].values
def plot_plane(x_coef, y_coef, intercept, ax):
x = np.linspace(X[:, 0].min(), X[:, 0].max(), n)
y = np.linspace(X[:, 1].min(), X[:, 1].max(), n)
x, y = np.meshgrid(x, y)
eq = x_coef * x + y_coef * y + intercept
surface = ax.plot_surface(x, y, eq, color='red', alpha=0.5)
fig = plt.figure(figsize = (10, 7))
ax = plt.axes(projection ="3d")
ax.scatter(X[:, 0], X[:, 1], y, label='penguins')
ax.set_xlabel("Bill Length")
ax.set_ylabel("Flipper Length")
ax.set_zlabel("Body Mass")
ax.legend();
model = LinearRegression()
reg = model.fit(X, y) # X needs to be a matrix
slope_bill, slope_flipper, intercept = reg.coef_[0], reg.coef_[1], reg.intercept_
print(f"Bill Slope: {slope:.1f}; Flipper Slope: {slope_flipper:.1f}; \nIntercept: {intercept:.1f}");
Bill Slope: 1.5; Flipper Slope: 48.9;
Intercept: -5836.3
PolynomialFeatures
from sklearn.preprocessing import PolynomialFeatures
pf = PolynomialFeatures(degree=2,include_bias=False)
X = x[:, np.newaxis] # n X 1 matrix
x_features = pf.fit_transform(X)
print("Transformed Features:\n", x_features[:5, :])
model = LinearRegression()
reg = model.fit(x_features, y) # X needs to be a matrix
print("Coefficients")
print(reg.coef_, reg.intercept_); # beta_2, beta_1, beta_0
Transformed Features:
[[0.0 0.0]
[1.0 1.0]
[2.0 4.0]
[3.0 9.0]
[4.0 16.0]]
Coefficients
[0.3 0.1] 2.965347007117529