ZANE.C

Regression Model

Regression Model

Created on Aug 03, 2025, Last Updated on Sep 14, 2025, By a Developer

Linear Regression


There are multiple error metrics available:

  • Mean Absolute Error (MAE)
  • Mean Percent Absolute Error (MAPE)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)

Least Squares Method is usually used to estimate the coefficient/weights of the model. The name indicate that it calculate the weights when MSE is zero.

Polynomial Regression


Linear regression would rarely be sufficient, the feature relationship would be non-linear in a lot of cases. Instead of using X as X in feature list, higher power of feature can be used, such as X**2, X**3.

Apart from using power of single feature, combination of different feature is also useful in a lot of cases.

Measurement


While the model performance is still measurable using Variance and Bias, different from Deep Learning where model not really care much about features, classic regression model put a lot of emphasis on feature selection.

Model Performance


To measure how good the model fits the training data, R-Squared is one option.

Coefficient Significance


To measure if the coefficient(s)/weight(s) are significant:

  • Standard Error (SE) describes the level of spread of the sample.
  • t-value: t-value indicate how strong the coefficient is compared to its uncertainty.
  • p-value: If the true coefficient were zero, how often would I see a t-value this extreme. A big p-value (usually > 0.05) indicate the coefficient is consistent with noise, meaning the it is not significant.

Correlated Features


Correlated features means multiple features containing redundant information, having confounding information (having same cause), or having causality (causing each other indirectly).

Variance Inflation Factor (VIF) is a metrics to determine Collinearity among features. Where means the value for all other features meaning . When VIF having a big value (usually bigger than 5 or 10), meaning the feature has a collinearity with other features.

Feature Selection


  • Forward Selection: Start from zero feature, and keep adding features that maximize R-Squared.
  • Backward Selection: Start from all features, and keep removing features with max p-value until reaching some threshold.
  • Mixed Selection: Combine of both above.

© 2024-present Zane Chen. All Rights Reserved.