ZANE.C

Support Vector Machine (SVM)

Support Vector Machine (SVM)

Created on Sep 18, 2025, Last Updated on Oct 22, 2025, By a Developer

Support Vector Machine (SVM) is similar to Logistic Regression to some extents. They both trying to find the decision boundary between two or more groups of training samples. SVM is explicitly looking for a hyper plane in feature space to split training samples.

Different from logistics regression where all sample data are participating in finding decision boundaries, only a small chunk of data is “supporting” the hyperplane in SVM. Thus, those points contributing to the hyperplane called Support Vectors and for each support vector the distance to the hyperplane should be no larger than Margin.

Similar to all model depending on sample distances, SVM suffer from “Curse of Dimension”, PCA and other dimension reduction methods are applied on top of features very commonly.

Hard & Soft Margin Classifier


Hard and soft is referring to if the hyperplane allow any sample to live in the opposite side. This, obviously, hard margin classifier would only work in linearly separable datasets. Both hard and soft margin classifiers are linear classifiers, meaning there are no interactions among features. So the hyperplane can have equation:

The hard margin classifier has no error tolerance, which indicates that the Margin is the distance from closest support vector to hyperplane, meaning all support vectors are living in the margin.

The soft margin classifier allows some points got classified into the wrong side of the hyperplane. Therefore, an error rate is assigned to each support vector.

To control total error, the error factor is added to the equation we minimizing.

Where C is the error budget, a smaller C indicate a higher tolerance of error.

Kernel Method


The linear hyperplane is insufficient in a lot of cases, therefore, the soft margin classifier got generalized to fit none linear example distributions.

Polynomial Regression is kind of hard to train, but differently in SVM, the the interaction among features is only for product. The formula can be written as below, where is all training sample and is support vector weights.

And this can be generalized to

And the K kernel methods can be different.

  • Polynomial Kernel:
  • Radial Kernel:

© 2024-present Zane Chen. All Rights Reserved.