SVMs as Linear Classifiers

Support Vector Machines: Margin-Maximising Linear Classifiers

A Support Vector Machine (SVM) is a supervised machine learning algorithm for binary classification. It finds the optimal linear decision boundary (hyperplane) that separates two classes with the maximum margin.

Binary Classification Setup

Given labelled data: ${(x_1, y_1), \ldots, (x_n, y_n)}$ where $y_i \in {-1, +1}$

The SVM finds a hyperplane separating the two classes:

In 2D: $w_1 x_1 + w_2 x_2 + b = 0$ (a line)
In $d$ dimensions: $\mathbf{w} \cdot \mathbf{x} + b = 0$

Classification rule: $\hat{y} = \text{sign}(\mathbf{w} \cdot \mathbf{x} + b)$

The Margin

The margin is the perpendicular distance between the decision boundary and the nearest training points from each class.

$$\text{Margin} = \frac{2}{|\mathbf{w}|}$$

A wider margin generally leads to better generalisation.

Support Vectors

Support vectors are the training points lying exactly on the margin boundaries:
$$y_i(\mathbf{w} \cdot \mathbf{x}_i + b) = 1$$

These are the only training points that determine the decision boundary. All other points can be removed without changing the classifier.

The Optimisation Objective

SVM solves:
$$\text{minimise} \;\frac{1}{2}|\mathbf{w}|^2 \quad \text{subject to} \quad y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 \;\; \forall i$$

Minimising $|\mathbf{w}|^2$ maximises the margin $\frac{2}{|\mathbf{w}|}$.

KEY TAKEAWAY: The SVM finds the hyperplane that maximises the margin between the two classes. The support vectors are the only training points that define the boundary — they are the most informative data points.

Why Margin Maximisation?

A larger margin provides a safety buffer for new data points.
Maximising margin is a form of regularisation that reduces overfitting.
The classifier is more confident: new points only need to be on the correct side of the margin boundary.

Limitations of Linear SVM

Not all data is linearly separable. Solutions:
- Soft-margin SVM: Allow some misclassifications (controlled by hyperparameter $C$).
- Kernel trick: Map data to higher dimensions where linear separation is possible.

Summary

Concept	Description
Hyperplane	Decision boundary
Margin	Width of the gap between classes
Support vectors	Points on the margin boundary
Maximise margin	SVM’s training objective
$\|\mathbf{w}\|^2$	Minimised to maximise margin

EXAM TIP: For VCAA, understand the geometric concepts: margin, support vectors, decision boundary. Know the optimisation goal (maximise margin = minimise $|\mathbf{w}|^2$). You do not need to solve the full quadratic programming problem.

COMMON MISTAKE: The SVM is defined by the support vectors only, not all training points. If you removed all non-support-vector training data, the decision boundary would not change.

VCAA FOCUS: Know the definition of margin, the role of support vectors, and why maximising the margin leads to good generalisation. Apply SVM concepts geometrically to 1D and 2D data.

SVMs as Linear Classifiers

Table of Contents

About these notes

Join StudyPulse