Some one-dimensional datasets are not linearly separable — no single threshold can separate the two classes. The solution is to create a second feature from the original data, mapping the points into two dimensions where they are linearly separable.
Example:
- Class \(+1\): \(\{-3, 3\}\) (far from zero)
- Class \(-1\): \(\{-1, 0, 1\}\) (close to zero)
No single threshold \(t\) works: for any \(t\), some class \(+1\) points are on the wrong side.
Create a second feature \(x_2 = x_1^2\) (the square of the original feature), and map each point \((x_1) \rightarrow (x_1, x_1^2)\):
| \(x_1\) | Class | \(x_2 = x_1^2\) | 2D point |
|---|---|---|---|
| \(-3\) | \(+1\) | \(9\) | \((-3, 9)\) |
| \(3\) | \(+1\) | \(9\) | \((3, 9)\) |
| \(-1\) | \(-1\) | \(1\) | \((-1, 1)\) |
| \(0\) | \(-1\) | \(0\) | \((0, 0)\) |
| \(1\) | \(-1\) | \(1\) | \((1, 1)\) |
In 2D: class \(+1\) points have large \(x_2\) (high up), class \(-1\) points have small \(x_2\) (low down). A horizontal line \(x_2 = c\) separates them.
Support vectors (closest across classes): \((3, 9)\) or \((-3, 9)\) from class \(+1\), and \((1, 1)\) or \((-1, 1)\) from class \(-1\).
Decision boundary \(x_2 = 5\) means \(x_1^2 = 5\), so \(|x_1| = \sqrt{5} \approx 2.24\).
Classification rule in original 1D:
- If \(|x_1| > \sqrt{5}\): predict \(+1\)
- If \(|x_1| \leq \sqrt{5}\): predict \(-1\)
This is a non-linear decision boundary in 1D, found using a linear SVM in 2D.
KEY TAKEAWAY: Mapping from 1D to 2D with \(x_2 = x_1^2\) transforms a non-linearly separable problem into a linearly separable one. The linear boundary in 2D corresponds to a non-linear boundary in 1D.
This technique generalises to the kernel trick: map features to a higher-dimensional space where linear separation is possible. The decision boundary in the original space may be curved (parabola, circle, etc.).
| Transformation | New feature | When useful |
|---|---|---|
| Quadratic | \(x_2 = x_1^2\) | Data symmetric around zero, classes at different distances from 0 |
| Absolute value | \$x_2 = | x_1 |
| Exponential | \(x_2 = e^{x_1}\) | Exponential growth patterns |
EXAM TIP: For VCAA: given a 1D dataset that is not linearly separable, apply \(x_2 = x_1^2\) to create 2D data, find the SVM boundary in 2D (midpoint between support vectors), then interpret back in 1D.
COMMON MISTAKE: After finding the decision boundary in 2D, you must transform it back to the original 1D space. If the 2D boundary is \(x_2 = 5\), the 1D rule is \(x_1^2 = 5\), not \(x_1 = 5\).
VCAA FOCUS: Know why \(x_2 = x_1^2\) is useful, how to apply the transformation, find the 2D boundary, and interpret it in 1D.