Neural Network Training - StudyPulse
Boost Your VCE Scores Today with StudyPulse
8000+ Questions AI Tutor Help
Home Subjects Algorithmics (HESS) Neural network training

Neural Network Training

Algorithmics (HESS)
StudyPulse

Neural Network Training

Algorithmics (HESS)
01 May 2026

Training Neural Networks: Iterative Weight Improvement

Neural network training iteratively adjusts edge weights (and biases) to reduce the difference between the network’s predictions and the true labels in the training data.


The Training Loop

Initialise all weights randomly (small values)
Repeat for many epochs:
    For each training example (x, y):
        1. Forward propagation: compute output y_hat
        2. Compute loss: L(y_hat, y)
        3. Backpropagation: compute gradient of L w.r.t. each weight
        4. Update weights: w <- w - learning_rate * gradient
Stop when loss is sufficiently small or validation error stops improving

Loss Function

The loss function measures prediction error. Training minimises the average loss over all training examples.

Task Loss Formula
Binary classification Binary cross-entropy $-[y\log\hat{y} + (1-y)\log(1-\hat{y})]$
Regression MSE $\frac{1}{n}\sum(y_i - \hat{y}_i)^2$

Gradient Descent

Weights are updated by a step opposite to the gradient (downhill on the loss surface):

$$w \leftarrow w - \eta \frac{\partial \mathcal{L}}{\partial w}$$

Where:
- $\eta$: learning rate (hyperparameter controlling step size)
- $\frac{\partial \mathcal{L}}{\partial w}$: gradient of loss with respect to weight $w$

KEY TAKEAWAY: Gradient descent moves weights in the direction that decreases the loss. The learning rate controls how large each step is. Too large: unstable training. Too small: very slow convergence.


Backpropagation

Backpropagation efficiently computes the gradient of the loss with respect to every weight using the chain rule of calculus. Error signals are propagated backwards through the network from output to input.

Conceptually:
1. Compute loss at output
2. Propagate error backward layer by layer
3. Compute how much each weight contributed to the error
4. Update each weight accordingly

EXAM TIP: For VCAA, you do not need to derive or implement backpropagation mathematically. Know the concept: forward pass computes predictions; backward pass computes how to update weights to reduce error.


Key Hyperparameters

Hyperparameter Effect
Learning rate $\eta$ Controls step size per update
Epochs Number of full passes through training data
Batch size Examples used per weight update
Architecture Depth and width determine model capacity

Variants of Gradient Descent

Method Updates per step Pros Cons
Batch GD All examples Stable Slow, memory-heavy
Stochastic GD One example Fast High variance
Mini-batch GD Small batch Balance Most common in practice

Overfitting and Early Stopping

As training proceeds, training loss decreases. If test loss begins to increase while training loss continues to fall, the model is overfitting.

Early stopping: Halt training when validation loss starts increasing — preserving the model at its point of best generalisation.

COMMON MISTAKE: Do not confuse the learning rate with the number of epochs. Learning rate controls how much weights change per update; epochs control how many times the dataset is processed.

VCAA FOCUS: Understand the iterative nature of training (predict, compute error, update weights). Know the role of the loss function, learning rate, and gradient descent. Understand the connection between training and overfitting, and the role of early stopping.

Table of Contents