Forward propagation is the process of computing a neural network’s output by passing input values through the network layer by layer, from input to output. It is used during both prediction and training.
For a network with layers \$1, 2, \ldots, L$:
Network: 2 inputs, 2 hidden neurons (sigmoid), 1 output (sigmoid)
$$W^{(1)} = \begin{pmatrix} 0.5 & 0.2 \ -0.3 & 0.8 \end{pmatrix}, \quad \mathbf{b}^{(1)} = \begin{pmatrix} 0.1 \ -0.1 \end{pmatrix}$$
$$W^{(2)} = (1.2,\; -0.7), \quad b^{(2)} = 0$$
Input: $\mathbf{x} = (1, 0)^T$
Step 1: Hidden pre-activations
$$z_1 = 0.5(1) + 0.2(0) + 0.1 = 0.6$$
$$z_2 = -0.3(1) + 0.8(0) + (-0.1) = -0.4$$
Step 2: Hidden activations (sigmoid)
$$a_1 = \sigma(0.6) = \frac{1}{1+e^{-0.6}} \approx 0.646$$
$$a_2 = \sigma(-0.4) = \frac{1}{1+e^{0.4}} \approx 0.401$$
Step 3: Output pre-activation
$$z^{(2)} = 1.2(0.646) + (-0.7)(0.401) + 0 = 0.775 - 0.281 = 0.494$$
Step 4: Output activation
$$\hat{y} = \sigma(0.494) \approx 0.621$$
Classification: Since $\hat{y} > 0.5$, predict class $+1$.
KEY TAKEAWAY: Forward propagation flows strictly from input to output, computing weighted sums and activations at each layer. It is deterministic: given fixed weights and an input, the output is always the same.
| $z$ | $\sigma(z)$ |
|---|---|
| $-2$ | $\approx 0.119$ |
| $-1$ | $\approx 0.269$ |
| $0$ | $0.500$ |
| $1$ | $\approx 0.731$ |
| $2$ | $\approx 0.880$ |
| Activation function | Decision rule |
|---|---|
| Sigmoid ($0$ to $1$) | Predict $+1$ if $\hat{y} > 0.5$ |
| Step ($0$ or $1$) | Predict $+1$ if $\hat{y} = 1$ |
EXAM TIP: Forward propagation is a common VCAA calculation question. Practice the full numerical calculation on small networks (2-3 neurons per layer). Show all intermediate steps — partial credit is awarded for correct working.
COMMON MISTAKE: Apply the activation function after the weighted sum, not before. Order: compute $z = \mathbf{w} \cdot \mathbf{a} + b$, then $a = \sigma(z)$.
VCAA FOCUS: Be able to evaluate the output of a small MLP given specific weights, biases, and inputs. Show the calculation at each layer. State the final predicted class.