Multi-Layer Perceptron Structure

Structure of Multi-Layer Perceptron (MLP) Neural Networks

A multi-layer perceptron (MLP) is a feedforward neural network with at least three layers. Information flows in one direction: input to hidden to output.

Layers

Layer	Role	Neurons
Input layer	Receives feature values; no computation	One per feature
Hidden layer(s)	Transform inputs into useful representations	Defined by designer
Output layer	Produces the final prediction	One per class (classification)

Fully connected (dense): Every neuron in each layer connects to every neuron in the next layer.

Notation

For a network with $d$ inputs, $h$ hidden neurons (one hidden layer), $k$ outputs:

$W^{(1)}$: weight matrix, shape $h \times d$ (input-to-hidden weights)
$W^{(2)}$: weight matrix, shape $k \times h$ (hidden-to-output weights)
$\mathbf{b}^{(1)}$: bias vector, length $h$
$\mathbf{b}^{(2)}$: bias vector, length $k$

Diagram

Input layer      Hidden layer     Output layer
  x_1  -----\
              \--> h_1 --\
  x_2  ---------> h_2 ----> output y_1
              \--> h_3 --/
  x_3  -----/

Each arrow represents a weight.

Computing a Single Hidden Neuron

For hidden neuron $j$ in the first hidden layer:
\$$z_j^{(1)} = \sum_{i=1}^{d} w_{ji}^{(1)} x_i + b_j^{(1)}$\$
\$$a_j^{(1)} = \sigma\bigl(z_j^{(1)}\bigr)$\$

In matrix form for the entire hidden layer:
\$$\mathbf{z}^{(1)} = W^{(1)} \mathbf{x} + \mathbf{b}^{(1)}, \qquad \mathbf{a}^{(1)} = \sigma\bigl(\mathbf{z}^{(1)}\bigr)$\$

Example: 2 inputs, 2 hidden neurons, 1 output

Weights:
\$$W^{(1)} = \begin{pmatrix} 1 & 2 \\ -1 & 0 \end{pmatrix}, \quad \mathbf{b}^{(1)} = \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \quad W^{(2)} = (1,\; -1), \quad b^{(2)} = 0$\$

Input: $\mathbf{x} = (1, 1)$

Hidden pre-activations:
\$$z_1 = 1 \cdot 1 + 2 \cdot 1 + 0 = 3, \quad z_2 = -1 \cdot 1 + 0 \cdot 1 + 1 = 0$\$

Hidden activations (sigmoid):
\$$a_1 = \sigma(3) \approx 0.953, \quad a_2 = \sigma(0) = 0.5$\$

Output:
\$$z^{(2)} = 1 \cdot 0.953 + (-1) \cdot 0.5 + 0 = 0.453$\$
\$$\hat{y} = \sigma(0.453) \approx 0.611$\$

KEY TAKEAWAY: An MLP processes information by sequentially applying weighted sums and activation functions layer by layer. Each hidden layer learns a higher-level representation of the input.

Role of Activation Functions

Without non-linear activations, any MLP collapses to a single linear transformation regardless of depth. Non-linear activations (sigmoid, ReLU) are what give deep networks their expressive power.

EXAM TIP: VCAA will ask you to compute forward propagation on a small MLP. Practice the full calculation: compute $z$ for each neuron, apply $\sigma$, pass results to the next layer. Show all working.

VCAA FOCUS: Describe the layered structure (input, hidden, output). Explain the role of weights and biases. Know how to perform forward propagation through a small network.

Multi-Layer Perceptron Structure

Table of Contents

About these notes

Join StudyPulse