Training Algorithms with Data

Training Algorithms Using Data

Unlike traditional algorithms that follow explicit rules, data-driven (machine learning) algorithms learn their behaviour from examples. The process of learning from data is called training.

Core Vocabulary

Term	Definition
Model	A mathematical function $f_\theta: X \rightarrow Y$ with adjustable parameters $\theta$
Features	Measurable input properties (e.g., pixel values, word frequencies)
Labels	Target output categories or values
Training data	Labelled examples used to fit the model
Loss function	Measures the error between predictions and true labels
Training	Adjusting $\theta$ to minimise loss on training data

The Training Process

1. Collect and label training data D = {(x_1, y_1), ..., (x_n, y_n)}
2. Choose a model architecture
3. Initialise parameters (randomly or by rule)
4. Repeat until convergence:
    a. Feed input x through model -> prediction y_hat
    b. Compute loss L(y_hat, y)
    c. Adjust parameters to reduce L (e.g. gradient descent)
5. Evaluate on a separate test dataset

KEY TAKEAWAY: Training is iterative optimisation — repeatedly adjust parameters to minimise prediction error on training data. The model learns the mapping from features to labels.

Types of Learning

Type	Training Data	Task	Example
Supervised	Labelled $(x, y)$ pairs	Predict output from input	Image classification, spam detection
Unsupervised	Unlabelled $x$ only	Find structure	Clustering, dimensionality reduction
Reinforcement	Rewards/penalties	Learn optimal actions	Game playing, robotics

VCE Algorithmics focuses on supervised learning (SVM and neural networks).

Train / Test Split

Set	Purpose
Training set	Fit model parameters
Validation set	Tune hyperparameters
Test set	Final unbiased evaluation

A critical rule: never train on test data. Evaluating on training data gives overly optimistic results.

Loss Functions

Task	Common Loss
Binary classification	Binary cross-entropy
Multi-class	Categorical cross-entropy
Regression	Mean squared error (MSE): $\frac{1}{n}\sum(y_i - \hat{y}_i)^2$

EXAM TIP: Know the vocabulary: features, labels, training data, test data, loss function, parameters. Understand the conceptual cycle: predict, measure error, adjust weights, repeat.

COMMON MISTAKE: Training data and test data must be kept strictly separate. A model evaluated on its own training data will appear far more accurate than it truly is (overfitting).

VCAA FOCUS: Explain at a high level how data-driven algorithms learn: collect labelled data, define a model, use optimisation to fit the model to the data, evaluate on unseen data.

Training Algorithms with Data

Table of Contents

About these notes

Join StudyPulse

Training Algorithms with Data

Training Algorithms Using Data

Core Vocabulary

The Training Process

Types of Learning

Train / Test Split

Loss Functions

Table of Contents