Hypothesis testing is a formal procedure used in statistics to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. It follows a logic similar to a legal trial: the “null hypothesis” (innocence) is assumed true until “sufficient evidence” (the sample data) proves otherwise.
Every statistical test involves two competing hypotheses:
| Test Type | Null Hypothesis (\(H_0\)) | Alternative Hypothesis (\(H_1\)) |
|---|---|---|
| Right-tailed | \(H_0: \mu = \mu_0\) | \(H_1: \mu > \mu_0\) |
| Left-tailed | \(H_0: \mu = \mu_0\) | \(H_1: \mu < \mu_0\) |
| Two-tailed | \(H_0: \mu = \mu_0\) | \(H_1: \mu \neq \mu_0\) |
COMMON MISTAKE: Students often use the sample mean \(\bar{x}\) in their hypotheses. Remember, hypotheses are always statements about the population parameter (\(\mu\)), never the sample statistic.
To test the hypothesis about a population mean \(\mu\) where the population standard deviation \(\sigma\) is known, we calculate a test statistic. This value measures how many standard errors the observed sample mean \(\bar{x}\) is away from the hypothesised mean \(\mu_0\).
For a sample size \(n\), the test statistic \(Z\) is:
This formula assumes either the population is normally distributed or \(n\) is large enough (\(n \ge 30\)) for the Central Limit Theorem to apply.
VCAA FOCUS: Ensure you check the conditions for a \(z\)-test. You must know the population standard deviation \(\sigma\). If \(\sigma\) is unknown and the sample is large, Specialist Mathematics students typically use the sample standard deviation \(s\) as an estimate for \(\sigma\).
The \(p\)-value is the probability of observing a sample statistic as extreme as, or more extreme than, the one observed, assuming that the null hypothesis is true.
Let \(Z_{calc}\) be the calculated test statistic from the sample data.
KEY TAKEAWAY: The \(p\)-value is NOT the probability that the null hypothesis is true. It is the probability of the data occurring, given that the null hypothesis is true.
The significance level (\(\alpha\)) is a pre-determined threshold used to decide whether the \(p\)-value is small enough to reject the null hypothesis. Common values for \(\alpha\) are \(0.05\) (5%) and \(0.01\) (1%).
| Result | Conclusion |
|---|---|
| \(p < 0.01\) | Very strong evidence against \(H_0\) |
| \(0.01 \le p < 0.05\) | Strong evidence against \(H_0\) |
| \(0.05 \le p < 0.10\) | Weak evidence against \(H_0\) |
| \(p \ge 0.10\) | Little to no evidence against \(H_0\) |
EXAM TIP: When writing your conclusion in an exam, always relate it back to the context of the question. Don’t just say “Reject \(H_0\)”; say “Reject \(H_0\). There is evidence at the 5% level to suggest that the mean heart rate of participants is higher than 70 bpm.”
The \(p\)-value is influenced by several components of the \(z\)-test calculation:
STUDY HINT: Remember the inverse relationship: A larger test statistic (\(Z\)) results in a smaller \(p\)-value.
Hypothesis testing is not infallible. Because we rely on samples, we can make two types of errors:
| Decision | \(H_0\) is True | \(H_0\) is False |
|---|---|---|
| Do not reject \(H_0\) | Correct Decision (\(1-\alpha\)) | Type II Error (\(\beta\)) |
| Reject \(H_0\) | Type I Error (\(\alpha\)) | Correct Decision (Power) (\(1-\beta\)) |
REMEMBER: To reduce the chance of a Type I error, decrease \(\alpha\) (e.g., from 0.05 to 0.01). However, this will generally increase the chance of a Type II error unless the sample size is increased.