Statistical inference uses data from a sample to draw conclusions about a population. It acknowledges inherent uncertainty through probability.
| Term | Definition |
|---|---|
| Population | The entire group under study |
| Sample | A subset of the population |
| Parameter | A numerical summary of the population (e.g., $\mu$, $p$) |
| Statistic | A numerical summary of the sample (e.g., $\bar{x}$, $\hat{p}$) |
| Estimator | A rule/formula for computing an estimate from data |
| Estimate | The specific value produced by applying the estimator to a sample |
The sampling distribution of a statistic is the probability distribution of that statistic over all possible samples of a given size $n$.
Central Limit Theorem (CLT): For large $n$, the sample mean $\bar{X}$ is approximately normally distributed regardless of the population distribution:
$$\bar{X} \sim N!\left(\mu,\, \frac{\sigma^2}{n}\right) \quad \text{approximately, for large } n$$
The standard error of $\bar{X}$ is $\text{SE} = \dfrac{\sigma}{\sqrt{n}}$.
A confidence interval (CI) is a range of values that is likely to contain the true parameter.
For a population mean with known $\sigma$:
$$\bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}$$
| Confidence level | $z^*$ |
|---|---|
| 90% | 1.645 |
| 95% | 1.960 |
| 99% | 2.576 |
Interpretation: If this procedure is repeated many times, approximately 95% of CIs constructed will contain the true $\mu$. It does NOT mean there is a 95% probability $\mu$ is in this specific interval.
A hypothesis test assesses evidence against a null hypothesis $H_0$.
Decision rule: Reject $H_0$ if $\text{p-value} < \alpha$.
Test statistic for $H_0: \mu = \mu_0$:
$$z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}$$
KEY TAKEAWAY: Inference moves from sample data to population conclusions. A confidence interval estimates a parameter; a hypothesis test assesses whether data is consistent with a specific claim.
EXAM TIP: When interpreting a confidence interval, always refer to the specific context: “We are 95% confident that the true mean wait time is between 4.2 and 6.8 minutes.”
COMMON MISTAKE: Interpreting the p-value as the probability that $H_0$ is true. The p-value is a probability about the data given $H_0$, not about $H_0$ given the data.