Statistical Inference: Sample Proportions

1. Population Parameter vs. Sample Statistic

Population Parameter: A numerical value that describes a characteristic of the entire population. It is usually unknown and denoted by Greek letters (e.g., $p$ for population proportion).
Sample Statistic: A numerical value that describes a characteristic of a sample taken from the population. It is used to estimate the population parameter and is denoted by Roman letters (e.g., $\hat{p}$ for sample proportion).

Feature	Population Parameter	Sample Statistic
Definition	Describes population	Describes sample
Notation	Greek letters (e.g., $p$)	Roman letters (e.g., $\hat{p}$)
Known/Unknown	Usually Unknown	Known
Variability	Constant	Varies between samples
Purpose	True value	Estimate of population parameter

KEY TAKEAWAY: The sample statistic is our best guess for the population parameter, but it’s inherently variable due to random sampling.

2. Simulation of Random Sampling

Purpose: To visualize the distribution of sample proportions ($\hat{P}$) and understand how confidence intervals vary from sample to sample.
Process:
1. Define the population proportion ($p$).
2. Choose a sample size ($n$).
3. Generate many random samples of size $n$ from the population.
4. Calculate the sample proportion ($\hat{p}$) for each sample.
5. Plot the distribution of the sample proportions ($\hat{P}$).
6. Calculate confidence intervals for each sample.
Observations:
- The distribution of $\hat{P}$ becomes approximately normal as $n$ increases.
- The mean of the distribution of $\hat{P}$ is close to $p$.
- The standard deviation of the distribution of $\hat{P}$ decreases as $n$ increases.
- Confidence intervals vary in width and position from sample to sample.
- Increasing sample size n reduces the width of the confidence interval.

STUDY HINT: Use statistical software or online simulators to experiment with different values of $p$ and $n$ to observe the effects on the distribution of $\hat{P}$ and confidence intervals.

3. Sample Proportion as a Random Variable

Definition: The sample proportion, denoted by $\hat{P}$, is the proportion of items in a sample that have a particular characteristic.
Formula: $\hat{P} = \frac{X}{n}$, where:
- $X$ is the number of items with the characteristic in the sample. $X$ follows a binomial distribution: $X \sim Bin(n, p)$.
- $n$ is the sample size.
Random Variable: $\hat{P}$ is a random variable because its value varies from sample to sample due to random sampling.
Relationship to Binomial Distribution: Since $X$ follows a binomial distribution, the distribution of $\hat{P}$ is related to the binomial distribution.

EXAM TIP: Be clear about the difference between $X$ (number of successes) and $\hat{P}$ (proportion of successes).

4. Approximate Normality of the Distribution of $\hat{P}$

Condition: For large samples, the distribution of $\hat{P}$ is approximately normal. A common rule of thumb is that $np \geq 5$ and $n(1-p) \geq 5$.
Mean: The mean of the distribution of $\hat{P}$ is the population proportion, $p$.
\$$E(\hat{P}) = p$\$
Standard Deviation: The standard deviation of the distribution of $\hat{P}$ is:
\$$SD(\hat{P}) = \sqrt{\frac{p(1-p)}{n}}$\$
Implication: When the distribution of $\hat{P}$ is approximately normal, we can use the standard normal distribution to calculate probabilities and confidence intervals.

COMMON MISTAKE: Forgetting to check the condition for approximate normality before using the normal distribution to analyze $\hat{P}$.

5. Confidence Intervals for a Population Proportion

Definition: A confidence interval is a range of values that is likely to contain the true population proportion, $p$.
Formula: The approximate confidence interval for a population proportion is:
\$\$ \left(\hat{p}-z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right) \$\$
where:
- $\hat{p}$ is the sample proportion.
- $n$ is the sample size.
- $z$ is the z-score corresponding to the desired level of confidence (quantile for the standard normal distribution).
Standard Error: The term $\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ is sometimes referred to as the standard error of the sample proportion.
95% Confidence Interval: For a 95% confidence interval, $z \approx 1.96$. The interval is:
\$\$ \left(\hat{p}-1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right) \$\$
Interpretation: We are 95% confident that the true population proportion, $p$, lies within the calculated interval. This does not mean that there is a 95% chance that p is in the interval. Rather, if we were to repeat the sampling process many times, 95% of the intervals constructed would contain the true population proportion.

VCAA FOCUS: VCAA often requires you to interpret the meaning of a confidence interval in context.

6. Factors Affecting Confidence Interval Width

Sample Size (n): As the sample size increases, the width of the confidence interval decreases. Larger samples provide more information about the population.
Confidence Level: As the confidence level increases (e.g., from 95% to 99%), the width of the confidence interval increases. A higher confidence level requires a wider interval to capture the true population proportion with greater certainty.
Sample Proportion (p-hat): The width of the interval is also affected by $\hat{p}$. The further $\hat{p}$ is from 0.5, the smaller the standard error becomes, and hence the narrower the confidence interval.

REMEMBER: Larger sample size = narrower interval; Higher confidence level = wider interval.

7. Example

Suppose a survey of 500 randomly selected voters finds that 55% support a particular candidate. Calculate a 95% confidence interval for the proportion of all voters who support the candidate.

$\hat{p} = 0.55$
$n = 500$
$z = 1.96$

The 95% confidence interval is:

\[ \left(0.55 - 1.96 \sqrt{\frac{0.55(1-0.55)}{500}}, 0.55 + 1.96 \sqrt{\frac{0.55(1-0.55)}{500}}\right) \]

\[ \left(0.55 - 1.96 \sqrt{\frac{0.2475}{500}}, 0.55 + 1.96 \sqrt{\frac{0.2475}{500}}\right) \]

\[ \left(0.55 - 1.96(0.02226), 0.55 + 1.96(0.02226)\right) \]

\[ (0.5064, 0.5936) \]

Therefore, we are 95% confident that the true proportion of all voters who support the candidate is between 50.64% and 59.36%.

APPLICATION: Confidence intervals are widely used in surveys, opinion polls, and scientific research to estimate population parameters.

Statistical Inference: Sample Proportions

Table of Contents

About these notes

Join StudyPulse