Statistical Inference: Sample Proportions - StudyPulse
Boost Your VCE Scores Today with StudyPulse
8000+ Questions AI Tutor Help
Home Subjects Mathematical Methods Statistical Inference

Statistical Inference: Sample Proportions

Mathematical Methods
StudyPulse

Statistical Inference: Sample Proportions

Mathematical Methods
05 Apr 2025

Statistical Inference: Sample Proportions

1. Population Parameter vs. Sample Statistic

  • Population Parameter: A numerical value that describes a characteristic of the entire population. It is usually unknown and denoted by Greek letters (e.g., \(p\) for population proportion).

  • Sample Statistic: A numerical value that describes a characteristic of a sample taken from the population. It is used to estimate the population parameter and is denoted by Roman letters (e.g., \(\hat{p}\) for sample proportion).

Feature Population Parameter Sample Statistic
Definition Describes population Describes sample
Notation Greek letters (e.g., \(p\)) Roman letters (e.g., \(\hat{p}\))
Known/Unknown Usually Unknown Known
Variability Constant Varies between samples
Purpose True value Estimate of population parameter

KEY TAKEAWAY: The sample statistic is our best guess for the population parameter, but it’s inherently variable due to random sampling.

2. Simulation of Random Sampling

  • Purpose: To visualize the distribution of sample proportions (\(\hat{P}\)) and understand how confidence intervals vary from sample to sample.
  • Process:
    1. Define the population proportion (\(p\)).
    2. Choose a sample size (\(n\)).
    3. Generate many random samples of size \(n\) from the population.
    4. Calculate the sample proportion (\(\hat{p}\)) for each sample.
    5. Plot the distribution of the sample proportions (\(\hat{P}\)).
    6. Calculate confidence intervals for each sample.
  • Observations:
    • The distribution of \(\hat{P}\) becomes approximately normal as \(n\) increases.
    • The mean of the distribution of \(\hat{P}\) is close to \(p\).
    • The standard deviation of the distribution of \(\hat{P}\) decreases as \(n\) increases.
    • Confidence intervals vary in width and position from sample to sample.
    • Increasing sample size n reduces the width of the confidence interval.

STUDY HINT: Use statistical software or online simulators to experiment with different values of \(p\) and \(n\) to observe the effects on the distribution of \(\hat{P}\) and confidence intervals.

3. Sample Proportion as a Random Variable

  • Definition: The sample proportion, denoted by \(\hat{P}\), is the proportion of items in a sample that have a particular characteristic.
  • Formula: \(\hat{P} = \frac{X}{n}\), where:
    • \(X\) is the number of items with the characteristic in the sample. \(X\) follows a binomial distribution: \(X \sim Bin(n, p)\).
    • \(n\) is the sample size.
  • Random Variable: \(\hat{P}\) is a random variable because its value varies from sample to sample due to random sampling.
  • Relationship to Binomial Distribution: Since \(X\) follows a binomial distribution, the distribution of \(\hat{P}\) is related to the binomial distribution.

EXAM TIP: Be clear about the difference between \(X\) (number of successes) and \(\hat{P}\) (proportion of successes).

4. Approximate Normality of the Distribution of \(\hat{P}\)

  • Condition: For large samples, the distribution of \(\hat{P}\) is approximately normal. A common rule of thumb is that \(np \geq 5\) and \(n(1-p) \geq 5\).
  • Mean: The mean of the distribution of \(\hat{P}\) is the population proportion, \(p\).
    \$\(E(\hat{P}) = p\)\$
  • Standard Deviation: The standard deviation of the distribution of \(\hat{P}\) is:
    \$\(SD(\hat{P}) = \sqrt{\frac{p(1-p)}{n}}\)\$
  • Implication: When the distribution of \(\hat{P}\) is approximately normal, we can use the standard normal distribution to calculate probabilities and confidence intervals.

COMMON MISTAKE: Forgetting to check the condition for approximate normality before using the normal distribution to analyze \(\hat{P}\).

5. Confidence Intervals for a Population Proportion

  • Definition: A confidence interval is a range of values that is likely to contain the true population proportion, \(p\).
  • Formula: The approximate confidence interval for a population proportion is:
    \$\$ \left(\hat{p}-z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right) \$\$
    where:
    • \(\hat{p}\) is the sample proportion.
    • \(n\) is the sample size.
    • \(z\) is the z-score corresponding to the desired level of confidence (quantile for the standard normal distribution).
  • Standard Error: The term \(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\) is sometimes referred to as the standard error of the sample proportion.
  • 95% Confidence Interval: For a 95% confidence interval, \(z \approx 1.96\). The interval is:
    \$\$ \left(\hat{p}-1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right) \$\$
  • Interpretation: We are 95% confident that the true population proportion, \(p\), lies within the calculated interval. This does not mean that there is a 95% chance that p is in the interval. Rather, if we were to repeat the sampling process many times, 95% of the intervals constructed would contain the true population proportion.

VCAA FOCUS: VCAA often requires you to interpret the meaning of a confidence interval in context.

6. Factors Affecting Confidence Interval Width

  • Sample Size (n): As the sample size increases, the width of the confidence interval decreases. Larger samples provide more information about the population.
  • Confidence Level: As the confidence level increases (e.g., from 95% to 99%), the width of the confidence interval increases. A higher confidence level requires a wider interval to capture the true population proportion with greater certainty.
  • Sample Proportion (p-hat): The width of the interval is also affected by \(\hat{p}\). The further \(\hat{p}\) is from 0.5, the smaller the standard error becomes, and hence the narrower the confidence interval.

REMEMBER: Larger sample size = narrower interval; Higher confidence level = wider interval.

7. Example

Suppose a survey of 500 randomly selected voters finds that 55% support a particular candidate. Calculate a 95% confidence interval for the proportion of all voters who support the candidate.

  • \(\hat{p} = 0.55\)
  • \(n = 500\)
  • \(z = 1.96\)

The 95% confidence interval is:

\[ \left(0.55 - 1.96 \sqrt{\frac{0.55(1-0.55)}{500}}, 0.55 + 1.96 \sqrt{\frac{0.55(1-0.55)}{500}}\right) \]
\[ \left(0.55 - 1.96 \sqrt{\frac{0.2475}{500}}, 0.55 + 1.96 \sqrt{\frac{0.2475}{500}}\right) \]
\[ \left(0.55 - 1.96(0.02226), 0.55 + 1.96(0.02226)\right) \]
\[ (0.5064, 0.5936) \]

Therefore, we are 95% confident that the true proportion of all voters who support the candidate is between 50.64% and 59.36%.

APPLICATION: Confidence intervals are widely used in surveys, opinion polls, and scientific research to estimate population parameters.

Table of Contents