The Chi-Square Test (χ² test) is a non-parametric statistical test used to determine if there is a significant association between categorical variables or whether a sample data fits a population distribution (goodness-of-fit).
It is widely used in hypothesis testing where data is expressed in frequencies or counts.
---
? Types of Chi-Square Tests
1. Chi-Square Test for Goodness of Fit
Checks if a sample distribution fits a theoretical distribution.
2. Chi-Square Test for Independence
Checks if two categorical variables are independent in a contingency table.
3. Chi-Square Test for Homogeneity
Compares distributions across multiple populations.
---
? Uses of Chi-Square Test
Testing hypotheses in categorical data.
Validating theoretical probability distributions.
Testing association or independence between variables (e.g., gender vs. voting preference).
Used in market research, quality control, survey analysis, and bio-statistics.
---
? Formula of Chi-Square Statistic
\chi^2 = \sum \frac{(O - E)^2}{E}
Where:
O = Observed frequency
E = Expected frequency
---
✅ Conditions for Applying Chi-Square Test
1. Data should be categorical (not continuous).
2. Observations must be independent.
3. Expected frequency in each cell should be ≥ 5 for validity.
4. Random sampling should be used.
5. Large sample size is preferred for reliable results.
---
? Chi-Square Test for Goodness of Fit
This test determines how well the observed distribution of data fits with the expected distribution.
Example:
If you roll a die 60 times and get the numbers 1–6 with varying frequency, the test checks if the die is fair (i.e., each number expected 10 times).
---
? Chi-Square Test for Independence (Contingency Table Test)
Used to determine whether two attributes are independent.
Example:
Testing whether gender is independent of product preference using a 2x2 or larger contingency table.
---
⚠️ Misuses or Limitations of Chi-Square Test
1. Using with small expected frequencies can lead to invalid results.
2. Applying it to continuous data without proper classification.
3. Using it with non-random samples or biased data.
The degrees of freedom (df) depend on the number of categories:
For Goodness of Fit:
For Independence:
The Chi-square distribution is positively skewed and becomes normal as sample size increases.
---
? Conclusion
The Chi-Square Test is a powerful tool for analyzing categorical data and checking how well theoretical models fit observed data. However, it must be applied with care, ensuring its assumptions are met to avoid incorrect conclusions.