Accuracy paradox

The accuracy paradox is the paradoxical finding that accuracy is not a good metric for predictive models when classifying in predictive analytics. This is because a simple model may have a high level of accuracy but too crude to be useful. For example, if the incidence of category A is dominant, being found in 99% of cases, then predicting that {{em|every}} case is category A will have an accuracy of 99%. Precision and recall are better measures in such cases.

The underlying issue is that there is a class imbalance between the positive class and the negative class. Prior probabilities for these classes need to be accounted for in error analysis. Precision and recall help, but precision too can be biased by unbalanced class priors in the test sets.{{cn|date=June 2024}}

Example

For example, a city of 1 million people has ten terrorists. A profiling system results in the following confusion matrix:

class="wikitable" style="text-align:center;" ! {{diagonal split header\|Actual class\|Predicted class}} ! Fail !! Pass !! Sum
Fail \| 10 \|\| 0 \|\| 10
Pass \| 990 \|\| 999000 \|\| 999990
Sum \| 1000 \|\| 999000 \|\| 1000000

class="wikitable" style="text-align:center;"

! {{diagonal split header|Actual class|Predicted
class}}

! Fail !! Pass !! Sum

Fail

| 10 || 0 || 10

Pass

| 990 || 999000 || 999990

Sum

| 1000 || 999000 || 1000000

Even though the accuracy is {{sfrac|10 + 999000|1000000}} ≈ 99.9%, 990 out of the 1000 positive predictions are incorrect. The precision of {{sfrac|10|10 + 990}} = 1% reveals its poor performance. As the classes are so unbalanced, a better metric is the F1 score = {{sfrac|2 × 0.01 × 1|0.01 + 1}} ≈ 2% (the recall being {{sfrac|10 + 0|10}} = 1).

Literature

Kubat, M. (2000). Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. Fourteenth International Conference on Machine Learning.

References

{{reflist |refs=

{{citation |last=Abma |first=B. J. M. |date=10 September 2009 |title=Evaluation of requirements management tools with support for traceability-based change impact analysis |publisher=University of Twente |pages=86–87 |url=https://www.utwente.nl/en/eemcs/trese/graduation_projects/2009/Abma.pdf |access-date=24 November 2018 |archive-date=25 November 2018 |archive-url=https://web.archive.org/web/20181125073901/https://www.utwente.nl/en/eemcs/trese/graduation_projects/2009/Abma.pdf |url-status=dead }}

{{citation |work=Information Access Evaluation. Multilinguality, Multimodality, and Visualization |year=2013 |publisher=Springer |isbn=9783642408021 |last1=Valverde-Albacete |last2=Carillo-de-Albornoz |last3=Peláez-Moreno |title=A Proposal for New Evaluation Metrics and Result Vizualization Technique for Sentiment Analysis Tasks}}

}}

Category:Statistical paradoxes

Accuracy paradox

Example

Literature

See also

References