Spectrum bias

In biostatistics, spectrum bias refers to the phenomenon that the performance of a diagnostic test may vary in different clinical settings because each setting has a different mix of patients.{{cite journal |vauthors=Ransohoff DF, Feinstein AR |title=Problems of spectrum and bias in evaluating the efficacy of diagnostic tests |journal=N. Engl. J. Med. |volume=299 |issue=17 |pages=926–30 |year=1978 |pmid=692598 |doi=10.1056/NEJM197810262991705}} Because the performance may be dependent on the mix of patients, performance at one clinic may not be predictive of performance at another clinic.{{cite journal |author1=Willis BH |title=Spectrum bias{{snd}}why clinicians need to be cautious when applying diagnostic test studies |journal=Family Practice |year=2008 |volume=25|pmid= 18765409 |doi=10.1093/fampra/cmn051|issue= 5 |pages= 390–96|doi-access=free }} These differences are interpreted as a kind of bias. Mathematically, the spectrum bias is a sampling bias and not a traditional statistical bias; this has led some authors to refer to the phenomenon as spectrum effects,{{cite journal |vauthors=Mulherin SA, Miller WC |title=Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation |journal=Ann. Intern. Med. |volume=137 |issue=7 |pages=598–602 |year=2002 |pmid=12353947 |doi= 10.7326/0003-4819-137-7-200210010-00011|s2cid=35752032 | url = http://www.annals.org/content/137/7/598.full.pdf}} whilst others maintain it is a bias if the true performance of the test differs from that which is 'expected'. Usually the performance of a diagnostic test is measured in terms of its sensitivity and specificity and it is changes in these that are considered when referring to spectrum bias. However, other performance measures such as the likelihood ratios may also be affected by spectrum bias.

Generally spectrum bias is considered to have three causes. The first is due to a change in the case-mix of those patients with the target disorder (disease) and this affects the sensitivity. The second is due to a change in the case-mix of those without the target disorder (disease-free) and this affects the specificity. The third is due to a change in the prevalence, and this affects both the sensitivity and specificity.{{cite journal |last=Willis |first=BH |title=Evidence that disease prevalence may affect the performance of diagnostic tests with an implicit threshold: a cross sectional study |journal=BMJ Open |year=2012 |volume=2 |issue=1 |pages=e000746 |pmid=22307105 |url= |doi=10.1136/bmjopen-2011-000746 |pmc=3274715}} {{open access}} This final cause is not widely appreciated, but there is mounting empirical evidenceLeeflang MM, Bossuyt PM, Irwig L., Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis, J Clin Epidemiol. 2009 Jan;62(1) 5–12. as well as theoretical arguments{{cite journal |vauthors=Goehring C, Perrier A, Morabia A |title=Spectrum bias: a quantitative and graphical analysis of the variability of medical diagnostic test performance |journal=Statistics in Medicine |volume=23 |issue=1 |pages=125–35 |year=2004 |pmid=14695644 |doi=10.1002/sim.1591|s2cid=24636826 }} which suggest that it does indeed affect a test's performance.

Examples where the sensitivity and specificity change between different sub-groups of patients may be found with the carcinoembryonic antigen test{{cite journal |author=Fletcher RH |title=Carcinoembryonic antigen |journal=Ann. Intern. Med. |volume=104 |issue=1 |pages=66–73 |year=1986 |pmid=3510056 |doi=10.7326/0003-4819-104-1-66|url=http://www.bmj.com/cgi/content/short/3/5827/600 |url-access=subscription }} and urinary dipstick tests.{{cite journal |vauthors=Lachs MS, Nachamkin I, Edelstein PH, Goldman J, Feinstein AR, Schwartz JS |title=Spectrum bias in the evaluation of diagnostic tests: lessons from the rapid dipstick test for urinary tract infection |journal=Ann. Intern. Med. |volume=117 |issue=2 |pages=135–40 |year=1992 |pmid=1605428 |doi=10.7326/0003-4819-117-2-135|s2cid=25381473 }}

Diagnostic test performances reported by some studies may be artificially overestimated if it is a case-control design where a healthy population ('fittest of the fit') is compared with a population with advanced disease ('sickest of the sick'); that is two extreme populations are compared, rather than typical healthy and diseased populations.Rutjes AWS, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PMM, Case-control and two-gate designs in diagnostic accuracy studies, Clin Chem 2005;51(8):1335–41.

If properly analyzed, recognition of heterogeneity of subgroups can lead to insights about the test's performance in varying populations.

See also

References