Draft:CERNO test

{{Short description|Statistical test that has been applied for gene set analysis in bioinformatics}}

{{Draft topics|mathematics}}

{{AfC topic|stem}}

{{AfC submission|||ts=20250504102619|u=Hsalamon|ns=118}}

{{AfC submission|t||ts=20250504102228|u=Hsalamon|ns=118|demo=}}

{{Infobox statistical method

| name = CERNO test

| field = Bioinformatics, Statistics, Data Science

}}

CERNO (Coincident Extreme Ranks in Numerical Observations) is a non-parametric, rank-based statistical test that evaluates the distribution of ranks for a subset of samples that have been labeled (the labels defining the subset). The method has been used in gene set and pathway analysis. In this applied context, the method assesses whether a predefined set of genes, proteins, or other features shows coincident enrichment for high or low ranks within a globally ranked list.

Publication of the Method

The CERNO statistic was first published in a 2008 study on interferon-beta-regulated gene expression in relapsing–remitting multiple sclerosis.{{cite journal |last1=Yamaguchi |first1=KD |last2=Ruderman |first2=DL |last3=Croze |first3=E |last4=Wagner |first4=TC |last5=Velichko |first5=S |last6=Reder |first6=AT |last7=Salamon |first7=H |title=IFN-beta-regulated genes show abnormal expression in therapy-naive relapsing-remitting MS mononuclear cells: Gene expression analysis employing all reported protein-protein interactions |journal=Journal of Neuroimmunology |volume=195 |issue=1–2 |pages=116–120 |date=2008 |doi=10.1016/j.jneuroim.2007.12.007 |pmid=18280692}} It was subsequently used in transcriptomic and proteomic studies.{{cite journal |last1=Kunnath-Velayudhan |first1=S |last2=Salamon |first2=H |last3=Wang |first3=HY |last4=Davidow |first4=AL |last5=Molina |first5=DM |last6=Huynh |first6=VT |last7=Cirillo |first7=DM |last8=Michel |first8=G |last9=Talbot |first9=EA |last10=Perkins |first10=MD |last11=Felgner |first11=PL |last12=Liang |first12=X |last13=Gennaro |first13=ML |title=Dynamic antibody responses to the Mycobacterium tuberculosis proteome |journal=Proceedings of the National Academy of Sciences |volume=107 |issue=33 |pages=14703–8 |date=2010-08-17 |doi=10.1073/pnas.1009080107 |doi-access=free |pmid=20668240 |pmc=2930474|bibcode=2010PNAS..10714703K }} The test was fully described in the supplementary materials of a 2013 pharmacogenomics study.{{cite journal |last1=Croze |first1=E |last2=Yamaguchi |first2=KD |last3=Knappertz |first3=V |last4=Reder |first4=AT |last5=Salamon |first5=H |title=Interferon-beta-1b-induced short- and long-term signatures of treatment activity in multiple sclerosis |journal=The Pharmacogenomics Journal |volume=13 |issue=5 |pages=443–451 |date=October 2013 |doi=10.1038/tpj.2012.27 |pmid=22711062 |pmc=3793239}}

The first independent, comprehensive evaluation of the algorithm was published by Zyla et al. in 2019.{{cite journal |last1=Zyla |first1=J |last2=Marczyk |first2=M |last3=Domaszewska |first3=T |last4=Kaufmann |first4=SHE |last5=Polanska |first5=J |last6=Weiner |first6=J |title=Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms |journal=Bioinformatics |volume=35 |issue=24 |pages=5146–5154 |date=2019-12-15 |doi=10.1093/bioinformatics/btz447 |pmid=31165139 |pmc=6954644}}

Methodology

The CERNO test evaluates whether the ranks of a set of genes or features within a genome-wide ranking (from most to least significant by any metric) are collectively more extreme than would be expected by chance. This makes it sensitive to sets with even a few strongly ranked members, rather than requiring uniform or over-a-threshold significance of all genes in the set.

The test statistic for a gene set of size k in a ranked list of N genes is:

:S = -2 \sum_{i=1}^k \ln\left(\frac{r_i}{N}\right)

where ri is the rank of the ith gene in the set. Under the null hypothesis of random rank distribution, S follows a chi-square distribution with 2k degrees of freedom.

Applications

CERNO is referenced in over a 100 publications on genes and proteins in PubMed Central as of May 2025.

Comparison with Other Methods

Zyla et al. noted some advantages of CERNO, including that it showed the highest reproducibility of the methods they investigated, as well as good sensitivity, prioritization and low computational time. That study notes the non-parametric method is robust to ranking metrics, as well as sample and gene set size.

CERNO is Related to Fisher’s Method of Combining Tests

The CERNO test is mathematically related to Fisher's method of combining p-values for independent statistical tests. Fisher’s method is known for its favorable asymptotic properties, especially as measured by Bahadur efficiency{{cite web |title=Bahadur efficiency |url=https://encyclopediaofmath.org/wiki/Bahadur_efficiency |website=Encyclopedia of Mathematics}}, which describes the rate at which the observed significance of a test statistic converges to zero as the sample size increases. Tests with higher Bahadur efficiency exhibit rapid convergence.

Littell and Folks (1971) demonstrated the asymptotic optimality of Fisher’s method of combining tests, showing that for independent tests, the negative logarithm of the significance level (−2log(significance)) diverges to infinity at the fastest possible rate among combination tests.{{cite journal |last1=Littell |first1=RC |last2=Folks |first2=JL |title=Asymptotic Optimality of Fisher's Method of Combining Tests |journal=Journal of the American Statistical Association |volume=66 |issue=336 |pages=802–806 |year=1971 |doi=10.1080/01621459.1971.10482347 |jstor=2284251}}

In contrast, the Kolmogorov–Smirnov test, which is the basis for several gene set analysis methods, was shown by Hwang (1982) to have much lower Bahadur efficiency compared to the chi-squared test.{{cite journal |last1=Hwang |first1=TY |title=Bahadur Efficiency of the One Sample Kolmogorov-Smirnov Test for Normal Alternatives |journal=Sankhyā: The Indian Journal of Statistics, Series A |volume=44 |issue=2 |pages=233–241 |year=1982 |jstor=25050525}} The Kolmogorov–Smirnov test is "always well worse" than the chi-squared test in this measure. This is relevant as the CERNO statistic S follows a chi-square distribution with 2k degrees of freedom.

As the Kolmogorov–Smirnov test is the basis of many commonly used gene set enrichment analysis methods, CERNO—which reflects Fisher’s combined test properties—may offer statistical power or efficiency advantages in this context.

Software

The CERNO method is easily implemented due to its simple mathematical form. CERNO has been implemented in the {{nowrap|tmod}} R package{{cite web|title=tmod: General and Multivariate Enrichment Analysis|url=https://cran.r-project.org/web/packages/tmod/index.html|website=CRAN|date=31 March 2023 }}.

See also

  • Gene set enrichment analysis
  • Fisher’s method
  • Order statistics
  • Pathway analysis

References

{{reflist}}