Biweight midcorrelation
{{Short description|Measure of similarity between samples}}
In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.{{cite book|last1=Wilcox|first1=Rand|title=Introduction to Robust Estimation and Hypothesis Testing|date=January 12, 2012|publisher=Academic Press|isbn=978-0123869838|page=455|edition=3rd}}
Derivation
Here we find the biweight midcorrelation of two vectors and , with items, representing each item in the vector as and . First, we define as the median of a vector and as the median absolute deviation (MAD), then define and as,
:
\begin{align}
u_i &= \frac{x_i - \operatorname{med}(x)}{9 \operatorname{mad}(x)},\\
v_i &= \frac{y_i - \operatorname{med}(y)}{9 \operatorname{mad}(y)}.
\end{align}
Now we define the weights and as,
:
\begin{align}
w_i^{(x)} &= \left(1-u_i^2\right)^2 I\left(1-|u_i|\right)\\
w_i^{(y)} &= \left(1-v_i^2\right)^2 I\left(1-|v_i|\right)
\end{align}
where is the identity function where,
:
I(x) = \begin{cases}1, & \text{if } x >0\\
0, & \text{otherwise}\end{cases}
Then we normalize so that the sum of the weights is 1:
:
\begin{align}
\tilde{x}_i &= \frac{\left(x_i - \operatorname{med}(x)\right) w_i^{(x)}}{\sqrt{\sum_{j=1}^m \left[(x_j -\operatorname{med}(x)) w_j^{(x)}\right]^2}}\\
\tilde{y}_i &= \frac{\left(y_i - \operatorname{med}(y)\right) w_i^{(y)}}{\sqrt{\sum_{j=1}^m \left[(y_j -\operatorname{med}(y)) w_j^{(y)}\right]^2}}.
\end{align}
Finally, we define biweight midcorrelation as,
:
\mathrm{bicor}\left(x, y\right) = \sum_{i=1}^m \tilde{x}_i \tilde{y}_i
Applications
Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,{{cite journal|last1=Song|first1=Lin|title=Comparison of co-expression measures: mutual information, correlation, and model based indices|journal=BMC Bioinformatics|date=9 December 2012|volume=13|issue=328|page=328 |doi=10.1186/1471-2105-13-328|pmid=23217028|pmc=3586947 |doi-access=free }} and is often used for weighted correlation network analysis.
Implementations
Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor
as part of the WGCNA package{{cite web|last1=Langfelder|first1=Peter|title=WGCNA: Weighted Correlation Network Analysis (an R package)|url=https://cran.r-project.org/package=WGCNA|website=CRAN|accessdate=2018-04-06}}
Also implemented in the Raku programming language as the function bi_cor_coef
as part of the Statistics module.{{cite web|last1=Khanal|first1=Suman|title=Statistics: Raku module for doing statistics | url=https://github.com/sumanstats/Statistics|website=GitHub|accessdate=2022-03-11}}