Biweight midcorrelation

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.{{cite book|last1=Wilcox|first1=Rand|title=Introduction to Robust Estimation and Hypothesis Testing|date=January 12, 2012|publisher=Academic Press|isbn=978-0123869838|page=455|edition=3rd}}

Derivation

Here we find the biweight midcorrelation of two vectors $x$ and $y$ , with $i=1,2, \ldots,m$ items, representing each item in the vector as $x_1, x_2, \ldots, x_m$ and $y_1, y_2, \ldots, y_m$ . First, we define $\operatorname{med}(x)$ as the median of a vector $x$ and $\operatorname{mad}(x)$ as the median absolute deviation (MAD), then define $u_i$ and $v_i$ as,

: $\begin{align}
u_i &= \frac{x_i - \operatorname{med}(x)}{9 \operatorname{mad}(x)},\\
v_i &= \frac{y_i - \operatorname{med}(y)}{9 \operatorname{mad}(y)}.
\end{align}$

Now we define the weights $w_i^{(x)}$ and $w_i^{(y)}$ as,

: $\begin{align}
w_i^{(x)} &= \left(1-u_i^2\right)^2 I\left(1-|u_i|\right)\\
w_i^{(y)} &= \left(1-v_i^2\right)^2 I\left(1-|v_i|\right)
\end{align}$

where $I$ is the identity function where,

: $I(x) = \begin{cases}1, & \text{if } x >0\\
0, & \text{otherwise}\end{cases}$

Then we normalize so that the sum of the weights is 1:

: $\begin{align}
\tilde{x}_i &= \frac{\left(x_i - \operatorname{med}(x)\right) w_i^{(x)}}{\sqrt{\sum_{j=1}^m \left[(x_j -\operatorname{med}(x)) w_j^{(x)}\right]^2}}\\
\tilde{y}_i &= \frac{\left(y_i - \operatorname{med}(y)\right) w_i^{(y)}}{\sqrt{\sum_{j=1}^m \left[(y_j -\operatorname{med}(y)) w_j^{(y)}\right]^2}}.
\end{align}$

Finally, we define biweight midcorrelation as,

: $\mathrm{bicor}\left(x, y\right) = \sum_{i=1}^m \tilde{x}_i \tilde{y}_i$

Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,{{cite journal|last1=Song|first1=Lin|title=Comparison of co-expression measures: mutual information, correlation, and model based indices|journal=BMC Bioinformatics|date=9 December 2012|volume=13|issue=328|page=328 |doi=10.1186/1471-2105-13-328|pmid=23217028|pmc=3586947 |doi-access=free }} and is often used for weighted correlation network analysis.

Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package{{cite web|last1=Langfelder|first1=Peter|title=WGCNA: Weighted Correlation Network Analysis (an R package)|url=https://cran.r-project.org/package=WGCNA|website=CRAN|accessdate=2018-04-06}}

Also implemented in the Raku programming language as the function bi_cor_coef as part of the Statistics module.{{cite web|last1=Khanal|first1=Suman|title=Statistics: Raku module for doing statistics | url=https://github.com/sumanstats/Statistics|website=GitHub|accessdate=2022-03-11}}

References

Category:Parametric statistics

Category:Covariance and correlation

Biweight midcorrelation

Derivation

Applications

Implementations

See also

References