transfer entropy

{{Short description|Non-parametric statistic on information transfer}}

Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes.{{cite journal|last=Schreiber|first=Thomas|title=Measuring information transfer|journal=Physical Review Letters|date=1 July 2000|volume=85|issue=2|pages=461–464|doi=10.1103/PhysRevLett.85.461|pmid=10991308|arxiv=nlin/0001042|bibcode=2000PhRvL..85..461S|s2cid=7411376}}{{cite journal|year= 2007 |title = Granger causality |volume = 2 |issue = 7 |pages = 1667 |last= Seth |first=Anil|journal=Scholarpedia |doi=10.4249/scholarpedia.1667 |bibcode=2007SchpJ...2.1667S|doi-access= free }}{{cite journal|last=Hlaváčková-Schindler|first=Katerina|author2=Palus, M |author3=Vejmelka, M |author4= Bhattacharya, J |title=Causality detection based on information-theoretic approaches in time series analysis|journal=Physics Reports|date=1 March 2007|volume=441|issue=1|pages=1–46|doi=10.1016/j.physrep.2006.12.004|bibcode=2007PhR...441....1H|citeseerx=10.1.1.183.1617}} Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. More specifically, if X_t and Y_t for t\in \mathbb{N} denote two random processes and the amount of information is measured using Shannon's entropy, the transfer entropy can be written as:

:

T_{X\rightarrow Y} = H\left( Y_t \mid Y_{t-1:t-L}\right) - H\left( Y_t \mid Y_{t-1:t-L}, X_{t-1:t-L}\right),

where H(X) is Shannon's entropy of X. The above definition of transfer entropy has been extended by other types of entropy measures such as Rényi entropy.{{Cite journal|last1=Jizba|first1=Petr|last2=Kleinert|first2=Hagen|last3=Shefaat|first3=Mohammad|date=2012-05-15|title=Rényi's information transfer between financial time series|journal=Physica A: Statistical Mechanics and Its Applications|language=en|volume=391|issue=10|pages=2971–2989|doi=10.1016/j.physa.2011.12.064|issn=0378-4371|arxiv=1106.5913|bibcode=2012PhyA..391.2971J|s2cid=51789622}}

Transfer entropy is conditional mutual information,{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}} with the history of the influenced variable Y_{t-1:t-L} in the condition:

:

T_{X\rightarrow Y} = I(Y_t ; X_{t-1:t-L} \mid Y_{t-1:t-L}).

Transfer entropy reduces to Granger causality for vector auto-regressive processes.{{cite journal|last=Barnett|first=Lionel|title=Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables|journal=Physical Review Letters|date=1 December 2009|volume=103|issue=23|doi=10.1103/PhysRevLett.103.238701|bibcode=2009PhRvL.103w8701B|pmid=20366183|page=238701|arxiv=0910.4514|s2cid=1266025}} Hence, it is advantageous when the model assumption of Granger causality doesn't hold, for example, analysis of non-linear signals.{{cite journal|last=Lungarella|first=M.|author2=Ishiguro, K. |author3=Kuniyoshi, Y. |author4= Otsu, N. |title=Methods for quantifying the causal structure of bivariate time series|journal=International Journal of Bifurcation and Chaos|date=1 March 2007|volume=17|issue=3|pages=903–921|doi=10.1142/S0218127407017628|bibcode=2007IJBC...17..903L|citeseerx=10.1.1.67.3585}} However, it usually requires more samples for accurate estimation.{{cite journal|last=Pereda|first=E|author2=Quiroga, RQ |author3=Bhattacharya, J |title=Nonlinear multivariate analysis of neurophysiological signals.|journal=Progress in Neurobiology|date=Sep–Oct 2005|volume=77|issue=1–2|pages=1–37|pmid=16289760|doi=10.1016/j.pneurobio.2005.10.003|arxiv=nlin/0510077|bibcode=2005nlin.....10077P|s2cid=9529656}}

The probabilities in the entropy formula can be estimated using different approaches (binning, nearest neighbors) or, in order to reduce complexity, using a non-uniform embedding.{{cite journal|last=Montalto|first=A|author2=Faes, L |author3=Marinazzo, D |title=MuTE: A MATLAB Toolbox to Compare Established and Novel Estimators of the Multivariate Transfer Entropy.|journal=PLOS ONE|date=Oct 2014|pmid=25314003|doi=10.1371/journal.pone.0109462|volume=9|issue=10|pmc=4196918|page=e109462|bibcode=2014PLoSO...9j9462M|doi-access=free}}

While it was originally defined for bivariate analysis, transfer entropy has been extended to multivariate forms, either conditioning on other potential source variables{{cite journal|last=Lizier|first=Joseph|author2=Prokopenko, Mikhail |author3=Zomaya, Albert |title=Local information transfer as a spatiotemporal filter for complex systems|journal=Physical Review E|year=2008|volume=77|issue=2|pages=026110|doi=10.1103/PhysRevE.77.026110|pmid=18352093|arxiv=0809.3275|bibcode=2008PhRvE..77b6110L|s2cid=15634881}} or considering transfer from a collection of sources,{{cite journal|last=Lizier|first=Joseph|author2=Heinzle, Jakob |author3=Horstmann, Annette |author4=Haynes, John-Dylan |author5= Prokopenko, Mikhail |title=Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity|journal=Journal of Computational Neuroscience|year=2011|volume=30|issue=1|pages=85–107|doi=10.1007/s10827-010-0271-2|pmid=20799057|s2cid=3012713}} although these forms require more samples again.

Transfer entropy has been used for estimation of functional connectivity of neurons,{{cite journal|last=Vicente|first=Raul|author2=Wibral, Michael |author3=Lindner, Michael |author4= Pipa, Gordon |title=Transfer entropy—a model-free measure of effective connectivity for the neurosciences |journal=Journal of Computational Neuroscience|date=February 2011|volume=30|issue=1|pages=45–67|doi=10.1007/s10827-010-0262-3|pmid=20706781|pmc=3040354}}{{cite journal|last=Shimono|first=Masanori|author2=Beggs, John |title=Functional clusters, hubs, and communities in the cortical microconnectome |url= |journal=Cerebral Cortex|date= October 2014|volume=25|issue=10|pages=3743–57|doi=10.1093/cercor/bhu252 |pmid=25336598 |pmc=4585513}} social influence in social networks{{cite conference |arxiv=1110.2724|title= Information transfer in social media|last1= Ver Steeg |first1= Greg|last2=Galstyan|first2= Aram |year= 2012|publisher= ACM|book-title= Proceedings of the 21st international conference on World Wide Web (WWW '12) |pages= 509–518 |bibcode=2011arXiv1110.2724V}} and statistical causality between armed conflict events.{{Cite journal |last1=Kushwaha |first1=Niraj |last2=Lee |first2=Edward D |date=July 2023 |title=Discovering the mesoscale for chains of conflict |url=https://doi.org/10.1093/pnasnexus/pgad228 |journal=PNAS Nexus |volume=2 |issue=7 |pages=pgad228 |doi=10.1093/pnasnexus/pgad228 |issn=2752-6542 |pmc=10392960 |pmid=37533894}}

Transfer entropy is a finite version of the directed information which was defined in 1990 by James Massey{{cite journal|last1=Massey|first1=James|title=Causality, Feedback And Directed Information|date=1990|issue=ISITA|citeseerx=10.1.1.36.5688}} as

I(X^n\to Y^n) =\sum_{i=1}^n I(X^i;Y_i|Y^{i-1}), where X^n denotes the vector X_1,X_2,...,X_n and Y^n denotes Y_1,Y_2,...,Y_n. The directed information places an important role in characterizing the fundamental limits (channel capacity) of communication channels with or without feedback{{cite journal|last1=Permuter|first1=Haim Henry|last2=Weissman|first2=Tsachy|last3=Goldsmith|first3=Andrea J.|title=Finite State Channels With Time-Invariant Deterministic Feedback|journal=IEEE Transactions on Information Theory|date=February 2009|volume=55|issue=2|pages=644–662|doi=10.1109/TIT.2008.2009849|arxiv=cs/0608070|s2cid=13178}}{{cite journal|last1=Kramer|first1=G.|title=Capacity results for the discrete memoryless network|journal=IEEE Transactions on Information Theory|date=January 2003|volume=49|issue=1|pages=4–21|doi=10.1109/TIT.2002.806135}} and gambling with causal side information.{{cite journal|last1=Permuter|first1=Haim H.|last2=Kim|first2=Young-Han|last3=Weissman|first3=Tsachy|title=Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing|journal=IEEE Transactions on Information Theory|date=June 2011|volume=57|issue=6|pages=3248–3259|doi=10.1109/TIT.2011.2136270|arxiv=0912.4872|s2cid=11722596}}

See also

References

{{Reflist|2}}