transfer learning

File:Transfer learning.svg

Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task.{{cite web |last1=West |first1=Jeremy |first2=Dan |last2=Ventura |first3=Sean |last3=Warnick |url=http://cpms.byu.edu/springresearch/abstract-entry?id=861 |title=Spring Research Presentation: A Theoretical Foundation for Inductive Transfer |publisher=Brigham Young University, College of Physical and Mathematical Sciences |year=2007 |access-date=2007-08-05 |url-status=dead |archive-url=https://web.archive.org/web/20070801120743/http://cpms.byu.edu/springresearch/abstract-entry?id=861 |archive-date=2007-08-01 }} For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing/transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency.{{Cite journal|last1=George Karimpanal|first1=Thommen|last2=Bouffanais|first2=Roland|date=2019|title=Self-organizing maps for storage and transfer of knowledge in reinforcement learning|journal=Adaptive Behavior|volume=27|issue=2|pages=111–126|doi=10.1177/1059712318818568|issn=1059-7123|arxiv=1811.08318|s2cid=53774629}}

Since transfer learning makes use of training with multiple objective functions it is related to cost-sensitive machine learning and multi-objective optimization. Cost-Sensitive Machine Learning. (2011). USA: CRC Press, Page 63, https://books.google.com/books?id=8TrNBQAAQBAJ&pg=PA63

History

In 1976, Bozinovski and Fulgosi published a paper addressing transfer learning in neural network training.Stevo. Bozinovski and Ante Fulgosi (1976). "The influence of pattern similarity and transfer learning on the base perceptron training." (original in Croatian) Proceedings of Symposium Informatica 3-121-5, Bled.Stevo Bozinovski (2020) [https://www.informatica.si/index.php/informatica/article/viewFile/2828/1433 "Reminder of the first paper on transfer learning in neural networks, 1976"]. Informatica 44: 291–302. The paper gives a mathematical and geometrical model of the topic. In 1981, a report considered the application of transfer learning to a dataset of images representing letters of computer terminals, experimentally demonstrating positive and negative transfer learning.S. Bozinovski (1981). "Teaching space: A representation concept for adaptive pattern classification." COINS Technical Report, the University of Massachusetts at Amherst, No 81-28 [available online: UM-CS-1981-028.pdf]

In 1992, Lorien Pratt formulated the discriminability-based transfer (DBT) algorithm.{{cite book |last=Pratt |first=L. Y. |url={{google books|plainurl=y|id=6tGHlwEACAAJ|page=204}} |title=NIPS Conference: Advances in Neural Information Processing Systems 5 |publisher=Morgan Kaufmann Publishers |year=1992 |pages=204–211 |chapter=Discriminability-based transfer between neural networks |chapter-url=https://proceedings.neurips.cc/paper/1992/file/67e103b0761e60683e83c559be18d40c-Paper.pdf}}

By 1998, the field had advanced to include multi-task learning,Caruana, R., "Multitask Learning", pp. 95-134 in {{Harvnb|Thrun|Pratt|2012}} along with more formal theoretical foundations.Baxter, J., "Theoretical Models of Learning to Learn", pp. 71-95 {{Harvnb|Thrun|Pratt|2012}} Influential publications on transfer learning include the book Learning to Learn in 1998,{{sfn|Thrun|Pratt|2012}} a 2009 survey{{Cite journal |last1=Pan |first1=Sinno Jialin |last2=Yang |first2=Qiang |date=2009 |title=A Survey on Transfer Learning |url=https://www.cse.ust.hk/~qyang/Docs/2009/tkde_transfer_learning.pdf |journal=IEEE}} and a 2019 survey.{{Cite journal |date=2019 |title=A Comprehensive Survey on Transfer Learning |journal=IEEE |arxiv=1911.02685 |last1=Zhuang |first1=Fuzhen |last2=Qi |first2=Zhiyuan |last3=Duan |first3=Keyu |last4=Xi |first4=Dongbo |last5=Zhu |first5=Yongchun |last6=Zhu |first6=Hengshu |last7=Xiong |first7=Hui |last8=He |first8=Qing }}

Ng said in his NIPS 2016 tutorial{{Citation|title=NIPS 2016 tutorial: "Nuts and bolts of building AI applications using Deep Learning" by Andrew Ng| date=6 May 2018 |url=https://www.youtube.com/watch?v=wjqaz6m42wU |archive-url=https://ghostarchive.org/varchive/youtube/20211219/wjqaz6m42wU |archive-date=2021-12-19 |url-status=live|language=en|access-date=2019-12-28}}{{cbignore}}{{Cite web|url=https://media.nips.cc/Conferences/2016/Slides/6203-Slides.pdf|title=Nuts and bolts of building AI applications using Deep Learning, slides}} that TL would become the next driver of machine learning commercial success after supervised learning.

In the 2020 paper, "Rethinking Pre-Training and self-training",{{cite journal |last1=Zoph |first1=Barret |title=Rethinking pre-training and self-training |journal=Advances in Neural Information Processing Systems |date=2020 |volume=33 |pages=3833–3845 |arxiv=2006.06882 |url=https://proceedings.neurips.cc/paper/2020/file/27e9661e033a73a6ad8cefcde965c54d-Paper.pdf |access-date=2022-12-20}} Zoph et al. reported that pre-training can hurt accuracy, and advocate self-training instead.

Definition

The definition of transfer learning is given in terms of domains and tasks. A domain $\mathcal{D}$ consists of: a feature space $\mathcal{X}$ and a marginal probability distribution $P(X)$ , where $X = \{x_1,...,x_n\} \in \mathcal{X}$ . Given a specific domain, $\mathcal{D} = \{\mathcal{X}, P(X)\}$ , a task consists of two components: a label space $\mathcal{Y}$ and an objective predictive function $f:\mathcal{X} \rightarrow \mathcal{Y}$ . The function $f$ is used to predict the corresponding label $f(x)$ of a new instance $x$ . This task, denoted by $\mathcal{T} = \{\mathcal{Y}, f(x)\}$ , is learned from the training data consisting of pairs $\{x_i, y_i\}$ , where $x_i \in \mathcal{X}$ and $y_i \in \mathcal{Y}$ .{{cite journal |last1=Lin |first1=Yuan-Pin |last2=Jung |first2=Tzyy-Ping |title=Improving EEG-Based Emotion Classification Using Conditional Transfer Learning |journal=Frontiers in Human Neuroscience |date=27 June 2017 |volume=11 |pages=334 |doi=10.3389/fnhum.2017.00334|pmid=28701938 |pmc=5486154 |doi-access=free }} 50px Material was copied from this source, which is available under a [https://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International License].

Given a source domain $\mathcal{D}_S$ and learning task $\mathcal{T}_S$ , a target domain $\mathcal{D}_T$ and learning task $\mathcal{T}_T$ , where $\mathcal{D}_S \neq \mathcal{D}_T$ , or $\mathcal{T}_S \neq \mathcal{T}_T$ , transfer learning aims to help improve the learning of the target predictive function $f_T (\cdot)$ in $\mathcal{D}_T$ using the knowledge in $\mathcal{D}_S$ and $\mathcal{T}_S$ .

Applications

Algorithms are available for transfer learning in Markov logic networks{{citation|title=Learning Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI-2007)|date=July 2007|last1=Mihalkova|last2=Huynh|last3=Mooney|first1=Lilyana|first2=Tuyen|first3=Raymond J.|contribution=Mapping and Revising Markov Logic Networks for Transfer|contribution-url=http://www.cs.utexas.edu/users/ml/papers/mihalkova-aaai07.pdf|location=Vancouver, BC|access-date=2007-08-05|pages=608–614}} and Bayesian networks.{{citation|last1=Niculescu-Mizil|title=Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007)|date=March 21–24, 2007|last2=Caruana|first1=Alexandru|first2=Rich|contribution=Inductive Transfer for Bayesian Network Structure Learning|contribution-url=http://www.stat.umn.edu/~aistat/proceedings/data/papers/043.pdf|access-date=2007-08-05}} Transfer learning has been applied to cancer subtype discovery,Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. {{arXiv|1810.09433}} building utilization,{{Cite conference|last1=Arief-Ang|first1=I.B.|last2=Salim|first2=F.D.|last3=Hamilton|first3=M.|date=2017-11-08|title=DA-HOC: semi-supervised domain adaptation for room occupancy prediction using CO2 sensor data|url=https://dl.acm.org/citation.cfm?id=3137146|conference=4th ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys)|location=Delft, Netherlands|pages=1–10|doi=10.1145/3137133.3137146|isbn=978-1-4503-5544-5}}{{cite journal |last1=Arief-Ang |first1=I.B. |last2=Hamilton |first2=M. |last3=Salim |first3=F.D. |date=2018-12-01 |title=A Scalable Room Occupancy Prediction with Transferable Time Series Decomposition of CO2 Sensor Data |journal=ACM Transactions on Sensor Networks |volume=14 |issue=3–4 |pages=21:1–21:28 |doi=10.1145/3217214 |s2cid=54066723 }} general game playing,Banerjee, Bikramjit, and Peter Stone. "[http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-107.pdf General Game Learning Using Knowledge Transfer]." IJCAI. 2007. text classification,{{cite conference|last1=Do|first1=Chuong B.|last2=Ng|first2=Andrew Y.|year=2005|title=Neural Information Processing Systems Foundation, NIPS*2005|url=http://papers.nips.cc/paper/2843-transfer-learning-for-text-classification.pdf|access-date=2007-08-05|contribution=Transfer learning for text classification}}{{cite conference|last1=Rajat|first1=Raina|last2=Ng|first2=Andrew Y.|last3=Koller|first3=Daphne|year=2006|title=Twenty-third International Conference on Machine Learning|url=https://ai.stanford.edu/~ang/papers/icml06-transferinformativepriors.pdf|access-date=2007-08-05|contribution=Constructing Informative Priors using Transfer Learning}} digit recognition,{{Cite book|last1=Maitra|first1=D. S.|last2=Bhattacharya|first2=U.|last3=Parui|first3=S. K.|title=2015 13th International Conference on Document Analysis and Recognition (ICDAR) |chapter=CNN based common approach to handwritten character recognition of multiple scripts |date=August 2015|pages=1021–1025|doi=10.1109/ICDAR.2015.7333916|isbn=978-1-4799-1805-8|s2cid=25739012}} medical imaging and spam filtering.{{cite conference|last=Bickel|first=Steffen|year=2006|title=ECML-PKDD Discovery Challenge Workshop|url=http://www.ecmlpkdd2006.org/discovery_challenge2006_overview.pdf|access-date=2007-08-05|contribution=ECML-PKDD Discovery Challenge 2006 Overview}}

In 2020, it was discovered that, due to their similar physical natures, transfer learning is possible between electromyographic (EMG) signals from the muscles and classifying the behaviors of electroencephalographic (EEG) brainwaves, from the gesture recognition domain to the mental state recognition domain. It was noted that this relationship worked in both directions, showing that electroencephalographic can likewise be used to classify EMG.{{cite journal | last1=Bird | first1=Jordan J. | last2=Kobylarz | first2=Jhonatan | last3=Faria | first3=Diego R. | last4=Ekart | first4=Aniko | last5=Ribeiro | first5=Eduardo P. | title=Cross-Domain MLP and CNN Transfer Learning for Biological Signal Processing: EEG and EMG | journal=IEEE Access | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=8 | year=2020 | issn=2169-3536 | doi=10.1109/access.2020.2979074 | pages=54789–54801| doi-access=free | bibcode=2020IEEEA...854789B }} The experiments noted that the accuracy of neural networks and convolutional neural networks were improved{{Cite book|last1=Maitra|first1=Durjoy Sen|last2=Bhattacharya|first2=Ujjwal|last3=Parui|first3=Swapan K.|title=2015 13th International Conference on Document Analysis and Recognition (ICDAR) |chapter=CNN based common approach to handwritten character recognition of multiple scripts |date=August 2015|chapter-url=https://ieeexplore.ieee.org/document/7333916|pages=1021–1025|doi=10.1109/ICDAR.2015.7333916|isbn=978-1-4799-1805-8|s2cid=25739012}} through transfer learning both prior to any learning (compared to standard random weight distribution) and at the end of the learning process (asymptote). That is, results are improved by exposure to another domain. Moreover, the end-user of a pre-trained model can change the structure of fully-connected layers to improve performance.{{Cite journal|url= https://ieeexplore.ieee.org/document/9802918|title=SpinalNet: Deep Neural Network with Gradual Input|first1=H. M. Dipu|last1=Kabir|first2=Moloud|last2=Abdar|first3=Seyed Mohammad Jafar|last3=Jalali|first4=Abbas|last4=Khosravi|first5=Amir F.|last5=Atiya|first6=Saeid|last6=Nahavandi|first7=Dipti|last7=Srinivasan|author7-link=Dipti Srinivasan|date=January 7, 2022|journal=IEEE Transactions on Artificial Intelligence|volume=4 |issue=5 |pages=1165–1177 |doi=10.1109/TAI.2022.3185179|arxiv=2007.03347 |s2cid=220381239 }}

References

Sources

{{cite book|url={{google books|plainurl=y|id=X_jpBwAAQBAJ}}|title=Learning to Learn|last1=Thrun|first1=Sebastian|last2=Pratt|first2=Lorien|date=6 December 2012|publisher=Springer Science & Business Media|isbn=978-1-4615-5529-2}}

Category:Machine learning