Emotion recognition in conversation

Emotion recognition in conversation (ERC) is a sub-field of emotion recognition, that focuses on mining human emotions from conversations or dialogues having two or more interlocutors.{{cite journal|last1=Poria|first1=Soujanya|last2=Majumder|first2=Navonil|last3=Mihalcea|first3=Rada|last4=Hovy|first4=Eduard|date=2019|title=Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances|journal=IEEE Access|volume=7|pages=100943–100953|doi=10.1109/ACCESS.2019.2929050|arxiv=1905.02947|bibcode=2019arXiv190502947P|s2cid=147703962}} The datasets in this field are usually derived from social platforms that allow free and plenty of samples, often containing multimodal data (i.e., some combination of textual, visual, and acoustic data).{{cite journal|last1=Lee|first1=Chul Min|last2=Narayanan|first2=Shrikanth|date=March 2005|title=Toward Detecting Emotions in Spoken Dialogs|journal=IEEE Transactions on Speech and Audio Processing|volume=13|issue=2|pages=293–303|doi=10.1109/TSA.2004.838534|s2cid=12710581}} Self- and inter-personal influences play critical role{{cite arXiv|last1=Hazarika|first1=Devamanyu|last2=Poria|first2=Soujanya|last3=Zimmermann|first3=Roger|last4=Mihalcea|first4=Rada|date=Oct 2019|title=Emotion Recognition in Conversations with Transfer Learning from Generative Conversation Modeling|eprint=1910.04980|class=cs.CL}} in identifying some basic emotions, such as, fear, anger, joy, surprise, etc. The more fine grained the emotion labels are the harder it is to detect the correct emotion. ERC poses a number of challenges, such as, conversational-context modeling, speaker-state modeling, presence of sarcasm in conversation, emotion shift across consecutive utterances of the same interlocutor.

The task

The task of ERC deals with detecting emotions expressed by the speakers in each utterance of the conversation. ERC depends on three primary factors – the conversational context, interlocutors' mental state, and intent.

Datasets

IEMOCAP,{{Cite journal|last1=Busso|first1=Carlos|last2=Bulut|first2=Murtaza|last3=Lee|first3=Chi-Chun|last4=Kazemzadeh|first4=Abe|last5=Mower|first5=Emily|author5-link=Emily Mower Provost|last6=Kim|first6=Samuel|last7=Chang|first7=Jeannette N.|last8=Lee|first8=Sungbok|last9=Narayanan|first9=Shrikanth S.|date=2008-11-05|title=IEMOCAP: interactive emotional dyadic motion capture database|journal=Language Resources and Evaluation|volume=42|issue=4|pages=335–359|doi=10.1007/s10579-008-9076-6|s2cid=11820063|issn=1574-020X}} SEMAINE,{{Cite journal|last1=McKeown|first1=G.|last2=Valstar|first2=M.|last3=Cowie|first3=R.|last4=Pantic|first4=M.|last5=Schroder|first5=M.|date=2012-01-02|title=The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent|journal=IEEE Transactions on Affective Computing|volume=3|issue=1|pages=5–17|doi=10.1109/t-affc.2011.20|s2cid=2995377|issn=1949-3045|url=https://pure.qub.ac.uk/en/publications/the-semaine-database-annotated-multimodal-records-of-emotionally-colored-conversations-between-a-person-and-a-limited-agent(4f349228-ebb5-4964-be2c-18f3559be29f).html}} DailyDialogue,Li, Yanran, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. "DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset." In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 986-995. 2017. and MELD{{Cite journal|last1=Poria|first1=Soujanya|last2=Hazarika|first2=Devamanyu|last3=Majumder|first3=Navonil|last4=Naik|first4=Gautam|last5=Cambria|first5=Erik|last6=Mihalcea|first6=Rada|date=2019|title=MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations|journal=Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics|pages=527–536|location=Stroudsburg, PA, USA|publisher=Association for Computational Linguistics|doi=10.18653/v1/p19-1050|arxiv=1810.02508|s2cid=52932143}} are the four widely used datasets in ERC. Among these four datasets, MELD contains multiparty dialogues.

Methods

Approaches to ERC consist of unsupervised, semi-unsupervised, and supervised{{cite book|last1=Abdelwahab|first1=Mohammed|last2=Busso|first2=Carlos|chapter=Supervised domain adaptation for emotion recognition from speech |date=March 2005|title=2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)|journal=IEEE Transactions on Speech and Audio Processing|pages=5058–5062|doi=10.1109/ICASSP.2015.7178934|isbn=978-1-4673-6997-8|s2cid=8207841}} methods. Popular supervised methods include using or combining pre-defined features, recurrent neural networks {{cite arXiv|last1=Chernykh|first1=Vladimir|last2=Prikhodko|first2=Pavel|last3=King|first3=Irwin|date=Jul 2019|title=Emotion Recognition From Speech With Recurrent Neural Networks|eprint=1701.08071|class=cs.CL}} (DialogueRNN{{Cite journal|last1=Majumder|first1=Navonil|last2=Poria|first2=Soujanya|last3=Hazarika|first3=Devamanyu|last4=Mihalcea|first4=Rada|last5=Gelbukh|first5=Alexander|last6=Cambria|first6=Erik|date=2019-07-17|title=DialogueRNN: An Attentive RNN for Emotion Detection in Conversations|journal=Proceedings of the AAAI Conference on Artificial Intelligence|volume=33|pages=6818–6825|doi=10.1609/aaai.v33i01.33016818|issn=2374-3468|doi-access=free|arxiv=1811.00405}}), graph convolutional networks {{cite web|url=https://www.techtimes.com/articles/246226/20191126/graph-convolutional-networks-are-bringing-emotion-recognition-closer-to-machines-here-s-how.htm|title=Graph Convolutional Networks are Bringing Emotion Recognition Closer to Machines. Here's how.|date=2019-11-26|publisher=Tech Times|access-date=February 25, 2020}} (DialogueGCN {{cite conference|last1=Ghosal|first1=Deepanway|last2=Majumder|first2=Navonil|last3=Soujanya|first3=Poria|date=Aug 2019|title=DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation|conference=Conference on Empirical Methods in Natural Language Processing (EMNLP)}}), and attention gated hierarchical memory network.{{cite arXiv|last1=Jiao|first1=Wenxiang|last2=R. Lyu|first2=Michael|last3=King|first3=Irwin|date=November 2019|title=Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network|eprint=1911.09075|class=cs.CL}} Most of the contemporary methods for ERC are deep learning based and rely on the idea of latent speaker-state modeling.

Emotion Cause Recognition in Conversation

Recently a new subtask of ERC has emerged that focuses on recognising emotion cause in conversation.{{Cite journal|last1=Poria|first1=Soujanya|last2=Majumder|first2=Navonil|last3=Hazarika|first3=Devamanyu|last4=Ghosal|first4=Deepanway|last5=Bhardwaj|first5=Rishabh|last6=Jian|first6=Samson Yu Bai|last7=Hong|first7=Pengfei|last8=Ghosh|first8=Romila|last9=Roy|first9=Abhinaba|last10=Chhaya|first10=Niyati|last11=Gelbukh|first11=Alexander|date=2021-09-13|title=Recognizing Emotion Cause in Conversations|url=https://doi.org/10.1007/s12559-021-09925-7|journal=Cognitive Computation|volume=13|issue=5|pages=1317–1332|language=en|doi=10.1007/s12559-021-09925-7|s2cid=229349214|issn=1866-9964|arxiv=2012.11820}} Methods to solve this task rely on language models-based question answering mechanism. RECCON is one of the key datasets for this task.

References

Category:Emotion

Category:Applications of artificial intelligence