Value learning

Value learning is a research area within artificial intelligence (AI) and AI alignment that focuses on building systems capable of inferring, acquiring, or learning human values, goals, and preferences from data, behavior, and feedback. The aim is to ensure that advanced AI systems act in ways that are beneficial and aligned with human well-being, even in the absence of explicitly programmed instructions.{{cite book |last=Russell |first=Stuart |title=Human Compatible: Artificial Intelligence and the Problem of Control |publisher=Viking |year=2019}}{{cite arXiv |title=Reward Models in Deep Reinforcement Learning: A Survey |date=June 2025|eprint=2506.09876 |last1=Xu |first1=Jisheng |last2=Lin |first2=Ding |last3=Fong |first3=Pangkit |last4=Fang |first4=Chongrong |last5=Duan |first5=Xiaoming |last6=He |first6=Jianping |class=cs.RO }}

Unlike traditional AI that focuses purely on task performance, value learning aims to ensure that AI decisions are ethically and socially acceptable. It is analogous to teaching a child right from wrong—guiding an AI to recognize which actions align with human moral standards and which do not. The process typically involves identifying relevant values (such as safety or fairness), collecting data that reflects those values, training models to learn appropriate responses, and iteratively refining their behavior through feedback and evaluation. Applications include minimizing harm in autonomous vehicles, promoting fairness in financial systems, prioritizing patient well-being in healthcare, and respecting user preferences in digital assistants. Compared to earlier techniques, value learning shifts the focus from mere functionality to understanding the underlying reasons behind choices, aligning machine behavior with human ethical expectations.{{cite web |title=What is Value Learning? |url=https://www.byteplus.com/en/what-is/value-learning |website=BytePlus |publisher=BytePlus |access-date=28 June 2025}}

Motivation

The motivation for value learning stems from the observation that humans are often inconsistent, unaware, or imprecise about their own values. Hand-coding a complete ethical framework into an AI is considered infeasible due to the complexity of human norms and the unpredictability of future scenarios. Value learning offers a dynamic alternative, allowing AI to infer and continually refine its understanding of human values from indirect sources such as behavior, approval signals, and comparisons.{{Cite conference |last=Ng |first=Andrew Y. |author2=Stuart Russell |title=Algorithms for Inverse Reinforcement Learning |conference=Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000) |year=2000 |location=Stanford, CA, USA |publisher=Morgan Kaufmann |pages=663–670 |url=https://www.cs.cmu.edu/~bziebart/icml15-irl.pdf}}{{Cite conference |last=Christiano |first=Paul F. |author2=Jan Leike |author3=Tom B. Brown |author4=Miljan Martic |author5=Shane Legg |author6=Dario Amodei |title=Deep Reinforcement Learning from Human Preferences |conference=Advances in Neural Information Processing Systems 30 (NeurIPS 2017) |year=2017 |publisher=Curran Associates, Inc. |pages=4299–4307 |arxiv=1706.03741 }}

A foundational critique of traditional reinforcement learning (RL) highlights its limitations in aligning artificial general intelligence (AGI) with human values. It is argued that RL systems optimize fixed reward signals, which can incentivize harmful or deceptive behavior if such actions increase rewards. As an alternative, he proposes value-learning agents that maintain uncertainty over utility functions and update beliefs based on interactions. These agents aim not to maximize static rewards but to infer what humans truly value. This probabilistic framework enables adaptive alignment with complex, initially unspecified goals and is viewed as a foundational step toward safer AGI.{{cite web |last=Dewey |first=Daniel |title=Learning What to Value |url=https://intelligence.org/files/LearningValue.pdf |website=Machine Intelligence Research Institute |publisher=MIRI |year=2011 |access-date=28 June 2025}}

The growing importance of value learning is reflected in how AI products are increasingly evaluated and marketed. A notable shift occurred with the release of GPT-4 in March 2023, when OpenAI emphasized not just technical improvements but also enhanced alignment with human values. This marked one of the first instances where a commercial AI product was promoted based on ethical considerations. The trend signals a broader transformation in AI development—prioritizing principles like fairness, accountability, safety, and privacy alongside performance. As AI systems become more integrated into society, aligning them with human values is critical for public trust and responsible deployment.{{cite web |last=Abernethy |first=Jacob |last2=Candelon |first2=François |last3=Evgeniou |first3=Theodoros |last4=Gupta |first4=Abhishek |last5=Lostanlen |first5=Yves |date=March 2024 |title=Bring Human Values to AI |url=https://hbr.org/2024/03/bring-human-values-to-ai |access-date=28 June 2025 |website=Harvard Business Review}}

Key approaches

One central technique is inverse reinforcement learning (IRL), which aims to recover a reward function that explains observed behavior. IRL assumes that the observed agent acts (approximately) optimally and infers the underlying preferences from its choices.{{Cite journal |last=Ng |first=Andrew Y. |author2=Stuart Russell |title=Algorithms for Inverse Reinforcement Learning |journal=Proceedings of the Seventeenth International Conference on Machine Learning (ICML) |date=May 2000 |pages=663–670 |publisher=Morgan Kaufmann |location=Stanford, CA |url=https://www.cs.cmu.edu/~bziebart/icml15-irl.pdf}}{{Cite journal |title=Advances and applications in inverse reinforcement learning: a comprehensive review |journal=Neural Computing and Applications |date=26 March 2025 |volume=37 |pages=11071–11123 |doi=10.1007/s00521-025-11100-0 |url=https://link.springer.com/article/10.1007/s00521-025-11100-0 |access-date=24 June 2025 |last1=Deshpande |first1=Saurabh |last2=Walambe |first2=Rahee |last3=Kotecha |first3=Ketan |last4=Selvachandran |first4=Ganeshsree |last5=Abraham |first5=Ajith |issue=17 }}

Cooperative inverse reinforcement learning (CIRL) extends IRL to model the AI and human as cooperative agents with asymmetric information. In CIRL, the AI observes the human to learn their hidden reward function and chooses actions that support mutual success.{{Cite conference |last=Hadfield-Menell |first=Dylan |author2=Anca Dragan |author3=Pieter Abbeel |author4=Stuart Russell |title=Cooperative Inverse Reinforcement Learning |conference=Proceedings of the 30th International Conference on Neural Information Processing Systems (NeurIPS 2016) |date=5 December 2016 |pages=3916–3924 |publisher=Curran Associates, Inc. |url=https://dl.acm.org/doi/10.5555/3157382.3157535 |access-date=24 June 2025}}

{{cite conference |last=Malik |first=Asma |display-authors=etal |title=Efficient Bellman Updates for Cooperative Inverse Reinforcement Learning |year=2018}}

Another approach is preference learning, where humans compare pairs of AI-generated behaviors or outputs, and the AI learns which outcomes are preferred. This method underpins successful applications in training language models and robotics.{{cite conference |last=Christiano |first=Paul F. |display-authors=etal |title=Deep reinforcement learning from human preferences |arxiv=1706.03741 |year=2017}}{{cite arXiv |title=Reward Models in Deep Reinforcement Learning: A Survey |eprint=2506.09876 |last1=Xu |first1=Jisheng |last2=Lin |first2=Ding |last3=Fong |first3=Pangkit |last4=Fang |first4=Chongrong |last5=Duan |first5=Xiaoming |last6=He |first6=Jianping |date=2025 |class=cs.RO }}

Recent research introduces a novel framework for learning human values directly from behavioral data, without relying on predefined models or external annotations. The method distinguishes between value specifications (contextual definitions) and value systems (agents’ prioritizations among values). A demonstration in route choice modeling—using tailored inverse reinforcement learning (IRL) techniques—infers how agents weigh options such as speed, safety, or scenic routes. The results confirm that value learning from demonstrations can effectively capture complex decision-making preferences, supporting the feasibility of value-aligned AI in applied settings.{{Cite conference |last1=Holgado‑Sánchez |first1=Andrés |last2=Bajo |first2=Javier |last3=Billhardt |first3=Holger |last4=Ossowski |first4=Sascha |last5=Arias |first5=Joaquín |year=2025 |title=Value Learning for Value‑Aligned Route Choice Modeling via Inverse Reinforcement Learning |url=https://hal.science/hal-04627792 |conference=Lecture Notes in Computer Science: Value Engineering in Artificial Intelligence |series=Lecture Notes in Computer Science |publisher=Springer Nature Switzerland |pages=40–60 |doi=10.1007/978-3-031-85463-7_3}}

Concept alignment

A major challenge in value learning is ensuring that AI systems interpret human behavior using similar conceptual models. Recent research distinguishes between "value alignment" and "concept alignment," the latter referring to the internal representations that humans and machines use to describe the world. Misalignment in conceptual models can lead to serious errors even if value inference mechanisms are accurate.{{cite journal |last=Rane |first=Aditya |display-authors=etal |title=Transmutation operators and complete systems of solutions for the radial bicomplex Vekua equation |journal=Journal of Mathematical Analysis and Applications |year=2024 |volume=536 |issue=2 |doi=10.1016/j.jmaa.2024.128224 |arxiv=2305.09150}}

Challenges

Value learning faces several difficulties:

Ambiguity of human behavior – Human actions are noisy, inconsistent, and context-dependent.{{cite web |title=Misspecification in IRL |author=Skalse, Tobias |website=AI Alignment Forum |date=2025 |url=https://www.alignmentforum.org/posts/nXpxTgFy6R4HdRiPC/misspecification-in-irl}}
Reward misspecification – The inferred reward may not fully capture human intent, particularly under imperfect assumptions.{{cite arXiv |title=Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment |last1=Zhou |first1=Weichao |last2=Li |first2=Wenchao |eprint=2410.23680 |year=2024|class=cs.LG }}
Scalability – Methods that work in narrow domains often struggle with generalization to more complex or ethical environments.{{cite arXiv |title=Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment |last=Cheng |first=Wei |display-authors=etal |year=2025 |class=stat.ML |eprint=2505.09612}}

Research from Purdue University reveals that AI training datasets disproportionately emphasize certain human values—such as utility and information-seeking—while underrepresenting others like empathy, civic responsibility, and human rights. By applying a value taxonomy grounded in moral philosophy, researchers found that AI systems trained on these datasets may struggle in morally complex or socially sensitive contexts. To address these gaps, the study employed reinforcement learning from human feedback (RLHF) and value annotation to audit and guide dataset improvements. This work underscores the importance of comprehensive value representation in data and contributes tools for more equitable, value-aligned AI development.{{cite web |last=Obi |first=Ike |title=AI datasets have human values blind spots − new research |url=https://theconversation.com/ai-datasets-have-human-values-blind-spots-new-research-246479 |website=The Conversation |publisher=Purdue University and partners |date=6 February 2025 |access-date=28 June 2025}}

Hybrid and cultural approaches

Recent work highlights the importance of integrating diverse moral perspectives into value learning. One framework, HAVA (Hybrid Approach to Value Alignment), incorporates explicit (e.g., legal) and implicit (e.g., social norm) values into a unified reward model.{{cite arXiv |last=Varys |first=Kryspin |title=HAVA: Hybrid Approach to Value Alignment |eprint=2505.15011 |year=2025|class=cs.AI }} Another line of research explores how inverse reinforcement learning can adapt to culturally specific behaviors, such as in the case of "culturally-attuned moral machines" trained on different societal norms.{{cite arXiv |title=Culturally-Attuned Moral Machines: Implicit Learning of Human Value Systems by AI through Inverse Reinforcement Learning |last=Oliveira |first=Nigini |display-authors=etal |year=2023 |class=cs.AI |eprint=2312.04578}}

An important global policy initiative supporting the goals of value learning is UNESCO’s Recommendation on the Ethics of Artificial Intelligence, unanimously adopted by 194 member states in 2021. Although the term "value learning" is not explicitly used, the document emphasizes the need for AI to operationalize values such as human dignity, justice, inclusiveness, sustainability, and human rights. It establishes a global ethical framework grounded in four core values and ten guiding principles, including fairness, transparency, and human oversight. Tools like the Readiness Assessment Methodology (RAM) and Ethical Impact Assessment (EIA) help translate these principles into practice.{{cite web |title=Recommendation on the Ethics of Artificial Intelligence |url=https://www.unesco.org/en/artificial-intelligence/recommendation-ethics |website=UNESCO |publisher=UNESCO |access-date=28 June 2025}}

Applications

Value learning is being applied in:

Robotics – Teaching robots to cooperate with humans in household or industrial tasks.{{Cite journal |title=Advances and applications in inverse reinforcement learning: a comprehensive review |journal=Neural Computing and Applications |date=26 March 2025 |volume=37 |pages=11071–11123 |doi=10.1007/s00521-025-11100-0 |url=https://link.springer.com/article/10.1007/s00521-025-11100-0 |access-date=24 June 2025 |last1=Deshpande |first1=Saurabh |last2=Walambe |first2=Rahee |last3=Kotecha |first3=Ketan |last4=Selvachandran |first4=Ganeshsree |last5=Abraham |first5=Ajith |issue=17 }}
Large language models – Aligning chatbot behavior with user intent using preference feedback and reinforcement learning.{{cite arXiv |title=Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment |last=Cheng |first=Wei |display-authors=etal |year=2025 |class=stat.ML |eprint=2505.09612}}
Policy decision-making – Informing AI-assisted decisions in governance, healthcare, and safety-critical environments.{{cite arXiv |last=Varys |first=Kryspin |title=HAVA: Hybrid Approach to Value Alignment |eprint=2505.15011 |year=2025|class=cs.AI }}

References

Category:Artificial intelligence

Category:Machine learning

Category:Cognitive science

Category:Moral psychology