Friendly Wiki
Home
About
Random
Help
Updates
Contact
Login
Friendly Wiki
☰
Home
About
Random
Help
Updates
Contact
Login
direct preference optimization
REDIRECT
Reinforcement learning from human feedback#Direct preference optimization
{{R to section}}
{{Redirect category shell|
{{R to section}}
}}