direct preference optimization

  1. REDIRECT Reinforcement learning from human feedback#Direct preference optimization

{{R to section}}

{{Redirect category shell|

{{R to section}}

}}