Group Relative Policy Optimization

  1. REDIRECT Policy_gradient_method#Group Relative Policy Optimization (GRPO)

{{Redirect category shell|

{{R from subtopic}}

}}