Training Reflective Dialogue Agents via Beam Search + DPO + Self-Play Evaluation
I’m excited to share a recent project where I trained a large language model to act as a reflective therapist — producing responses that not only make sense, but prompt deeper, thoughtful replies from users. Objective To train a dialogue agent that prioritizes reflectiveness, learning and intention, especially in therapeutic or coaching-style settings. Unlike traditional next-token training or preference modeling, we aimed to directly optimize for the quality of user reflection prompted by the model’s responses.