Jun 30, 2025 - 5 minute read

Training Reflective Dialogue Agents via Beam Search + DPO + Self-Play Evaluation

I’m excited to share a recent project where I trained a large language model to act as a reflective therapist — producing responses that not only make sense, but prompt deeper, thoughtful replies from users. Objective To train a dialogue agent that prioritizes reflectiveness, learning and intention, especially in therapeutic or coaching-style settings. Unlike traditional next-token training or preference modeling, we aimed to directly optimize for the quality of user reflection prompted by the model’s responses.

Nov 6, 2024 - 1 minute read

NEW KID ON THA BLOCK

Welcome to the party !!! BRUH TESTX TEST2 TEST3 TEST4 TEST5 TEST6 TEST7 TEST8 TEST9 Test15 If this works that would be crazy I think I did it