LANGUAGE MODELS THAT THINK, CHAT BETTER
Reinforcement Learning with Model-rewarded Thinking (RLMT), a new technique that trains language models to explicitly generate detailed Chain-of-Thought reasoning before answering, making performance on open-ended chat and creative tasks better.