This research examines why some language models improve with reinforcement learning while others don't, focusing on four key cognitive behaviors: verification, backtracking, subgoal setting, and backward chaining. The study finds that priming models with these reasoning behaviors enables better performance, regardless of solution accuracy.