If you want your bots to misbehave, give them KPIs. “Strikingly, we find that superior reasoning capability does not ...
If you want your bots to misbehave, give them KPIs. “Strikingly, we find that superior reasoning capability does not inherently ensure safety; for instance, Gemini-3-Pro-Preview, one of the most capable models evaluated, exhibits the highest violation rate at 71.4%, frequently escalating to severe misconduct to satisfy KPIs. Furthermore, we observe significant "deliberative misalignment", where the models that power the agents recognize their actions as unethical during separate evaluation.”