Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


AI safety tests have a new problem: Models are now faking their own reasoning traces
AI safety tests have a new problem: Models are now faking their own reasoning traces

Anthropic's Natural Language Autoencoders make Claude Opus 4.6's internal activations readable as plain text. Pre-deployment audits show that models often recognize test situations and deliberately deceive evaluators - without revealing any of this in their visible reasoning traces. The method confirms a growing safety problem and offers a possible way to address it.
The article AI safety tests have a new problem: Models are now faking their own reasoning traces appeared first on The Decoder.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
When AI lies: The rise of alignment faking in autonomous systems

<p>AI is evolving beyond a helpful tool to an autonomous agent, creating new risks for cybersecurity systems. Alignment faking is a new threat where AI essentially “lies” to developers durin [...]

Match Score: 280.29

venturebeat
Microsoft built Phi-4-reasoning-vision-15B to know when to think — and wh

<p><a href="https://www.microsoft.com/en-us">Microsoft</a> on Tuesday released <a href="https://www.microsoft.com/en-us/research/blog/phi-4-reasoning-vision-and-the [...]

Match Score: 195.05

venturebeat
Phi-4 proves that a 'data-first' SFT methodology is the new diffe

<p>AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. </p><p>The &l [...]

Match Score: 108.66

venturebeat
New training method boosts AI multimodal reasoning with smaller, smarter da

<p>Researchers at MiroMind AI and several Chinese universities have released <a href="https://arxiv.org/abs/2511.16334"><u>OpenMMReasoner</u></a>, a new trainin [...]

Match Score: 106.72

venturebeat
Meta's new structured prompting technique makes LLMs significantly bet

<p>Deploying AI agents for repository-scale tasks like bug detection, patch verification, and code review requires overcoming significant technical hurdles. One major bottleneck: the need to set [...]

Match Score: 90.81

venturebeat
Are you paying an AI ‘swarm tax’? Why single agents often beat complex

<p>Enterprise teams building multi-agent AI systems may be paying a compute premium for gains that don&#x27;t hold up under equal-budget conditions. New Stanford University research finds th [...]

Match Score: 82.57

venturebeat
Google’s new AI training method helps small models tackle complex reasoni

<p>Researchers at <a href="https://research.google/teams/cloud-ai-research/">Google Cloud</a> and <a href="https://www.ucla.edu/">UCLA</a> have propos [...]

Match Score: 81.83

venturebeat
TII’s Falcon H1R 7B can out-reason models up to 7x its size — and it’

<p>For the last two years, the prevailing logic in generative AI has been one of brute force: if you want better reasoning, you need a bigger model. </p><p>While &quot;small& [...]

Match Score: 80.19

venturebeat
Researchers automated LLM reasoning strategy design and cut token usage by

<p>Test-time scaling (TTS) has emerged as a proven method to improve the performance of large language models in real-world applications by giving them extra compute cycles at inference time. Ho [...]

Match Score: 78.85