Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
Nvidia debuts Nemotron 3 with hybrid MoE and Mamba-Transformer to drive efficient agentic AI

Nvidia launched the new version of its frontier models, Nemotron 3, by leaning in on a model architecture that the world’s most valuable company said offers more accuracy and reliability for agents. Nemotron 3 will be available in three sizes: Nemotron 3 Nano with 30B parameters, mainly for targeted, highly efficient tasks; Nemotron 3 Super, which is a 100B parameter model for multi-agent applications and with high-accuracy reasoning and Nemotron 3 Ultra, with its large reasoning engine and around 500B parameters for more complex applications. To build the Nemotron 3 models, Nvidia said it leaned into a hybrid mixture-of-experts (MoE) architecture to improve scalability and efficiency. By using this architecture, Nvidia said in a press release that its new models also offer enterprises more openness and performance when building multi-agent autonomous systems. Kari Briski, Nvidia vice president for generative AI software, told reporters in a briefing that the company wanted to demonstrate its commitment to learn and improving from previous iterations of its models. “We believe that we are uniquely positioned to serve a wide range of developers who want full flexibility to customize models for building specialized AI by combining that new hybrid mixture of our mixture of experts architecture with a 1 million token context length,” Briski said.  Nvidia said early adopters of the Nemotron 3 models include Accenture, CrowdStrike, Cursor, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens and Zoom.Breakthrough architectures Nvidia has been using the hybrid Mamba-Transformer mixture-of-experts architecture for many of its models, including Nemotron-Nano-9B-v2.The architecture is based on research from Carnegie Mellon University and Princeton, which weaves in selective state-space models to handle long pieces of information while maintaining states. It can reduce compute costs even through long contexts. Nvidia noted its design “achieves up to 4x higher token throughput” compared to Nemotron 2 Nano and can significantly lower inference costs by reducing reasoning token generation by up 60%.“We really need to be able to bring that efficiency up and the cost per token down. And you can do it through a number of ways, but we're really doing it through the innovations of that model architecture,” Briski said. “The hybrid Mamba transformer architecture runs several times faster with less memory, because it avoids these huge attention maps and key value caches for every single token.”Nvidia also introduced an additional innovation for the Nemotron 3 Super and Ultra models. For these, Briski said Nvidia deployed “a breakthrough called latent MoE.”“That’s all these experts that are in your model share a common core and keep only a small part private. It’s kind of like chefs sharing one big kitchen, but they need to get their own spice rack,” Briski added. Nvidia is not the only company that employs this kind of architecture to build models. AI21 Labs uses it for its Jamba models, most recently in its Jamba Reasoning 3B model.The Nemotron 3 models benefited from extended reinforcement learning. The larger models, Super and Ultra, used the company’s 4-bit NVFP4 training format, which allows them to train on existing infrastructure without compromising accuracy.Benchmark testing from Artificial Analysis placed the Nemotron models highly among models of similar size. New environments for models to ‘work out’As part of the Nemotron 3 launch, Nvidia will also give users access to its research by releasing its papers and sample prompts, offering open datasets where people can use and look at pre-training tokens and post-training samples, and most importantly, a new NeMo Gym where customers can let their models and agents “workout.” The NeMo Gym is a reinforcement learning lab where users can let their models run in simulated environments to test their post-training performance. AWS announced a similar tool through its Nova Forge platform, targeted for enterprises that want to test out their newly created distilled or smaller models.  Briski said the samples of post-training data Nvidia plans to release “are orders of magnitude larger than any available post-training data set and are also very permissive and open.”Nvidia pointed to developers seeking highly intelligent and performant open models, so they can better understand how to guide them if needed, as the basis for releasing more information about how it trains its models. “Model developers today hit this tough trifecta. They need to find models that are ultra open, that are extremely intelligent and are highly efficient,” she said. “Most open models force developers into painful trade-offs between efficiencies like token costs, latency, and throughput.”She said developers want to know how a model was trained, where the training data came from and how they can evaluate it.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
'Western Qwen': IBM wows with Granite 4 LLM launch and hybrid Mamba/Transfo

<p>IBM today <a href="https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models">announced the release of Granite 4.0</a>, the ne [...]

Match Score: 485.88

venturebeat
Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages

<p>When the transformer architecture was introduced in 2017 in the now seminal Google paper &quot;<a href="https://arxiv.org/abs/1706.03762">Attention Is All You Need</a&g [...]

Match Score: 279.53

venturebeat
CrowdStrike & NVIDIA’s open source AI gives enterprises the edge agai

<p>Every SOC leader knows the feeling: drowning in alerts, blind to the real threat, stuck playing defense in a war waged at the speed of AI. </p><p>Now <a href="https://www. [...]

Match Score: 153.05

venturebeat
IBM's open source Granite 4.0 Nano AI models are small enough to run locall

<p>In an industry where model size is often seen as a proxy for intelligence, IBM is charting a different course — one that values <i>efficiency over enormity</i>, and <i>acc [...]

Match Score: 133.03

venturebeat
Hybrid cloud security must be rebuilt for an AI war it was never designed t

<p>Hybrid cloud security was built before the current era of automated, machine-based cyberattacks that take just milliseconds to execute and minutes to deliver devastating impacts to infrastruc [...]

Match Score: 130.15

venturebeat
Arcee aims to reboot U.S. open source AI with new Trinity models released u

<p>For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.</p><p>Chinese research labs inc [...]

Match Score: 113.54

NVIDIA GeForce RTX 5090 review: Pure AI excess for $2,000
NVIDIA GeForce RTX 5090 review: Pure AI excess for $2,000

<p>A $2,000 video card for consumers shouldn't exist. The <a data-i13n="elm:context_link;elmt:doNotAffiliate;cpos:1;pos:1" class="no-affiliate-link" href="https://sov [...]

Match Score: 94.63

venturebeat
Agentic AI security breaches are coming: 7 ways to make sure it's not your

<p>AI agents – task-specific models designed to operate autonomously or semi-autonomously given instructions — are being widely implemented across enterprises (up to 79% of all surveyed for [...]

Match Score: 91.50

How to buy a GPU in 2025
How to buy a GPU in 2025

<p>One of the trickiest parts of any new computer build or upgrade is finding the right video card. In a gaming PC, the GPU is easily the most important component, and you can hamstring your exp [...]

Match Score: 82.20