Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
Bolmo’s architecture unlocks efficient byte‑level LM training without sacrificing quality

Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche — and make it practical at scale — the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its Olmo 3 models by “bytefiying” them and reusing their backbone and capabilities. The company launched two versions, Bolmo 7B and Bolmo 1B, which are “the first fully open byte-level language model,” according to Ai2. The company said the two models performed competitively with — and in some cases surpassed — other byte-level and character-based models.Byte-level language models operate directly on raw UTF-8 bytes, eliminating the need for a predefined vocabulary or tokenizer. This allows them to handle misspellings, rare languages, and unconventional text more reliably — key requirements for moderation, edge deployments, and multilingual applications.For enterprises deploying AI across multiple languages, noisy user inputs, or constrained environments, tokenizer-free models offer a way to reduce operational complexity. Ai2’s Bolmo is an attempt to make that approach practical at scale — without retraining from scratch.How Bolmo works and how it was built Ai2 said it trained the Bolmo models using its Dolma 3 data mix, which helped train its Olmo flagship models, and some open code datasets and character-level data.The company said its goal “is to provide a reproducible, inspectable blueprint for byteifying strong subword language models in a way the community can adopt and extend.” To meet this goal, Ai2 will release its checkpoints, code, and a full paper to help other organizations build byte-level models on top of its Olmo ecosystem. Since training a byte-level model completely from scratch can get expensive, Ai2 researchers instead chose an existing Olmo 3 7B checkpoint to byteify in two stages. In the first stage, Ai2 froze the Olmo 3 transformer so that they only train certain parts, such as the local encoder and decoder, the boundary predictor, and the language modeling head. This was designed to be “cheap and fast” and requires just 9.8 billion tokens. The next stage unfreezes the model and trains it with additional tokens. Ai2 said the byte-level approach allows Bolmo to avoid the vocabulary bottlenecks that limit traditional subword models.Strong performance among its peersByte-level language models are not as mainstream as small language models or LLMs, but this is a growing field in research. Meta released its BLT architecture research last year, aiming to offer a model that is robust, processes raw data, and doesn’t rely on fixed vocabularies. Other research models in this space include ByT5, Stanford’s MrT5, and Canine.  Ai2 evaluated Bolmo using its evaluation suite, covering math, STEM reasoning, question answering, general knowledge, and code. Bolmo 7B showed strong performance, outperforming character-focused benchmarks like CUTE and EXECUTE, and also improving accuracy over the base LLM Olmo 3. Bolmo 7B outperformed models of comparable size in coding, math, multiple-choice QA, and character-level understanding. Why enterprises may choose byte-level modelsEnterprises find value in a hybrid model structure, using a mix of models and model sizes. Ai2 makes the case that organizations should also consider byte-level models not only for robustness and multilingual understanding, but because it “naturally plugs into an existing model ecosystem.”“A key advantage of the dynamic hierarchical setup is that compression becomes a toggleable knob,” the company said. For enterprises already running heterogeneous model stacks, Bolmo suggests that byte-level models may no longer be purely academic. By retrofitting a strong subword model rather than training from scratch, Ai2 is signaling a lower-risk path for organizations that want robustness without abandoning existing infrastructure.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Baseten takes on hyperscalers with new AI training platform that lets you o

<p><a href="https://www.baseten.co/"><u>Baseten</u></a>, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product [...]

Match Score: 131.61

venturebeat
Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages

<p>When the transformer architecture was introduced in 2017 in the now seminal Google paper &quot;<a href="https://arxiv.org/abs/1706.03762">Attention Is All You Need</a&g [...]

Match Score: 92.46

venturebeat
MIT offshoot Liquid AI releases blueprint for enterprise-grade small-model

<p>When Liquid AI, a startup f<a href="https://aimmediahouse.com/market-industry/from-worm-brains-to-a-2-billion-ai-unicorn-liquid-ai-defies-conventional-ai-limits">ounded by MIT [...]

Match Score: 76.69

venturebeat
Arcee aims to reboot U.S. open source AI with new Trinity models released u

<p>For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.</p><p>Chinese research labs inc [...]

Match Score: 71.69

venturebeat
Meta’s DreamGym framework trains AI agents in a simulated world to cut re

<p>Researchers at Meta, the University of Chicago, and UC Berkeley have developed a new framework that addresses the high costs, infrastructure complexity, and unreliable feedback associated wit [...]

Match Score: 65.95

venturebeat
Nvidia researchers unlock 4-bit LLM training that matches 8-bit performance

<p>Researchers at Nvidia have developed a <a href="https://arxiv.org/abs/2509.25149"><u>novel approach</u></a> to train large language models (LLMs) in 4-bit qu [...]

Match Score: 63.43

venturebeat
Google’s ‘Nested Learning’ paradigm could solve AI's memory and conti

<p>Researchers at Google have developed a new AI paradigm aimed at solving one of the biggest limitations in today’s large language models: their inability to learn or update their knowledge a [...]

Match Score: 63.26

venturebeat
Nvidia debuts Nemotron 3 with hybrid MoE and Mamba-Transformer to drive eff

<p>Nvidia launched the new version of its frontier models, Nemotron 3, by leaning in on a model architecture that the world’s most valuable company said offers more accuracy and reliability fo [...]

Match Score: 62.05

venturebeat
World's largest open-source multimodal dataset delivers 17x training effici

<p>AI models are only as good as the data they&#x27;re trained on. That data generally needs to be labeled, curated and organized before models can learn from it in an effective way.</p&g [...]

Match Score: 61.65