Logo
FrontierNews.ai

Cohere's New 218B-Parameter Model Runs on Just Two GPUs, Reshaping Enterprise AI Economics

Cohere has released Command A+, a 218-billion-parameter sparse mixture-of-experts (MoE) model designed to run enterprise AI workflows on as few as two high-end GPUs, marking a significant shift in how organizations can deploy advanced reasoning capabilities without massive infrastructure costs. The open-source model, available under an Apache 2.0 license, unifies capabilities from four separate models into a single architecture optimized for reasoning, agent-based tasks, document processing, and multilingual support.

What Makes Command A+ Different From Previous Enterprise Models?

The key innovation lies in how Command A+ handles computation. While the model contains 218 billion total parameters, only 25 billion are active at any given time. This sparse design means the model routes each token through just eight expert sub-networks out of 128 available experts, rather than processing information through the entire parameter set. The result is dramatic efficiency gains without sacrificing performance.

The model supports three quantization variants, which is technical jargon for compression techniques that reduce the model's size and computational demands. The most aggressive compression, called W4A4, requires just two NVIDIA H100 GPUs, the same hardware tier used by many enterprises for other AI workloads. For comparison, the previous Command A Reasoning model needed four H100s at the same quantization level. This 50% reduction in hardware requirements could translate to significant cost savings for organizations deploying multiple instances across teams or regions.

How Does Command A+ Perform on Real Enterprise Tasks?

Cohere tested the model on benchmarks designed to reflect actual business use cases. On a telecommunications reasoning benchmark, Command A+ improved from 37% to 85% accuracy compared to its predecessor. For agentic coding tasks, performance jumped from 3% to 25%. On internal enterprise evaluations measuring how well the model answers business questions using cloud file systems, accuracy improved by 20%. When analyzing spreadsheets, quality improved by 32%, and the model's ability to remember and apply information from previous conversations scored 54% versus 39% with the older model.

The model also delivers speed improvements that matter in production environments. At matching quantization and concurrency levels, Command A+ produces up to 63% more output tokens per second and reduces time to first token by up to 17%. The aggressive W4A4 quantization adds another 47% speed increase and 13% latency reduction. For organizations running customer-facing applications, these latency improvements translate to noticeably faster response times.

Steps to Evaluate Command A+ for Your Enterprise Deployment

  • Assess Hardware Compatibility: Verify your infrastructure can support either two H100 GPUs for W4A4 quantization, four H100s for FP8, or eight H100s for full precision BF16. Command A+ also runs on NVIDIA's newer B200 GPUs with proportionally lower requirements.
  • Test on Your Specific Workflows: Run pilot projects on your actual use cases, whether that's customer service automation, document analysis, or code generation, since benchmark improvements don't always translate uniformly across different business domains.
  • Evaluate Multimodal Capabilities: If your workflows involve processing images alongside text, test Command A+ on your document types, as it's Cohere's first multimodal reasoning model and may handle your specific visual content differently than text-only predecessors.
  • Plan for Multilingual Support: If your organization operates globally, Command A+ now supports 48 languages, up from 23 in previous versions, with particular improvements in Arabic, Korean, and Japanese tokenization efficiency.
  • Establish Governance Frameworks: Before deploying at scale, define how your organization will handle tool use, reasoning transparency, and output validation, as the model generates visible thinking traces that require interpretation protocols.

The model's multimodal capabilities represent another significant expansion. Command A+ scored 63% on a challenging multimodal reasoning benchmark and 75.1% on a broader multimodal benchmark, compared to 65.3% for the previous vision-focused model. Math reasoning improved from 73.5% to 80.6%, and document understanding improved from 46.9% to 52.7%. These gains matter for enterprises processing mixed-media documents, such as insurance claims with photos and text, or technical documentation with diagrams.

How Does This Fit Into the Broader Enterprise AI Consolidation Trend?

Cohere's release arrives at a critical moment in enterprise AI adoption. While 88% of organizations now use AI in at least one business function, only 5.5% qualify as "AI high performers" attributing more than 5% of earnings to generative AI. The gap between spending and results remains stark, with 95% of AI pilots producing no measurable business value.

The enterprise LLM market is consolidating rapidly. The top five providers, including Cohere, held 78% of enterprise market share in 2024, signaling that the era of unlimited model proliferation is ending. However, 37% of enterprises now run five or more models simultaneously, creating governance complexity and technical debt. Enterprise LLM API spending more than doubled from $3.5 billion to $8.4 billion in less than a year, indicating organizations are spending faster than they're strategizing.

"Organizations that have invested in model-agnostic orchestration layers, where business logic, memory and workflow live independently of any single provider, will be far better positioned to switch or blend models as the market shifts," noted Mayur Khandelwal, Vice President at EXL.

Mayur Khandelwal, Vice President at EXL

This consolidation pattern mirrors what happened in cloud infrastructure and databases. In 2008, the cloud market looked fragmented with dozens of providers. By 2015, three providers dominated. The same arc is playing out with LLMs, where enterprises increasingly favor vendors offering deep enterprise relationships, surrounding ecosystems, and trust advantages in regulated industries.

Command A+ addresses several pain points driving consolidation. By unifying four separate models into one, Cohere reduces the operational complexity of managing multiple specialized models. The model's support for reasoning, vision, translation, and agentic workflows means organizations can potentially reduce the number of different models they maintain. The open-source Apache 2.0 license also appeals to enterprises concerned about vendor lock-in, a growing consideration as organizations sign multiyear AI agreements.

The model includes support for tool use through JSON schema descriptions and generates visible reasoning traces, features increasingly important for enterprise deployments where explainability and auditability matter. The model is compatible with vLLM and Transformers, the most widely adopted open-source inference frameworks, reducing integration friction for organizations with existing infrastructure investments.

For enterprises navigating the consolidation period, the practical implication is clear: the model isn't the strategy. Organizations that build orchestration layers allowing them to swap or blend models as the market shifts will be better positioned than those binding business logic tightly to a single vendor's API. Command A+'s efficiency and multimodal capabilities make it a credible option for that orchestration layer, but only if deployed with flexibility in mind.

" }