NVIDIA Nemotron 3 Ultra Live on CoreWeave Serverless Inference

View organization page for Weights & Biases

92,882 followers

NVIDIA Nemotron 3 Ultra is live on CoreWeave Serverless Inference. 🚀 In agentic AI, what matters is speed of task completion at a given accuracy. Long-running agents make thousands of model calls per task, so the model underneath decides whether a workflow finishes in seconds or stalls out halfway through. Nemotron 3 Ultra is built for exactly that. It's an open, frontier-reasoning MoE: 550B parameters with 55B active, a hybrid Transformer-Mamba architecture, and up to 1M tokens of context. The Mamba layers keep long-context inference efficient, which is what makes it practical for agents that plan, call tools, and reason over long trajectories instead of one-shot prompts. NVIDIA designed it for orchestration, coding agents, deep research, and enterprise automation. On Serverless Inference there's no infra to manage. No clusters to provision, no capacity to babysit. You hit an endpoint, it scales to the workload, and Nemotron 3 Ultra runs right alongside the frontier models you're already using. So you can route the right model to the right step instead of forcing one model to carry the whole agent. Open weights, frontier reasoning, built for long-running agents, ready to call today. Try it: https://lnkd.in/gunGdncw

2 Comments

Quixoa 5d

This is an important shift in agent architecture. The bottleneck is no longer just model intelligence, it's how efficiently agents retrieve, organize, and reason over information before taking action. Search as Code shows how moving from sequential tool calls to programmable workflows can improve both performance and cost. For long-running agents, better orchestration is becoming just as important as better models.

LinkedIn respects your privacy

Explore content categories