NVIDIA Nemotron 3 Ultra Live on CoreWeave Serverless Inference

NVIDIA Nemotron 3 Ultra is live on CoreWeave Serverless Inference. 🚀 In agentic AI, what matters is speed of task completion at a given accuracy. Long-running agents make thousands of model calls per task, so the model underneath decides whether a workflow finishes in seconds or stalls out halfway through. Nemotron 3 Ultra is built for exactly that. It's an open, frontier-reasoning MoE: 550B parameters with 55B active, a hybrid Transformer-Mamba architecture, and up to 1M tokens of context. The Mamba layers keep long-context inference efficient, which is what makes it practical for agents that plan, call tools, and reason over long trajectories instead of one-shot prompts. NVIDIA designed it for orchestration, coding agents, deep research, and enterprise automation. On Serverless Inference there's no infra to manage. No clusters to provision, no capacity to babysit. You hit an endpoint, it scales to the workload, and Nemotron 3 Ultra runs right alongside the frontier models you're already using. So you can route the right model to the right step instead of forcing one model to carry the whole agent. Open weights, frontier reasoning, built for long-running agents, ready to call today. Try it: https://lnkd.in/gunGdncw

  • graphical user interface, application

This is an important shift in agent architecture. The bottleneck is no longer just model intelligence, it's how efficiently agents retrieve, organize, and reason over information before taking action. Search as Code shows how moving from sequential tool calls to programmable workflows can improve both performance and cost. For long-running agents, better orchestration is becoming just as important as better models.

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories