Red Hat AI Inference: The open foundation for enterprise AI

May 12, 2026

•

Resource type: Datasheet

Overview

Red Hat® AI Inference optimizes inference across hybrid cloud environments, acting as the engine for agentic AI and internal Model-as-a-Service (MaaS) patterns. This solution provides the operational control organizations need to run any model on any accelerator and scale predictably. With AI Inference, central IT and platform teams become the organization's AI provider, taking advantage of available resources to serve more users and agents.

Operational control to run and scale predictably

As part of the Red Hat AI portfolio, AI Inference is powered by vLLM, a high-performance inference runtime that lets users run any AI model on any hardware accelerator across datacenters, clouds, and edge environments. To help IT and platform teams scale AI workloads efficiently and manage token economics, AI Inference includes llm-d, which intelligently distributes inference processing across a fleet of accelerators, preventing bottlenecks and maximizing compute efficiency.

AI Inference accelerates time to value with a curated collection of validated, optimized open models from the Red Hat AI repository hosted on Hugging Face. These models are ready for production deployment with improved efficiency and no loss of accuracy. Advanced model compression capabilities help reduce hardware requirements and costs through techniques like quantization and speculative decoding, applied to both foundational and custom models.

The platform exposes gen AI-specific telemetry—from time-to-first-token (TTFT), key-value (KV)-cache hit rate, throughput, and graphics processing unit (GPU) utilization—to existing monitoring dashboards. It gives organizations the operational transparency they need to meet service-level objectives and control costs.

AI Inference runs on Red Hat OpenShift® and third-party Kubernetes distributions.

Table 1. Key benefits

At a glance

Run any model on any accelerator and cloud.
Manage token economics efficiently at scale.
Scale predictably with distributed inference.
Use validated, optimized open models ready for deployment.
Standardize operational control across datacenter, cloud, and edge.

Related resources

Red Hat AI Inference product page

Red Hat AI Hugging Face repository

What is Model-as-a-Service?

Benefit	Description
Token economics management	Increase production and reduce your cost per token by boosting existing infrastructure. Boost available resources to scale inference cost effectively while delivering the low latency that agentic architectures demand.
Predictable scaling	Distribute inference traffic intelligently across a fleet of accelerators and infrastructure. Prevent bottlenecks and maintain reliable performance—even during unpredictable demand spikes from agentic workflows.
Open hybrid cloud flexibility	Run any combination of hardware accelerators and AI models across datacenter, cloud, and edge environments with a consistent operational experience. Build a unified MaaS architecture where platform teams can act as their own private AI provider—without rebuilding a centralized platform.

Components

Powered by vLLM and llm-d, AI Inference delivers a fully integrated platform that makes the most of inference at both the individual accelerator and the entire infrastructure.

Open hybrid cloud runtime. Run your choice of models across various accelerators on Kubernetes and Linux environments—in a datacenter, a cloud, or at the edge.
Distributed inference. Take advantage of a fully integrated platform powered by llm-d to route and balance inference traffic across a fleet of accelerators for consistent performance and better infrastructure utilization.
Model-optimization toolkit. Reduce hardware requirements and costs while maintaining accuracy through techniques like quantization and speculative decoding—applied to both foundational and custom models.
Validated model repository. Access a curated collection of leading gen AI models— validated and maximized for AI Inference—from the Hugging Face repository. These models are ready for production deployment with increased efficiency and preserved accuracy.
Enterprise Kubernetes deployment. Deploy distributed inference on Red Hat OpenShift and third-party Kubernetes platforms. Third-party deployments are covered under Red Hat’s third-party support policy.

Get started with Red Hat AI Inference

Learn about how AI Inference helps organizations expand token economics, scale predictably, and run their preferred choice of models on any accelerators and cloud.

Visit the Red Hat AI Inference product page.
Try AI Inference with a no-cost, 60-day trial.

Tags:Artificial intelligence

About Red Hat

Red Hat is the open hybrid cloud technology leader, delivering a trusted, consistent and comprehensive foundation for transformative IT innovation and AI applications. Its portfolio of cloud, developer, AI, Linux, automation and application platform technologies enables any application, anywhere—from the datacenter to the edge. As the world's leading provider of enterprise open source software solutions, Red Hat invests in open ecosystems and communities to solve tomorrow's IT challenges. Collaborating with partners and customers, Red Hat helps them build, connect, automate, secure, and manage their IT environments, supported by consulting services and award-winning training and certification offerings.

North America
Asia Pacific
Latin America
Europe, Middle East, and Africa

888-REDHAT1
+6564904200
+5443297300
+0080073342835

Copyright © 2026 Red Hat. Red Hat, the Red Hat logo, Ansible, and OpenShift are trademarks or registered trademarks of Red Hat, LLC or its subsidiaries in the United States and other countries. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. The OPENSTACK logo and word mark are trademarks or registered trademarks of OpenInfra Foundation, used under license. All other trademarks are the property of their respective owners.

Red Hat AI Inference: The open foundation for enterprise AI

Overview

Operational control to run and scale predictably

Table 1. Key benefits

At a glance

Related resources

Components

Get started with Red Hat AI Inference

About Red Hat

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links