Skip to contentRed Hat AI
  • Overview

    • AI news
    • Technical blog
    • Live AI events
    • Inference explained
    • See our approach
  • Products

    • Red Hat AI Enterprise
    • Red Hat AI Inference
    • Red Hat Enterprise Linux AI
    • Red Hat OpenShift AI
    • Explore Red Hat AI
  • Engage & learn

    • Learning hub
    • AI topics
    • AI partners
    • Services for AI
Hybrid cloud
  • Platform solutions

    • Artificial intelligence

      Build, deploy, and monitor AI models and apps.

    • Linux standardization

      Get consistency across operating environments.

    • Application development

      Simplify the way you build, deploy, and manage apps.

    • Automation

      Scale automation and unite tech, teams, and environments.

  • Use cases

    • Virtualization

      Modernize operations for virtualized and containerized workloads.

    • Digital sovereignty

      Control and protect critical infrastructure.

    • Security

      Code, build, deploy, and monitor security-focused software.

    • Edge computing

      Deploy workloads closer to the source with edge technology.

  • Explore solutions
  • Solutions by industry

    • Automotive
    • Financial services
    • Healthcare
    • Industrial sector
    • Media and entertainment
    • Public sector (Global)
    • Public sector (U.S.)
    • Telecommunications

Discover cloud technologies

Learn how to use our cloud products and solutions at your own pace in the Red Hat® Hybrid Cloud Console.

Products
  • Platforms

    • Red Hat AI

      Develop and deploy AI solutions across the hybrid cloud.

    • Red Hat Enterprise Linux

      Support hybrid cloud innovation on a flexible operating system.

    • Red Hat OpenShift

      Build, modernize, and deploy apps at scale.

    • Red Hat Ansible Automation Platform

      Implement enterprise-wide automation.

  • Featured

    • Red Hat AI Enterprise
    • Red Hat OpenShift Virtualization Engine
    • Red Hat Desktop
    • See all products
  • Try & buy

    • Start a trial
    • Buy online
    • Integrate with major cloud providers
  • Services & support

    • Consulting
    • Product support
    • Services for AI
    • Technical Account Management
    • Explore services
Training
  • Training & certification

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Featured

    • Red Hat Certified System Administrator exam
    • Red Hat System Administration I
    • Red Hat Learning Subscription trial (No cost)
    • Red Hat Certified Engineer exam
    • Red Hat Certified OpenShift Administrator exam
  • Services

    • Consulting
    • Partner training
    • Product support
    • Services for AI
    • Technical Account Management
Learn
  • Build your skills

    • Documentation
    • Hands-on labs
    • Hybrid cloud learning hub
    • Interactive demos
    • Training and certification
  • More ways to learn

    • Blog
    • Events and webinars
    • Podcasts and video series
    • Red Hat TV
    • Resource library

For developers

Discover resources and tools to help you build, deliver, and manage cloud-native applications and services.

Partners
  • For customers

    • Our partners
    • Red Hat Ecosystem Catalog
    • Find a partner
  • For partners

    • Partner Connect
    • Become a partner
    • Training
    • Support
    • Access the partner portal

Build solutions powered by trusted partners

Find solutions from our collaborative community of experts and technologies in the Red Hat® Ecosystem Catalog.

Search

I'd like to:

  • Start a trial
  • Buy a learning subscription
  • Manage subscriptions
  • Contact sales
  • Contact customer service
  • See Red Hat jobs

Help me find:

  • Documentation
  • Developer resources
  • Tech topics
  • Architecture center
  • Security updates
  • Customer support

I want to learn more about:

  • AI
  • Application modernization
  • Automation
  • Cloud-native applications
  • Linux
  • Virtualization
ConsoleDocsSupportNew For you

Recommended

We'll recommend resources you may like as you browse. Try these suggestions for now.

  • Product trial center
  • Courses and exams
  • All products
  • Tech topics
  • Resource library
Log in

Get more with a Red Hat account

  • Console access
  • Event registration
  • Training & trials
  • World-class support

A subscription may be required for some services.

Log in or register
Contact us
Red Hat logo
  • Home
  • Resources
  • Red Hat AI Inference: The open foundation for enterprise AI

Red Hat AI Inference: The open foundation for enterprise AI

May 12, 2026•
Resource type: Datasheet

Overview

Red Hat® AI Inference optimizes inference across hybrid cloud environments, acting as the engine for agentic AI and internal Model-as-a-Service (MaaS) patterns. This solution provides the operational control organizations need to run any model on any accelerator and scale predictably. With AI Inference, central IT and platform teams become the organization's AI provider, taking advantage of available resources to serve more users and agents.

Operational control to run and scale predictably

As part of the Red Hat AI portfolio, AI Inference is powered by vLLM, a high-performance inference runtime that lets users run any AI model on any hardware accelerator across datacenters, clouds, and edge environments. To help IT and platform teams scale AI workloads efficiently and manage token economics, AI Inference includes llm-d, which intelligently distributes inference processing across a fleet of accelerators, preventing bottlenecks and maximizing compute efficiency.

AI Inference accelerates time to value with a curated collection of validated, optimized open models from the Red Hat AI repository hosted on Hugging Face. These models are ready for production deployment with improved efficiency and no loss of accuracy. Advanced model compression capabilities help reduce hardware requirements and costs through techniques like quantization and speculative decoding, applied to both foundational and custom models.

The platform exposes gen AI-specific telemetry—from time-to-first-token (TTFT), key-value (KV)-cache hit rate, throughput, and graphics processing unit (GPU) utilization—to existing monitoring dashboards. It gives organizations the operational transparency they need to meet service-level objectives and control costs.

AI Inference runs on Red Hat OpenShift® and third-party Kubernetes distributions.

Table 1. Key benefits

At a glance

  • Run any model on any accelerator and cloud.

  • Manage token economics efficiently at scale.

  • Scale predictably with distributed inference.

  • Use validated, optimized open models ready for deployment.

  • Standardize operational control across datacenter, cloud, and edge.

Related resources

Red Hat AI Inference product page

Red Hat AI Hugging Face repository

What is Model-as-a-Service?

Benefit

Description

Token economics management

  • Increase production and reduce your cost per token by boosting existing infrastructure.
  • Boost available resources to scale inference cost effectively while delivering the low latency that agentic architectures demand.

Predictable scaling

  • Distribute inference traffic intelligently across a fleet of accelerators and infrastructure.
  • Prevent bottlenecks and maintain reliable performance—even during unpredictable demand spikes from agentic workflows.

Open hybrid cloud flexibility

  • Run any combination of hardware accelerators and AI models across datacenter, cloud, and edge environments with a consistent operational experience.
  • Build a unified MaaS architecture where platform teams can act as their own private AI provider—without rebuilding a centralized platform.

Components

Powered by vLLM and llm-d, AI Inference delivers a fully integrated platform that makes the most of inference at both the individual accelerator and the entire infrastructure.

  • Open hybrid cloud runtime. Run your choice of models across various accelerators on Kubernetes and Linux environments—in a datacenter, a cloud, or at the edge.
  • Distributed inference. Take advantage of a fully integrated platform powered by llm-d to route and balance inference traffic across a fleet of accelerators for consistent performance and better infrastructure utilization.
  • Model-optimization toolkit. Reduce hardware requirements and costs while maintaining accuracy through techniques like quantization and speculative decoding—applied to both foundational and custom models.
  • Validated model repository. Access a curated collection of leading gen AI models— validated and maximized for AI Inference—from the Hugging Face repository. These models are ready for production deployment with increased efficiency and preserved accuracy.
  • Enterprise Kubernetes deployment. Deploy distributed inference on Red Hat OpenShift and third-party Kubernetes platforms. Third-party deployments are covered under Red Hat’s third-party support policy.

Get started with Red Hat AI Inference

Learn about how AI Inference helps organizations expand token economics, scale predictably, and run their preferred choice of models on any accelerators and cloud.

  • Visit the Red Hat AI Inference product page.
  • Try AI Inference with a no-cost, 60-day trial.

Tags:Artificial intelligence

Red Hat logo

About Red Hat

Red Hat is the open hybrid cloud technology leader, delivering a trusted, consistent and comprehensive foundation for transformative IT innovation and AI applications. Its portfolio of cloud, developer, AI, Linux, automation and application platform technologies enables any application, anywhere—from the datacenter to the edge. As the world's leading provider of enterprise open source software solutions, Red Hat invests in open ecosystems and communities to solve tomorrow's IT challenges. Collaborating with partners and customers, Red Hat helps them build, connect, automate, secure, and manage their IT environments, supported by consulting services and award-winning training and certification offerings.

  • North America
  • Asia Pacific
  • Latin America
  • Europe, Middle East, and Africa
  • 888-REDHAT1
  • +6564904200
  • +5443297300
  • +0080073342835
  • www.redhat.com
  • apace@redhat.com
  • info-latam@redhat.com
  • europe@redhat.com
  • @red-hat
  • @redhat
  • @redhat
  • @red_hat

Copyright © 2026 Red Hat. Red Hat, the Red Hat logo, Ansible, and OpenShift are trademarks or registered trademarks of Red Hat, LLC or its subsidiaries in the United States and other countries. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. The OPENSTACK logo and word mark are trademarks or registered trademarks of OpenInfra Foundation, used under license. All other trademarks are the property of their respective owners.

Red Hat logoLinkedInYouTubeFacebookXInstagram

Platforms

  • Red Hat AI
  • Red Hat Enterprise Linux
  • Red Hat OpenShift
  • Red Hat Ansible Automation Platform
  • See all products

Tools

  • Training and certification
  • My account
  • Customer support
  • Developer resources
  • Find a partner
  • Red Hat Ecosystem Catalog
  • Documentation

Try, buy, & sell

  • Product trial center
  • Red Hat Store
  • Buy online (Japan)
  • Console

Communicate

  • Contact sales
  • Contact customer service
  • Contact training
  • Social

About Red Hat

Red Hat is an open hybrid cloud technology leader, delivering a consistent, comprehensive foundation for transformative IT and artificial intelligence (AI) applications in the enterprise. As a trusted adviser to the Fortune 500, Red Hat offers cloud, developer, Linux, automation, and application platform technologies, as well as award-winning services.

  • Our company
  • How we work
  • Customer success stories
  • Analyst relations
  • Newsroom
  • Open source commitments
  • Our social impact
  • Jobs

Change page language

Red Hat legal and privacy links

  • About Red Hat
  • Jobs
  • Events
  • Locations
  • Contact Red Hat
  • Red Hat Blog
  • Inclusion at Red Hat
  • Cool Stuff Store
  • Red Hat Summit
© 2026 Red Hat

Red Hat legal and privacy links

  • Privacy statement
  • Terms of use
  • All policies and guidelines
  • Digital accessibility