This document describes the overall system architecture of NVIDIA FLARE, including its layered structure, major subsystems, and how they interact. It covers the runtime components on both server and client sides, the communication framework, and the process model.
The FLARE architecture (shown above) comprises three main layers:
- Foundation Layer - Communication infrastructure, messaging protocols, privacy preservation tools, and secure platform management.
- Application Layer - Building blocks for federated learning, including federation workflows and learning algorithms.
- Tooling - FL Simulator and POC CLI for experimentation and simulation, plus deployment and management tools for production workflows.
Server Parent (SP)
- Runs FederatedServer
- Manages client registration and heartbeat monitoring
- Houses ServerEngine which orchestrates job scheduling via JobRunner
- Spawns Server Job (SJ) processes or docker/pod for each active job for different job launcher.
Server Job (SJ)
- Runs ServerRunner
- Executes workflow Controllers (e.g., ScatterAndGather)
- Broadcasts tasks to client jobs and aggregates results
- Separate process per job for isolation
Client Parent (CP)
- Runs FederatedClient
- Manages client registration with server
- Houses ClientEngine which coordinates job execution
- Spawns Client Job (CJ) processes or docker/pod for each assigned job for different job launcher.
Client Job (CJ)
- Runs ClientRunner
- Pulls tasks from server via Cell network
- Launches training processes using JobExecutor
- Routes task data to/from training process via Pipe
Training Process
- User's ML training script
- Uses Client API: flare.init(), flare.receive(), flare.send()
- Communicates with CJ via FilePipe (file-based) or CellPipe (network-based)
Cell Network: All parent and job processes communicate via F3 Cell objects that provide:
- FQCN (Fully Qualified Cell Name) addressing (e.g., server.job_123)
- Channel-based routing (SERVER_MAIN, CLIENT_MAIN, AUX_COMMUNICATION)
- Secure, encrypted messaging with authentication
- Streaming support for large data transfers
Pipe Abstraction: CJ-to-training-process communication uses Pipe interface:
- FilePipe: File system-based IPC for same-machine processes
- CellPipe: Network-based IPC allowing training process on different machine
NVFLARE provides three deployment modes that share the same core runtime but differ in packaging, security, and deployment complexity. This design ensures consistency from development to production.
| Mode | Use Case | Security | Processes | Setup Time |
|---|---|---|---|---|
| Simulator | Rapid prototyping, algorithm testing | None | multiple threads, some cases if may create multiple process | Seconds |
| POC | Local multi-client testing, workflow validation | Optional | Multiple processes on one machine | Minutes |
| Production | Real-world deployment | Full PKI/TLS | Distributed processes across machines | Hours (with provisioning) |
The Core FL Runtime is the execution engine that manages federated learning job processes and orchestration. This page documents the runtime components responsible for process lifecycle management, task coordination, and execution modes.
The Core FL Runtime consists of:
- ServerEngine : Server-side process orchestration and job lifecycle management
- ClientEngine : Client-side process management and communication handling
- JobRunner : Job scheduling, deployment, and monitoring
- SimulatorRunner : Single-machine simulation for development
| Process Type | Code Symbol | Description |
|---|---|---|
| SP | ProcessType.SERVER_PARENT | Server parent process running ServerEngine |
| SJ | ProcessType.SERVER_JOB | Server job process running ServerRunner |
| CP | ProcessType.CLIENT_PARENT | Client parent process running ClientEngine |
| CJ | ProcessType.CLIENT_JOB | Client job process running ClientRunner |
The runtime uses Cell-based communication between parent and job processes.
| Channel | Purpose | Used By |
|---|---|---|
| CellChannel.SERVER_MAIN | Client-to-server FL messages | CP to SP |
| CellChannel.CLIENT_MAIN | Server-to-client FL messages | SP to CP |
| CellChannel.SERVER_COMMAND | Commands to server job | SP to SJ |
| CellChannel.CLIENT_COMMAND | Commands to client job | CP to CJ |
| CellChannel.SERVER_PARENT_LISTENER | Parent commands from SJ | SJ to SP |
| CellChannel.AUX_COMMUNICATION | Auxiliary messages | All processes |
The Communication Framework, also known as F3 (FLARE Foundation Framework) and Cellnet, provides the foundational messaging infrastructure for all communication in NVIDIA FLARE. It implements a secure, scalable, and feature-rich messaging layer that handles all interactions between servers, clients, and administrative components.
This section provides an overview of the communication framework architecture, core components, and basic concepts.
- CellNet Architecture - Detailed architecture and design patterns
- Cell Communication Patterns - Message sending patterns and channel routing
- Streaming and Data Transfer - Large data transfer and streaming protocols
- Security and Encryption - Certificate management and message encryption
for mode details please refer to cellnet architecture :ref:`cellnet_architecture`
Please refer to :ref:`flare_security_overview` for the security architecture.


