Announcing Arroyo 0.15.0
Arroyo 0.15 is now available! This release includes a large number of fixes, improvements, and new features, including a complete Iceberg sink, and more.


Updates from the Arroyo team
Arroyo 0.15 is now available! This release includes a large number of fixes, improvements, and new features, including a complete Iceberg sink, and more.



Arroyo has been acquired by Cloudflare to bring serverless SQL stream processing to the Cloudflare Developer Platfrorm, integrated with Queues, Workers, and R2. The Arroyo Engine will remain open-source and self-hostable.


Arroyo 0.14 is now available! This release introduces support for lookup joins, more powerful updating SQL, new syntax, structs in DDL, and more!


JSON is the most common serialization format used in streaming pipelines, so it pays to be able to deserialize it fast. This post covers in detail how the arrow-json library works to perform very efficient columnar JSON decoding, and the additions we've made for streaming use cases.


The LOAD stack (log storage/object storage/Arroyo/DuckDB) makes it easy to build an affordable real-time data lake with minimal operational overhead. This tutorial will guide you through the process of setting up a complete system on AWS.


Arroyo 0.13 is now available! This release introduces support for reading source metadata, a RabbitMQ connector, improved CDC support, operator chaining, along with many other improvements.


Arroyo creator Micah Wylde recently spoke at P99Conf, discussing how Arroyo achieves low-latency and high-throughput while maintaining fault tolerance and fast recovery times


Arroyo 0.12 is now available! This release introduces Python UDFs, Protobuf ingestion, JSON syntax, custom state TTLs, and many other features, improvements, and fixes.


Arroyo is the easiest way to build real-time data pipelines, and Fly.io is the easiest way to run them. This tutorial shows how to use the new pipeline cluster feature in Arroyo 0.11 to build a streaming pipeline and a web app that consumes it, all running on Fly's serverless infrastructure.


Arroyo 0.11 is now available! This release introduces pipeline clusters for lightweight, self-contained job execution, and SQLite support for simplified deployments. It also brings a new configuration system, improved UI for pipeline creation and previewing, SQL enhancements, and more.


Software used by businesses often needs to be extensible. For Arroyo, a real-time SQL engine, that means supporting user-defined functions (UDFs). But how can we support dynamic, user-written code in a static language like Rust? This post dives deep into the technical details of building a dynamically-linked, FFI-based plugin system in Rust.


Arroyo 0.10 is now available! This is our biggest release ever, featuring an entirely new SQL engine that's >3x faster and ships as a single binary. Plus NATS and MQTT connectors, more SQL features, and more.


Arroyo 0.10 has an entirely new SQL engine built with Apache Arrow and DataFusion. It's much faster, smaller, and easier to run. Read on for why and how we're making this change.


We are excited to announce that Arroyo is now a Connect with Confluent Partner, making it easier than ever for Confluent customers to integrate with the Arroyo platform. Arroyo extends Kafka with powerful stateful stream processing support, enabling businesses to analyze their data in real-time using SQL.


Arroyo is a stateful stream processing engine—which means that it's able to remember information about previously seen events, enabling features like joins, windows, and aggregations. When should you choose a stateless or a stateful streaming system? And how do stateful engines like Arroyo and Flink mitigate the difficulty of dealing with large amounts of state?


Arroyo 0.9 is now available! This release introduces async UDFs, which allow users to use databases, services, and models from within their pipelines. It also brings support for joining update tables, more control over bad data handling, a redesigned connection profile editor, and more.


It's been a big year for Arroyo! We launched the company, open-sourced the engine, and did 8 releases. Here's a look back at our very exciting 2023.


Apache Kafka is a distributed log that's a great fit for streaming applications, microservice architectures, and more. In this post, we will learn how to use Kafka with applications written in Rust.


User-defined functions (UDFs) allow users to extend Arroyo with new functionality by writing Rust code. In this tutorial, we'll walk through how to use UDFs to parse a custom data format: the Common Log Format used by Apache HTTP and other web servers.


Arroyo 0.8 is now available, with a new FileSystem source, Delta Lake sink, Redis sink, Avro support, global UDFs, and more.


What does it mean to apply SQL—a batch-oriented query language—to streams of data that are never complete? Read on for a deep dive into streaming SQL in Arroyo and other engines.


The easiest way to run a highly-scaled production Arroyo cluster is on Kubernetes. Setting up a Kubernetes cluster used to be a daunting task, but services like Amazon EKS have made it much easier. This post will walk through how to set up an EKS cluster and deploy Arroyo to it.


Recent versions of Arroyo have added support for HTTP sources, and treating individual lines of a response as streaming messages. So I wondered: could we use Arroyo to directly process metrics?


Arroyo 0.7.0 is now available, with custom partitioning for s3 writes, message framing, unnest, union, state compaction, and more.
