Modulo Scheduling

description180 papers

group24 followers

lightbulbAbout this topic

Modulo scheduling is an advanced compiler optimization technique used in instruction scheduling for superscalar and VLIW architectures. It aims to minimize pipeline stalls by overlapping instruction execution across multiple cycles, effectively utilizing available resources while adhering to data dependencies and resource constraints.

lightbulbAbout this topic

Key research themes

1. How can modulo scheduling algorithms be scaled and optimized for efficient loop pipelining in high-level synthesis and VLIW architectures?

This theme focuses on improving the scalability, efficiency, and quality of modulo scheduling algorithms, particularly for loop pipelining in High-Level Synthesis (HLS) targeting hardware like VLIW processors and heterogeneous systems. It addresses key challenges including minimizing initiation intervals (II), balancing computation time with schedule quality, integrating resource constraints, and reducing leakage power through scheduling optimizations. Advances aim to reduce compilation time while maintaining high throughput, enabling practical acceleration of large, complex loops.

Scaling Up Modulo Scheduling for High-Level Synthesis

by Vanderlei Bonato

2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Key finding: Proposes a new modulo scheduling algorithm that reformulates the classical problem, separating scheduling and allocation, resulting in linear scalability with loop size compared to previous quadratic methods, and enabling a... Read more

articleView Paper downloadDownload

The resource-constrained modulo scheduling problem: an experimental study

by Christian Artigues

2023, Computational Optimization and Applications

Key finding: Introduces a hybrid method combining decomposed software pipelining to obtain a valid retiming and an integer linear programming (ILP) formulation with reduced size to solve the resource-constrained modulo scheduling problem... Read more

articleView Paper downloadDownload

Leakage-Aware Modulo Scheduling for Embedded VLIW Processors

by Jingling Xue

2021, Journal of Computer Science and Technology

Key finding: Develops a leakage-aware modulo scheduling algorithm tailored for VLIW architectures with dual-threshold domino logic, maximizing idle time of functional units and reducing transitions between active and sleep modes.... Read more

articleView Paper downloadDownload

Modulo Scheduling with Regular Unwinding

by Benoît de Dinechin

2021

Key finding: Introduces a novel framework that reformulates modulo scheduling as an acyclic scheduling problem on a 'regular unwinded' problem with a regularity constraint enforcing fixed spacing between operation instances. This reduces... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can modulo scheduling be effectively applied and optimized on Coarse-Grained Reconfigurable Architectures (CGRAs) for loop-level parallelism?

This research area investigates modulo scheduling and mapping techniques to exploit loop-level parallelism on CGRAs. CGRAs offer a balance between programmability and power efficiency, but effective compiler support is crucial, particularly in scheduling, placement, routing, and managing resource constraints. The challenges include integrating scheduling with mapping and routing, handling limited registers and memory ports, and producing mappings that maximize throughput and resource utilization. Advances include heuristic and metaheuristic algorithms, routing-aware scheduling frameworks, and exploiting recomputation to overcome resource limitations.

RAMP: Resource-Aware Mapping for CGRAs

by Shail Dave

2018, 55th Annual Design Automation Conference

Key finding: Presents RAMP, a CGRA mapping approach that integrates various routing options explicitly and intelligently before scheduling to improve ability and quality of data routing. By considering routing through PEs, registers,... Read more

articleView Paper downloadDownload

Mapping loops onto Coarse-Grained Reconfigurable Architectures using Particle Swarm Optimization

by R. Venkatesan

2023, 2010 International Conference of Soft Computing and Pattern Recognition

Key finding: Proposes MCHPSO, a Modulo-Constrained Hybrid Particle Swarm Optimization algorithm for software pipelining that schedules, places, and routes loops onto CGRAs simultaneously. Experiments on DSP benchmarks and ADRES... Read more

articleView Paper downloadDownload

EPIMap

by Aviral Shrivastava

2021, Proceedings of the 49th Annual Design Automation Conference on - DAC '12

Key finding: Introduces a general problem formulation for application mapping on CGRAs that includes re-computation along with routing to alleviate resource limitations. EPIMap transforms input dependency graphs into epimorphic... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can resource constraints, such as memory ports and fairness, be incorporated and optimized within modulo scheduling frameworks?

This theme explores integrating resource constraints—like memory bandwidth limits, fairness across multiple scheduling days, and resource sharing—into modulo scheduling models. The goal is to maintain high throughput while minimizing resource usage or ensuring equitable service. Research addresses trade-offs between ideal execution times and resource usage, multi-day scheduling fairness, and the application of Boolean and pseudo-Boolean optimization techniques to reduce model sizes and improve solution scalability.

Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs

by Yosi Asher

2022, ACM Transactions on Reconfigurable Technology and Systems

Key finding: Focuses on reducing the required number of memory ports in modulo scheduling for FPGA synthesis while preserving the minimal initiation interval (ideal parallelism). By targeting 'gradual' solutions that optimize resources... Read more

articleView Paper downloadDownload

Equitable Scheduling on a Single Machine

by Danny Hermelin

2025, Proceedings of the AAAI Conference on Artificial Intelligence

Key finding: Introduces the equitable scheduling problem that generalizes single-machine scheduling by considering multiple days and guaranteeing each client meets deadlines in at least k out of m days. This model addresses fairness in... Read more

articleView Paper downloadDownload

Boolean and Pseudo-Boolean Models for Scheduling

by Steven Prestwich

2025

Key finding: Demonstrates that reformulating scheduling problems, including round-robin and job-shop scheduling, from Boolean satisfiability (SAT) to linear pseudo-Boolean (PB) constraints can significantly reduce model sizes without... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Modulo Scheduling

A Software Pipelining Framework for Simple Processor Cores

by Dee Lee

2025

Current trends in many-core architectures show a switch from a small number of architecturally sophisticated cores (e.g. Intel Core2, IBM PowerPC) to many simple cores (e.g SiCortex and Tilera multiprocessor). These simple cores lack many... more

descriptionView Paper arrow_downwardDownload

Designing cost-effective coarse-grained reconfigurable architecture

by Rabi Mahapatra

2025, Texas A&M University eBooks

Application-specific optimization of embedded systems becomes inevitable to satisfy the market demand for designers to meet tighter constraints on cost, performance and power. On the other hand, the flexibility of a system is also... more

descriptionView Paper arrow_downwardDownload

Very Wide Register: An Asymmetric Register File Organization for Low Power Embedded Processors

by Henk Corporaal

2025, 2007 Design, Automation & Test in Europe Conference & Exhibition

In current embedded systems processors, multi-ported register files are one of the most power hungry parts of the processor, even when they are clustered. This paper presents a novel register file architecture, which has single ported... more

descriptionView Paper arrow_downwardDownload

Design style case study for embedded multi media compute nodes

by Henk Corporaal

2025, Proceedings - Real-Time Systems Symposium

Users expect future handheld devices to provide extended multimedia functionality and have long battery life. This type of application imposes heavy constraints on both (realtime) performance and energy consumption and forces designers to... more

descriptionView Paper arrow_downwardDownload

Should SDBMS support a join index?

by Ned Levine

2025, Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems

Given a spatial crime data warehouse, that is updated infrequently and a set of operations O as well as constraints of storage and update overheads, the index type selection problem is to find a set of index types that can reduce the I/O... more

descriptionView Paper arrow_downwardDownload

Modulo scheduling for the TMS320C6x VLIW DSP architecture

by Ernst Leiss

2025

Digital Signal Processing (DSP) architectures are specialized for high performance numerical algorithms such as those found in communication and multimedia applications. The development of efficient compilers for DSP processors is a... more

descriptionView Paper arrow_downwardDownload

Seguimiento mamográfico, ecográfico y su correlación histopatológica en lesiones categorizadas con BI-RADS 3, 4 y 5

by Lorena Cisneros

2025, Anales de Radiología …

descriptionView Paper arrow_downwardDownload

Sulodexida para la enfermedad venosa crónica en etapas clínicas C3 y C4. Estudio abierto observacional

by Luis Fernando Flota Cervera

2024, Revista mexicana de angiología

Se ha valorado poco el tratamiento farmacológico de la enfermedad venosa crónica (EVc) en etapas clínicas CEAP C3 y C4. Objetivo. Valorar si el tratamiento con sulodexida es eficaz en la EVc etapas clínicas CEAP C3 y C4. Material y... more

descriptionView Paper arrow_downwardDownload

Modulo scheduling without overlapped lifetimes

by Ernst Leiss

2024

This paper describes complementary software-and hardwarebased approaches for handling overlapping register lifetimes that occur in modulo scheduled loops. Modulo scheduling takes the Ninstructions in a loop body and constructs an M-stage... more

descriptionView Paper arrow_downwardDownload

Exploiting pseudo-schedules to guide data dependence graph partitioning

by David Kaeli

2024, Proceedings.International Conference on Parallel Architectures and Compilation Techniques

descriptionView Paper arrow_downwardDownload

AGAMOS: A Graph-Based Approach to Modulo Scheduling for Clustered Microarchitectures

by David Kaeli

2024, IEEE Transactions on Computers

This paper presents AGAMOS, a technique to modulo schedule loops on clustered micro-architectures. The proposed scheme uses a multi-level graph partitioning strategy to distribute the workload among clusters and reduces the number of... more

descriptionView Paper arrow_downwardDownload

Early Evaluation Techniques for Low Power Binding

by Eren Kurshan

2024

This paper presents effective metrics to evaluate the power dissipation of scheduled data flow graphs (DFGs). This enables early evaluation of schedules without performing the computationally expensive resource-binding step. Our metrics... more

descriptionView Paper arrow_downwardDownload

Just-in-time scheduling for loop-based speculative parallelization

by Diego R. Llanos

2024

Scheduling for speculative parallelization is a problem that remained unsolved despite its importance. Simple methods such as Fixed-Size Chunking (FSC) need several 'dry-runs' before an acceptable chunk size is found. Other traditional... more

descriptionView Paper arrow_downwardDownload

Scaling Up Modulo Scheduling for High-Level Synthesis

by Vanderlei Bonato

2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

High-Level Synthesis tools have been increasingly used within the hardware design community to bridge the gap between productivity and the need to design large and complex systems. When targeting heterogeneous systems, where the CPU and... more

descriptionView Paper arrow_downwardDownload

SPAID: an architectural synthesis tool for DSP custom applications

by Baher Haroun

2024

Ahsfrucf-SPAID is a design tool that maps digital signal processing (DSP) algorithms into a multibus VLSI architecture. Algorithm structure, design style of functional units (FU's), and parallelism of the architecture are all explored in... more

HAL, EMUCS, SPLICER, AND CATREE SYNTHESIS FOR THE ELLIPTIC FILTER

accessed sequentially. For more parallelism, splitting ar- rays among different RAM’s would be a user-definable task. The bottlenecks resulting from writing to register files or in accessing the memory RAM may not be resolved and may result in not obtaining as high throughput rates and using much more register storage than those obtained for random topology architectures. This has lead to the devel- opment of CATHEDRAL-III [38] targeted for higher throughputs. CATHEDRAL-III maps every operation of the algorithm on a one-to-one basis to hardwired FU’s. This approach achieves the maximum possible throughput, at the expense of underutilized FU’s.

Fig. 3. Multibus multifunctional unit architecture with self-timed interface.

Fig. 1. SPAID in context of a silicon compiler.

Fig. 2. Initial architecture used in synthesis.

descriptionView Paper arrow_downwardDownload

Sulodexida para la enfermedad venosa crónica en etapas clínicas C3 y C4. Estudio abierto observacional

by A. Frati-munari

2024, Revista mexicana de angiología

descriptionView Paper arrow_downwardDownload

Mechanochemistry

by K.L. Sebastian

2024, Resonance

Nano-sized molecular motors, which consume chemicals and do mechanical work are ubiquitous in nature. One of the most powerful such motors is the viral packaging motor, which consumes ATP and packages the viral DNA into the procapsid (the... more

descriptionView Paper arrow_downwardDownload

Automated Loop Fusion for Image Processing

by Madushan Abeysinghe and

2024, TechRxiv

In this paper, we develop a method for automatically selecting groups of loops to fuse in an image processing data flow graph, here referred to as a "fusing configuration". The method is designed for use on Digital Signal Processors... more

descriptionView Paper arrow_downwardDownload

Sources and relative importance of PCDD and PCDF emissions

by M. Lind

2024, Waste Management & Research

Polychlorinated dioxins (PCDD) and dibenzofurans (PCDF) have been identified in technical products and pesticides, most of which are not very widely used today. Other sources are incinerators of various types like MSW incinerators,... more

descriptionView Paper arrow_downwardDownload

Reevaluating Data Stall Time with the Consideration of Data Access Concurrency

by Xian-he Sun

2024, Journal of Computer Science and Technology

descriptionView Paper arrow_downwardDownload

A New Modelling Framework for Coarse-Grained Programmable Architectures

by Eva DOKLADALOVA

2024

Coarse-grained reconfigurable architectures (CGRA) are designed to deliver high-performance computing while drastically reducing the latency of the computing system. Although they are often highly domain-specifically optimized, they keep... more

descriptionView Paper arrow_downwardDownload

Transport-Triggered Soft Cores

by Jarmo Takala

2024, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Soft cores are used as flexible software programmable components in FPGA designs. Transport-Triggered Architecture (TTA) is interesting for this use due to its scalability, modularity, simplified register files (RF) and fine-grained... more

descriptionView Paper arrow_downwardDownload

Enhancement of MCM testability using an embedded reconfigurable FPGA

by peyman nejat dehkordi

2024, 1997 Proceedings Second Annual IEEE International Conference on Innovative Systems in Silicon

The testability of an MCM can be enhanced significantly for very little cost whenever a reprogrammable FPGA component that is already embedded in the MCM for functionality is utilized for diagnostics. This approach can have some of the... more

descriptionView Paper arrow_downwardDownload

Lower and upper bounds for the resource-constrained modulo scheduling problem

by Christian Artigues

2024, HAL (Le Centre pour la Communication Scientifique Directe)

descriptionView Paper arrow_downwardDownload

Lower and upper bounds for the resource-constrained modulo scheduling problem

by Christian Artigues

2024

descriptionView Paper arrow_downwardDownload

Lagrangian relaxation-based lower bound for resource-constrained modulo scheduling

by Christian Artigues

2024, Electronic Notes in Discrete Mathematics

Ayala, Artigues, Gacias (LAAS) Lagrangian relaxation for the RCMSP ISCO 2010 1 / 24 Ayala, Artigues, Gacias (LAAS) Lagrangian relaxation for the RCMSP ISCO 2010 2 / 24 3 Lagrangian relaxation 4 Experimental results. 5 Conclusion and... more

Each task uses each resource such that ds, b? >Oattime 9; mod A

descriptionView Paper arrow_downwardDownload

Minimum register instruction sequence problem: revisiting optimal code generation for DAGs

by R. Govindarajan

2024, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001

We revisit the optimal code generation or evaluation order determination problem-the problem of generating an instruction sequence from a data dependence graph (DDG). In particular, we are interested in generating an instruction sequence... more

descriptionView Paper arrow_downwardDownload

A theory for software-hardware co-scheduling for ASIPs and embedded processors

by R. Govindarajan

2024, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors

Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded processors. Existing techniques deal with either scheduling... more

descriptionView Paper arrow_downwardDownload

Co-Scheduling Hardware and Software Pipelines

by R. Govindarajan

2024

In this paper w e propose CO-Scheduling, a framework f o r simultaneous design of hardware pipelines structures and software-pipelined schedules. T w o important components of t h e Co-Scheduling framework are: (1) An extension t o t h e... more

descriptionView Paper arrow_downwardDownload

Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

by R. Govindarajan

2024, International Journal of Parallel Programming

Instruction scheduling methods which use the concepts developed by the classical pipeline theory have been proposed for architectures involving deeply pipelined function units. These methods rely on the construction of state diagrams (or... more

descriptionView Paper arrow_downwardDownload

How to eliminate non-positive circuits in periodic scheduling: a procative strategy based on shortest path equations

by Karine Deschinkel

2023, HAL (Le Centre pour la Communication Scientifique Directe)

Usual periodic scheduling problems deal with precedence

descriptionView Paper arrow_downwardDownload

Just-In-Time Scheduling for Loop-based Speculative Parallelization

by Belén Palop

2023

descriptionView Paper arrow_downwardDownload

The Lexical Differences in Madurese Varieties Spoken by People in Situbondo Regency

by Rhofiatul Badriyah

2023

One of the characteristics of Madurese variety used in Situbondo Regency is the lexical differences. Focusing on the Madurese variety used by people to communicate in their daily life, this study is aimed to describe the lexical... more

After calculating the data, the status of a variety is obtained. Mahsun classifies the status of a variet into five types (Mahsun, 2005, p. 176). They are:

Interpretive Map of Madurese Varieties in Situbondo In addition, there are some principles which are important to be taken into account in determining the status of lexical differences:

Anglicist Volume 05 No 02 (August 2016) | Rhofiatul Badriyah; Erlita Rusnaningtias

descriptionView Paper arrow_downwardDownload

Mixed-granularity parallel coarse-grained reconfigurable architecture

by deng jinyi

2023, Proceedings of the 59th ACM/IEEE Design Automation Conference

Coarse-Grained Reconfigurable Architecture (CGRA) is a highperformance computing architecture. However, existing CGRA silicon utilization is low due to the lack of fine-grained parallelism inside Processing Element (PE) and general... more

Figure 4: Coarse-grained Parallelism of MP-Model: Vectoriza- tion with a Two-level Nested Loop Example And Task-level Parallelism

Figure 3: Mixed-granularity Parallel Model

which improves the silicon utilization of PE and makes the map- ping compact and regular. Besides, a general coarse-grained parallel method is proposed to optimize the PE utilization. In addition to the Mixed-granularity Parallel Model (MP-Model), a co-designed Mixed-granularity Parallel CGRA (MP-CGRA) has also been pro- posed by adding fine-grained parallelism in PE and increasing PE- level coarse-grained parallelism. Thus silicon utilization of CGRA is effectively improved, leading to a significant optimization of performance.

descriptionView Paper arrow_downwardDownload

Influence of Variable Time Operations in Static Instruction Scheduling⋆

by Patricia Borensztejn

2023, Lecture Notes in Computer Science

Instruction Scheduling is the task of deciding what instruction will be executed at which unit of time. The objective is to extract maximum instruction level parallelism for the code. Compilers designed for VLIW and EPIC architectures do... more

descriptionView Paper arrow_downwardDownload

Influence of Variable Time Operations in Static Instruction Scheduling⋆

by Patricia Borensztejn

2023, Lecture Notes in Computer Science

descriptionView Paper arrow_downwardDownload

Software Pipelining for Packet Filters

by Masato Tsuru

2023, Lecture Notes in Computer Science

Packet filters play an essential role in traffic management and security management on the Internet. In order to create software-based packet filters that are fast enough to work even under a DOS attack, it is vital to effectively combine... more

descriptionView Paper arrow_downwardDownload

How to eliminate non-positive circuits in periodic scheduling: a proactive strategy based on shortest path equations

by Karine Deschinkel

2023, Rairo-operations Research

Usual periodic scheduling problems deal with precedence

descriptionView Paper arrow_downwardDownload

Heterogeneous Clustered VLIW Microarchitectures

by David Kaeli

2023, International Symposium on Code Generation and Optimization (CGO'07)

Increasing performance, while at the same time reducing power consumption, is a major design tradeoff in current microprocessors. In this paper, we investigate the potential of using a heterogeneous clustered VLIW microarchitecture. In... more

descriptionView Paper arrow_downwardDownload

Combinatorial Techniques for Memory Power State Scheduling in Energy-Constrained Systems

by Mitali Singh

2023, Lecture Notes in Computer Science

Energy has emerged as a critical constraint for a large number of portable, wireless devices. For data intensive applications, a significant amount of energy is dissipated in the memory. Advanced memory architectures support multiple... more

descriptionView Paper arrow_downwardDownload

A New Modelling Framework for Coarse-Grained Programmable Architectures

by Eva Dokladalova

2023

descriptionView Paper arrow_downwardDownload

Software pipelining

by Vicki Allan

2023, Proceedings of the 24th annual international symposium on Microarchitecture - MICRO 24

Software Pipelining is a fine-grain loop optimization technique for architectures that support synchronous parallel execution. We compare Lam's software pipelining algorithm with Ebcio~lu and Nakatani's technique. This research seems to... more

descriptionView Paper arrow_downwardDownload

Incremental foresighted local compaction

by Vicki Allan

2023, ACM SIGMICRO Newsletter

Under timing constraints, local compaction may fail because of poor scheduling decisions. Su [SDWX87] uses foresight to avoid some of the poor scheduling decisions. However, the foresight takes a considerable amount of time. In this paper... more

descriptionView Paper arrow_downwardDownload

Building a retargetable local instruction scheduler

by Vicki Allan

2023, Software: Practice and Experience

Historically, instruction schedulers have been developed in an ad hoc manner. This paper explores using one scheduler for a number of different architectures and the ramifications of this. In order to achieve this generality, a machine... more

descriptionView Paper arrow_downwardDownload

Dynamic context management for low power coarse-grained reconfigurable architecture

by Rabi Mahapatra

2023, Proceedings of the 19th ACM Great Lakes symposium on VLSI

Coarse-grained reconfigurable architectures (CGRA) require many processing elements (PEs) and a configuration memory unit (configuration cache) for reconfiguration of its PE array. Although this structure is meant for high performance and... more

descriptionView Paper arrow_downwardDownload

Design-Space Exploration of Stream Programs through Semantic-Preserving Transformations

by Denis Barthou

2023

Stream languages explicitly describe fork-join parallelism and pipelines, offering a powerful programming model for many-core Multi-Processor Systems on Chip (MPSoC). In an embedded resource-constrained system, adapting stream programs to... more

descriptionView Paper arrow_downwardDownload

The resource-constrained modulo scheduling problem: an experimental study

by Christian Artigues

2023, Computational Optimization and Applications

In this paper, we focus on the resource-constrained modulo scheduling problem, a general periodic scheduling problem, abstracted from the problem solved by compilers when optimizing inner loops at instruction level for VLIW parallel... more

descriptionView Paper arrow_downwardDownload

Thread-Sensitive Modulo Scheduling for Multicore Processors

by Quân Nguyễn

2023, 2008 37th International Conference on Parallel Processing

This paper describes a generalisation of modulo scheduling to parallelise loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-level parallelism while preserving the simplicity and... more

descriptionView Paper arrow_downwardDownload

FARMING OBJECTIVES AND ENVIRONMENTAL ISSUES IN THE VENICE LAGOON WATER BASIN; Proceedings of the Fifth Joint Conference on Agriculture, Food, and the Environment, June 17-18, 1996, Padova, Italy

by Paolo Rosato

2023

descriptionView Paper arrow_downwardDownload