Academia.eduAcademia.edu

Compiler Optimization

description1,376 papers
group226 followers
lightbulbAbout this topic
Compiler optimization is the process of improving the performance and efficiency of compiled code by transforming the intermediate representation of a program. This involves techniques that enhance execution speed, reduce memory usage, and minimize resource consumption, while preserving the program's correctness and intended functionality.
lightbulbAbout this topic
Compiler optimization is the process of improving the performance and efficiency of compiled code by transforming the intermediate representation of a program. This involves techniques that enhance execution speed, reduce memory usage, and minimize resource consumption, while preserving the program's correctness and intended functionality.

Key research themes

1. How can machine learning and AI techniques improve compiler optimization across varying applications and architectures?

This theme focuses on leveraging machine learning (ML) and artificial intelligence (AI) methods to automate and enhance compiler optimization strategies. It addresses the challenge of adapting compiler heuristics to complex program features and diverse microarchitectures, enabling compilers to learn effective optimization passes dynamically and predictively. The goal is to improve execution efficiency, resource utilization, and performance portability while reducing human effort in tuning and retargeting compilers for new programs and hardware.

Key finding: This paper surveys cache optimization and multi-memory allocation features, highlighting that machine learning (ML) techniques can guide sustainable computing strategies by intelligently selecting optimization methods... Read more
Key finding: The paper presents a novel ML model that predicts the best optimization passes for any new program on varying microarchitectural configurations, enabling automatic adaptation without retuning. Across 200 microarchitecture... Read more
Key finding: This study introduces a grammar-based genetic programming framework to automatically generate effective program features for ML models used in compiler heuristics. Applied to loop unrolling optimization in GCC, automatically... Read more
Key finding: The paper demonstrates that AI-based compiler optimizations, leveraging reinforcement learning and neural architecture search, outperform traditional techniques in optimizing machine learning workloads in terms of energy... Read more
Key finding: This work proposes a global constraints-driven strategy (GCDS) using multiple optimization sequences and a posteriori evaluation of code size vs. performance trade-offs across entire applications rather than individual loops.... Read more

2. What advanced loop transformation and data locality optimizations can compilers implement to improve performance on modern architectures?

This theme covers compiler techniques focused on high-level loop transformations such as induction variable analysis, scalar evolution, loop interchange, skewing, and vectorization enhancements to improve data locality, parallelism, and memory hierarchy utilization. Efficient management of loop-carried dependencies and leveraging advanced analyses like dependence analysis enable compilers to unlock more effective loop-level optimizations, crucial for performance on systems with deep memory hierarchies and parallel processors.

Key finding: The paper details the design of a GCC infrastructure leveraging TreeSSA for improved induction variable and scalar evolution analysis combined with data dependence tests, enabling a matrix-based approach to safely and... Read more
Key finding: By introducing a novel partial sums reordering transformation that exploits symmetry and common subexpressions in high-order stencil computations, this compiler optimization significantly reduces floating-point operations and... Read more
Key finding: URECA introduces a compiler-managed unified nonrotating register file (RF) for CGRA architectures that efficiently handles both recurring and nonrecurring loop variables by dynamically partitioning the RF and preloading... Read more
Key finding: The paper proposes new polynomial-time algorithms for counting and computing integer affine transformations of unions of parametric Z-polytopes—a mathematical abstraction critical for analyzing loop nests with parameters. The... Read more

3. How can compiler frameworks enable higher-level programming models and integrate heterogeneous computing workflows effectively?

This theme investigates compiler design approaches that bridge high-level programming abstractions—such as tasks, parallel loops, and domain-specific functions—with low-level hardware execution models, especially in heterogeneous systems. The objective is to combine programmability, portability, and performance by transforming high-level constructs (e.g., OpenMP tasks) into efficient execution engines like CUDA graphs or symbolic compilation frameworks, facilitating automated parallelization and hybrid CPU-GPU programming.

Key finding: This paper presents a novel compiler transformation that converts OpenMP tasking and accelerator model code into CUDA graphs by representing OpenMP programs as static task dependency graphs (TDGs). The approach uncouples... Read more
Key finding: TAM is a parallelizing compiler front-end that parallelizes all stages of compilation (lexical, syntax, semantic analysis, and IR generation) using data dependency graphs to maximize utilization of multicore CPUs. It... Read more
Key finding: Grisette provides a purely functional, statically typed symbolic evaluation framework implemented as a library, allowing symbolic compilation and reasoning about all program paths with merged states using ordered-guards... Read more
Key finding: The authors propose methods combining semantic inference with multi-threading and reduced input sampling to automatically generate compiler and interpreter components for domain-specific languages (DSLs) directly from example... Read more

All papers in Compiler Optimization

The quality of compiler-optimized code for high-performance applications is far behind what optimization and domain experts can achieve by hand. Although it may seem surprising at first glance, the performance gap has been widening over... more
Quantum simulation of lattice gauge theories (LGTs) represents one of the most promising applications of near-term intermediate-scale quantum (NISQ) devices. This review provides a comprehensive synthesis of recent developments in... more
Matrix multiplication is a fundamental operation in linear algebra libraries, serving as the computational backbone for scientific computing, machine learning, and data analytics applications. This paper presents a comprehensive analysis... more
There is an active research community concentrating on visualizations of algorithms taught in CS1 and CS2 courses. These visualizations can help students to create concrete visual images of the algorithms and their underlying concepts.... more
The efficiency of a software piece is a key factor for many systems. Real-time programs, critical software, device drivers, kernel OS functions and many other software pieces which are executed thousands or even millions of times per day... more
The work described here introduces a practical and accurate tool for predicting power consumption for FPGA circuits. The utility of the tool is that it enables FPGA circuit designers to evaluate the power consumption of their designs... more
The Aho-Corasick algorithm derives a failure deterministic finite automaton for finding matches of a finite set of keywords in a text. It has the minimum number of transitions needed for this task. The DFA-Homomorphic Algorithm (DHA)... more
Failure of a safety-critical application on an embedded processor can lead to severe damage or even loss of life. Here we are concerned with two kinds of failure: stack overflow, which usually leads to run-time errors that are difficult... more
The increasing complexity of cloud-native, distributed, and AI-enabled software systems has rendered traditional, static quality assurance (QA) practices increasingly inadequate, as fixed test suites and manually curated strategies... more
In this article, we investigate compiler transformation techniques regarding the problem of scheduling VLIW instructions aimed at reducing power consumption of VLIW architectures in the instruction bus. The problem can be categorized into... more
Graphics Processing Units (GPU) have become the platform of choice for accelerating a large range of data parallel and task parallel applications. Both AMD and NVIDIA have developed GPU implementations targeted at the high performance... more
The performance of data-parallel processing can be highly sensitive to any contention in memory. In contrast to multi-core CPUs which employ a number of memory latency minimization techniques such as multi-level caching and prefetching,... more
This paper discusses a repertoire of well-known and new compiler optimizations that help produce excellent server application performance and investigates their performance contributions. These optimizations combined produce a 40%... more
This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical... more
This paper discusses a repertoire of well-known and new compiler optimizations that help produce excellent server application performance and investigates their performance contributions. These optimizations combined produce a 40%... more
Hybrid architectures combining conventional processors with con gurable logic resources enable ecient coordination of control with datapath computation. With integration of the two components on a single device, loop control and... more
Modern software demands high performance, portability, and adaptability, driving innovations in compiler technologies. This article investigates advanced optimization strategies in modern compilers, focusing on Just-In-Time (JIT),... more
The widespread use of the continuation-passing style (CPS) transformation in compilers, optimizers, abstract interpreters, and partial evaluators reflects a common belief that the transformation has a positive effect on the analysis of... more
Quantum computing is poised to revolutionize computational paradigms by leveraging quantum mechanics principles such as superposition and entanglement. However, the full-scale deployment of quantum applications remains constrained by... more
For the worst-case execution time (WCET) analysis, especially loops are an inherent source of unpredictability and loss of precision. This is caused by the difficulty to obtain safe and tight information on the number of iterations... more
This thesis would not have been completed without the help of others. I would like to take this opportunity to express my gratitude towards them and acknowledge them. First of all, I would like to offer my deepest gratitude to my... more
The Amsterdam Compiler Kit is a widely used compiler building system. Up until now, the emphasis has been on producing good object code. In this paper we describe recent work that has focused on reducing compile time. The techniques... more
The general translator formalism and computing specific implementations are proposed. The implementation of specific elements necessary to process the source and destination information within the translators are presented. Some common... more
The general translator formalism and computing specific implementations are proposed. The implementation of specific elements necessary to process the source and destination information within the translators are presented. Some common... more
We exhibit an aggressive optimizing compiler for a functionalprogramming language which includes a first-class forward automatic differentiation (AD) operator. The compiler's performance is competitive with FORTRAN-based systems on our... more
We exhibit an aggressive optimizing compiler for a functionalprogramming language which includes a first-class forward automatic differentiation (AD) operator. The compiler's performance is competitive with FORTRAN-based systems on our... more
Execution of a program almost always involves multiple address spaces, possibly across separate machines. Here, an approach to reducing such costs using compiler optimization techniques is presented. This paper elaborates on the overall... more
A colleague describes working with Greg at IBM Research.
Current trends in many-core architectures show a switch from a small number of architecturally sophisticated cores (e.g. Intel Core2, IBM PowerPC) to many simple cores (e.g SiCortex and Tilera multiprocessor). These simple cores lack many... more
This paper is an introduction to Lambdix, a lazy Lisp interpreter implemented at the Research Laboratory of Paris XI University (Laboratoire de Recherche en Informatique, Orsay). Lambdix was devised in the course of an investigation into... more
Download research papers for free!