Academia.eduAcademia.edu

Outline

A Programming Language Interface to Describe Transformations and Code Generation

2011, Languages and Compilers for Parallel Computing

https://doi.org/10.1007/978-3-642-19595-2_10

Abstract

This paper presents a programming language interface, a complete scripting language, to describe composable compiler transformations. These transformation programs can be written, shared and reused by non-expert application and library developers. From a compiler writer's perspective, a scripting language interface permits rapid prototyping of compiler algorithms that can mix levels and compose different sequences of transformations, producing readable code as output. From a library or application developer's perspective, the use of transformation programs permits expression of clean high-level code, and a separate description of how to map that code to architectural features, easing maintenance and porting to new architectures. We illustrate this interface in the context of CUDA-CHiLL, a sourceto-source compiler transformation and code generation framework that transforms sequential loop nests to high-performance GPU code. We show how this high-level transformation and code generation language can be used to express: (1) complex transformation sequences, exemplified by a single loop restructuring construct used to generate a series of tiling and permute commands; and, (2) complex code generation sequences to produce CUDA code from a high-level specification. We demonstrate that the automatically-generated code either performs closely or outperforms two hand-tuned GPU library kernels from Nvidia's CUBLAS 2.2 and 3.2 libraries.

A Programming Language Interface to Describe Transformations and Code Generation Gabe Rudy1 , Malik Murtaza Khan2 , Mary Hall1 , Chun Chen1 , and Jacqueline Chame2 1 School of Computing, University of Utah, Salt Lake City, UT 2 USC/Information Sciences Institute, Marina del Rey CA Abstract. This paper presents a programming language interface, a complete scripting language, to describe composable compiler transformations. These transformation programs can be written, shared and reused by non-expert application and library developers. From a compiler writer’s perspective, a scripting language interface permits rapid prototyping of compiler algorithms that can mix levels and compose different sequences of transformations, producing readable code as output. From a library or application developer’s perspective, the use of transformation programs permits expression of clean high-level code, and a separate description of how to map that code to architectural features, easing maintenance and porting to new architectures. We illustrate this interface in the context of CUDA-CHiLL, a sourceto-source compiler transformation and code generation framework that transforms sequential loop nests to high-performance GPU code. We show how this high-level transformation and code generation language can be used to express: (1) complex transformation sequences, exemplified by a single loop restructuring construct used to generate a series of tiling and permute commands; and, (2) complex code generation sequences to produce CUDA code from a high-level specification. We demonstrate that the automatically-generated code either performs closely or outperforms two hand-tuned GPU library kernels from Nvidia’s CUBLAS 2.2 and 3.2 libraries. 1 Introduction Programmers of petascale and high-performance desktop platforms alike demand high levels of performance on their library and application code. Despite a large body of research on compiler techniques for increasing parallelism or better managing the memory hierarchy [34,31,32,25,28,4,20,16,23,21,1,18,19,12], the complexity of optimizing for today’s diverse architectural features leaves a significant gap between performance produced by a compiler and that achievable by a savvy programmer. For this reason, many such programmers continue to manually apply the very same code transformations that their compiler can apply automatically, not only increasing programming time but also producing low-level architecture-specific code that is difficult to port and maintain. The performance differences between automatic and manual optimization stem from many sources: K. Cooper, J. Mellor-Crummey, and V. Sarkar (Eds.): LCPC 2010, LNCS 6548, pp. 136–150, 2011. c Springer-Verlag Berlin Heidelberg 2011  Programming Language for Optimizations 137 – Compilers must be conservative to avoid generating incorrect code and are therefore limited to optimizations that are provably safe. – Compiler decision algorithms are usually based on static compile-time analysis which cannot always accurately predict dynamic behavior. – Programmers can fundamentally rewrite their code (e.g., use a new algorithm), while compilers must optimize the computation as written. Even with these limitations, compilers nevertheless provide powerful and robust techniques for transformation and code generation, which if harnessed, could greatly accelerate the human effort involved in performance tuning and facilitate clean, portable code for high-performance architectures. Our research has been addressing this performance gap in two ways: (1) we use auto-tuning to generate a search space of alternative implementations of key computations and then explore this space with empirical measurement to identify the best-performing solution; and, (2) we collaborate with application and library developers in describing how to generate code through a composition of code transformations. These two separate concepts have motivated a rethinking of the compiler structure and how both application/library programmers and compiler developers should interact with it. A previous paper described transformation recipes, which are descriptions of a composition of program transformations to be applied to a piece of code containing loop nests [9]. These recipes can be written by a programmer or generated automatically by the compiler using the same structure. Similar concepts appear in the literature, some recent, although primarily other work focuses on annotations on the source code [22,7,10,36]. The recipes in [9] are in a separate script, permitting greater flexibility and portability, since a collection of different scripts, possibly targeting different architectures, can be associated with the same high-level code. This paper takes the notion of transformation recipes a significant step further by providing a programming language interface for describing transformation and code generation. We have developed an embedded scripting language interface to express high-level transformation recipes that are translated to a lower-level interface, in the same way that high-level programming languages are translated to lower level ones to improve programmer productivity. Using a programming language offers several advantages: (1) the system can integrate queries of the program state and other control flow to guide optimization decisions; (2) the programmer can create variables to refer to output objects from code transformations to which additional transformations can be applied, and which are used to produce readable code; and, (3) the programmer of the system can express encapsulation into functions to prototype or implement high-level transformation algorithms, which can be reused by other, possibly less-skilled developers or for other pieces of similar code. To illustrate the power of this programming language interface, we have used it to implement a core set of transformations to generate CUDA code, a parallel programming language interface for Nvidia GPUs [15]. Each CUDAspecific transformation generates a composition of a sequence standard compiler
About the author
Papers
140
Followers
12
View all papers from Haiwad Khattakarrow_forward