Academia.eduAcademia.edu

Abstract Syntax Tree

description1,017 papers
group53 followers
lightbulbAbout this topic
An Abstract Syntax Tree (AST) is a hierarchical tree representation of the abstract syntactic structure of source code, where each node denotes a construct occurring in the source code. It abstracts away specific syntax details, focusing instead on the logical structure and relationships of the code elements.
lightbulbAbout this topic
An Abstract Syntax Tree (AST) is a hierarchical tree representation of the abstract syntactic structure of source code, where each node denotes a construct occurring in the source code. It abstracts away specific syntax details, focusing instead on the logical structure and relationships of the code elements.

Key research themes

1. How can Abstract Syntax Trees (ASTs) be effectively transformed or encoded into representations that preserve structural information for improved program analysis and model learning?

This research theme investigates innovative methods to encode ASTs into forms suitable for machine learning and program manipulation, preserving key syntactic and semantic structures. Structural preservation is crucial to maintain meaningful relations within code for downstream tasks such as bug prediction, code summarization, or performance modeling. Various encoding schemes—including Prüfer sequences, higher-order abstract syntax encodings, and algebraic representations—are explored for their losslessness, expressiveness, and computational advantages.

Key finding: Introduces the use of Prüfer sequences as a concise and lossless encoding of ASTs, enabling unique reconstruction of the original tree. This encoding captures syntactic importance through node degree-related frequencies and... Read more
Key finding: Proposes Higher-Order Abstract Syntax (HOAS) using typed λ-calculus to embed name binding information uniformly and language-generically within AST representations. HOAS enables correct and efficient manipulation of syntactic... Read more
Key finding: Demonstrates that deep learning models, particularly those based on tree-structured Long Short-Term Memory (LSTM) networks, can leverage hierarchical AST representations to predict relative performance changes due to code... Read more
Key finding: Introduces a syntax-guided generation framework that explicitly uses constituency parse trees to guide top-down hierarchical text generation. By representing syntax explicitly during generation, the approach improves... Read more

2. How can formal tree grammar formalisms and algebraic operations facilitate the underspecification, disambiguation, and semantic analysis of abstract syntax trees in natural language processing and programming languages?

This area explores theoretical and algorithmic frameworks for handling underspecification and ambiguity in AST-like tree structures, particularly in scope disambiguation and language semantics. It leverages formal models such as Regular Tree Grammars (RTGs), algebraic operations on block diagrams, and graph grammars to represent, manipulate, and reason about trees with complex binding and composition properties. This facilitates efficient computations for optimal semantic readings and modular semantic specifications in both NLP and programming paradigms.

Key finding: Proposes Regular Tree Grammars (RTGs) as a powerful, expressively complete formalism for underspecified representations of scope ambiguities in natural language, overcoming expressive limitations of dominance graphs. The... Read more
Key finding: Develops an algebraic approach for constructing block diagrams—akin to ASTs for visual languages—based on sequential, parallel, and recursive binary operations. This framework replaces low-level graph connections with... Read more
Key finding: Introduces a framework interpreting classes of string languages as classes of graph languages via typing and interpretations that enforce sequential (concatenation) correspondence with graph operations (such as sequential... Read more

3. What methods improve parsing, error detection, and repair in the processing of syntax trees, particularly for ambiguous or erroneous code and natural language inputs?

This theme focuses on advancing parsing strategies and error repair techniques for AST construction from languages with ambiguous or syntactically incorrect inputs. It includes research on tunnel parsing for ambiguous and ε-ambiguous grammars, use of compiler diagnostics to guide syntax error repair, and reasoning about structural relationships in codebases for better debugging and design pattern recognition. These approaches aim to improve parser robustness, enable efficient ambiguity resolution, and enhance automated syntax error correction using machine learning and formal methods.

Key finding: Extends tunnel parsing algorithms to handle grammars with countable repetitions and empty word-generating configurations without refactoring. Defines classes of ε-ambiguous and ε-deterministic grammars, formalizing structural... Read more
Key finding: Introduces SynFix, a machine learning-based automated syntax error repair tool that leverages compiler diagnostics and unsupervised pre-training via large RoBERTa models to substantially outperform prior approaches. SynFix... Read more
Key finding: Presents SOUL, a logic meta-language for declaratively expressing and extracting structural relationships in class-based object-oriented systems. This logic-based approach facilitates automated reasoning about program... Read more
Key finding: Introduces Entity Aware Data Augmentation (EADA), which constructs Entity Aware Syntax Trees (EASTs) integrating entities with syntactic tree structures to generate diverse, semantically valid augmented sentences for NLU... Read more

All papers in Abstract Syntax Tree

In traditional university compiler syllabus, all phases of compiler is studied in detail in each unit before moving on to the next one. This makes the students lose the big picture of the subject. The course format should be updated in... more
During language evolution, compiler construction is usually performed along two dimensions: defining new abstract syntax tree (AST) classes, or adding new operations. In order to facilitate such changes, two software design patterns... more
The use of object-oriented techniques and concepts, like encapsulation and inheritance, greatly improves language specifications towards better modularity, reusability and extensibility. Additional improvements can be achieved with... more
We propose a new framework for the syntax and semantics of Weak Hereditarily Harrop logic programming with constraints, based on resolution over -categories: ÿnite product categories with canonical structure. Constraint information is... more
Detecting code clones is an established method for comprehending and maintaining systems. One important but challenging form of code clone detection involves detecting semantic clones, which are those that are semantically similar code... more
Detecting code clones is an established method for comprehending and maintaining systems. One important but challenging form of code clone detection involves detecting semantic clones, which are those that are semantically similar code... more
An efficient lightweight forward static slicing tool is presented. The tool is implemented on top of srcML, an XML representation of source code. The approach does not compute the full program dependence graph but instead dependency... more
We are used to the idea that computers operate on numbers, yet another kind of data is equally important: the syntax of formal languages, with variables, binding, and alpha-equivalence. The original application of nominal techniques, and... more
We present a generalisation of ÿrst-order uniÿcation to the practically important case of equations between terms involving binding operations. A substitution of terms for variables solves such an equation if it makes the equated terms... more
We are used to the idea that computers operate on numbers, yet another kind of data is equally important: the syntax of formal languages, with variables, binding, and alpha-equivalence. The original application of nominal techniques, and... more
Compiler front ends often perform desugaring on the source code while constructing the abstract syntax tree (AST). A programming language processing tool (such as a refactoring tool) working with the desugared AST perceives the code at... more
Previous work in slogan generation focused on generating novel slogans by utilising templates mined from real slogans. While some such slogans can be catchy, they are often not coherent with the company’s focus or style across their... more
This paper presents the design and usage of a language for Conceptual Data Modeling in Model-based Systems Engineering. Based on an existing analysis of presently employed data modeling languages, a new conceptual data modeling language... more
The complexity of modern software has increased security risks, emphasizing the need for automated detection and correction. DCodeBERT, a CodeBERT-based vulnerability detection and remediation framework, is introduced in this study.... more
Recent IDEs have become more extensible tool platforms but do not concern themselves with how other tools running on them collaborate with each other. They compel developers to use proprietary representations or the classical abstract... more
In this paper, we define a neuro-symbolic approach to address the task of finding semantically similar clones for the codes of the legacy programming language COBOL, without training data. We define a meta-model that is instantiated to... more
In the field of software development, ensuring the accuracy and quality of code remains a paramount concern. The task of precisely classifying code as correct or incorrect poses inherent challenges. This research introduces a... more
The intention of this paper is to provide an overview on the subject of compiler design. The overview includes previous and existing concepts, current technologies. This paper also covers definition, history, phases of compiler, structure... more
In this paper UML statechart diagrams are used as an example of a generic approach to integrating a visual language in a heterogeneous modelling and simulation environment. A system represented in a visual language is syntactically... more
In this article we present a method for describing the language of UML statecharts. Statecharts are syntactically defined as attributed graphs, with well-formedness rules specified by a set of first-order predicates over the abstract... more
The information value of stock message boards has often been debated. A main difficulty in assessing the value is the presence of a large number of posts with varying quality. This paper presents an intuitive approach to identify and... more
We consider the problem of learning to repair erroneous C programs by learning optimal alignments with correct programs. Since the previous approaches fix a single error in a line, it is inevitable to iterate the fixing process until no... more
Modal logics are often criticised for their coarse grain representation of knowledge of possibilities about assertions. That is to say, if two assertions are possible in the current world, their further properties are indistinguishable in... more
We examine the key syntactic and semantic aspects of a nominal framework allowing scopes of name bindings to be arbitrarily interleaved. Name binding (e.g. delta x.M) is handled by explicit name-creation and name-destruction brackets... more
The Fraenkel-Mostowski permutation model of set theory with atoms (FM-sets) can serve as the semantic basis of meta-logics for specifying and reasoning about formal systems involving name binding, α-conversion, capture avoiding... more
We present a generalisation of first-order unification to the practically important case of equations between terms involving binding operations. A substitution of terms for variables solves such an equation if it makes the equated terms... more
The permutation model of set theory with atoms (FM-sets), devised by Fraenkel and Mostowski in the 1930s, supports notions of 'name-abstraction' and 'fresh name' that provide a new way to represent, compute with, and reason about the... more
Two-level lambda-calculus is designed to provide a mathematical model of capturing substitution, also called instantiation. Instantiation is a feature of the 'informal meta-level'; it appears pervasively in specifications of the syntax... more
This paper is concerned with the form of typed name binding used by the FreshML family of languages. Its characteristic feature is that a name binding is represented by an abstract (name,value)-pair that may only be deconstructed via the... more
We propose a method retrieving histories of code clones. Many code clone detection methods are proposed, but few researches forcused on histories of code clones. Histories of code clone is useful for retrieving somewhile clone... more
We demonstrate a tiny, yet non-trivial evaluator that is powerful enough to run practical code, including itself. This is made possible using a Higher-Order Abstract Syntax (HOAS) representationa technique that has become popular in... more
We demonstrate a tiny, yet non-trivial evaluator that is powerful enough to run practical code, including itself. This is made possible using a Higher-Order Abstract Syntax (HOAS) representationa technique that has become popular in... more
Download research papers for free!