Abstract Syntax Tree

description1,017 papers

group53 followers

lightbulbAbout this topic

An Abstract Syntax Tree (AST) is a hierarchical tree representation of the abstract syntactic structure of source code, where each node denotes a construct occurring in the source code. It abstracts away specific syntax details, focusing instead on the logical structure and relationships of the code elements.

lightbulbAbout this topic

Key research themes

1. How can Abstract Syntax Trees (ASTs) be effectively transformed or encoded into representations that preserve structural information for improved program analysis and model learning?

This research theme investigates innovative methods to encode ASTs into forms suitable for machine learning and program manipulation, preserving key syntactic and semantic structures. Structural preservation is crucial to maintain meaningful relations within code for downstream tasks such as bug prediction, code summarization, or performance modeling. Various encoding schemes—including Prüfer sequences, higher-order abstract syntax encodings, and algebraic representations—are explored for their losslessness, expressiveness, and computational advantages.

Code Representation Learning Using Prüfer Sequences (Student Abstract)

by Tenzin Jinpa

2024, Proceedings of the ... AAAI Conference on Artificial Intelligence

Key finding: Introduces the use of Prüfer sequences as a concise and lossless encoding of ASTs, enabling unique reconstruction of the original tree. This encoding captures syntactic importance through node degree-related frequencies and... Read more

articleView Paper downloadDownload

Higher-order abstract syntax

by Conal Elliott

2013, Sigplan Notices

Key finding: Proposes Higher-Order Abstract Syntax (HOAS) using typed λ-calculus to embed name binding information uniformly and language-generically within AST representations. HOAS enables correct and efficient manipulation of syntactic... Read more

articleView Paper downloadDownload

Comparative Code Structure Analysis using Deep Learning for Performance Prediction

by Tanzima Islam

2024

Key finding: Demonstrates that deep learning models, particularly those based on tree-structured Long Short-Term Memory (LSTM) networks, can leverage hierarchical AST representations to predict relative performance changes due to code... Read more

articleView Paper downloadDownload

Explicit Syntactic Guidance for Neural Text Generation

by leyang cui

2023, arXiv (Cornell University)

Key finding: Introduces a syntax-guided generation framework that explicitly uses constituency parse trees to guide top-down hierarchical text generation. By representing syntax explicitly during generation, the approach improves... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can formal tree grammar formalisms and algebraic operations facilitate the underspecification, disambiguation, and semantic analysis of abstract syntax trees in natural language processing and programming languages?

This area explores theoretical and algorithmic frameworks for handling underspecification and ambiguity in AST-like tree structures, particularly in scope disambiguation and language semantics. It leverages formal models such as Regular Tree Grammars (RTGs), algebraic operations on block diagrams, and graph grammars to represent, manipulate, and reason about trees with complex binding and composition properties. This facilitates efficient computations for optimal semantic readings and modular semantic specifications in both NLP and programming paradigms.

Regular Tree Grammars as a Formalism for Scope Underspecification

by Michaela Regneri and

2015

Key finding: Proposes Regular Tree Grammars (RTGs) as a powerful, expressively complete formalism for underspecified representations of scope ambiguities in natural language, overcoming expressive limitations of dominance graphs. The... Read more

articleView Paper downloadDownload

An Algebra for Block Diagram Languages

by yann orlarey

2023

Key finding: Develops an algebraic approach for constructing block diagrams—akin to ASTs for visual languages—based on sequential, parallel, and recursive binary operations. This framework replaces low-level graph connections with... Read more

articleView Paper downloadDownload

Graph Grammars and Operations on Graphs

by Jan Joris Vereijken

2024

Key finding: Introduces a framework interpreting classes of string languages as classes of graph languages via typing and interpretations that enforce sequential (concatenation) correspondence with graph operations (such as sequential... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. What methods improve parsing, error detection, and repair in the processing of syntax trees, particularly for ambiguous or erroneous code and natural language inputs?

This theme focuses on advancing parsing strategies and error repair techniques for AST construction from languages with ambiguous or syntactically incorrect inputs. It includes research on tunnel parsing for ambiguous and ε-ambiguous grammars, use of compiler diagnostics to guide syntax error repair, and reasoning about structural relationships in codebases for better debugging and design pattern recognition. These approaches aim to improve parser robustness, enable efficient ambiguity resolution, and enhance automated syntax error correction using machine learning and formal methods.

Tunnel Parsing with Ambiguous Grammars

by Elena Somova

2024, Cybernetics and Information Technologies

Key finding: Extends tunnel parsing algorithms to handle grammars with countable repetitions and empty word-generating configurations without refactoring. Defines classes of ε-ambiguous and ε-deterministic grammars, formalizing structural... Read more

articleView Paper downloadDownload

SYNFIX: Automatically Fixing Syntax Errors using Compiler Diagnostics

by Premkumar Devanbu

2023

Key finding: Introduces SynFix, a machine learning-based automated syntax error repair tool that leverages compiler diagnostics and unsupervised pre-training via large RoBERTa models to substantially outperform prior approaches. SynFix... Read more

articleView Paper downloadDownload

Declarative reasoning about the structure of object-oriented systems

by Roel Wuyts

2025, Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No.98EX176)

Key finding: Presents SOUL, a logic meta-language for declaratively expressing and extracting structural relationships in class-based object-oriented systems. This logic-based approach facilitates automated reasoning about program... Read more

articleView Paper downloadDownload

Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

by Jiangneng Li

2023, arXiv (Cornell University)

Key finding: Introduces Entity Aware Data Augmentation (EADA), which constructs Entity Aware Syntax Trees (EASTs) integrating entities with syntactic tree structures to generate diverse, semantically valid augmented sentences for NLU... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Abstract Syntax Tree

COMPILERS IN UNIVERSITY SYLLABUS

by Abhishek Kundu

2026

In traditional university compiler syllabus, all phases of compiler is studied in detail in each unit before moving on to the next one. This makes the students lose the big picture of the subject. The course format should be updated in... more

descriptionView Paper arrow_downwardDownload

A Two-Dimensional Separation of Concerns for Compiler Construction

by Marjan Mernik

2026, Proceedings of the …

During language evolution, compiler construction is usually performed along two dimensions: defining new abstract syntax tree (AST) classes, or adding new operations. In order to facilitate such changes, two software design patterns... more

descriptionView Paper arrow_downwardDownload

AspectLISA: An Aspect-oriented Compiler Construction System Based on Attribute Grammars

by Marjan Mernik

2026, Electronic Notes in Theoretical Computer Science

The use of object-oriented techniques and concepts, like encapsulation and inheritance, greatly improves language specifications towards better modularity, reusability and extensibility. Additional improvements can be achieved with... more

descriptionView Paper arrow_downwardDownload

Erratum to: “A new framework for declarative programming”

by James Lipton

2026, Theoretical Computer Science

We propose a new framework for the syntax and semantics of Weak Hereditarily Harrop logic programming with constraints, based on resolution over -categories: ÿnite product categories with canonical structure. Constraint information is... more

descriptionView Paper arrow_downwardDownload

srcClone

by Hakam W. Alomari

2026

Detecting code clones is an established method for comprehending and maintaining systems. One important but challenging form of code clone detection involves detecting semantic clones, which are those that are semantically similar code... more

descriptionView Paper arrow_downwardDownload

srcClone

by Hakam W. Alomari

2026, Proceedings of the 28th International Conference on Program Comprehension

descriptionView Paper arrow_downwardDownload

srcSlice

by Hakam W. Alomari

2026, Proceedings of the 38th International Conference on Software Engineering Companion

An efficient lightweight forward static slicing tool is presented. The tool is implemented on top of srcML, an XML representation of source code. The approach does not compute the full program dependence graph but instead dependency... more

descriptionView Paper arrow_downwardDownload

Foundations of Nominal Techniques: Logic and Semantics of Variables in Abstract Syntax

by Murdoch Gabbay

2026, The Bulletin of Symbolic Logic

We are used to the idea that computers operate on numbers, yet another kind of data is equally important: the syntax of formal languages, with variables, binding, and alpha-equivalence. The original application of nominal techniques, and the one with greatest prominence in this paper, is to reasoning on formal syntax with variables and binding. Variables can be modelled in many ways: for instance as numbers (since we usually take countably many of them); as links (since they may 'point' to a binding site in the term, where they are bound); or as functions (since they often, though not always, represent 'an unknown'). None of these models is perfect. In every case for the models above, problems arise when trying to use them as a basis for a fully formal mechanical treatment of formal language. The problems are practical-but their underlying cause may be mathematical. The issue is not whether formal syntax exists, since clearly it does, so much as what kind of mathematical structure it is. To illustrate this point by a parody, logical derivations can be modelled using a G ödel encoding (i.e. injected into the natural numbers). It would be false to conclude from this that proof-theory is a branch of number theory and can be understood in terms of, say, Peano's axioms. Similarly, as it turns out, it is false to conclude from the fact that variables can be encoded e.g. as numbers, that the theory of syntax-with-binding can be understood in terms of the theory of syntax-without-binding, plus the theory of numbers (or, taking this to a logical extreme, purely in terms of the theory of numbers). It cannot; something else is going on. What that something else is, has not yet been fully understood. In nominal techniques, variables are an instance of names, and names are data. We model names using urelemente with properties that, pleasingly enough, turn out to have been investigated by Fraenkel and Mostowski in the first half of the 20th century for a completely different purpose than modelling formal language. What makes this model really interesting is that it gives names distinctive properties which can be related to useful logic and programming principles for formal syntax. Since the initial publications, advances in the mathematics and presentation have been introduced piecemeal in the literature. This paper provides in a single accessible document an updated development of the foundations of nominal techniques. This gives the reader easy access to updated results and new proofs which they would otherwise have to search across two or more papers to find, and full proofs that in other publications may have been elided. We also include some new material not appearing elsewhere.

descriptionView Paper arrow_downwardDownload

Nominal Unification

by Murdoch Gabbay

2026, Springer eBooks

We present a generalisation of ÿrst-order uniÿcation to the practically important case of equations between terms involving binding operations. A substitution of terms for variables solves such an equation if it makes the equated terms... more

descriptionView Paper arrow_downwardDownload

Foundations of Nominal Techniques: Logic and Semantics of Variables in Abstract Syntax

by Murdoch Gabbay

2026, The Bulletin of Symbolic Logic

descriptionView Paper arrow_downwardDownload

Compiler Front End Fusion: Undo Desugaring in Language Processing Tools

by Melinda Tóth

2026, Studia Universitatis Babeș-Bolyai Informatica

Compiler front ends often perform desugaring on the source code while constructing the abstract syntax tree (AST). A programming language processing tool (such as a refactoring tool) working with the desugared AST perceives the code at... more

descriptionView Paper arrow_downwardDownload

Generating Coherent and Diverse Slogans with Sequence-to-Sequence Transformer

by Dittaya Wanvarie

2026, ArXiv

Previous work in slogan generation focused on generating novel slogans by utilising templates mined from real slogans. While some such slogans can be catchy, they are often not coherent with the company’s focus or style across their... more

descriptionView Paper arrow_downwardDownload

SCDML: A Language for Conceptual Data Modeling in Model-based Systems Engineering

by Christian Hennig

2026, Proceedings of the 4th International Conference on Model-Driven Engineering and Software Development

This paper presents the design and usage of a language for Conceptual Data Modeling in Model-based Systems Engineering. Based on an existing analysis of presently employed data modeling languages, a new conceptual data modeling language... more

descriptionView Paper arrow_downwardDownload

A proactive approach to software security using DCodeBERT for vulnerability management

by Indurthi Ravindra Kumar

2026, Bulletin of Electrical Engineering and Informatics

The complexity of modern software has increased security risks, emphasizing the need for automated detection and correction. DCodeBERT, a CodeBERT-based vulnerability detection and remediation framework, is introduced in this study.... more

descriptionView Paper arrow_downwardDownload

A Tool Platform Using an XML Representation of Source Code Information

by Katsuhisa Maruyama

2026, IEICE Transactions on Information and Systems

Recent IDEs have become more extensible tool platforms but do not concern themselves with how other tools running on them collaborate with each other. They compel developers to use proprietary representations or the classical abstract... more

descriptionView Paper arrow_downwardDownload

Neuro-symbolic Zero-Shot Code Cloning with Cross-Language Intermediate Representation

by Ravindra Naik

2026, arXiv (Cornell University)

In this paper, we define a neuro-symbolic approach to address the task of finding semantically similar clones for the codes of the legacy programming language COBOL, without training data. We define a meta-model that is instantiated to... more

descriptionView Paper arrow_downwardDownload

Cluster Based Classification of Question Independent C Codes

by Roshni M

2026

In the field of software development, ensuring the accuracy and quality of code remains a paramount concern. The task of precisely classifying code as correct or incorrect poses inherent challenges. This research introduces a... more

descriptionView Paper arrow_downwardDownload

Compiler-An Overview

by Rashmi Dewan

2026, International Journal of Research in Information Technology

The intention of this paper is to provide an overview on the subject of compiler design. The overview includes previous and existing concepts, current technologies. This paper also covers definition, history, phases of compiler, structure... more

descriptionView Paper arrow_downwardDownload

Describing the Syntax and Semantics of UML Statecharts in a Heterogeneous Modelling Environment

by Jorn W Janneck

2026, Lecture Notes in Computer Science

In this paper UML statechart diagrams are used as an example of a generic approach to integrating a visual language in a heterogeneous modelling and simulation environment. A system represented in a visual language is syntactically... more

descriptionView Paper arrow_downwardDownload

A method for describing the syntax and semantics of UML statecharts

by Jorn W Janneck

2026, Software & Systems Modeling

In this article we present a method for describing the language of UML statecharts. Statecharts are syntactically defined as attributed graphs, with well-formedness rules specified by a set of first-order predicates over the abstract... more

descriptionView Paper arrow_downwardDownload

Identifying Information in Stock Message Boards and Its Implications for Stock Market Efficiency

by Balaji Rajagopalan

2026

The information value of stock message boards has often been debated. A main difficulty in assessing the value is the presence of a large number of posts with varying quality. This paper presents an intuitive approach to identify and... more

descriptionView Paper arrow_downwardDownload

MultiFix: Learning to Repair Multiple Errors by Optimal Alignment Learning

by Sang Ki Ko

2026, Findings of the Association for Computational Linguistics: EMNLP 2021

We consider the problem of learning to repair erroneous C programs by learning optimal alignments with correct programs. Since the previous approaches fix a single error in a line, it is inevitable to iterate the fixing process until no... more

descriptionView Paper arrow_downwardDownload

How much does an agent believe: an extension of modal epistemic logic

by Subrata Das

2026, Lecture Notes in Computer Science

Modal logics are often criticised for their coarse grain representation of knowledge of possibilities about assertions. That is to say, if two assertions are possible in the current world, their further properties are indistinguishable in... more

descriptionView Paper arrow_downwardDownload

Leaving the Nest: Nominal Techniques for Variables with Interleaving Scopes

by Murdoch Gabbay

2026

We examine the key syntactic and semantic aspects of a nominal framework allowing scopes of name bindings to be arbitrarily interleaved. Name binding (e.g. delta x.M) is handled by explicit name-creation and name-destruction brackets... more

descriptionView Paper arrow_downwardDownload

A new approach to abstract syntax involving binders

by Murdoch Gabbay

2026, Proceedings. 14th Symposium on Logic in Computer Science (Cat. No. PR00158)

The Fraenkel-Mostowski permutation model of set theory with atoms (FM-sets) can serve as the semantic basis of meta-logics for specifying and reasoning about formal systems involving name binding, α-conversion, capture avoiding... more

descriptionView Paper arrow_downwardDownload

Nominal unification

by Murdoch Gabbay

2026, Theoretical Computer Science

We present a generalisation of first-order unification to the practically important case of equations between terms involving binding operations. A substitution of terms for variables solves such an equation if it makes the equated terms... more

descriptionView Paper arrow_downwardDownload

A New Approach to Abstract Syntax with Variable Binding

by Murdoch Gabbay

2026, Formal Aspects of Computing

The permutation model of set theory with atoms (FM-sets), devised by Fraenkel and Mostowski in the 1930s, supports notions of 'name-abstraction' and 'fresh name' that provide a new way to represent, compute with, and reason about the... more

descriptionView Paper arrow_downwardDownload

Two-level Lambda-calculus

by Murdoch Gabbay

2026, Electronic Notes in Theoretical Computer Science

Two-level lambda-calculus is designed to provide a mathematical model of capturing substitution, also called instantiation. Instantiation is a feature of the 'informal meta-level'; it appears pervasively in specifications of the syntax... more

descriptionView Paper arrow_downwardDownload

Generative unbinding of names

by Andrew Pitts

2026

This paper is concerned with the form of typed name binding used by the FreshML family of languages. Its characteristic feature is that a name binding is represented by an abstract (name,value)-pair that may only be deconstructed via the... more

descriptionView Paper arrow_downwardDownload

Code Clone Origin Analysis Using Version Control System

by Katsuro Inoue

2025

We propose a method retrieving histories of code clones. Many code clone detection methods are proposed, but few researches forcused on histories of code clones. Histories of code clone is useful for retrieving somewhile clone... more

descriptionView Paper arrow_downwardDownload

A Self-Hosting Evaluator using HOAS

by Eli Barzilay

2025

We demonstrate a tiny, yet non-trivial evaluator that is powerful enough to run practical code, including itself. This is made possible using a Higher-Order Abstract Syntax (HOAS) representationa technique that has become popular in... more

descriptionView Paper arrow_downwardDownload

A Self-Hosting Evaluator using HOAS

by Eli Barzilay

2025

descriptionView Paper arrow_downwardDownload