Key research themes
1. How can Abstract Syntax Trees (ASTs) be effectively transformed or encoded into representations that preserve structural information for improved program analysis and model learning?
This research theme investigates innovative methods to encode ASTs into forms suitable for machine learning and program manipulation, preserving key syntactic and semantic structures. Structural preservation is crucial to maintain meaningful relations within code for downstream tasks such as bug prediction, code summarization, or performance modeling. Various encoding schemes—including Prüfer sequences, higher-order abstract syntax encodings, and algebraic representations—are explored for their losslessness, expressiveness, and computational advantages.
2. How can formal tree grammar formalisms and algebraic operations facilitate the underspecification, disambiguation, and semantic analysis of abstract syntax trees in natural language processing and programming languages?
This area explores theoretical and algorithmic frameworks for handling underspecification and ambiguity in AST-like tree structures, particularly in scope disambiguation and language semantics. It leverages formal models such as Regular Tree Grammars (RTGs), algebraic operations on block diagrams, and graph grammars to represent, manipulate, and reason about trees with complex binding and composition properties. This facilitates efficient computations for optimal semantic readings and modular semantic specifications in both NLP and programming paradigms.
3. What methods improve parsing, error detection, and repair in the processing of syntax trees, particularly for ambiguous or erroneous code and natural language inputs?
This theme focuses on advancing parsing strategies and error repair techniques for AST construction from languages with ambiguous or syntactically incorrect inputs. It includes research on tunnel parsing for ambiguous and ε-ambiguous grammars, use of compiler diagnostics to guide syntax error repair, and reasoning about structural relationships in codebases for better debugging and design pattern recognition. These approaches aim to improve parser robustness, enable efficient ambiguity resolution, and enhance automated syntax error correction using machine learning and formal methods.