Vishal Gupta
2018-08-23 12:56:03 UTC
Hello everyone,
Google Summer of Code provided me the opportunity to work with Automake during
the summers. Here is the final report on work done during the same.
___________________________________________
[GSoC'18] Parse Makefile.am using
Abstract Syntax Tree - Final Report
___________________________________________
Branch:- experimental/gsoc/ast
http://git.savannah.gnu.org/cgit/automake.git/log/?h=experimental/gsoc/ast
Commit History:- cf633fb61 to 5aebec2cc
1. Introduction
============
The goal of the project was to generate an abstract syntax tree after parsing
the input Makefile.am . It has separate lexer, parser and tree generation
module so that new features can be added incrementally. For debugging purpose
the AST generated is displayed as image using graph description language.
The current implementation can parse the following type of statements:
* Primaries : All the primaries are identified.
* Multiline statements : Statements on multiple lines ending with backslash
and its combination with other type of statements
* Conditional statements : Single or nested conditional statements
[ if-else-endif block ]
* Subdirs directive : The parser parses the Makefile.am file in the said
subdirectory and generates the tree in that subdirectory.
* Include directive : The parser recursively parses the included file and
merge the generated tree to the parent tree by serializing at child process
and deserializing at parent process.
2. Technical Description
===================
* Lexer :- It converts the input file line by line into tokens using perl
regex. Whenever called by parser, it returns an array of tokens. Each token
has a name and if required, a value.
* Bison Grammar :- The grammar of Automake is written in GNU Bison. It is used
to generate state description and state transition diagram. Writing in Bison
is easy and exact syntax can be captured. Adding new features and debugging
them becomes easy as no change is required in parser. The tool is only
required at maintainer end to generate the parsing table.
* Parsing Table :- It is an array of hashes. Each array index indicate the
state and hash consist of key value pair (Key for token acceptable for that
state and value is the next transition state). For reduction of grammar and
generation of tree, a special 'reduce' key is used in Hash. It points to
array containing number of tokens to reduce and corresponding function to
call in Tree module. The start state is 0 and acceptance state is stored in
variable with value decided by converter.
* Converter :- It takes as input the state description and state transition
file generated from Bison and outputs the parsing table in a format
understood by perl.
* Parser and AST :- It parses the input file using tokens provided by lexer
and the parsing table provided by converter. LR parsing algorithm
similar to this [https://goo.gl/images/a4NWva] is implemented.
At each reduction of the grammar, the node is generated for the tree
by calling its corresponding function in tree module specified in parsing
table. After successful parsing, the parser outputs the generated tree in
DOT format.
* Graph Description Language and dot utility :- For visualization of abstract
syntax tree, it is printed in the graph description language. It provide an
easy interface to display directed and undirected graphs. As the size of
input file increases, the tree increases in width significantly as compared
to height as all the statements identified during parsing are attached to
single master node. Unflatten utility is used to fix this. At last, dot
utility is used to convert the graph to image file.
* Testing :- Test cases for features implemented are provided in âtâ directory
A simple script is executed which generates AST for the test files.
Currently visual inspection of tree is required for validation.
3. Future Work
=============
* Make rules :- Currently only basic make rules are identified by parser.
Grammar needs to be extended to handle complex make rules.
* Using AST to generate Makefile.in :- Current project was to generate AST
from input file. This AST can be used to generate Makefile.in. Converting
AST to the output file is another big task. It would require to implement
how different variables are handled, what to output for corresponding
statement and how to integrate with the current code base.
4. Conclusion
============
Learnt and enjoyed a lot. Applying the compiler theory learnt in class to
practical work with a language I was knowing only some basics was a wonderfull
experience. It also helped me to understand and follow good coding practices.
Most of the automake features are implemented. I will be working to fix the
grammar to identify complex make rules. With that, AST can be generated for
most of the files which are identified by Automake. As this is a new feature,
in the current state it cannot be merged in codebase. When generation of
Makefile.in from AST be implemented can it be merged. I would like to continue
working with that part, but some guidance in that direction will be required.
Sorry for delay in presenting the report as I was not well.
---
Vishal Gupta
Google Summer of Code provided me the opportunity to work with Automake during
the summers. Here is the final report on work done during the same.
___________________________________________
[GSoC'18] Parse Makefile.am using
Abstract Syntax Tree - Final Report
___________________________________________
Branch:- experimental/gsoc/ast
http://git.savannah.gnu.org/cgit/automake.git/log/?h=experimental/gsoc/ast
Commit History:- cf633fb61 to 5aebec2cc
1. Introduction
============
The goal of the project was to generate an abstract syntax tree after parsing
the input Makefile.am . It has separate lexer, parser and tree generation
module so that new features can be added incrementally. For debugging purpose
the AST generated is displayed as image using graph description language.
The current implementation can parse the following type of statements:
* Primaries : All the primaries are identified.
* Multiline statements : Statements on multiple lines ending with backslash
and its combination with other type of statements
* Conditional statements : Single or nested conditional statements
[ if-else-endif block ]
* Subdirs directive : The parser parses the Makefile.am file in the said
subdirectory and generates the tree in that subdirectory.
* Include directive : The parser recursively parses the included file and
merge the generated tree to the parent tree by serializing at child process
and deserializing at parent process.
2. Technical Description
===================
* Lexer :- It converts the input file line by line into tokens using perl
regex. Whenever called by parser, it returns an array of tokens. Each token
has a name and if required, a value.
* Bison Grammar :- The grammar of Automake is written in GNU Bison. It is used
to generate state description and state transition diagram. Writing in Bison
is easy and exact syntax can be captured. Adding new features and debugging
them becomes easy as no change is required in parser. The tool is only
required at maintainer end to generate the parsing table.
* Parsing Table :- It is an array of hashes. Each array index indicate the
state and hash consist of key value pair (Key for token acceptable for that
state and value is the next transition state). For reduction of grammar and
generation of tree, a special 'reduce' key is used in Hash. It points to
array containing number of tokens to reduce and corresponding function to
call in Tree module. The start state is 0 and acceptance state is stored in
variable with value decided by converter.
* Converter :- It takes as input the state description and state transition
file generated from Bison and outputs the parsing table in a format
understood by perl.
* Parser and AST :- It parses the input file using tokens provided by lexer
and the parsing table provided by converter. LR parsing algorithm
similar to this [https://goo.gl/images/a4NWva] is implemented.
At each reduction of the grammar, the node is generated for the tree
by calling its corresponding function in tree module specified in parsing
table. After successful parsing, the parser outputs the generated tree in
DOT format.
* Graph Description Language and dot utility :- For visualization of abstract
syntax tree, it is printed in the graph description language. It provide an
easy interface to display directed and undirected graphs. As the size of
input file increases, the tree increases in width significantly as compared
to height as all the statements identified during parsing are attached to
single master node. Unflatten utility is used to fix this. At last, dot
utility is used to convert the graph to image file.
* Testing :- Test cases for features implemented are provided in âtâ directory
A simple script is executed which generates AST for the test files.
Currently visual inspection of tree is required for validation.
3. Future Work
=============
* Make rules :- Currently only basic make rules are identified by parser.
Grammar needs to be extended to handle complex make rules.
* Using AST to generate Makefile.in :- Current project was to generate AST
from input file. This AST can be used to generate Makefile.in. Converting
AST to the output file is another big task. It would require to implement
how different variables are handled, what to output for corresponding
statement and how to integrate with the current code base.
4. Conclusion
============
Learnt and enjoyed a lot. Applying the compiler theory learnt in class to
practical work with a language I was knowing only some basics was a wonderfull
experience. It also helped me to understand and follow good coding practices.
Most of the automake features are implemented. I will be working to fix the
grammar to identify complex make rules. With that, AST can be generated for
most of the files which are identified by Automake. As this is a new feature,
in the current state it cannot be merged in codebase. When generation of
Makefile.in from AST be implemented can it be merged. I would like to continue
working with that part, but some guidance in that direction will be required.
Sorry for delay in presenting the report as I was not well.
---
Vishal Gupta