Vishal Gupta
2018-06-07 15:46:52 UTC
GSoC: Parse Makefile.am using abstract syntax tree
Vishal Gupta
Automake Branch: experimental/gsoc/ast
1) Introduction:-
The goal of this project is to parse Makefile.am using separate lexical and
parsing phase and generate an Abstract Syntax Tree as output. The current
implementation of parser combines both the phases into a single
unit. Extending the grammar become difficult and testing the module become
tedious as pointing the source of error requires many test cases. This
project will separate the lexical, parsing and AST generation phases, so
that different phases can be unit tested and after combining integration
testing can be done.
2) Period of work:-
The work had been started from 16th May, 2018 and will be completed by 6th
August, 2018.
3) Planning of work:-
Parsing Makefile will be done in two stages:-
Lexer:- Input file will be converted line by line into tokens and passed to
the parser. Regular expression is used to identify lexemes and convert them
to tokens. An array of tokens is returned by the lexer.
Parser:- It parses the Makefile.am file according to the grammar and builds
the AST. The grammar of Automake is implemented in bison and it is
converted into the parsing table. Parser pushes tokens to the stack or
reduce according to the grammar. During reduction, Tree node is created
which is internally a hash consisting of node information and its child
references.
Sample Parsing Table :-
@table = ({ num_k => 1 , expr => 3 , input => 2 } , { reduce => [ 1 ,
\&num ] } , { end => 4 } , { '-' => 6 , '/' => 8 , '+' => 5 , '*' => 7 ,
'\n' => 9 } , { }) ;
Every index in table represents particular parsing state. The state consist
of hash of token and next state number. Reduce key represent reduction of
tokens and creation of node in AST. It consist of an array consisting of
number of tokens to reduce and function to call for creating the node. An
empty hash represent acceptance state.
Parser :-
while( @stack )
{
if( $stack[ -1 ] == $acceptState )
print "Complete";
my @curr_token = @{ $tokens[ 0 ] } ;
if( $val = $arr[ $stack[ -1 ] ] { $curr_token[ 0 ] } )
{
push @stack, \@curr_token, $val;
shift @tokens;
}
elsif( $val = $arr[ $stack[ -1 ] ] { reduce } )
{
my @val1 = @$val;
my @param;
for( $i = 1 ; $i <= 2 * $val1[ 0 ] ; $i++)
if( $i % 2 == 0)
{
$val = pop @stack;
push @param , $val;
}
else
pop @stack;
push @stack , $val1[ 1 ] - > ( @param );
push @stack , $arr[ $stack [ -2 ] ]{ $stack[ -1 ] -> { name } };
}
else die "Unexpected Token ". $curr_token."\n";
}
The above code parses the input and generates the AST. It find the next
state to jump according to the current token and current state number. If
shifting on stack is not possible, it will check for reduce entry and if
found, will call the node function with required parameter. If both shift
and reduce is not possible, then it represents an error.
Automake Grammar:-
The grammar of Automake is written in GNU Bison. Other Perl modules like
Marpa::R2 or Parse::RecDescent could be used for specifying the grammar and
for parsing but they would become a dependency for Automake. Using bison
does not have this disadvantage as it will only be used by the developers
to extend the grammar and develop the parsing table. Once the parsing table
is developed, bison is not required. Another advantage is that, extending
the grammar only require changes in bison file, lexer file if new tokens
are added and AST file for corresponding nodes. The parser would not have
to be updated.
Methods for converting bison grammar to Parsing table:-
Manual :- Bison grammar[1] is graph is made using bison --graph option.
The resultant file is converted into image using dot utility. The image[2]
represents the state transition diagram which can be converted to the
parsing table. Each edge from one node to another represent a transition on
finding a particular token. This can be converted manually into parsing
table.
Automatic :- As the grammar grows, the number of states increases and
chances of manual conversion being correct can decrease. Automatic
conversion can convert the graph to Perl parsing table in the above
specified format. (Converter.pl [3])
Mixed:- As the automatic conversion script is not bug free. First automatic
conversion and then manual checking can be applied, to reduce the time and
easily identify errors.
4) Progress:-
Implementation of the project is iterative. Basic lexer, parser and AST
module are implemented. New grammar features are added and the modules are
updated accordingly. Currently the parser identifies different type of
primaries, distinction between Automake variable definition and Make rule.
A program for converting the bison grammar to Perl parsing table is also
implemented.
5) Next Step:-
Extending the parser to support different type of variables identified by
Automake and creating unit test for these modules.
Footnotes:-
[1] Bison grammar
<http://git.savannah.gnu.org/cgit/automake.git/tree/lib/Automake/Parser/automake.y?h=experimental/gsoc/ast&id=6115c12f37db3e7a1746dbc21fc3949e31efce7f>
[2] Automake graph <https://imgur.com/a/H869UCG>
[3] Converter.pl
<http://git.savannah.gnu.org/cgit/automake.git/tree/lib/Automake/Parser/Converter.pl?h=experimental/gsoc/ast&id=6115c12f37db3e7a1746dbc21fc3949e31efce7f>
Vishal Gupta
Automake Branch: experimental/gsoc/ast
1) Introduction:-
The goal of this project is to parse Makefile.am using separate lexical and
parsing phase and generate an Abstract Syntax Tree as output. The current
implementation of parser combines both the phases into a single
unit. Extending the grammar become difficult and testing the module become
tedious as pointing the source of error requires many test cases. This
project will separate the lexical, parsing and AST generation phases, so
that different phases can be unit tested and after combining integration
testing can be done.
2) Period of work:-
The work had been started from 16th May, 2018 and will be completed by 6th
August, 2018.
3) Planning of work:-
Parsing Makefile will be done in two stages:-
Lexer:- Input file will be converted line by line into tokens and passed to
the parser. Regular expression is used to identify lexemes and convert them
to tokens. An array of tokens is returned by the lexer.
Parser:- It parses the Makefile.am file according to the grammar and builds
the AST. The grammar of Automake is implemented in bison and it is
converted into the parsing table. Parser pushes tokens to the stack or
reduce according to the grammar. During reduction, Tree node is created
which is internally a hash consisting of node information and its child
references.
Sample Parsing Table :-
@table = ({ num_k => 1 , expr => 3 , input => 2 } , { reduce => [ 1 ,
\&num ] } , { end => 4 } , { '-' => 6 , '/' => 8 , '+' => 5 , '*' => 7 ,
'\n' => 9 } , { }) ;
Every index in table represents particular parsing state. The state consist
of hash of token and next state number. Reduce key represent reduction of
tokens and creation of node in AST. It consist of an array consisting of
number of tokens to reduce and function to call for creating the node. An
empty hash represent acceptance state.
Parser :-
while( @stack )
{
if( $stack[ -1 ] == $acceptState )
print "Complete";
my @curr_token = @{ $tokens[ 0 ] } ;
if( $val = $arr[ $stack[ -1 ] ] { $curr_token[ 0 ] } )
{
push @stack, \@curr_token, $val;
shift @tokens;
}
elsif( $val = $arr[ $stack[ -1 ] ] { reduce } )
{
my @val1 = @$val;
my @param;
for( $i = 1 ; $i <= 2 * $val1[ 0 ] ; $i++)
if( $i % 2 == 0)
{
$val = pop @stack;
push @param , $val;
}
else
pop @stack;
push @stack , $val1[ 1 ] - > ( @param );
push @stack , $arr[ $stack [ -2 ] ]{ $stack[ -1 ] -> { name } };
}
else die "Unexpected Token ". $curr_token."\n";
}
The above code parses the input and generates the AST. It find the next
state to jump according to the current token and current state number. If
shifting on stack is not possible, it will check for reduce entry and if
found, will call the node function with required parameter. If both shift
and reduce is not possible, then it represents an error.
Automake Grammar:-
The grammar of Automake is written in GNU Bison. Other Perl modules like
Marpa::R2 or Parse::RecDescent could be used for specifying the grammar and
for parsing but they would become a dependency for Automake. Using bison
does not have this disadvantage as it will only be used by the developers
to extend the grammar and develop the parsing table. Once the parsing table
is developed, bison is not required. Another advantage is that, extending
the grammar only require changes in bison file, lexer file if new tokens
are added and AST file for corresponding nodes. The parser would not have
to be updated.
Methods for converting bison grammar to Parsing table:-
Manual :- Bison grammar[1] is graph is made using bison --graph option.
The resultant file is converted into image using dot utility. The image[2]
represents the state transition diagram which can be converted to the
parsing table. Each edge from one node to another represent a transition on
finding a particular token. This can be converted manually into parsing
table.
Automatic :- As the grammar grows, the number of states increases and
chances of manual conversion being correct can decrease. Automatic
conversion can convert the graph to Perl parsing table in the above
specified format. (Converter.pl [3])
Mixed:- As the automatic conversion script is not bug free. First automatic
conversion and then manual checking can be applied, to reduce the time and
easily identify errors.
4) Progress:-
Implementation of the project is iterative. Basic lexer, parser and AST
module are implemented. New grammar features are added and the modules are
updated accordingly. Currently the parser identifies different type of
primaries, distinction between Automake variable definition and Make rule.
A program for converting the bison grammar to Perl parsing table is also
implemented.
5) Next Step:-
Extending the parser to support different type of variables identified by
Automake and creating unit test for these modules.
Footnotes:-
[1] Bison grammar
<http://git.savannah.gnu.org/cgit/automake.git/tree/lib/Automake/Parser/automake.y?h=experimental/gsoc/ast&id=6115c12f37db3e7a1746dbc21fc3949e31efce7f>
[2] Automake graph <https://imgur.com/a/H869UCG>
[3] Converter.pl
<http://git.savannah.gnu.org/cgit/automake.git/tree/lib/Automake/Parser/Converter.pl?h=experimental/gsoc/ast&id=6115c12f37db3e7a1746dbc21fc3949e31efce7f>