Discussion:
[GSoC] Proposal for "Parse Makefile.am using an AST"
Matthias Paulmier
2018-03-04 15:20:05 UTC
Permalink
Hello,

I'm a french CS student at the University of Bordeaux. I'm currently following a
masters degree course specialized in network communications and administration.
I've been interested in free software for a couple of years now and have been
willing to help a project for some time, but never found one I could help with a
significant contribution before that.

I have decided to candidate for the project "Parse Makefile.am using an Abstract
Syntax Tree".

The reason I'm choosing this subject over the other one is that I already have
good knowledge about ASTs. I have worked on a small programming language as an
assignment (project here :
<https://services.emi.u-bordeaux.fr/projet/viewvc/compilfinal/> but it is very
poorly written). It is a very basic interpreter for a trimmed Pascal programming
language written in C with Flex and Bison. On this project I've worked on the
syntax and semantic analysis as well the lexer (which is not a big deal with
Flex).

I've already met with Mathieu Lirzin to talk about the project so I have a
general idea of what is expected of this GSoC. From my understanding, both
proposed subjects' goal is to go towards Automake's eventual modularization. The
benefits of generating this AST from a Makefile.am file would be to separate the
different code generation phases, improve the test suite by testing each phase
separately and probably others that it can't think about now.

My knowledge in Perl may be my weak point for this project as I only know a bit
of the syntax. But I am familiar with other programming languages, principally C
and Python.

If you have any suggestions on documents I can read or software I can check to
prepare for this project I'll be glad to check them. I know texinfo is written
in Perl and generates an AST so I'll check that.

Thanks.

--
Matthias Paulmier
Kip Warner
2018-03-04 22:00:32 UTC
Permalink
Post by Matthias Paulmier
If you have any suggestions on documents I can read or software I can
check to prepare for this project I'll be glad to check them. I know
texinfo is written in Perl and generates an AST so I'll check that.
Hey Matthias,

I think that is a great idea and I commend you for wanting to
contribute to GNU Automake. I use it nearly everyday in my work.

I don't have much to suggest. As I recall Automake is written mostly in
Perl which is not a language I have much experience with. However, if
you plan on using a native system language like C++, consider looking
at flexc++(1) and bisonc++(1).

Yours truly,
--
Kip Warner | Senior Software Engineer
OpenPGP signed/encrypted mail preferred
https://www.cartesiantheatre.com
John Calcote
2018-03-04 22:46:40 UTC
Permalink
Hi Matthias,

If you have any suggestions on documents I can read or software I can check
Post by Matthias Paulmier
to
prepare for this project I'll be glad to check them. I know texinfo is written
in Perl and generates an AST so I'll check that.
A Makefile.am file is really just a Makefile with embellishments. It seems
like your ast would have to incorporate most of make’s syntax to work
correctly.

The reason Perl was chosen to begin with is because of its great text
processing capabilities as, ultimately, all automake really does is copy
the file directly to the output Makefile.in file, filtering out automake
stuff along the way and injecting make snippets generated from the automake
constructs.

This may not appear obvious at first because many simpler Makefile.am files
contain only automake stuff. But anything found in the Makefile.am file
that automake doesn’t recognize is assumed to be proper make script and
copied directly to the output file.

I suggest making your ast handle non automake chunks as a specific token
type designed to be passed through without modifications.

Just a few thoughts for you to consider.

Kind regards,

John Calcote
Matthias Paulmier
2018-03-05 17:37:47 UTC
Permalink
Post by John Calcote
A Makefile.am file is really just a Makefile with embellishments. It seems
like your ast would have to incorporate most of make’s syntax to work
correctly.
The reason Perl was chosen to begin with is because of its great text
processing capabilities as, ultimately, all automake really does is copy
the file directly to the output Makefile.in file, filtering out automake
stuff along the way and injecting make snippets generated from the automake
constructs.
This may not appear obvious at first because many simpler Makefile.am files
contain only automake stuff. But anything found in the Makefile.am file
that automake doesn’t recognize is assumed to be proper make script and
copied directly to the output file.
This is what I figured from the doc. To be honest I've really only used automake
as a "user" and have never written a Makefile.am myself. But I figured that from
the "General Operation" subsection of the documentation and from the chat I had
with Mathieu.
Post by John Calcote
I suggest making your ast handle non automake chunks as a specific token
type designed to be passed through without modifications.
As you said automake assumes what it doesn't recognize is proper make script I
think this would be a reasonable approach. Letting make handle make stuff sounds
good to me.

Thanks for the pointers.

--
Matthias Paulmier
Mathieu Lirzin
2018-03-07 21:04:44 UTC
Permalink
Post by John Calcote
Hi Matthias,
If you have any suggestions on documents I can read or software I can check
Post by Matthias Paulmier
to
prepare for this project I'll be glad to check them. I know texinfo is written
in Perl and generates an AST so I'll check that.
A Makefile.am file is really just a Makefile with embellishments. It seems
like your ast would have to incorporate most of make’s syntax to work
correctly.
The reason Perl was chosen to begin with is because of its great text
processing capabilities as, ultimately, all automake really does is copy
the file directly to the output Makefile.in file, filtering out automake
stuff along the way and injecting make snippets generated from the automake
constructs.
This may not appear obvious at first because many simpler Makefile.am files
contain only automake stuff. But anything found in the Makefile.am file
that automake doesn’t recognize is assumed to be proper make script and
copied directly to the output file.
I suggest making your ast handle non automake chunks as a specific token
type designed to be passed through without modifications.
I agree that using a coarse grained AST is a good first approach.
Exploration and evaluation of a finer grained approach later during this
GSoC could be interesting too.

Thanks for your input.
--
Mathieu Lirzin
GPG: F2A3 8D7E EB2B 6640 5761 070D 0ADE E100 9460 4D37
NightStrike
2018-03-07 21:46:00 UTC
Permalink
Post by John Calcote
Hi Matthias,
If you have any suggestions on documents I can read or software I can check
Post by Matthias Paulmier
to
prepare for this project I'll be glad to check them. I know texinfo is written
in Perl and generates an AST so I'll check that.
A Makefile.am file is really just a Makefile with embellishments. It seems
like your ast would have to incorporate most of make’s syntax to work
correctly.
The reason Perl was chosen to begin with is because of its great text
processing capabilities as, ultimately, all automake really does is copy
the file directly to the output Makefile.in file, filtering out automake
stuff along the way and injecting make snippets generated from the automake
constructs.
This may not appear obvious at first because many simpler Makefile.am files
contain only automake stuff. But anything found in the Makefile.am file
that automake doesn’t recognize is assumed to be proper make script and
copied directly to the output file.
I suggest making your ast handle non automake chunks as a specific token
type designed to be passed through without modifications.
I agree that using a coarse grained AST is a good first approach.
Exploration and evaluation of a finer grained approach later during this
GSoC could be interesting too.

Thanks for your input.

--
Mathieu Lirzin
GPG: F2A3 8D7E EB2B 6640 5761 070D 0ADE E100 9460 4D37

What problem does the AST solve?
Mathieu Lirzin
2018-03-07 23:11:54 UTC
Permalink
Post by Mathieu Lirzin
Post by John Calcote
A Makefile.am file is really just a Makefile with embellishments. It seems
like your ast would have to incorporate most of make’s syntax to work
correctly.
The reason Perl was chosen to begin with is because of its great text
processing capabilities as, ultimately, all automake really does is copy
the file directly to the output Makefile.in file, filtering out automake
stuff along the way and injecting make snippets generated from the automake
constructs.
This may not appear obvious at first because many simpler Makefile.am files
contain only automake stuff. But anything found in the Makefile.am file
that automake doesn’t recognize is assumed to be proper make script and
copied directly to the output file.
I suggest making your ast handle non automake chunks as a specific token
type designed to be passed through without modifications.
I agree that using a coarse grained AST is a good first approach.
Exploration and evaluation of a finer grained approach later during this
GSoC could be interesting too.
Thanks for your input.
What problem does the AST solve?
The main one I see is the potential modularity and performant
testability it brings. Checking some properties in an in-memory tree
data structure instead of reading a file has generally better
performance. While this performance gain is not important in an
practical interactive usage of 'automake', the benefit will be
significative for the test-suite runtime assuming that most functional
tests are rewritten as unit tests.

Using an AST is not the only possible approach to achieve this goal of
having an in-memory data structure for the tests. However the AST
approach is generally considered a better design for syntax/semantic
analysis than having a couple of streams of character combined with a
set of global variables.
--
Mathieu Lirzin
GPG: F2A3 8D7E EB2B 6640 5761 070D 0ADE E100 9460 4D37
Kip Warner
2018-03-06 00:32:50 UTC
Permalink
Thanks I'll have a look. I've used the C version of flex and bison
already as said in my original post.
However I think we'll stick to Perl for this project. As John Calcote
said
(<http://lists.gnu.org/archive/html/automake/2018-03/msg00004.html>),
Perl has its advantages for such a project and I don't know if a
lower level programming language would benefit Automake that much.
Maybe I'm wrong.
Well John's one to know because he literally wrote my favourite book on
the Autotools!

A lot of what Automake does is string manipulation which probably lends
itself well to an interpreted language like Perl.
--
Kip Warner | Senior Software Engineer
OpenPGP signed/encrypted mail preferred
https://www.cartesiantheatre.com
Mathieu Lirzin
2018-03-08 00:05:55 UTC
Permalink
Hello Matthias,
Post by Matthias Paulmier
I'm a french CS student at the University of Bordeaux. I'm currently following a
masters degree course specialized in network communications and administration.
I've been interested in free software for a couple of years now and have been
willing to help a project for some time, but never found one I could help with a
significant contribution before that.
I have decided to candidate for the project "Parse Makefile.am using an Abstract
Syntax Tree".
Your proposal is very welcome. Google Summer of Code is a good
opportunity to start contributing to Free Software.
Post by Matthias Paulmier
The reason I'm choosing this subject over the other one is that I already have
good knowledge about ASTs. I have worked on a small programming language as an
<https://services.emi.u-bordeaux.fr/projet/viewvc/compilfinal/> but it is very
poorly written). It is a very basic interpreter for a trimmed Pascal programming
language written in C with Flex and Bison. On this project I've worked on the
syntax and semantic analysis as well the lexer (which is not a big deal with
Flex).
I've already met with Mathieu Lirzin to talk about the project so I have a
general idea of what is expected of this GSoC. From my understanding, both
proposed subjects' goal is to go towards Automake's eventual modularization. The
benefits of generating this AST from a Makefile.am file would be to separate the
different code generation phases, improve the test suite by testing each phase
separately and probably others that it can't think about now.
My knowledge in Perl may be my weak point for this project as I only know a bit
of the syntax. But I am familiar with other programming languages, principally C
and Python.
The background you have of this compilation course would be helpful for
this project. IMO The lack of knowledge of Perl is not a big deal,
however it means you will have to acquire a basic knowledge of Perl
during the "Community Bonding" period.
Post by Matthias Paulmier
If you have any suggestions on documents I can read or software I can check to
prepare for this project I'll be glad to check them. I know texinfo is written
in Perl and generates an AST so I'll check that.
Yes looking at Texinfo will be interesting for that.

I think you should start thinking on a roadmap with the milestones and
deadlines for your formal application. The deliverables that are
expected for this project are on one hand a Perl library capable of
parsing 'Makefile.am' files, of injecting rudimentary predefined
compilation rules based on the semantic analysis, and of dumping the
resulting 'Makefile.in'. A example script using that library should be
developped to easily be able to check the progress of the parsing and
code generation work.

Thanks.
--
Mathieu Lirzin
GPG: F2A3 8D7E EB2B 6640 5761 070D 0ADE E100 9460 4D37
Matthias Paulmier
2018-03-08 20:25:17 UTC
Permalink
Post by Mathieu Lirzin
Hello Matthias,
Post by Matthias Paulmier
I'm a french CS student at the University of Bordeaux. I'm currently following a
masters degree course specialized in network communications and administration.
I've been interested in free software for a couple of years now and have been
willing to help a project for some time, but never found one I could help with a
significant contribution before that.
I have decided to candidate for the project "Parse Makefile.am using an Abstract
Syntax Tree".
Your proposal is very welcome. Google Summer of Code is a good
opportunity to start contributing to Free Software.
Post by Matthias Paulmier
The reason I'm choosing this subject over the other one is that I already have
good knowledge about ASTs. I have worked on a small programming language as an
<https://services.emi.u-bordeaux.fr/projet/viewvc/compilfinal/> but it is very
poorly written). It is a very basic interpreter for a trimmed Pascal programming
language written in C with Flex and Bison. On this project I've worked on the
syntax and semantic analysis as well the lexer (which is not a big deal with
Flex).
I've already met with Mathieu Lirzin to talk about the project so I have a
general idea of what is expected of this GSoC. From my understanding, both
proposed subjects' goal is to go towards Automake's eventual modularization. The
benefits of generating this AST from a Makefile.am file would be to separate the
different code generation phases, improve the test suite by testing each phase
separately and probably others that it can't think about now.
My knowledge in Perl may be my weak point for this project as I only know a bit
of the syntax. But I am familiar with other programming languages, principally C
and Python.
The background you have of this compilation course would be helpful for
this project. IMO The lack of knowledge of Perl is not a big deal,
however it means you will have to acquire a basic knowledge of Perl
during the "Community Bonding" period.
That's what I was planning on doing. Should this be added to the roadmap ? Or is
it only about the coding part ?
Post by Mathieu Lirzin
Post by Matthias Paulmier
If you have any suggestions on documents I can read or software I can check to
prepare for this project I'll be glad to check them. I know texinfo is written
in Perl and generates an AST so I'll check that.
Yes looking at Texinfo will be interesting for that.
I think you should start thinking on a roadmap with the milestones and
deadlines for your formal application. The deliverables that are
expected for this project are on one hand a Perl library capable of
parsing 'Makefile.am' files, of injecting rudimentary predefined
compilation rules based on the semantic analysis, and of dumping the
resulting 'Makefile.in'. A example script using that library should be
developped to easily be able to check the progress of the parsing and
code generation work.
I'll write a first draft of my application this weekend and keep you updated on
it here. I have a good idea of what it will look like. I will follow the
guidelines for SoC[1]. I don't know which file format is required by google
since the applications are not open on their website yet so I'll write it as a
pdf file and put it on my personnal web page so I can show it here.

[1] <https://www.gnu.org/software/soc-projects/guidelines.html> and

Thanks for reading.

--
Matthias Paulmier
Matthias Paulmier
2018-03-11 12:51:48 UTC
Permalink
I put up the first draft for my proposal here :
<matthias.paulmier.emi.u-bordeaux.fr/proposal.pdf>.

The communication part still needs to be discussed. As for the plan tell me what
you think.

Thanks
--
Matthias Paulmier
Mathieu Lirzin
2018-03-14 22:46:48 UTC
Permalink
Hello Matthias,
Post by Matthias Paulmier
<matthias.paulmier.emi.u-bordeaux.fr/proposal.pdf>.
I think this is a good first draft.

A few comments:

- For the “Project” and “Plan” parts, feel free to expand on the
functional and non-functional requirements, based on previous
discussion (for example the fact that the AST will probably have a
coarse grain) and from your personal intuition. It doesn't matter if
your intuition doesn't match with what I have in mind, it will just be
a good occasion to discuss the project in more details.

- Regarding the example script deliverable, I think you can precise that
a set of examples that can be manually tested will be provided. If it
helps you and fit your workflow (for example if you want to do TDD or
similar), you can add some automated unit tests however this is not a
requirement.

- For the documentation, this doesn't have a high priority. I will be
happy with just some basic docstrings specifying the functional
contract of subroutines.
Post by Matthias Paulmier
The communication part still needs to be discussed. As for the plan tell me what
you think.
Regarding the communication. For the weekly status update and
discussion, if that's OK with you I would rather have a VOIP one on one
conversation (via Ring or Jitsi) when possible and use email as a
fallback or complement. Regarding the instantaneous communication IRC
is convenient for me.

Thanks.
--
Mathieu Lirzin
GPG: F2A3 8D7E EB2B 6640 5761 070D 0ADE E100 9460 4D37
Matthias Paulmier
2018-03-15 12:15:33 UTC
Permalink
Post by Mathieu Lirzin
Hello Matthias,
Post by Matthias Paulmier
<matthias.paulmier.emi.u-bordeaux.fr/proposal.pdf>.
I think this is a good first draft.
- For the “Project” and “Plan” parts, feel free to expand on the
functional and non-functional requirements, based on previous
discussion (for example the fact that the AST will probably have a
coarse grain) and from your personal intuition. It doesn't matter if
your intuition doesn't match with what I have in mind, it will just be
a good occasion to discuss the project in more details.
I've added a "Requirements" subsection to the "Project" part. It may be
lacking some things I didn't think of or forgot about.
Post by Mathieu Lirzin
- Regarding the example script deliverable, I think you can precise that
a set of examples that can be manually tested will be provided. If it
helps you and fit your workflow (for example if you want to do TDD or
similar), you can add some automated unit tests however this is not a
requirement.
That has been added. As for TDD, I'm not an expert on that but some
tests may be provided during the summer as part of the test suite under
the t/ directory.
Post by Mathieu Lirzin
- For the documentation, this doesn't have a high priority. I will be
happy with just some basic docstrings specifying the functional
contract of subroutines.
Ok. I let documentation as a general term with that in mind.
Post by Mathieu Lirzin
Post by Matthias Paulmier
The communication part still needs to be discussed. As for the plan tell me what
you think.
Regarding the communication. For the weekly status update and
discussion, if that's OK with you I would rather have a VOIP one on one
conversation (via Ring or Jitsi) when possible and use email as a
fallback or complement. Regarding the instantaneous communication IRC
is convenient for me.
Sounds good to me. I put Jitsi in the proposal since it doesn't need
registration and I don't have a Ring ID ATM. But I don't have a real
preference.

My draft is online on the GSoC website since it was open on Monday. I
don't know if you have access to that.

Thanks for the feedback.
--
Matthias Paulmier
Mathieu Lirzin
2018-03-16 19:44:43 UTC
Permalink
Post by Matthias Paulmier
Post by Mathieu Lirzin
Post by Matthias Paulmier
<matthias.paulmier.emi.u-bordeaux.fr/proposal.pdf>.
- Regarding the example script deliverable, I think you can precise that
a set of examples that can be manually tested will be provided. If it
helps you and fit your workflow (for example if you want to do TDD or
similar), you can add some automated unit tests however this is not a
requirement.
That has been added. As for TDD, I'm not an expert on that but some
tests may be provided during the summer as part of the test suite under
the t/ directory.
Yes indeed the ‘t/’ directory contains validation tests, and ‘t/pm’
directory contains unit tests.

If you wan't to write unit tests I encourage you to look at the classic
‘Test::More’ framework instead of following what is done in ‘t/pm’ which
is a bit too barebone in my opinion.

For validation tests they should be easy automatable with basic shell
scripts calling the example script and checking the output.
Post by Matthias Paulmier
Post by Mathieu Lirzin
Post by Matthias Paulmier
The communication part still needs to be discussed. As for the plan tell me what
you think.
Regarding the communication. For the weekly status update and
discussion, if that's OK with you I would rather have a VOIP one on one
conversation (via Ring or Jitsi) when possible and use email as a
fallback or complement. Regarding the instantaneous communication IRC
is convenient for me.
Sounds good to me. I put Jitsi in the proposal since it doesn't need
registration and I don't have a Ring ID ATM. But I don't have a real
preference.
Great.
Post by Matthias Paulmier
My draft is online on the GSoC website since it was open on Monday. I
don't know if you have access to that.
Yes I have access to it. I will send my future comments via this
website.

Thanks.
--
Mathieu Lirzin
GPG: F2A3 8D7E EB2B 6640 5761 070D 0ADE E100 9460 4D37
Mathieu Lirzin
2018-03-16 20:09:36 UTC
Permalink
Post by Mathieu Lirzin
Post by Matthias Paulmier
My draft is online on the GSoC website since it was open on Monday. I
don't know if you have access to that.
Yes I have access to it. I will send my future comments via this
website.
I was under the wrong impression that using this interface was the way
to communicate with applicants, but this seems not to be the case so I
will continue to send my comments to this thread.

Few additional comments:

- I agree with your suggestion of providing numbers regarding the
performance impact of using an AST.

- In the "qualification" part, you can include all the details you
provided regarding your school project.

Feel free to continue expanding on the technical details and request
feedback until the deadline.

Thanks.
--
Mathieu Lirzin
GPG: F2A3 8D7E EB2B 6640 5761 070D 0ADE E100 9460 4D37
Loading...