[GSoC] Reafactoring Automake: Report

Matthias Paulmier

2018-06-01 14:43:37 UTC

__________________________________________________

GSOC: MODULARIZE AUTOMAKE AND IMPROVE ITS TEST
SUITE

Matthias Paulmier
__________________________________________________

Introduction
============

In this project, my first task is to analyze how Automake is built and
how it can be improved. The obvious first point will be to look into
the actual "main" script of the project
(`automake/bin/automake[.in]'). It is this big 8000+ lines of code
file which clearly needs to be trimmed down. Their are multiple ways
to approach this problem. The one I chose, which looked the most
obvious to me, is to try and move out as much subroutines as possible
in separate modules. The idea is that this file should only (or
mostly) be a caller for its submodules' methods.

This document attempts to describe how the current hierarchy will be
modified during the next couple of weeks. This is not a binding
document as I will probably change my mind when moving some code
around or as I realize I made bad assumptions as to how the program is
structured. This is a first approach for me to have an easier time
understanding where to start.

Move out the variable/constant declarations
===========================================

The first thing I did, as explained in the introduction, was to move
the variable and constant declarations, which were at the top of the
file, to another module. I created a new module called
`Automake::Global' which held these variables. I didn't immediately
have a clear idea of what I would do with it but it seemed important
to have them declared in a separate file to be able to relocate them
when the modularization process would start.

Coupling and cohesion
=====================

Refactoring some code consists in reapplying good practices which have
been lost during the multiple years of active development of the
project. Coupling and cohesion are two important concepts in terms of
code robustness and maintainability.

Coupling measures the degree of dependence between modules inside a
program. The goal is to have as low coupling as possible (meaning as
low dependence's as possible) inside a program. This ensures that the
code will be easily maintainable and/or reusable.

Cohesion is a more abstract concept but is also an important one. It
measures to which degree the methods inside a module belong together.
We want as high cohesion as possible inside a module.

These two concepts are very important and will be taken into
consideration all along this summer when coming up with new modules or
moving code to existing ones.

Find a tool to check cross references
=====================================

The next step was to find the dependencies between subroutines and
global variables (now declared in a separate file), or other
subroutines, to be able to group them together or make a hierarchy
between them.

The first thing I had to do was to find a tool that could scan these
dependencies and print a report for me to be able to get the
information. I didn't know at the time what such a tool would be
called and had a hard time browsing CPAN for something I couldn't even
name. I then contacted people on the #perl IRC channel to have their
input on that. They told me I was looking for a cross reference tool
and pointed me to one in particular, `B::Xref'. This program takes a
Perl file as an input and outputs a report showing the cross
references between and inside modules.

I used this tool to produce a table (which you will find attached as
table.html) (it is a very large table so I'm putting it in a separate
file). This table is a useful tool to be able to quickly check cross
references for a particular subroutine but is not very efficient to
browse. So I then produced a Directed Acyclic Graph (DAG) to be able
to have a better visualization of the different dependencies. The
hard part was to make the graph as easy to read as possible. I first
wanted to visually isolate functions and variables by module as
subgraphs (clusters in `graphviz' terms) but quickly realized that it
created a big mess of arrows crossing each other. It wasn't visible
at all. So I went with a simple color coding and let the `sfdp'
command be in charge of arranging the DAG in the most efficient way
possible.

You will find the full graph attached as `full_graph.svg' (the svg
format is convenient for navigating with text searches).

Legend for the DAG:
- functions from `automake/bin/automake' -> circled blue nodes
- global variables -> circled red nodes
- each package has its own color and each call to a method of the
package is represented by an arrow of the same color

It may look a bit scary right now but it is really useful and the
color coding helps following the dependencies. Also, with some clever
"grepping" from the source file we can get the dependency graph for a
certain subroutine or remove the global variables. Attached as
`no_vars.svg', you can find the same DAG without the variables for
example.

To get it from the source with the following command `grep -v
'^.*var_.*' graph.dot | grep -v '^$' | sfdp -Tpng -Goverlap=scale -o
outfile.png'

As we can see in these graphs, there is a lot of coupling between the
different modules. We want to lower this as much as possible. The
main file has very low cohesion too which is less of a problem for us
since it will be corrected as we advance and move subroutines out of
it.

Information we can get from the table/DAGs
==========================================

From the table and the DAGs we had some crucial information directly
coming out. First, some subroutines and variables didn't seem to be
used at all. Second, some subroutines use a lot of variables. These
variables are manipulated by other subroutines which don't seem to
have a lot in common. This will need to be investigated further later
to see what we can do with that. And then we have subroutines that
don't use any external variables (only lexical ones declared in their
code) or external subroutines, such as `verbose_var' for example.
These are usually some utility functions which we may be able to group
together in a module.

Unused subroutines in `automake/bin/automake'
=============================================

For each subroutine that appeared unused in the DAG, I double checked
in the different files to be sure they were not used elsewhere or
omitted by `B::Xref' for some reason (it can omit some variables
sometimes). Turns out some of them were false positives as they were
in fact used in the file.

Here are the unused subroutines in `automake/bin/automake':

Subroutine name Utility
---------------------------------------------------------------
`lang_vala_rewrite' replaces the `.vala' extension by `.c'
`lang_java_rewrite' returns the constant `LANG_SUBDIR'
`lang_header_rewrite' returns the constant `LANG_IGNORE'

We can see that two of them simply return the value of a constant
which doesn't seem very practical. These subroutines will be removed
from the program.

Unused global variables
=======================

I used the same process for unused variables and constants. This time
their were more false positives as `B::Xref' doesn't seem to be able
to recognize constant usage and some code structures such as the
ternary operation or multiple assignments with the same `=' operator.
The variables and constants that are not used can also be moved out of
the program to make some clear space.

Here are the unused variables from `automake/lib/Automake/Global.pm':

Variable name Value
-------------------------------------------------------------------
`libtool_sometimes' `qw(ltmain.sh config.guess config.sub'
`TARGET_PATTERN'[1] `'[$a-zA-Z0-9_.@%][-.a-zA-Z0-9_(){}/$+@%]*''
`PATH_PATTERN'[1] `'(\w)' | `[+/.-])+''

The problem of constants
========================

Since the tool I used would not detect constant usage properly (they
have sometimes been misinterpreted as subroutine calls or not taken
into consideration at all), this work had to be done mostly by hand.
Fortunately, their wasn't as many constants as their was variables so
it wasn't so much of a big task.

Isolated subroutines
====================

Looking at the DAG, we can see some subroutines (circled in blue)
which are isolated from the rest. We should look into these and see
if something obvious comes to mind when it comes to moving them out.
For all the subroutines mentioned below we will move the associated
global variables from `Automake::Global' with them when possible. The
idea is to group nodes together inside clusters starting from the
leaves of our DAG. This way we can find coherent modules while
keeping coupling low as we work our way through it via the edges.

New modules
===========

We have now gathered some information to create new modules from the
parts above.

First, for all the general purpose utility functions, we have two
choices. We could move these functions to a new module called
`Automake:Utils' or move them to the already existing
`Automake::General' module which would apparently (when looking at its
name) fill this purpose. When digging into this module, we can see
that it redefines some core Perl functionalities (such as the END
function). Knowing that, I think it would be better to create a new
module and place the subroutines accordingly. The `version', `usage'
and `parse_arguments' subroutine should be moved to this module as
well as the subroutines cited above.

Among the list of subroutines in the main file, we can see a number of
subroutines with the same naming scheme (for example `handle_*' or
`require_*'). I need to see if they all serve a related purpose and
then group them accordingly into new or existing modules.

Existing modules
================

When it comes to existing modules, I will try to move some subroutines
to them when it is possible. The goal of the project is not to add as
much modules as possible but rather make the structure of the program
as coherent as possible while not changing the behavior for the users.
The important thing is to understand which module does what. Here is
a table with a small explanation of the modules' utility:

- `Automake::ChannelDefs': Defines channels and helper functions;
- `Automake::Channel': Functions for error and warning management;
- `Automake::Condition': Record conjunction of conditionals;
- `Automake::Config': Global variables to be used in the program (like
the release number or year of lease);
- `Automake::Configure_ac': Functions for locating `configure.ac' or
`configure.in';
- `Automake::DisjConditions': Like `Automake::Condition' but with
disjunction instead of conjunction;
- `Automake::FileUtils': File handling;
- `Automake::General': Redefine Perl core functionalities;
- `Automake::Getopt': Get command line options;
- `Automake::Global': Global variable and constant definitions;
- `Automake::ItemDef': Base class for `Automake::VarDef' and
`Automake::RuleDef';
- `Automake::Item': Base class for `Automake::Var' and
`Automake::Rule';
- `Automake::Language': Class for defining languages;
- `Automake::Location': Track location with a stack of contexts;
- `Automake::Options': Keep track of `Automake' options;
- `Automake::RuleDef': Rule definitions;
- `Automake::Rule': Define `Automake' rules;
- `Automake::VarDef': Variable definitions;
- `Automake::Variable': Define `Automake' variables;
- `Automake::Version': Functions for comparing program versions based
on patterns;
- `Automake::Wrap': Functions for formatting paragraphs;
- `Automake::XFile': Error handling in file handles;

Conclusion
==========

As we've seen in this report, their is a big task ahead of me in terms
of module reorganization. For the sake of quality code and easier
maintainability, I will add unit tests for the newly created modules
as well as the ones that are not tested. To see if the new
architecture is well integrated with the already existing one, I will
use the existing test suite and the integration/validation tests it
provides. At the end of the summer, Automake should feel the same for
users while being more easily maintainable and extendable.

Footnotes
_________

[1] The variable is used internally in the global file but is unused
elsewhere so we will remove it from its default exports.

--
Matthias Paulmier