Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: 'bld' project - signature(SHA1) based replacement for 'make'

by afoken (Chancellor)
on Aug 26, 2014 at 19:46 UTC ( [id://1098661]=note: print w/replies, xml ) Need Help??


in reply to 'bld' project - signature(SHA1) based replacement for 'make'

So ... - yet another wheel re-invented? Sure, re-invent make, like the people who invented ant, rake, dmake, nmake, and a lot of other tools. To learn how to plan a project, to learn how to write code, for fun, or just because you can.

(GNU) make works just fine for me for almost all of my build problems, from "hello world" up to complete embedded systems, it is properly documented, and has very few surprises. And because make "just" calls other utilities, I can use the entire Unix toolkit to get the job done.

Yes, the syntax is ugly, but not uglier or harder than shell scripts.

So, why should I spend my time learning a different tool?

Your tool calculates hashes instead of relying on timestamps. OK, but why? Comparing two currently unknown timestamps costs two system calls (stat()) and an integer compare. Comparing two currently unknown hashes costs at least three system calls (open(), read(), close()) plus a lot of bit shuffling <update>per file</update> plus comparing a block of memory typically larger than any native integer type. This is much more work, not less. As you store hashes in a separate file, you also need some extra open(), read(), close() and open(), write(), close() system calls to manage that file. Make keeps the timestamps in memory, the operating system updates the timestamps in the filesystem without needing a separate file.

Rebuild rules

Rebuilds will happen only if: 1. a source file is new or has changed 2. the corresponding object file is missing or has changed 3. the command that is used to compile the source has changed 4. a dependent header file has changed 5. the command to link the executable or build the library arc +hive has changed 6. the executable or library has changed or is missing

Make does 1, 2, and 6 by default, 3 if the target also depends on the Makefile, 4 if dependencies are calculated before or during the make process (typically by using gcc -M in one make rule and including the output of that command into the Makefile), 5 if the target depends on the linker, archiver, compiler or whatever tool is used to build the target.

I expect a build tool to take care of 1, 2, 4, and 6. 3 is nice to have. 5 is nice to have for custom tools (build and) used in the project, but I don't see a use to track the linker or archiver executable, as they rarely change, at least on my systems. And when they change, I recompile the entire project from a clean state (make distclean; make).

FEATURES AND ADVANTAGES 1. Everything is done with SHA1 signatures. No dates are used an +ywhere. 2. bld is REALLY simple to use. There are no arguments, no optio +ns(except -h), no environment variables and no rc files. The entire +bld is controlled from the Bld(and Bld.gv file) file. Only a minimal + knowledge of perl is needed - variable definitions and simple regula +r expressions. 3. bld is not hierarchical. A single Bld file controls the const +ruction of a single target(a target is an executable or library(stati +c or shared)). Complex multi-target projects use one Bld.gv(global v +alues) file and many Bld files - one to a target. 4. Each source file will have three signatures associated with it + - one for the source file, one for the corresponding object file and + one for the cmds use to rebuild the source. A change in any of thes +e will result in a rebuild. 5. If any files in the bld have the same signature this is warned + about e.g. two header or source files of the same or different names +. 6. Optionally, the signatures of dynamic libraries may be tracked +. If a library signature changes the bld may warn or stop the rebuil +d. If dynamic libraries are added or deleted from the bld this can i +gnore/warn/fatal. 7. A change in the target signature will result in a rebuild. 8. Complex multi-target projects are built with a standard direct +ory setup and a standard set of scripts - Directories:

1 - I don't get it. What's the advantage of calculating hashes of files and doing a lot of I/O and calculations over comparing timestamps?

2 - make is at least as simple to use. A single Makefile, and even that is optional for simple projects, thanks to the default rules. No perl knowledge needed:

/tmp>mkdir empty /tmp>cd empty /tmp/empty>echo 'int main() { write(1,"Hello\n",6); }' > demo.c /tmp/empty>make demo cc demo.c -o demo /tmp/empty>./demo Hello /tmp/empty>ls demo demo.c /tmp/empty>

3 - Being unable to handle a hierachical set of files is an advantage? Really?

4 - Neat idea. You can get the same result by adding dependencies to a target in a Makefile.

5 - Never heard of hash collisions? They are rare, but not impossible! Why are identical files a problem at all?

6 - Add dependencies to a target in the Makefile for this.

7 - Why? Who modifies the generated target?

Oh, wait: Some projects don't strip the generated executables. So I strip them manually. Is this a valid reason to recompile an unstripped executable? I don't think so.

8 - I really need that many files just to compile a few files?

Your Notes:

1. bld assumes that a source will build a derived file e.g. .o fi +les in the same directory and have the same root name as the source. 2. bld assumes that all targets in multi-target bld's will be uni +quely named - all targets go into the same project directory. 3. Some projects violate either or both of these target naming or + object file naming/location requirements, but reconstructing these p +rojects with bld should be relatively easy e.g. systemd. 4. bld executes cmd fields({}) in the bld directory and then move +s all created files to the source directory.

1 - make does the same, most times. ant seems to know more about javac (see below).

2 - This simply does not work with larger projects. As far as I know, subdirectories were re-invented for the last time in 1983, with MS-DOS 2.0.

3 - So I need to reorganize my project to use a single flat directory (or fake it by having unique names even across subdirectories)? Are you kidding me?

4 - Are you aware that javac (the Java compiler) has the nasty habbit of generating several output files from a single source file? Perl Modules need subdirectories to work. javadoc and several other tools generate a bunch of files including subdirectories full of files, and this structure must not be changed.

Some other problems I see:

  • Keeping a hash (or timestamp) of just the compiler, linker, and archiver executables may be insufficient, as those executables often also link in libraries and call other executables that may change without a single bit change in the main executable or its meta data.
  • system() function called with a single string, at least in your examples, so your tool will likely start messing with the shell. This is at least as ugly as in make and begs for trouble. Why do I need to quote variables for perl and then again for the shell? One set of quoting rules should be sufficient!
  • No default rules? Do I really have to tell your tool how to compile an object file from a source file, and how to link object files to executables, over and over again? Make has default rules for everything and the kitchen sink, see output of env - make -f /dev/null -p.
  • I need Perl before I can use your tool. Make works without Perl.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^2: 'bld' project - signature(SHA1) based replacement for 'make'
by rahogaboom (Novice) on Aug 27, 2014 at 13:39 UTC

    This is an extensive post. I'll reply in detail, however, I want to formulate a considered reply. Maybe within a week or so. Thanks.

      I'll reply in detail, however, I want to formulate a considered reply.

      Sounds good. Meanwhile, let me "think loud" when and why make can go wrong and how to fix that.

      Make (in general, all variants) uses timestamps provided by the operating system and the file system to decide if a rule has to run or not. Rules run only once for each target during a single run of make, so make keeps an internal state, too. This state is obviously initially build from the timestamps.

      Timestamps in the future may be caused by system clock manipulation. This happens, for example, when you damage your Slackware system, boot from the install CD/DVD, and chroot to your real system to rebuild the kernel using make. The Slackware installer manipulates the system clock (but not the hardware clock) to work around some problems with time zones, so you virtually travel in time. The same problem may happen when root changes the system time back manually while make runs. GNU make detects both and warns ("File '%s' has modification time %d s in the future"). It could, perhaps should, be more paranoid and abort instead, because it is likely that your build is incomplete.

      Clock skew may happen when you use network filesystems (especially NFS) without tightly synchronising system clocks of client and server. The server sets the timestamps, using its system clock, but make on the client compares the timestamps using the client's system clock. That clock may have a very different idea of the current time, it may even jitter around the server's clock, so the freshly generated target may have a timestamp earlier than its source files. Again, it may also happen when root messes with the system clock. GNU make detects this and warns ("Clock skew detected"). Again, it could and perhaps should be more paranoid and abort, because again it is likely that your build is incomplete.

      These two problems are the most common problems with make using timestamps, but there are other ways to create wrong timestamps.

      FAT filesystems allow only a two-second-resolution of filestamps (you get only even values for seconds. So, your target may have the same timestamp as its source. But this should be no problem, you can get essentially the same problem because stat returns timestamps with only second resolution. Make only rebuilds when the target is older, i.e. timestamp of target is less than timestamp of source. But FAT stores local time, not universal time, so when you change the timezone, the FAT timestamps move back or forward in time.

      No problem: The ntp deamon manipulates the system clock so it agrees with the reference clocks, perhaps even while you run make. If ntpd was implemented stupidly, the system clock would wildly jump around, especially if the system clock was off by several seconds or even minutes or hours. But ntpd is smart, it slows down or accelerates the system clock just a little bit at a time, so it smoothly aproaches the reference clocks' time. Generally, systems allow ntpd a single big adjustment of the system time during system boot to compensate cheap battery buffered real-time clocks that tend to run too slow or too fast.

      Imagine a system without a battery-buffered realtime clock, like the old home computers or embedded systems. You boot the system, the system clock starts at some arbitary point in time (often with timer count = 0 or build date), and starts counting up, completely independant from any reference clock. No problem until you reboot. "Groundhog Day". Instant "timestamp in the future" problems. If the system has network access, start ntpd (or ntpdate) during boot. If the system is not networked (or just has no access to a reference clock), just make sure the system remembers the last count of its system clock across a reboot. This may be implemented as simply as touching a file once a second (or once a minute) as long as the system runs, and adjusting the system clock to the timestamp of that file during boot. Or, equally, by storing the timestamp in some kind of persistant storage (EEPROM, Flash, battery buffered RAM, ...) every minute or second, or at least in the shutdown script, and reading that value back during boot.

      In summary, make sure that the system clock is synchronised with the reference clocks, and keeps counting upwards with no jumps. This will not only help make, but all other programs that rely on timestamps. Most times, the easiest solution is to start ntpd during boot, allowing a single big adjustment during startup of ntpd.

      If you run on an isolated network, point to one arbitary machine and declare it holding the reference clock for that network. Serve time via NTP from that machine. Don't mess with its clock or timezone at all. If you have the resources, add a clock signal receiver to that machine (GPS or local radio like WWV, DCF77).

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Motivation:
        Why did I start this project? Over many years I have used make to build many projects. My experiences with setting up
        a makefile system and my observations of already constructed makefile systems caused my realization that a great deal more effort was
        being used in setting up and maintaining these build systems then was necessary. I felt that with the use of one of the modern scripting
        languages - Perl, Python, Ruby - and a simplified design and incorporated automatic header file dependency checking that the whole
        process could be improved. I knew and liked Perl; so I used it. Perl is pretty much everywhere. You don't even need it to be installed
        in the system directories; perlbrew can be used to install to a local user directory. The major design goal was simplicity; complexity
        is the enemy. The existence of many other variations of make indicates to me that others as well were unsatisfied with many aspects of
        make and wanted to provide solutions. These all seemed, however, like band-aids on top of band-aids, never getting at the core of the problem.
        The current state of the project can handle C/C++/Objective C/Objective C++/ASM. I haven't tried to reproduce everything that make does
        in every instance. Simplicity is the goal; the ability to easily build Bld files for a single target or several Bld files for complex
        multi-target projects. The Bld files have a very simple syntax; a comment section - an EVAL section that requires 6 perl variables to be
        defined - a DIRS section with 'R dir:regex:{cmds}' or '{cmds}' specifications. I have succeeded in fully rebuilding the git, svn and
        systemd projects with this design without modifying in any way the directory structure of these projects. At present I have not attempted
        to incorporate Java or any other languages.

        Signatures:
        Signatures are an inherent property, unlike dates, of the file. They provide a simple way of dissociating the criteria for
        rebuilding a file from any other outside interference's - any command line programs that might modify these dates, other parties that are
        involved with the build might modify arbitrarily the dates of files but not the content of the file, clock changes or synchronizations
        that might cause a rebuild without file changes. All of these go away with signatures; a signature change means a file change. The
        suggestion that signatures may collide is a non-starter. Modern signature algorithms are designed to be random with even the smallest
        files changes. With a signature length of 160 bits a collision is unlikely in the extreme.

        The perl standard module Digest::SHA provides sha1_hex() which is fast enough to fully rebuild complex projects like git, svn and
        systemd in reasonable times - execution times are dominated by the recompilation of source, not by signature calculations. The
        problem with make is people time, engineer time. The supposed 'fast' use of dates is overwhelmed by the complexity of any but the
        simplest of Makefiles.

        Signatures are portable; dates are not.

        One of my goals in the use of signatures was security. At each step and for all file types a signature is calculated. Any attacker
        that managed to insinuate a modified source, source build cmd, object, executable or library with an unmodified date would fail. A
        recompilation would result. Protecting the integrity of the build signature file(Bld.sig - the source, objects, executables, build
        command lines, libraries) would be equivalent to protecting the integrity of the build. If you modified two files and three were
        recompiled then you know something in that extra file changed or the object changed or the rebuild command changed. If a project
        rebuild warned/fataled from a sudden unexpected library change then you would have the opportunity to investigate.

        A reply to the numbered items:

        1. Why use signatures is explained above.

        2. The assertion that make is simple to use is astonishing. Look at the GNU tools to automatically generate Makefiles. Why do this if
        Makefiles are so simple. I refer you to the thousands of online articles dealing with the obscurities/version/portability/performance/bugs
        issues related to make. I also refer you to:

        http://www.scons.org/wiki/FromMakeToScons(Adrian Neagu)
        ----a detailed critique of make and some alternatives
        ftp://ftp.gnu.org/old-gnu/Manuals/autoconf/html_mono/autoconf.html#SEC3
        ----a brief critique of make and how GNU automake from the GNU Build System contributes
        http://www.scons.org/architecture/
        ----a description of the scons architecture and in particular the reasons for the use of signatures instead of dates
        http://aegis.sourceforge.net/auug97.pdf
        ----an article "Recursive Make Considered Harmful" by Peter Miller from the Australian UNIX Users Group
        http://www.conifersystems.com/whitepapers/gnu-make/
        ----an in depth critique of make

        I've seen many a Makefile that is so complex as to be unintelligible and that when modified has broken the build and requires a detailed
        reading of the doc for the application of some obscure rule that almost no one knows. The purpose of the GNU tools is to abstract away
        this complexity and yet still have make underneath. Look at the size of the GNU make doc. Randomly point a some section. Ask any
        experienced software engineer about the details of that section; I don't think you'll get far. 'bld' is designed to be simple. The
        learning curve is minimal - a. execute the "Hello, world!" program and read the Bld file(many stubs do nothing routines to illustrate the
        construction of Bld files) b. understand the EVAL and DIRS section requirements - EVAL has defined perl variables(6 are required) and
        DIRS has the 'R dir:regex:{cmds}' specifications c. read bld.README for useful start-up stuff and an intro to bld'ing complex multi-target
        projects(git, svn and systemd) d. do perldoc bld for the full man page. That's it.

        To quote Adrain Neagu(see link above):

        "Difficult debugging

        The inference process of Make may be elegant but its trace and debug features are dating back to the Stone Age. Most clones improved on that.
        Nevertheless, when the Make tool decides to rebuild something contrary to the user expectation, most users will find the time needed to
        understand that behavior not worth the effort (unless it yields an immediate fatal error when running the result of the build, of course).
        From my experience, I noticed that Make-file authors tend to forget the following:

        How rules are preferred when more than one rule can build the same target.
        How to inhibit and when to inhibit the default built in the rules.
        How the scope rules for macro work in general and in their own build set up.

        While not completely impossible, Make-based builds are tedious to track and debug. By way of consequence, Make-file authors will continue to
        spend too much time fixing their mistakes or, under high time pressure, they will just ignore all behavior that they don't understand."

        3. Yes, being non-recursive is an advantage. See the following article "Recursive Make Considered Harmful" by Peter Miller at
        millerp@canb.auug.org.au(http://aegis.sourceforge.net/auug97.pdf) - Australian UNIX Users Group(AUUG).

        bld handles directories with the Bld file 'R dir:regex:{cmds}' specification. Use any number of these specifications. The R indicates
        to apply the 'regex:{cmds}' recursively to sub-directories.

        4. I designed bld to take signatures of source, objects and build cmds all the time. You assert that make can also do this by adding
        dependencies to a target in a Makefile. With bld the programmer does not need(and remember) to do any extra steps.

        5. See above about signatures. With a 160 bit signature collisions are unlikely in the extreme. Also, duplicate files are not necessarily a
        problem. bld warns about them. This give you more information. If two files are the same with the same name then maybe you want a link
        to one from the other. If two files are different names then maybe you might want to change the name of one of them. In most cases I've
        run, this type of warning shows a file with several links to it. More information is better.

        6. Again, by just setting the $opt_lib Bld file variable to "warnlibcheck" you get full library file signature checking. bld does not
        require a dependency entry for each library. And the same as before for source files applies using meta-data dates versus inherent file
        properties like signatures. Lots of stuff can mess with meta-data.

        7. Security is one of the bld goals. I want to take and store(in Bld.sig) the signature of anything that may be tampered with. The
        executable or library might be modified with a date unmodified. Creating a changed target file with the same signature as the unmodified
        target is difficult in the extreme. You mention stripped executables causing a rebuild. Just copy the executable elsewhere and then strip;
        not difficult.

        8. No, you don't need all those files just to compile 'a few files'. If you want to build a single target you only need the Bld file. All
        the other files and directory structure is for building a complex multi-target project that may involve a few to hundreds to thousands of
        files. There are a few provided perl programs to aid you in doing so.

        The Notes comments:

        1. I have no comments on this one. It's not a critique of anything.

        2. The bld restriction that multi-target projects have targets that have unique names and the deposition of all targets into a single directory
        is no real difficulty at all. I designed project builds this way in order to put the targets, the target build adjunct files - bld.info,
        bld.warn and bld.fatal - and the build scripts together so as to see at a glance the status of a build. It works; see the bld versions
        of git, svn and systems. Systemd actually required relocating object files to the directory of the source and in a few cases renaming
        targets to unique names. When the install script is executed multiple targets many then be renamed to the same name in different locations
        if necessary. The fatal files are all listed together and a glance will immediately determine if any target and which targets failed to
        build. Likewise the warn and info files are listed together. I found this flat storage of bld results to show the project status simply
        and immediately with out a lot of cd'ing. I wouldn't normally name different project targets with the same name; this seems counter intuitive.

        3. I think you are confused about the difference between the project source directories and the project results directory. For all three
        projects that I re-built, git, svn and systemd, the downloaded source directory structure remains entirely the same. It's only the results
        files of the target builds that go into a single directory - the targets, the info, warn and fatal files, the project construction
        scripts, the README file, the list of project targets file and specialized scripts required by any of target builds. There are naming
        conventions for everything. There is no need to restructure the source code directory in any way. Please examine the provided source
        for the git, svn and systemd projects rebuilt with bld. They remain entirely unchanged. This is required since running ./configure is
        necessary to generate source based on the system configuration and this source can be deposited anywhere in the source tree.

        4. First, I have only used/tested bld with C/C++/Objective C/Objective C++. I have never tried Java and Perl is not compiled anyway.
        Second, if any bld step generates multiple output files these are moved to the source directory from which came the source file matched for
        building. Nothing is lost. Additionally, the execution of '{commands}' in the Bld file DIRS section can then move these generated files to
        wherever needed. Java might be a future project action.

        Some other problems:

        a. The ability of bld to save the signatures of all source, objects, build cmds, targets and dynamically linked libraries is all that is
        necessary to manage software construction of an active ongoing project. Make most certainly does not do this without substantial difficulty.
        The web is littered with thousands of articles and 'make' how to's on avoiding and fixing make's many problems.

        b. The use of the system() perl call is in the bld code. There are two common ways to execute external cmds in perl; `` - backtics and
        systemd(). I chose systemd(); are you suggesting some other way? I have to execute for each Bld file {} specification the enclosed cmds.
        You don't have to do anything except write a Bld file to build projects. If your objection is to some aspect of the bld code that's one thing
        - I'd be OK to listen - but the comment has nothing to do with using bld.

        c. There are other tools that do not use built in rules - see PBS on cpan.org. The bld Bld file DIRS section 'R dir:regex:{cmds}' construct
        defines where a source is to be found, which source to use and how to manipulate that source to generate the desired output files. The example
        git, svn and systemd projects illustrate how complex projects are built with relatively simple Bld(and Bld.gv) files. Writing:

        R bld.example/example/C/y : ^.*\.c$ : {ls;$CC -c $INCLUDE $s;ls;} # Note: the {} may hold any number of ';' separated cmds

        doesn't seem to me an excessive burden to compile all *.c files recursively from the bld.example/example/C/y directory.

        d. The whole purpose of using Perl was to take advantage of the power the one of the modern scripting languages. Perl is everywhere. Also,
        all of the warnings and fatals in bld give a line number in bld. The user can examine the context in the bld code. Although the source for
        make is available, I have never heard of anyone(user) actually delving into the code for any reason.

        Lastly:
        1. Make and it's difficulties: The entire history of make is one of addons, bandaids, hacked versions and attempts to graft onto an inadequate
        design some feature or other to 'fix' some difficulty. The auto-generation of make files is a good example of attempts to circumvent the entire
        issue by moving the engineers effort upstream - toward an entirely new format specification file - while preserving the downstream Makefile.
        Anyone can read the provided critiques of make and I am sure can find additional criticisms of make online or read the thousands of articles
        and how to's on dealing make's many problems. Signatures are clearly the way to go. They are an inherent property of the file, cheap, portable
        and easy to use and compare. There are no clock synchronization issues. The entire history of make is one complexity and the efforts to deal
        with it.

        2. Try it!: I suspect that, in fact, you have not actually tried bld. Download bld, install the experimental.pm module, cd to the bld directory and
        execute ./bld. That's it! You should now be able to examine the output from the "Hello, World!" program. Look at the bld.info, bld.warn and
        bld.fatal(should be 0) files. This will give you an idea of the output from bld'ing any target - executable or library. The "Hello, World!"
        program has several stub routines that do nothing; they are there to show how a Bld file is constructed. The "Hello, World!" Bld file is well
        commented. Then cd to Bld.example/example. Execute './bld.example --all'. This will build 13 example targets, the source for which is in the
        bld.example/example directory. Download a release file for git, svn or systemd. Install in the bld directory. Do the same stuff e.g. cd
        to Bld.git/git-1.9.rc0 and execute './bld.git --all'. Examine all the built targets and their associated bld files.

        3. Security: The only way to maintain security for the entire bld process is to use signatures for every source, intermediate file, target and
        library. Comparison of saved Bld.sig file signatures against the build source tree will show anything that has been modified. If something
        unexpected was changed the question is why. A TODO item on my list is to write code to do Bld.sig file comparisons with the source tree and to
        write code to do this comparison for multi-target builds.

        4. The Linux kernel: I'd like to re-bld the Linux kernel. I used git, svn and systemd first because these were complex projects with many targets.
        I needed to have a standard directory structure, standard naming conventions and write bld adjunct scripts to manage these complex projects.
        The kernel is a single target, but a really complex one. I wrote extensions.pl(in the aux dir) to list all the various file extensions underneath
        a directory. When run in the kernel main directory you can get an idea of the distribution of file types in the kernel. I haven't done much else
        on kernel work, however. When done, the Bld.sig file for the kernel could then protect the integrity of it's construction; a useful addition.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1098661]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-19 23:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found