Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Deriving meaning from source code, some code is self documenting

by submersible_toaster (Chaplain)
on Dec 08, 2008 at 12:37 UTC ( #728924=perlmeditation: print w/replies, xml ) Need Help??

Can useful documentation be derived from perl source, by source analysis alone? This would mean parsing perl, which is said to be hard.

As it turns out , PPI solves that messy perl parsing business - leaving only the challenge of deriving meaning from patterns of elements in a PPI document. Isn't that what perlcritic does ?


Don't get me wrong. I like pod , my pod is forever improving - however when confronted with the horror of one particular brand of home grown, half baked XML class doc dangling beyond __END__ markers or lurking inside =pod/=cut , I cannot help but ponder documentation, self documentation and (save me) auto documentation.

Some months ago , in the throes of hacking on a large OO-ish perl codebase , thwarted by lack of pod and an overabundance of outdated XML fluff - I returned to the source only to find it a descent through layers and layers of subclasses of classes inheriting frameworks of further classes and subclasses.

First consider that My::Class::Foo's method 'load' is failing and the problem appears to be with $path. Seeing that variable is returned by the object method 'canonpath' , we can go look for that method.

package My::Class::Foo; use base 'My::Class'; sub load { my ($self,$file) = @_; my $path = $self->canonpath( $file ); # ... some special load } 1;

Seems that canonpath must be inherited since it is not defined in My::Class::Foo , which leads on a search discovering ...

package My::Class; use base 'More::Generic'; sub save { my ($self,$file) = @_; my $path = $self->canonpath( $file ); # ... some special save } 1;

Nuts! , no joy with My::Class either , the documentation is no help - pod missing presumed dead - XML fluff only tells me what I can plainly see which is My::Class::Foo inherits from My::Class which inherits from More::Generic , with heavy heart let us open More::Generic in anticipation of an end to this pursuit.

package More::Generic; use base qw( File::Spec ); sub twiddle_thumbs { # etc # } 1;

Closer. Since the familiar File::Spec is the parent class here, we can finally resort to the mystical practice of documentation perusal, perldoc File::Spec

This is a rather contrived example (and my apologies to File::Spec) . My frustration at being faced with this combination of deep inheritance and missing documentation lead to a short brutal script that attempted to find and chase instances of 'use base' and outputting short pod sections like

=pod =head INHERITS =head2 L<My::Class> =head3 L<More::Generic> L<File::Spec> =cut


After some experimentation I believe that when expressed as PPI elements , the use base qw(Classname); idiom can be matched and used to derive meaningful documentation about perl source. In fact this idiom is a special case of importing behaviour - like use POSIX qw( strftime );. To PPI , these are include statement elements with trailing data, albeit often very different trailing data with specific meaning with respect to the package being imported.

In processing perl source, allow a plugin to capture the include of a chosen module and any import arguments. Add new plugins for more meaningful output

package DerivePod::Includes::base; use base qw( DerivePod::Plugin ); sub process { my ($plugin,$statement,$ppi_doc) = @_; my $parent_class = $statement->find_first('PPI::Token::QuoteLike::Wo +rds'); $plugin->output( qq|=head INHERITS L<$parent_class> |; }


I'm compelled to try writing a parser which turns perl source into a document describing the nature of that source code, dependencies, ancestors, imported and exported symbols. This document's interface would be used by a formatter / processor to produce or embelish existing documentation. So far I have a terrible document model, a lame parser and a missing original script.Their namespace is Macropod , as in Big and Foot.

First meditation in ages. be kind

I can't believe it's not psellchecked

Replies are listed 'Best First'.
Re: Deriving meaning from source code, some code is self documenting
by DStaal (Chaplain) on Dec 08, 2008 at 14:11 UTC

    You may want to add the other common idioms for making a subclass: our @ISA qw(My::Class); and use parent qw(My::Class);. All of these are listed in the Exporter docs, as recommended.

    Oh, and while you're at it... :) The other 'automatic' bit of POD I'd often like to see is a list of dependencies: Anything that is use'd in the code.

    (By my giving you more things to add into your project you can tell I think it is a good one. ;) )

      excellent thankyou! While I'm there , why not markup functions with prototypes , note subs resembling methods, snarf attributes. It's rich ground for 'features' :)

      I can't believe it's not psellchecked

        Ah, but at least in most of those cases the programmer would still have to write something to describe the data you are pulling out: The bare data itself isn't all that useful. (What does it mean?)

        Whereas the inheritance and dependencies are useful even (or especially) when simply listed. So at least there's a reason for drawing a line there. ;)

Re: Deriving meaning from source code, some code is self documenting
by eric256 (Parson) on Dec 08, 2008 at 15:44 UTC

    With the other ideas you have you probably want to look at making plugins to do the parsing. Then adding plugins for documenting different things can be added (e.g. prototypes, dependencies, inheritance, even collecting existing pod). Could make for a very usefull tool.

    Eric Hodges
Re: Deriving meaning from source code, some code is self documenting
by jethro (Monsignor) on Dec 10, 2008 at 17:06 UTC

    Good idea to have a tool to extract information out of source files like the inheritance tree. It would be a useful analyzing tool for bad code

    BUT don't expect to generate *documentation*

    Documentation should provide meta-information about the code, not information that is already there. In general a list of methods by itself is not documentation, neither is an inheritance tree. And the type of information you need for a sensible POD is not extractable through automatic tools (except maybe as a skeleton for the editor)

      There is a certain level of 'boilerplate' information (let's not call it documentation) that other more strictly parsable languages tend to provide as part of code documentation. The approach described is without a doubt driven by analyze bad code , for certain values of bad.

      In general a list of methods by itself is not documentation, neither is an inheritance tree

      In general - would you expect an author to document , in POD linked references to parent classes / dependencies in a helpful =head1 INHERITS or =head1 DEPENDS ON section? This is the sort of boilerplate that is a) manual and tedious , b) comes for free with many other languages, c) prone to outdated-ness(see a)

      Generate documentation - no. Enhance existing documentation - yes. Soften the blow in the absence of ANY documentation - maybe. Provide hints to author/maintainer about the nature of a piece of code versus the nature of the associated documentation - possibly.

      The zero day script simply tacked it's conclusions (as pod) onto whatever pod already existed before passing that to a final formatter, in this case HTML. I wouldn't suggest baking it's output back into the original code anymore than I would suggest perlcritic users insert comments in their code for every critic message.

      Perhaps consider Macropod to be a step towards a Pod::Critic which in addition to nagging about common pod formatting mistakes, would suggest that you only appear to have pod sections for functions 'foo' and 'bar' , but are exporting '&zebra' which has no associated documentation.

      I can't believe it's not psellchecked
Explaining source code
by szabgab (Priest) on Dec 11, 2008 at 22:21 UTC
    I am not sure how close this is to what I wanted, but I'd like to a feature to Padre so when and expression is selected in a Perl document the user can bring up a pop-up explaining what that expression means.

    I am mostly aiming at beginners or people with little experience who need to maintain funky code.

    So if you plans have some overlap it would be great to see this added to Padre.

      Possibly, if the idioms you're looking to explain can be expressed by pulling apart sequences of PPI tokens, the scope would be just as broad.The Schwartzian Transform for example could be matched. Do you have some examples for be me think about?

      I keep meaning to take a deeper look at Padre, which appears to be moving a at great pace. Were Padre to have a POD panel alongside editor,fs tree, etc - macropod could be used there as already described

      I can't believe it's not psellchecked
        for a starter I'd like to be able to recognize and explain thing such as $_, $_[0], $a, but of course expanding bigger expressions would be also cool.

        What is $x ||= 42; ?

        Think about beginners, or people with little experience who are thrown in to maintain a nasty code base.

        Think about people for whom Perl is only a 3rd or 4th level tool that they hardly know, hardly use but want to fix some code.

        I am sure we'll work out the details on the Padre mailing list or IRC channel.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://728924]
Approved by davidrw
Front-paged by Arunbear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2019-06-18 23:39 GMT
Find Nodes?
    Voting Booth?
    Is there a future for codeless software?

    Results (83 votes). Check out past polls.

    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!