Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

PodHead Census

by Intrepid (Deacon)
on Dec 12, 2003 at 11:02 UTC ( #314271=perlmeditation: print w/ replies, xml ) Need Help??

I recently wrote a new module. When time came to bite the bullet and provide the carefully-crafted code with some documentation, I started asking myself questions about how to do it. I'd written some of the module before running the module package creation tool h2xs so I didn't have the boilerplate entries already in place. What was properly supposed to be in a module's POD? I started looking at some semi-random modules installed to my system, and discovered that there appeared to be a lot of variation in what the authors chose to use as head[ing]1 labels, these being one of the basic elements that give POD documentation their structure. My curiosity was piqued further.


A simple way to address this question is to just run h2xs to make a "fake" module setup. Doing so (with Perl-5.8.1 as the base release) yields this list of head1 labels in the generated module file "skeleton":
 
    =head1 NAME
    =head1 SYNOPSIS
    =head1 DESCRIPTION
    =head1 SEE ALSO
    =head1 AUTHOR
    =head1 COPYRIGHT AND LICENSE

If these six labels are present in some large percentage of module documentations, we might infer that many module authors are using the h2xs tool and leaving the POD pretty much as is -- "filling in the blanks".


I decided that I wanted to find out more. Reading page after page of POD is way too much work, being the lazy programmer that I am. So I decided to hack together a little Perl to let me in on the bare stats without laborious manual labor. As textual analysis tools go this is no great shakes, in fact it could be seriously flawed. But anyway what it revealed was interesting ...


Update:

Minor update to this node was done on 11 Jan 2004

#!/usr/bin/env perl =head1 NAME PodHeadCensus - Survey the POD headings in perl POD-bearing files. =cut # This program Last modified: 12 Dec 2003 03:45:52 # Canonical code location: http://intrepid.perlmonk.org/scriptscode/Po +dHeadCensus # $CVSHeader$ BEGIN { use strict; $/ = "\n"; $| = 1; use vars qw/$v $vv/; if ($ARGV[0] eq q/-v/) { (shift @ARGV) and $v=1 } elsif ($ARGV[0] eq q/-V/) { (shift @ARGV) and $v=1 and $vv=1 } $^W = 1; } use File::Find; use Tie::IxHash; use Pod::Usage; use subs qw! lead_section close_section draw_border !; my $LL = $ENV{'COLUMNS'} || 80; my ($TCs, $NoPOD,$HadPOD,$_75p,$_25p,$_05) = (1, 0,0,0,0,0); my (%Aseen,%c_POD); my $ixh = tie(%Aseen, q/Tie::IxHash/); my $term = [ &termcodes_set ]; # Our esc seqs are going to be in her +e? my ($re, $bd, $us, $ue) = @$term; unless ( 4 == grep { $_ } @$term ) { $TCs= 0; $re = "\e[0m"; # ANSI $bd = "\e[0;1m"; # ANSI $us = ''; # dunno $ue = ''; # dunno } my $argN = $ARGV[$#ARGV]; if (! $argN) { pod2usage(1) } elsif (grep /^$argN$/i, qw% -h -u %, qw% --man --help --usage %) { pod2usage(-verbose => 2) } elsif (-d $argN) { $argN = pop @ARGV } else { $argN = undef } if ($argN) { my $only_allow = sub { my $str = shift(@_); my @pu = $str=~/[[:punct:]]/g; if ($str =~ m/[[:cntrl:]]/) { return 0 } elsif(! grep{ m/[\Q{}&*:!`|><\E]/ }@pu ) { # `filter the names, as a small precautionary measure. return 1; } return 0; }; File::Find::find( sub { my $fful_n = $File::Find::name; my $fdir_n = $File::Find::dir ; return if /^\.{1,2}$/; # reject . and .. push @ARGV, $fful_n if ( $fdir_n !~ m%/pod$% and # Not from Perl dist "pod/" dir, &$only_allow($_) and # no funny business, /\.p(?:m|od)$/ and # POD-bearing type extensions, -f $_ # we will only open real files, thank-you. ); } , $argN ); } # Now, having "manually" or "quasi-meta-auto-magically" populated # @ARGV, we scan through the files looking for '=head1' lines. for my $podfile (@ARGV) { lead_section $podfile if $vv; open PPod, $podfile or die "Failed open() on \"$podfile\", maybe no rights?:\n $!"; while(<PPod>) { if ( m%^=head(?:1|2)\s+([A-Z][-\s_A-Z]+[A-Z])\s*$% ) { print "${us}$_${ue}" if $vv; $Aseen{"$1"}++ if $1; exists $c_POD{"$podfile"} || $c_POD{"$podfile"}++; } } close PPod; close_section if $vv; } for my $n (keys %c_POD) {$HadPOD++ if $c_POD{"$n"};} $NoPOD = @ARGV - $HadPOD; $_75p = sprintf("%u", .75 * $HadPOD); $_25p = sprintf("%u", .25 * $HadPOD); $_05p = sprintf("%u", .05 * $HadPOD); print join ("\n", map { sprintf "%-65s seen ${bd}%3u${re} times", $_,$Aseen{$_ +} } keys(%Aseen) ) if $v ; print draw_border ,"Number of files examined (${bd}".$NoPOD."${re} had no POD): " . ${bd} . @ARGV . ${re} ,draw_border; print draw_border ,"These were the headings seen in at least 75% of the cases:" ,draw_border; print map { " ${bd}$_${re}\n" if $Aseen{"$_"} >= $_75p } keys(%Asee +n); print draw_border ,"These were the headings seen in at least 25% of the cases:" ,draw_border; print map { " ${bd}$_${re}\n" if $Aseen{"$_"} >= $_25p } keys(%Asee +n); print draw_border ,"These were the headings seen in at least 5% of the cases:" ,draw_border; print map { " ${bd}$_${re}\n" if $Aseen{"$_"} >= $_05p } keys(%Asee +n); exit 0; sub termcodes_set { eval { require Term::Cap; require POSIX; } or do { warn "Cannot import POSIX or Term::Cap\n" unless $^O =~/M?S?Win/; return undef; }; my $termios = POSIX::Termios->new(); if (not $termios) { die "Badly!\n$!" } $termios->getattr; # This next defaults to $ENV{'TERM'}, if not set, we'll croak! my $ts = Tgetent Term::Cap { TERM => undef, OSPEED => $termios->getospeed }; my $sure = eval { $ts->Trequire( qw/me md us ue/ ) } ; if ($@) { return undef } else { return map { $ts->Tputs($_,1) } qw/me md us ue/; } # From curses termcap/terminfo (5) manpage: #------------------------------------------------------------------ # md = extra bold mode ue = exit underline mode # mh = dim mode us = enter underline mode # *** me = turn off all attributes *** # so = enter standout mode se = exit standout mode # sp = set curr color pair to #1 # op = set curr color pair back to original pair # Sf = set foregrnd color #1 Sb = set backgnd color #1 # AF = set ANSI fg color AB = set ANSI bg color #------------------------------------------------------------------ } sub lead_section { printf("%-${LL}s", ' ---"'.shift(@_).'"---'); } sub close_section { print "\n", ('=' x ($LL-1)), "\n"; } sub draw_border { return "\n" if $TCs; sprintf "\n%s\n", '-' x ($LL-1); } __END__ --- 8< --- SNIP! --- 8< --- cut here --- 8< --- The POD for this script, with its line lengths longer than advisable to post as "code" on Perlmonks (IMHO), is available at http://intrepid.perlmonk.org/scriptspod/PodHeadCensus.pod The .pod file can be inserted one blank line after the __END__ marker which terminates the code above. It is recommended to run this program with its POD in place. --- 8< --- SNIP! --- 8< --- cut here --- 8< ---
Update: until further notice the links just below are offline. Sorry!

Links provided for reader convenience:

  • This script's documentation as POD: PodHeadCensus.pod
  • This script as syntax-hilighted stand-alone code, no POD: PodHeadCensus.html
  • This script's "production copy" with inline POD and cryptographically signed: PodHeadCensus
  • links last checked: 11 Jan 2004

The Results

Perhaps unsurprisingly, it appears from the very preliminary data which I've seen so far that Perl authors are a highly individualistic bunch, who take guidelines and recommendations as only that. A quite high percentage of the scanned documentation lacked several of the "labels" output by the h2xs utility. On the other hand, many, many unique instances of labels were detected. The trial was run on the several Perl releases that I have available and these are some representative results:

$ PodHeadCensus /opt/perl/lib/5.8.1/

Number of files examined (25 had no POD): 355

These were the headings seen in at least 75% of the cases:
    NAME
    SYNOPSIS
    DESCRIPTION

These were the headings seen in at least 25% of the cases:
    NAME
    SYNOPSIS
    DESCRIPTION
    SEE ALSO
    AUTHOR

These were the headings seen in at least 5% of the cases:
    NAME
    SYNOPSIS
    DESCRIPTION
    SEE ALSO
    EXAMPLES
    CAVEATS
    DIAGNOSTICS
    AUTHOR
    NOTES
    AUTHORS
    METHODS
    BUGS
    COPYRIGHT
    NOTE

$ PodHeadCensus /usr/share/perl

(Perl-5.6.1 on Debian)
Number of files examined (33 had no POD): 214

These were the headings seen in at least 75% of the cases:
    NAME
    SYNOPSIS
    DESCRIPTION

These were the headings seen in at least 25% of the cases:
    NAME
    SYNOPSIS
    DESCRIPTION
    SEE ALSO
    AUTHOR

These were the headings seen in at least 5% of the cases:
    NAME
    SYNOPSIS
    DESCRIPTION
    SEE ALSO
    DIAGNOSTICS
    EXAMPLES
    AUTHORS
    BUGS
    AUTHOR
    COPYRIGHT
    NOTE
    METHODS

There have undoubtedly been interesting discussions about creating POD in the past here on perlmonks nodes; I will likely be downvoted for not referencing them ;-). OTOH I have made what I hope is an original (if small) contribution by providing a tool to use in "POD-head census-taking" on various Perl installations.

    Soren/somian/Intrepid

-- 
2003: The 3 least meaningful terms in Online jargon:
  troll   flame   rant
These used to mean something -- but then they were highjacked by
inferior intellects who, when faced with a more erudite opponent
employing superior arguments, abuse them as merely another form of
name-calling. ;-)

Comment on PodHead Census
Download Code
Re: PodHead Census
by Abigail-II (Bishop) on Dec 12, 2003 at 11:48 UTC
    For some more statistics, I counted the headers in my /usr/share/man directory. Here are the results:
    # % Header 1: 81.96 NAME 2: 79.75 DESCRIPTION 3: 74.54 SYNOPSIS 4: 60.26 SEE ALSO 5: 25.86 STANDARD OPTIONS 6: 24.25 KEYWORDS 7: 22.32 AUTHOR 8: 19.04 ARGUMENTS 9: 15.88 OPTIONS 10: 15.74 BUGS 11: 11.62 CONFORMING TO 12: 11.60 FILES 13: 11.48 RETURN VALUE 14: 9.58 ERRORS 15: 8.25 NOTES 16: 5.66 COPYRIGHT 17: 4.80 ACKNOWLEDGEMENTS 18: 4.61 AUTHORS 19: 4.06 EXAMPLES 20: 3.61 ENVIRONMENT 21: 3.26 REPORTING BUGS 22: 2.23 DIAGNOSTICS 23: 2.14 EXAMPLE 24: 2.09 HISTORY 25: 2.02 INTRODUCTION 26: 1.85 VERSION 27: 1.21 CONFIGURATION 28: 1.19 AVAILABILITY 29: 1.16 WIDGET-SPECIFIC OPTIONS 30: 1.14 NOTE Total files: 4207.
Re: PodHead Census
by Anonymous Monk on Dec 12, 2003 at 11:48 UTC
    NAME SYNOPSIS and DESCRIPTION are pretty much standard, as is some form of AUTHOR/COPYRIGHT/LICENSE. There are really no abosllutes, unless of course you're specifically submitting scripts for the CPAN or using Pod::Usage or something.
Re: PodHead Census
by rnahi (Curate) on Dec 12, 2003 at 12:23 UTC

    A quick and dirty counter. Change your starting directory accordingly.

    $ find /usr/local/lib/perl5/5.8.1/ -name "*.pm" -or -name "*.pod" | \ xargs perl -lne 'print if s/^=head1 //' | \ perl -lne '$ndx{$_}++;END{for(sort{$ndx{$b}<=>$ndx{$a}} keys %ndx) {exit if$count++>30;print"$_ => $ndx{$_}"}}'

    resulting in

    NAME => 453
    DESCRIPTION => 436
    SYNOPSIS => 333
    SEE ALSO => 211
    AUTHOR => 200
    BUGS => 74
    COPYRIGHT => 51
    AUTHORS => 46
    EXAMPLES => 42
    HISTORY => 29
    NOTES => 29
    METHODS => 26
    CAVEATS => 23
    NOTE => 19
    DIAGNOSTICS => 18
    AUTHOR AND COPYRIGHT => 14
    ABSTRACT => 14
    EXAMPLE => 13
    ENVIRONMENT => 13
    COPYRIGHT AND LICENSE => 12
    CONSTRUCTOR => 12
    LICENSE => 12
    FUNCTIONS => 11
    OPTIONS => 9
    Utility Changes => 9
    WARNING => 8
    Modules and Pragmata => 8
    Incompatible Changes => 8
    Reporting Bugs => 8
    EXPORTS => 8
    COPYRIGHT AND DISCLAIMERS => 7
    
•Re: PodHead Census
by merlyn (Sage) on Dec 12, 2003 at 15:31 UTC
    You can't just make the rubrics up. One primary target of POD is the manpages, and the manpages have specific heads for specific purposes. So things like "name" and "synopsis" are dictated by longstanding convention created two decades before Perl, and it would be unwise to ignore such a relationship. See "man 7 man" for details.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      See "man 7 man" for details.

      Thanks. (Sadly, Mac OS X seems to omit this manpage, but I found it online.)

      The sections headings they refer to as "traditional" are NAME, SYNOPSIS, DESCRIPTION, OPTIONS, FILES, SEE ALSO, DIAGNOSTICS, BUGS, and AUTHOR. (The only required one is NAME, which must follow the "name - brief description" format.)

      A quick comparison shows that most of those headings are among the most common in module POD documentation.

Re: PodHead Census
by theorbtwo (Prior) on Dec 12, 2003 at 19:31 UTC

    Vaugely related question... I just uploaded my first module, Sort::Merge, to CPAN. Now, look at the list you get from that link... and note that the other modules have short descriptions, while mine does not? If you click through to the distribution of mine, it has the short description there... anybody know why, and what I can do about it?

    Update: It seems to be working now, though I havn't changed anything. Thanks, all, and whoever or whatever fixed it.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      I just uploaded my first module, Sort::Merge, to CPAN. Now, look at the list you get from that link... and note that the other modules have short descriptions, while mine does not?

      Hmm -- clicking on that link, I see the description "general merge sort" right below it. Am I missing something, or do you see something different?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://314271]
Approved by ysth
Front-paged by ysth
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2014-07-13 18:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (251 votes), past polls