Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Module Submission for CPAN

by tedv (Pilgrim)
on Dec 13, 2000 at 01:26 UTC ( #46329=perlquestion: print w/ replies, xml ) Need Help??
tedv has asked for the wisdom of the Perl Monks concerning the following question:

I was looking for an Anagram module on CPAN: Something that could read in a dictionary and use that to create simple anagrams of words with an anagram($word) function. I couldn't find anything, so I wrote my own module. I wanted to submit it to CPAN, but I've never done that before. My questions are two-fold:

#1: What should I do before submitting the module? I'd like to set up something that works with MakeMaker (someone suggested h2xs). What about general documentation? Clearly there is no end to the amount of support you could give a module, but what's a reasonable lower bound for supporting this hack?

#2: I've attached the module code and a short script to run it below. What is stylistically expected from a CPAN module? How could this code get cleaned up? Are there any algorithmic changes that would make the code run faster (or support more than 1 destination word?) (If you're curious, the code parses a dictionary into a standard tree format and then when anagraming the words, walks through the tree on a prefix-by-prefix basis. A few of my variable names suck. Let me know if you have better names than $sub_table and $sub_prefix.)

anagram:
#!/usr/bin/perl -w use strict; use Anagram; while (<>) { chomp; my $word = $_; s/\W//g; my $base = lc; my @words = grep { $base ne $_ } Anagram::anagram($base); next unless @words; print "\nAnagrams of $_:\n"; print " $_\n" foreach @words; }

Anagram.pm:
use strict; use Carp; package Anagram; @Anagram::EXPORT = qw(anagram is_word starts_word); BEGIN { $Anagram::version = "1.0"; $Anagram::dict = {}; open DICT, "</usr/dict/words" or croak("Could not open dictionary: + $!"); foreach (<DICT>) { s/\W//g; my @letters = split //, lc; # Insert word into sorted dictionary tree my $sub_table = $Anagram::dict; foreach (@letters) { # Create sub-table if it's missing and then recurse $sub_table->{$_} = {} if ref($sub_table->{$_}) ne 'HASH'; $sub_table = $sub_table->{$_}; } # Flag this entry as a valid word $sub_table->{word} = 1; } close DICT; } sub version { my $version = shift; warn "Version $version is later than $Anagram::version." if defined($version) && ($version > $Anagram::version); return $Anagram::version; } sub anagram { my $word = shift; # Create table of letters. This is better than a pure table # because we avoid extra work for duplicated letters-- we can # anagram "aaaaa" in one cycle instead of 5! = 120 cycles. my $letters = {}; foreach (split //, $word) { $letters->{$_}++; } return permute_recurse($letters, ""); } sub permute_recurse { my ($letters, $prefix) = @_; my @words = (); foreach (keys %$letters) { # Test if any words start with this new prefix my $sub_prefix = $prefix.$_; my $sub_table = starts_word($sub_prefix); next unless $sub_table; # Use this letter if ($letters->{$_} <= 1) { delete $letters->{$_}; } else { $letters->{$_}--; } # Test for recurse case if (scalar keys %$letters) { push @words, permute_recurse($letters, $sub_prefix); } # Test for base case elsif ($sub_table->{word}) { push @words, $sub_prefix; } # Restore letter $letters->{$_}++; } return @words; } sub is_word { my $sub_table = walk_dict(@_) or return undef; return $sub_table->{word}; } # Returns a sub-table entry on true and undef on failutre sub starts_word { my ($word) = @_; my $sub_table = $Anagram::dict; foreach (split //, $word) { $sub_table = $sub_table->{$_}; return undef unless ref($sub_table) eq 'HASH'; } return $sub_table; } 1;

-Ted

Comment on Module Submission for CPAN
Select or Download Code
Re: Module Submission for CPAN
by dchetlin (Friar) on Dec 13, 2000 at 01:57 UTC
    This is good code and will fill a hole on the CPAN. I was looking for something similar several months ago and came up with nothing.

    I have a couple of suggestions that I hope will be helpful.

    • Firstly, I believe there is an alternative data structure that will really speed up your lookups. Here is my suggested algorithm for building it:

      s/\W//g, push @{$dict{join "", sort split //, lc}}, $_ for <DICT>;

      In other words, sort each word and store it in its canonical form in the lookup table, so all anagrams are stored together. Then, when you want to check for anagrams of a word, you just go straight to its canonical key in the hash.

      This might make your starts_word slower/less clear, but I'm not sure. Of course, starts_word won't be necessary for anagram lookups anymore, but it would be a useful standalone function.

    • You'll probably want to do something more robust than assume /usr/dict/words. There are MakeMaker settings to play around with, but I imagine it will be a pain. Somehow, though you'll have to at least make a way for the user to specify where their dictionary is.

    • You probably shouldn't use a global for the lookup table.

    • You don't have to put that initialization code in a BEGIN block.

    • $VERSION is almost always in all caps.

    • Am I missing something, or does walk_dict() not seem to exist?

    -dlc

Re: Module Submission for CPAN
by chipmunk (Parson) on Dec 13, 2000 at 02:03 UTC
    Here's what I like to see in a module on CPAN:
    • An easy installation procedure, preferably the standard `perl Makefile.pl; make; make test; make install`. make test should test the features of the module.
    • Documentation, including a README file and standard POD in the module itself.
    • Configurability, as appropriate. Your module, for example, depends on /usr/dict/words. The person using your module may need to or want to specify an alternate file to use. (*)
    One comment specifically about your module... I notice that you set @Anagram::EXPORT, but you don't have an import() sub. You might want to change @EXPORT to @EXPORT_OK and inherit Exporter's import() method.

    (*) On some systems, /usr/dict/words includes Unix terms and includes only root forms of regular words, which is not good for word puzzles.

      To be honest, I don't understand a lot of how module exporting and importing works-- I just know use and require. Is there a brief tutorial on this exporting stuff, or is it explainable in a paragraph or so? I've read through the Camel Book's section on exporting, but I just got a syntatical feel, not a feel for the purpose behind it or how it's best used.

      -Ted
        Read the Exporter manpage... here's a simple example:
        package Ctweten::Export; use Exporter; our @ISA = qw(Exporter); # we now inherit from Exporter, namely the im +port() method our @EXPORT_OK = qw(this that); # you must ask for these: use Ctweten: +:Export 'this'; our @EXPORT = qw(other); # exporter by default sub this { print "Hello\n" } *that = *other = *this;
        In a program, if you want to use this() and other(), you do the following: <code> #!/usr/local/bin/perl -w use strict; $|++; use Ctweten::Export qw(this); other; this; that; # this would die under these circumstances.
        --
        Casey
           I am a superhero.
        
        Here's a quick, off-the-cuff explanation of exporting/importing:

        First, exporting and importing are the same thing; like emigrating/immigrating, it just depends which side you're looking from. :)

        Exporting/importing is simply the process of making a link from a symbol (or part of a symbol) in one package to a symbol in another package, so that the symbol can be accessed from either package. For example, if you import CPAN's install method, then CPAN exports it (by using Exporter) with this assignment: (*{"${callpkg}::$sym"} = \&{"${pkg}::$sym"}. That actually makes an assignment to the symbol table for the calling package. Assuming $callpkg is main, now main::install and CPAN::install refer to the same subroutine. The \& on the RHS of the assignment means that only the subroutine part of the CPAN::install symbol was duplicated; the scalar, array, hash, and other parts were not. Exporter does separate assignments for each data type, so that only what was asked for is imported.

        The reason for exporting/importing is simply to make accessing symbols in other packages easier, by bringing them into your package. Someone using the functional interface to CGI, for example, would have to write out CGI::header(), CGI::start_html(), CGI::param(), etc., for every function, if they couldn't import the methods.

Re: Module Submission for CPAN
by cwest (Friar) on Dec 13, 2000 at 02:09 UTC
    Use h2xs ( h2xs -XAn Anagram -v 1.00 ) and put your module into the distro it creates. Then:

    • read perlmod
    • read perlmodlib
    • document your module!
    • write test cases for make test!
    • read docs at the CPAN on how to submit modules.
    • submit and follow the suggestions of andreas and modules@perl.org friends.
    all of dchetlins ideas sound good.

    --
    Casey
       I am a superhero.
    

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://46329]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2014-12-26 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (168 votes), past polls