Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
laziness, impatience, and hubris
 
PerlMonks  

A Better Guten Split

by hacker (Priest)
on Dec 01, 2009 at 00:11 UTC ( #810289=perlquestion: print w/ replies, xml ) Need Help??
hacker has asked for the wisdom of the Perl Monks concerning the following question:

I'll keep it simple: I'm trying to write a quick and dirty parser of some Project Gutenberg etexts, and ran into a puzzle.

Each of the etexts is stored in a split directory structure that models the name of the etext. For example, the HTML version of etext 12345 exists in /1/2/3/4/12345/12345-h/12345-h.htm.

Here's what I have to split that out, when given just the 12345 as the argument to my parser:

my $etext = $ARGV[0]; my $site = 'http://pod/Gutenberg'; my $splitguten = join('/', split(/ */, $etext)); my $clipguten = substr($splitguten, -2, 2, ''); my $link = "$site/$splitguten/$etext/$etext-h/$etext-h.htm";

I'm trying to find a cleaner way to do this. Any ideas or suggestions?

Comment on A Better Guten Split
Download Code
Re: A Better Guten Split
by ikegami (Pope) on Dec 01, 2009 at 00:35 UTC

    I guess we can start by getting rid of the useless variable and shorten the silly pattern.

    my $split = join '/', split //, $id; substr($split, -2, 2, ''); my $url = "$base_url/$split/$id/$id-h/$id-h.htm";

    Two lines to calculate and one line to assemble. I don't think length is really a problem here. We're dealing with readability issues if we try to shorten it any more. These are just too complicated:

    substr( ( my $split = join '/', split //, $id ), -2, 2, ''); my $url = "$base_url/$split/$id/$id-h/$id-h.htm";
    ( my $url = join '/', split //, $id ) =~ s{(.*)/}{$base_url/$1/$id/$id-h/$id-h.htm}s;

    I'm partial to this *longer* version:

    my $url = join('/', $base_url, $id =~ /(.)(?=.)/sg, $id, "$id-h", "$id-h.htm" );

    The flow is very simple, so it's easy to understand.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://810289]
Approved by broomduster
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2014-04-20 12:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls