A Better Guten Split

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I'll keep it simple: I'm trying to write a quick and dirty parser of some Project Gutenberg etexts, and ran into a puzzle.

Each of the etexts is stored in a split directory structure that models the name of the etext. For example, the HTML version of etext 12345 exists in /1/2/3/4/12345/12345-h/12345-h.htm.

Here's what I have to split that out, when given just the 12345 as the argument to my parser:

my $etext       = $ARGV[0];
my $site        = 'http://pod/Gutenberg';
my $splitguten  = join('/', split(/ */, $etext));
my $clipguten   = substr($splitguten, -2, 2, '');
my $link        = "$site/$splitguten/$etext/$etext-h/$etext-h.htm";
[download]

I'm trying to find a cleaner way to do this. Any ideas or suggestions?

Comment on A Better Guten Split Download Code

Replies are listed 'Best First'.
Re: A Better Guten Split by ikegami (Patriarch) on Dec 01, 2009 at 00:35 UTC
I guess we can start by getting rid of the useless variable and shorten the silly pattern. `my $split = join '/', split //, $id; substr($split, -2, 2, ''); my $url = "$base_url/$split/$id/$id-h/$id-h.htm";` [download] Two lines to calculate and one line to assemble. I don't think length is really a problem here. We're dealing with readability issues if we try to shorten it any more. These are just too complicated: `substr( ( my $split = join '/', split //, $id ), -2, 2, ''); my $url = "$base_url/$split/$id/$id-h/$id-h.htm";` [download] `( my $url = join '/', split //, $id ) =~ s{(.)/}{$base_url/$1/$id/$id-h/$id-h.htm}s;` [download] I'm partial to this longer* version: `my $url = join('/', $base_url, $id =~ /(.)(?=.)/sg, $id, "$id-h", "$id-h.htm" );` [download] The flow is very simple, so it's easy to understand.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: A Better Guten Split
by ikegami (Patriarch) on Dec 01, 2009 at 00:35 UTC

I guess we can start by getting rid of the useless variable and shorten the silly pattern.

my $split = join '/', split //, $id;
substr($split, -2, 2, '');
my $url = "$base_url/$split/$id/$id-h/$id-h.htm";
[download]

Two lines to calculate and one line to assemble. I don't think length is really a problem here. We're dealing with readability issues if we try to shorten it any more. These are just too complicated:

substr( ( my $split = join '/', split //, $id ), -2, 2, '');
my $url = "$base_url/$split/$id/$id-h/$id-h.htm";
[download]

( my $url = join '/', split //, $id ) =~
    s{(.*)/}{$base_url/$1/$id/$id-h/$id-h.htm}s;
[download]

I'm partial to this *longer* version:

my $url = join('/', 
    $base_url,
    $id =~ /(.)(?=.)/sg,
    $id,
    "$id-h",
    "$id-h.htm"
);
[download]

The flow is very simple, so it's easy to understand.

[reply]
[d/l]
[select]

Back to Seekers of Perl Wisdom