Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

A Better Guten Split

by hacker (Priest)
on Dec 01, 2009 at 00:11 UTC ( #810289=perlquestion: print w/ replies, xml ) Need Help??
hacker has asked for the wisdom of the Perl Monks concerning the following question:

I'll keep it simple: I'm trying to write a quick and dirty parser of some Project Gutenberg etexts, and ran into a puzzle.

Each of the etexts is stored in a split directory structure that models the name of the etext. For example, the HTML version of etext 12345 exists in /1/2/3/4/12345/12345-h/12345-h.htm.

Here's what I have to split that out, when given just the 12345 as the argument to my parser:

my $etext = $ARGV[0]; my $site = 'http://pod/Gutenberg'; my $splitguten = join('/', split(/ */, $etext)); my $clipguten = substr($splitguten, -2, 2, ''); my $link = "$site/$splitguten/$etext/$etext-h/$etext-h.htm";

I'm trying to find a cleaner way to do this. Any ideas or suggestions?

Comment on A Better Guten Split
Download Code
Replies are listed 'Best First'.
Re: A Better Guten Split
by ikegami (Pope) on Dec 01, 2009 at 00:35 UTC

    I guess we can start by getting rid of the useless variable and shorten the silly pattern.

    my $split = join '/', split //, $id; substr($split, -2, 2, ''); my $url = "$base_url/$split/$id/$id-h/$id-h.htm";

    Two lines to calculate and one line to assemble. I don't think length is really a problem here. We're dealing with readability issues if we try to shorten it any more. These are just too complicated:

    substr( ( my $split = join '/', split //, $id ), -2, 2, ''); my $url = "$base_url/$split/$id/$id-h/$id-h.htm";
    ( my $url = join '/', split //, $id ) =~ s{(.*)/}{$base_url/$1/$id/$id-h/$id-h.htm}s;

    I'm partial to this *longer* version:

    my $url = join('/', $base_url, $id =~ /(.)(?=.)/sg, $id, "$id-h", "$id-h.htm" );

    The flow is very simple, so it's easy to understand.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://810289]
Approved by broomduster
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (17)
As of 2015-07-31 14:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (278 votes), past polls