Cody Fendant has asked for the wisdom of the Perl Monks concerning the following question:

I've got to parse a string which contains variations on the following (along with some other stuff which I can ignore):

I need to extract those things, and I need to be sure the order they appear in, because if there's only one, it will be the large image, but if there are two, the second will be the thumbnail. It's a big ugly mess.

So far I'm doing this:

my @images = ( [], [] ); my ( $img_1_str, $img_2_str) = split( '\|', $str); @{ @images[0] } = $img_1_str =~ m/(\w+\.jpg)/gi; @{ @images[1] } = $img_2_str =~ m/(\w+\.jpg)/gi;

Which I think is foolproof, but I feel it's ugly and not particularly Perlish. Any suggestions?

Replies are listed 'Best First'.
Re: Parse messy string into neat data structure
by ikegami (Pope) on Aug 09, 2010 at 01:33 UTC

    Wrong sigil for @images[0].
    split takes a regex. There's no reason to use a string literal.
    Needless code duplication.

    my @images = map { [ /\w+\.jpg/gi ] } split /\|/, $str;
Re: Parse messy string into neat data structure
by Anonymous Monk on Aug 09, 2010 at 01:46 UTC
    I would write that as
    my @images = ( [], [] ); { my( @imgstr ) = split /\|/, $str, 2; @{ $images[0] } = $imgstr[0] =~ m/(\w+\.jpg)/gi; @{ $images[1] } = $imgstr[1] =~ m/(\w+\.jpg)/gi; }
    • using parens with built-in functions is ugly , extra typing
    • split takes regex as 1st argument, and you're only expecting it to return 2 parts at most
    • avoid slice syntax when you're not taking a slice ($images[0])
    • avoid pretend arrays, use real arrays (@imgstr)
    • limit scope of temporary variables
    Actually I would write
    my @images; { my @imgstr = split /\|/, $str, 2; push @images, [ $_ =~ m/(\w+\.jpg)/gi ] for @imgstr; }
    or actually
    my @images = map { [ $_ =~ m/(\w+\.jpg)/gi ] } split /\|/, $str;
    Yes, the last one is how I would normally write something like that, its funny how digesting someone else s code plays tricks on you :)
Re: Parse messy string into neat data structure
by mykl (Monk) on Aug 09, 2010 at 12:46 UTC

    Are you sure that the filenames won't contain spaces? Or, in general, that it will match a \w+\.jpg pattern? If you are sure, fine, but I thought we ought to check that assumption.


    "Any sufficiently analyzed magic is indistinguishable from science" - Agatha Heterodyne

      Yes, thanks for that. But I'm pretty confident about the filenames being reliable.
Re: Parse messy string into neat data structure
by aquarium (Curate) on Aug 09, 2010 at 01:41 UTC
    I'm guessing you're parsing a mediawiki which case the built-in API is more flexible and with more well-defined (xml) responses. If this is the case, the url for the mediawiki is whateveryourwiki/api.php, e.g.
    Or in any case, try harder if you can for your incoming data to be more well defined if possible. Assuming an attribute for your data based on column number is precarious.
    the hardest line to type correctly is: stty erase ^H

      You're guessing wrong, or if I am parsing Wiki code I'm at the end of a long chain which starts with Wiki code and can't help it anyway.

      I'm well aware how ugly the input text is, but believe me, I can't fix that.

        So shoot me for asking.
        You're right about your code being ugly it's not a good idea to number variables from the split. Either an array, a hash, or even a linked list result could be more appropriate, depending on what you want your "neat data structure" loaded from non-reliable input to do.
        the hardest line to type correctly is: stty erase ^H