Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Parse messy string into neat data structure

by Cody Fendant (Pilgrim)
on Aug 09, 2010 at 01:24 UTC ( #853700=perlquestion: print w/ replies, xml ) Need Help??
Cody Fendant has asked for the wisdom of the Perl Monks concerning the following question:

I've got to parse a string which contains variations on the following (along with some other stuff which I can ignore):

  • one filename ending in '.jpg'
  • two filenames ending in '.jpg'
  • two sections, separated by a pipe, which may contain either of the previous two variations

I need to extract those things, and I need to be sure the order they appear in, because if there's only one, it will be the large image, but if there are two, the second will be the thumbnail. It's a big ugly mess.

So far I'm doing this:

my @images = ( [], [] ); my ( $img_1_str, $img_2_str) = split( '\|', $str); @{ @images[0] } = $img_1_str =~ m/(\w+\.jpg)/gi; @{ @images[1] } = $img_2_str =~ m/(\w+\.jpg)/gi;

Which I think is foolproof, but I feel it's ugly and not particularly Perlish. Any suggestions?

Comment on Parse messy string into neat data structure
Download Code
Re: Parse messy string into neat data structure
by ikegami (Pope) on Aug 09, 2010 at 01:33 UTC

    Wrong sigil for @images[0].
    split takes a regex. There's no reason to use a string literal.
    Needless code duplication.

    my @images = map { [ /\w+\.jpg/gi ] } split /\|/, $str;
Re: Parse messy string into neat data structure
by aquarium (Curate) on Aug 09, 2010 at 01:41 UTC
    I'm guessing you're parsing a mediawiki page..in which case the built-in API is more flexible and with more well-defined (xml) responses. If this is the case, the url for the mediawiki is whateveryourwiki/api.php, e.g. http://en.wikipedia.org/w/api.php
    Or in any case, try harder if you can for your incoming data to be more well defined if possible. Assuming an attribute for your data based on column number is precarious.
    the hardest line to type correctly is: stty erase ^H

      You're guessing wrong, or if I am parsing Wiki code I'm at the end of a long chain which starts with Wiki code and can't help it anyway.

      I'm well aware how ugly the input text is, but believe me, I can't fix that.

        So shoot me for asking.
        You're right about your code being ugly though..as it's not a good idea to number variables from the split. Either an array, a hash, or even a linked list result could be more appropriate, depending on what you want your "neat data structure" loaded from non-reliable input to do.
        the hardest line to type correctly is: stty erase ^H
Re: Parse messy string into neat data structure
by Anonymous Monk on Aug 09, 2010 at 01:46 UTC
    I would write that as
    my @images = ( [], [] ); { my( @imgstr ) = split /\|/, $str, 2; @{ $images[0] } = $imgstr[0] =~ m/(\w+\.jpg)/gi; @{ $images[1] } = $imgstr[1] =~ m/(\w+\.jpg)/gi; }
    because
    • using parens with built-in functions is ugly , extra typing
    • split takes regex as 1st argument, and you're only expecting it to return 2 parts at most
    • avoid slice syntax when you're not taking a slice ($images[0])
    • avoid pretend arrays, use real arrays (@imgstr)
    • limit scope of temporary variables
    Actually I would write
    my @images; { my @imgstr = split /\|/, $str, 2; push @images, [ $_ =~ m/(\w+\.jpg)/gi ] for @imgstr; }
    or actually
    my @images = map { [ $_ =~ m/(\w+\.jpg)/gi ] } split /\|/, $str;
    Yes, the last one is how I would normally write something like that, its funny how digesting someone else s code plays tricks on you :)
Re: Parse messy string into neat data structure
by mykl (Monk) on Aug 09, 2010 at 12:46 UTC

    Are you sure that the filenames won't contain spaces? Or, in general, that it will match a \w+\.jpg pattern? If you are sure, fine, but I thought we ought to check that assumption.

    --

    "Any sufficiently analyzed magic is indistinguishable from science" - Agatha Heterodyne

      Yes, thanks for that. But I'm pretty confident about the filenames being reliable.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://853700]
Approved by aquarium
Front-paged by aquarium
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2014-09-23 02:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls