Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: extracting a substring from a string - multiple variables

by graff (Chancellor)
on Oct 27, 2007 at 22:23 UTC ( [id://647638]=note: print w/replies, xml ) Need Help??


in reply to extracting a substring from a string - multiple variables

Is the data some sort of home-grown imitation of XML? If it was "real" XML, there wouldn't be a slash before the first close-angle-bracket. (I guess since it isn't real XML, it wouldn't help to recommend an XML parsing module.)

Do you mean something like this?

my $string = '...blah...<file fiop="foo" length="bar"/>baz</file>...bl +ah...'; my ( $foo, $bar, $baz ); if ( $string =~ s{<file fiop="([^"]+)" length="([^"]+)"/>([^<]+)</file +>}{} ) { ( $foo, $bar, $baz ) = ( $1, $2, $3 ); print "extracted $foo, $bar, $baz; left $string\n"; }

Replies are listed 'Best First'.
Re^2: extracting a substring from a string - multiple variables
by walinsky (Scribe) on Oct 27, 2007 at 23:10 UTC
    Actually you hit it right on the spot; it's home-grown XML from Cupertino...
    The baz part is raw binary data, inserted in the XML; that's why I want to extract it before parsing the valid XML.
    I hadn't even noticed the close-angle-bracket (thanks - but it's really there).

    I've tried your code; but it doesn't seem to get me there.
    Any further suggestions ?
      When I run my snippet as posted, I get the following output:
      extracted foo, bar, baz; left ...blah......blah...
      Do you get something different when you run it? Or do you want something different from that?

      When you try to use the "s{...}{}" expression in your own code, is it possible that your "raw binary data" (in "the baz part") might contain a byte value of 0x3C? This would be treated as a "<" character in the regex match, which would cause trouble. Something like this might work better in that case:

      s{<file fiop="([^"]+)" length="([^"]+)"/>(.*?)</file>}{}s
      (update: added the "s" modifier at the end, in case the raw binary stuff might contain a line-feed)

      Note the question mark after ".*" -- that's the important thing that was missing from your initial attempt: it makes the wildcard match non-greedy (stops matching as soon as possible).

        where do I send the flowers ;)

        the 's' modifier at the end did the trick!

        thanks for your continuous effort (and updating your comment ;)
Re^2: extracting a substring from a string - multiple variables
by duff (Parson) on Oct 29, 2007 at 13:46 UTC

    I find it funny that the right solution ("use a parser") is shot down because this isn't exactly XML. Are you sure an HTML parser wouldn't parse it properly? Weird how everyone gets stuck on regex.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://647638]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-20 01:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found