Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: Regular expressions: Extracting certain text from a line

by kcott (Archbishop)
on Apr 08, 2014 at 08:41 UTC ( [id://1081476]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Regular expressions: Extracting certain text from a line
in thread Regular expressions: Extracting certain text from a line

Grrr! I wish they wouldn't do that.

Anticipating more ante upping, with deeply nested brace/bracket combos and wanting to capture a nested (but not an isolated) '{}' or '[]', e.g. '{ {} }', here's (maybe) a bit of a cheat:

#!/usr/bin/env perl -l use strict; use warnings; my ($brace_re, $bracket_re); $brace_re = qr< { (?: [^{}]++ | (??{ $brace_re }) )* } >x; $bracket_re = qr< \[ (?: [^\[\]]++ | (??{ $bracket_re }) )* \] >x; my $re = qr< ( $brace_re | $bracket_re ) >x; while (<DATA>) { print; while (/$re/g) { print "MATCH = $1" if length $1 > 2; } print '-' x 60; } __DATA__ ...?[](...$[] = [ USER_ENTITY_NAME ], text${} = { this is a test })... a[] = [ this is a [ test ] { test2 } ] a{} = { this is a { test } [ test2 ] } { a { b [ {}c{} ] d } e } = [ f [ g { []h[] } i ] j ] {}[]{ {}[] }[]{} - []{}[ []{} ]{}[]

Output:

...?[](...$[] = [ USER_ENTITY_NAME ], text${} = { this is a test })... MATCH = [ USER_ENTITY_NAME ] MATCH = { this is a test } ------------------------------------------------------------ a[] = [ this is a [ test ] { test2 } ] MATCH = [ this is a [ test ] { test2 } ] ------------------------------------------------------------ a{} = { this is a { test } [ test2 ] } MATCH = { this is a { test } [ test2 ] } ------------------------------------------------------------ { a { b [ {}c{} ] d } e } = [ f [ g { []h[] } i ] j ] MATCH = { a { b [ {}c{} ] d } e } MATCH = [ f [ g { []h[] } i ] j ] ------------------------------------------------------------ {}[]{ {}[] }[]{} - []{}[ []{} ]{}[] MATCH = { {}[] } MATCH = [ []{} ] ------------------------------------------------------------

Update: For Perl v5.8, you'll need to change [...]++ to (?> [...]+ ) (the '++' appeared in v5.10) and qr<...> delimiters will need to be something else, e.g. qr!...!.

The '(??{ $re })' construct has been around since at least v5.8.8.

Here's the perlre doco for 5.8.8 and 5.10.0.

-- Ken

Replies are listed 'Best First'.
Re^4: Regular expressions: Extracting certain text from a line
by AnomalousMonk (Archbishop) on Apr 09, 2014 at 01:23 UTC

    Hi Ken!

    Here's my latest try. It may be of interest to you. This is full-on 5.10+ as I wanted to get away from the  (??{ ... }) construct with its scary warnings and experiment some more with the  (?PARNO) construct, and also with  (DEFINE), which I still don't fully understand. As you see, the  (DEFINE) version requires an extra grep step; I couldn't figure out how to avoid it. Also, definition of empty squares or curlies expanded to include unlimited whitespace. Tested under Strawberries 5.10.1.5 and 5.14.4.1.

      Thanks. This is of interest.

      I was aiming for a 5.8 solution: it was only after posting that I noticed ++ wasn't introduced until 5.10.0. Both the 5.8.8 and 5.18.2 doco show the same (??{ code }) example for matching (...), which I more or less copied for {...} and [...], so I wasn't too concerned about the experimental warnings for that bit.

      I noticed that SimonPratt had hinted at a (?PARNO) solution (in Re^3: Regular expressions: Extracting certain text from a line) and I did look into that yesterday; although, I didn't spend a huge amount of time on it. Like you, I'm not really across (DEFINE): I'll spend a bit more time looking at this in concert with your code.

      I ran the four tests under 5.18.1. The two you'd marked as # works passed all tests for me; the other two (# no) both failed tests 2, 4, 6, 8 and 10 with $got->[0] = Does not exist in every case.

      -- Ken

        ... the ... doco show the same  (??{ code }) example for matching  (...) ... a  (?PARNO) solution ...

        I'm interested in  (?PARNO) and  (?R) because the documentation examples for nested expressions are almost the same. (Indeed, the doc sez "Similar to "(??{ code })"..." re: (?PARNO).) The impression the docs give is that something like  (?R) is somehow 'safer'. This may be tied up with the fact that  (??{ code }) used to be broken for use with lexicals – a problem I believe was fixed somewhere around 5.16 or 5.18 (I only go as high as 5.14 right now).

        Thanks to the 5.18 feedback on  (DEFINE). Yes, some variations work, some don't, and I don't understand the reason(s) for the differences.

Re^4: Regular expressions: Extracting certain text from a line
by AnomalousMonk (Archbishop) on Apr 08, 2014 at 16:59 UTC
    Grrr! I wish they wouldn't do that.

    Yeah, it's much too much like those casual "Oh, by the way..." comments on the way out of a meeting at work that drive a steel spike into the heart of a careful proposal you've just finished presenting to apparent general approval. I don't hang around this place just so I can get more of what I get at work.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1081476]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2024-04-19 08:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found