Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^4: Selecting HL7 Transactions

by BillDowns (Novice)
on May 02, 2013 at 01:02 UTC ( #1031677=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Selecting HL7 Transactions
in thread Selecting HL7 Transactions

On a different note, since you seem quite knowledgeable about Perl, and again referring to google searches, non-greedy matching is usually defined in a manner that  (.*?\|) and  ([^|]*\|) should be equivalent. And I think so, too. Why are they not?


Comment on Re^4: Selecting HL7 Transactions
Select or Download Code
Replies are listed 'Best First'.
Re^5: Selecting HL7 Transactions
by kcott (Abbot) on May 02, 2013 at 02:41 UTC

    Ignoring the issue with newlines and the "s" modifier that I alluded to earlier, the heart of the matter is that "." matches any character while "[^|]" matches any character except the pipe character.

    Taken in isolation, /(.*?\|)/ and /([^|]*\|)/ may well produce the same result:

    $ perl -Mstrict -Mwarnings -E ' my $x = q{A|||||Z}; my $dot_re = qr{(.*?\|)}; my $cc_re = qr{([^|]*\|)}; $x =~ $dot_re; say $1; $x =~ $cc_re; say $1; ' A| A|

    The reasons they do this, however, are different. "A" is the least number [non-greedy] of zero or more of any characters (".*?") that match before a literal pipe character ("\|"). It just so happens that "A" is also the greatest number [greedy] of zero or more non-pipe characters ("[^|]*") that match before a literal pipe character ("\|"). So, in both cases "A|" is captured.

    Now consider the following where the capture groups are no longer in isolation:

    $ perl -Mstrict -Mwarnings -E ' my $x = q{A|||||Z}; my $dot_re = qr{(.*?\|)Z}; my $cc_re = qr{([^|]*\|)Z}; $x =~ $dot_re; say $1; $x =~ $cc_re; say $1; ' A||||| |

    Here, "A||||" is the least number [non-greedy] of zero or more of any characters (".*?") that match before a literal pipe character ("\|") that is immediately followed by a literal Z character: "A||||" plus "|" are captured. However, "" (i.e. nothing) is the greatest number [greedy] of zero or more non-pipe characters ("[^|]*") that match before a literal pipe character ("\|") that is immediately followed by a literal Z character: "" plus "|" are captured.

    I recommend you take a look at Regexp::Debugger which provides a visualisation of Perl's regular expression engine in action — I think you'll find it most enlightening.

    I'd also recommend you look at the Perl documentation (available online at http://perldoc.perl.org/perl.html) before reaching for an internet search engine. Here's a list of Perl regular expression documentation that you'll find linked from that page:

    • perlrequick — Perl regular expressions quick start
    • perlretut — Perl regular expressions tutorial
    • perlfaq6 — Perl frequently asked questions: Regexes
    • perlre — Perl regular expressions, the rest of the story
    • perlrebackslash — Perl regular expression backslash sequences
    • perlrecharclass — Perl regular expression character classes
    • perlreref — Perl regular expressions quick reference

    That's the order the links appear on that page: look at them in whatever order you want. To be honest, I was a little surprised there was so many; had I realised in advanced, I might not have chosen to start enumerating them all here.

    -- Ken

      Frankly, I find the Perl documentation quite badly written years ago and I gave up on it.

      Back to the non-greedy matching, I understand what you are saying. It's just that's not the normal and usual understanding of the word. To my mind, and I would bet most of the world, since the expression was (*.?\|) that would mean the least number of characters before the next - stress that next - \|.

      That's what "non-greedy" should mean, IMO. Perl's implementation still greedy.

        Frankly, I find the Perl documentation quite badly written years ago and I gave up on it.

        Um, that doesn't sound reasonable :/

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031677]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2015-07-29 04:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls