Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Selecting HL7 Transactions

by kcott (Abbot)
on May 02, 2013 at 00:20 UTC ( #1031672=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Selecting HL7 Transactions
in thread Selecting HL7 Transactions

"The parentheses are used for the repeat factor. All my research on the internet indicates a multi-character pattern that needs to be repeated multiple times should be enclosed in parentheses. Is this not correct?"

Here's a test showing clustering and capturing. Both match as expected. Capturing also sets $1.

$ perl -Mstrict -Mwarnings -E ' my $re1 = qr{PV1\|1\|O\|(?:[^|]*\|){3}\|}; my $re2 = qr{PV1\|1\|O\|([^|]*\|){3}\|}; my $x = q{PV1|1|O|F3|F4|F5|F6|F7}; my $y = q{PV1|1|O|F3|F4|F5||F7}; say "------- Clustering -------"; say "Match in \$x" if $x =~ /$re1/; say $1 if $1; say "Match in \$y" if $y =~ /$re1/; say $1 if $1; say "------- Capturing -------"; say "Match in \$x" if $x =~ /$re2/; say $1 if $1; say "Match in \$y" if $y =~ /$re2/; say $1 if $1; ' ------- Clustering ------- Match in $y ------- Capturing ------- Match in $y F5|

-- Ken


Comment on Re^3: Selecting HL7 Transactions
Select or Download Code
Replies are listed 'Best First'.
Re^4: Selecting HL7 Transactions
by BillDowns (Novice) on May 02, 2013 at 01:02 UTC

    On a different note, since you seem quite knowledgeable about Perl, and again referring to google searches, non-greedy matching is usually defined in a manner that  (.*?\|) and  ([^|]*\|) should be equivalent. And I think so, too. Why are they not?

      Ignoring the issue with newlines and the "s" modifier that I alluded to earlier, the heart of the matter is that "." matches any character while "[^|]" matches any character except the pipe character.

      Taken in isolation, /(.*?\|)/ and /([^|]*\|)/ may well produce the same result:

      $ perl -Mstrict -Mwarnings -E ' my $x = q{A|||||Z}; my $dot_re = qr{(.*?\|)}; my $cc_re = qr{([^|]*\|)}; $x =~ $dot_re; say $1; $x =~ $cc_re; say $1; ' A| A|

      The reasons they do this, however, are different. "A" is the least number [non-greedy] of zero or more of any characters (".*?") that match before a literal pipe character ("\|"). It just so happens that "A" is also the greatest number [greedy] of zero or more non-pipe characters ("[^|]*") that match before a literal pipe character ("\|"). So, in both cases "A|" is captured.

      Now consider the following where the capture groups are no longer in isolation:

      $ perl -Mstrict -Mwarnings -E ' my $x = q{A|||||Z}; my $dot_re = qr{(.*?\|)Z}; my $cc_re = qr{([^|]*\|)Z}; $x =~ $dot_re; say $1; $x =~ $cc_re; say $1; ' A||||| |

      Here, "A||||" is the least number [non-greedy] of zero or more of any characters (".*?") that match before a literal pipe character ("\|") that is immediately followed by a literal Z character: "A||||" plus "|" are captured. However, "" (i.e. nothing) is the greatest number [greedy] of zero or more non-pipe characters ("[^|]*") that match before a literal pipe character ("\|") that is immediately followed by a literal Z character: "" plus "|" are captured.

      I recommend you take a look at Regexp::Debugger which provides a visualisation of Perl's regular expression engine in action — I think you'll find it most enlightening.

      I'd also recommend you look at the Perl documentation (available online at http://perldoc.perl.org/perl.html) before reaching for an internet search engine. Here's a list of Perl regular expression documentation that you'll find linked from that page:

      • perlrequick — Perl regular expressions quick start
      • perlretut — Perl regular expressions tutorial
      • perlfaq6 — Perl frequently asked questions: Regexes
      • perlre — Perl regular expressions, the rest of the story
      • perlrebackslash — Perl regular expression backslash sequences
      • perlrecharclass — Perl regular expression character classes
      • perlreref — Perl regular expressions quick reference

      That's the order the links appear on that page: look at them in whatever order you want. To be honest, I was a little surprised there was so many; had I realised in advanced, I might not have chosen to start enumerating them all here.

      -- Ken

        Frankly, I find the Perl documentation quite badly written years ago and I gave up on it.

        Back to the non-greedy matching, I understand what you are saying. It's just that's not the normal and usual understanding of the word. To my mind, and I would bet most of the world, since the expression was (*.?\|) that would mean the least number of characters before the next - stress that next - \|.

        That's what "non-greedy" should mean, IMO. Perl's implementation still greedy.

Re^4: Selecting HL7 Transactions
by Anonymous Monk on May 02, 2013 at 00:46 UTC
    Thanks for that info - I did not know about clustering. It does not show up in the first half-dozen google searches on regular expressions. In this case, it doesn't matter - the utility script doesn't care about any capturing. It is simply qualifying.
      Sorry - forgot to log in first. That was me :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031672]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2015-07-28 06:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (252 votes), past polls