Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Selecting HL7 Transactions

by BillDowns (Novice)
on May 01, 2013 at 23:22 UTC ( #1031671=note: print w/ replies, xml ) Need Help??


in reply to Re: Selecting HL7 Transactions
in thread Selecting HL7 Transactions

Thanks, but I guess I did not make it clear - this is a utility script that extracts transactions from an archive based on the regular expressions I give it at run time. That's all it does - extracts transactions to a file.

/PV1\|1\|O\|(.*?\|){3}\|/ was one of several regexes evaluated by itself. If all are true, the transaction is extract to an output file.

I know about anchors - PV1 segments are a ways into the transaction as I showed in the sample, so I could not use an anchor.

The parentheses are used for the repeat factor. All my research on the internet indicates a multi-character pattern that needs to be repeated multiple times should be enclosed in parentheses. Is this not correct?


Comment on Re^2: Selecting HL7 Transactions
Download Code
Replies are listed 'Best First'.
Re^3: Selecting HL7 Transactions
by kcott (Abbot) on May 02, 2013 at 00:20 UTC
    "The parentheses are used for the repeat factor. All my research on the internet indicates a multi-character pattern that needs to be repeated multiple times should be enclosed in parentheses. Is this not correct?"

    Here's a test showing clustering and capturing. Both match as expected. Capturing also sets $1.

    $ perl -Mstrict -Mwarnings -E ' my $re1 = qr{PV1\|1\|O\|(?:[^|]*\|){3}\|}; my $re2 = qr{PV1\|1\|O\|([^|]*\|){3}\|}; my $x = q{PV1|1|O|F3|F4|F5|F6|F7}; my $y = q{PV1|1|O|F3|F4|F5||F7}; say "------- Clustering -------"; say "Match in \$x" if $x =~ /$re1/; say $1 if $1; say "Match in \$y" if $y =~ /$re1/; say $1 if $1; say "------- Capturing -------"; say "Match in \$x" if $x =~ /$re2/; say $1 if $1; say "Match in \$y" if $y =~ /$re2/; say $1 if $1; ' ------- Clustering ------- Match in $y ------- Capturing ------- Match in $y F5|

    -- Ken

      On a different note, since you seem quite knowledgeable about Perl, and again referring to google searches, non-greedy matching is usually defined in a manner that  (.*?\|) and  ([^|]*\|) should be equivalent. And I think so, too. Why are they not?

        Ignoring the issue with newlines and the "s" modifier that I alluded to earlier, the heart of the matter is that "." matches any character while "[^|]" matches any character except the pipe character.

        Taken in isolation, /(.*?\|)/ and /([^|]*\|)/ may well produce the same result:

        $ perl -Mstrict -Mwarnings -E ' my $x = q{A|||||Z}; my $dot_re = qr{(.*?\|)}; my $cc_re = qr{([^|]*\|)}; $x =~ $dot_re; say $1; $x =~ $cc_re; say $1; ' A| A|

        The reasons they do this, however, are different. "A" is the least number [non-greedy] of zero or more of any characters (".*?") that match before a literal pipe character ("\|"). It just so happens that "A" is also the greatest number [greedy] of zero or more non-pipe characters ("[^|]*") that match before a literal pipe character ("\|"). So, in both cases "A|" is captured.

        Now consider the following where the capture groups are no longer in isolation:

        $ perl -Mstrict -Mwarnings -E ' my $x = q{A|||||Z}; my $dot_re = qr{(.*?\|)Z}; my $cc_re = qr{([^|]*\|)Z}; $x =~ $dot_re; say $1; $x =~ $cc_re; say $1; ' A||||| |

        Here, "A||||" is the least number [non-greedy] of zero or more of any characters (".*?") that match before a literal pipe character ("\|") that is immediately followed by a literal Z character: "A||||" plus "|" are captured. However, "" (i.e. nothing) is the greatest number [greedy] of zero or more non-pipe characters ("[^|]*") that match before a literal pipe character ("\|") that is immediately followed by a literal Z character: "" plus "|" are captured.

        I recommend you take a look at Regexp::Debugger which provides a visualisation of Perl's regular expression engine in action — I think you'll find it most enlightening.

        I'd also recommend you look at the Perl documentation (available online at http://perldoc.perl.org/perl.html) before reaching for an internet search engine. Here's a list of Perl regular expression documentation that you'll find linked from that page:

        • perlrequick — Perl regular expressions quick start
        • perlretut — Perl regular expressions tutorial
        • perlfaq6 — Perl frequently asked questions: Regexes
        • perlre — Perl regular expressions, the rest of the story
        • perlrebackslash — Perl regular expression backslash sequences
        • perlrecharclass — Perl regular expression character classes
        • perlreref — Perl regular expressions quick reference

        That's the order the links appear on that page: look at them in whatever order you want. To be honest, I was a little surprised there was so many; had I realised in advanced, I might not have chosen to start enumerating them all here.

        -- Ken

      Thanks for that info - I did not know about clustering. It does not show up in the first half-dozen google searches on regular expressions. In this case, it doesn't matter - the utility script doesn't care about any capturing. It is simply qualifying.
        Sorry - forgot to log in first. That was me :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031671]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2015-07-30 07:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (270 votes), past polls