http://www.perlmonks.org?node_id=1031668


in reply to Selecting HL7 Transactions

G'day BillDowns,

You have a number of issues here. I've included a fair amount of detail below but refer to perlre for the full story.

Putting all that together, you end up with a few options. Minimal changes would give: "/^PV1\|1\|O\|(?:[^|]*\|){3}\|/".

Having said all that, I'm wondering if splitting the lines on pipe characters might just be a whole lot easier in terms of general readability and future maintenance. Something along these lines:

my @fields = split /[|]/ => $line; ... if ($fields[0] eq 'MSH' and $fields[8] eq 'ADT^A02') { ... } ... if ($fields[0] eq 'PV1' and $fields[6] eq '') { ... } ...

-- Ken

Replies are listed 'Best First'.
Re^2: Selecting HL7 Transactions
by BillDowns (Novice) on May 01, 2013 at 23:22 UTC

    Thanks, but I guess I did not make it clear - this is a utility script that extracts transactions from an archive based on the regular expressions I give it at run time. That's all it does - extracts transactions to a file.

    /PV1\|1\|O\|(.*?\|){3}\|/ was one of several regexes evaluated by itself. If all are true, the transaction is extract to an output file.

    I know about anchors - PV1 segments are a ways into the transaction as I showed in the sample, so I could not use an anchor.

    The parentheses are used for the repeat factor. All my research on the internet indicates a multi-character pattern that needs to be repeated multiple times should be enclosed in parentheses. Is this not correct?

      "The parentheses are used for the repeat factor. All my research on the internet indicates a multi-character pattern that needs to be repeated multiple times should be enclosed in parentheses. Is this not correct?"

      Here's a test showing clustering and capturing. Both match as expected. Capturing also sets $1.

      $ perl -Mstrict -Mwarnings -E ' my $re1 = qr{PV1\|1\|O\|(?:[^|]*\|){3}\|}; my $re2 = qr{PV1\|1\|O\|([^|]*\|){3}\|}; my $x = q{PV1|1|O|F3|F4|F5|F6|F7}; my $y = q{PV1|1|O|F3|F4|F5||F7}; say "------- Clustering -------"; say "Match in \$x" if $x =~ /$re1/; say $1 if $1; say "Match in \$y" if $y =~ /$re1/; say $1 if $1; say "------- Capturing -------"; say "Match in \$x" if $x =~ /$re2/; say $1 if $1; say "Match in \$y" if $y =~ /$re2/; say $1 if $1; ' ------- Clustering ------- Match in $y ------- Capturing ------- Match in $y F5|

      -- Ken

        On a different note, since you seem quite knowledgeable about Perl, and again referring to google searches, non-greedy matching is usually defined in a manner that  (.*?\|) and  ([^|]*\|) should be equivalent. And I think so, too. Why are they not?

        Thanks for that info - I did not know about clustering. It does not show up in the first half-dozen google searches on regular expressions. In this case, it doesn't matter - the utility script doesn't care about any capturing. It is simply qualifying.