Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Selecting HL7 Transactions

by kcott (Abbot)
on May 01, 2013 at 22:54 UTC ( #1031668=note: print w/ replies, xml ) Need Help??


in reply to Selecting HL7 Transactions

G'day BillDowns,

You have a number of issues here. I've included a fair amount of detail below but refer to perlre for the full story.

  • You're not actually showing a regex but just a fragment of one (I'll assume "/PV1\|1\|O\|(.*?\|){3}\|/"). I'm not trying to be pedantic but I can only respond to what you've written: for all I know, "PV1\|1\|O\|(.*?\|){3}\|" may be part of a larger regex. Also, I have no idea what modifiers, if any, you've used.
  • The "." in ".*?" matches any character including a pipe ("|") character which isn't what you want. (That's a slight oversimplication: it doesn't match a newline character unless you used the "s" modifier.) So, ".*?" would be better as "[^|]*" (zero or more characters that aren't pipe characters).
  • You don't anchor the regex so it could match anywhere in the string. To match at the beginning of the string you'll need to prepend "^" or "\A".
  • You've used capturing parentheses "( ... )" here. This won't break anything as it currently stands but could become an issue if you do want to capture fields later: "(?: ... )" (for clustering, not capturing) would be better.
  • Purely as a matter of style and personal taste, replacing the escaped pipe "\|" with the character class "[|]" may reduce what's been referred to as backslashitis and improve readability. Either is fine, it's up to you.

Putting all that together, you end up with a few options. Minimal changes would give: "/^PV1\|1\|O\|(?:[^|]*\|){3}\|/".

Having said all that, I'm wondering if splitting the lines on pipe characters might just be a whole lot easier in terms of general readability and future maintenance. Something along these lines:

my @fields = split /[|]/ => $line; ... if ($fields[0] eq 'MSH' and $fields[8] eq 'ADT^A02') { ... } ... if ($fields[0] eq 'PV1' and $fields[6] eq '') { ... } ...

-- Ken


Comment on Re: Selecting HL7 Transactions
Select or Download Code
Re^2: Selecting HL7 Transactions
by BillDowns (Novice) on May 01, 2013 at 23:22 UTC

    Thanks, but I guess I did not make it clear - this is a utility script that extracts transactions from an archive based on the regular expressions I give it at run time. That's all it does - extracts transactions to a file.

    /PV1\|1\|O\|(.*?\|){3}\|/ was one of several regexes evaluated by itself. If all are true, the transaction is extract to an output file.

    I know about anchors - PV1 segments are a ways into the transaction as I showed in the sample, so I could not use an anchor.

    The parentheses are used for the repeat factor. All my research on the internet indicates a multi-character pattern that needs to be repeated multiple times should be enclosed in parentheses. Is this not correct?

      "The parentheses are used for the repeat factor. All my research on the internet indicates a multi-character pattern that needs to be repeated multiple times should be enclosed in parentheses. Is this not correct?"

      Here's a test showing clustering and capturing. Both match as expected. Capturing also sets $1.

      $ perl -Mstrict -Mwarnings -E ' my $re1 = qr{PV1\|1\|O\|(?:[^|]*\|){3}\|}; my $re2 = qr{PV1\|1\|O\|([^|]*\|){3}\|}; my $x = q{PV1|1|O|F3|F4|F5|F6|F7}; my $y = q{PV1|1|O|F3|F4|F5||F7}; say "------- Clustering -------"; say "Match in \$x" if $x =~ /$re1/; say $1 if $1; say "Match in \$y" if $y =~ /$re1/; say $1 if $1; say "------- Capturing -------"; say "Match in \$x" if $x =~ /$re2/; say $1 if $1; say "Match in \$y" if $y =~ /$re2/; say $1 if $1; ' ------- Clustering ------- Match in $y ------- Capturing ------- Match in $y F5|

      -- Ken

        Thanks for that info - I did not know about clustering. It does not show up in the first half-dozen google searches on regular expressions. In this case, it doesn't matter - the utility script doesn't care about any capturing. It is simply qualifying.

        On a different note, since you seem quite knowledgeable about Perl, and again referring to google searches, non-greedy matching is usually defined in a manner that  (.*?\|) and  ([^|]*\|) should be equivalent. And I think so, too. Why are they not?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031668]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2014-12-29 01:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (184 votes), past polls