Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Skip quoted string in regex

by roho (Canon)
on Nov 09, 2015 at 20:05 UTC ( #1147286=perlquestion: print w/replies, xml ) Need Help??

roho has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code which splits a SQL clause into field, operator, and value using a regular expression of alternations composed of known operators. It works fine unless the quoted value in the SQL clause contains an operator character (the equal sign in my example), in which case $val contains 'Response.

I tried adding the usual pattern for quoted strings ("[^"]+") to the front of the regex alternations but that does not help. I tried adding the limit value of ",3" to split but that also does not help. How can I modify my regular expression (variable $opr_regex) to skip quoted values when splitting the SQL clause?

#!/usr/bin/perl use strict; use warnings; my $clause = qq(failure_reason <> 'Response=X'); my @operators = qw ( <> != !< !> <= >= < > = ); my $opr_regex = '(\s*' . join('\s*|\s*', @operators) . '\s*)'; my ($fld,$opr,$val) = split /$opr_regex/, $clause, 3; print "\n\$opr_regex = $opr_regex\n\n"; print "\$clause = $clause\n\n"; print "\$fld = |$fld|\n"; print "\$opr = |$opr|\n"; print "\$val = |$val|\n\n";

"Its not how hard you work, its how much you get done."

Replies are listed 'Best First'.
Re: Skip quoted string in regex
by LanX (Archbishop) on Nov 09, 2015 at 20:21 UTC
    Hi Roho

    Usually it's discouraged to solve this kind of parsing with regex, but expecting that split could do this is even more than ambitious.

    You would need look-arounds b/c both LHS and RHS could contain "operators".

    Did you consider using something like SQL::Parser ?

    Anyway the right approach¹ (IMHO) is to successively build a parse regex which starts from the left, by combining smaller parts like $clause = $lhs . $op . $rhs;

    Like this you can also follow the official SQL grammar.

    Saying this, there are already some BNF parsers available for Perl.

    Good luck reinventing the wheel. ;-)

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

    update

    ¹) not elaborating on the necessity to be able to recursively descent into nested structures like parens-groups and even sub-queries

      Thanks Rolf. I realized this as soon as I inherited this code. Due to constraints (time, $$$) I am trying to avoid a massive refactor (or rewrite), but alas, I may have to make the case to the powers that be that this will be more cost effective and beneficial in the long run. I will check out SQL::Parser. Thanks again.
      Sorry, I was not logged in. Thanks again.

      "Its not how hard you work, its how much you get done."

Re: Skip quoted string in regex
by AnomalousMonk (Bishop) on Nov 09, 2015 at 20:38 UTC

    Like LanX, I have some reservations about your basic approach, but that said, just using a split limit of 2 seems to do the trick here:

    c:\@Work\Perl\monks>perl -wMstrict -e "my $clause = qq(failure_reason <> 'Response=X'); my @operators = qw ( <> != !< !> <= >= < > = ); my $opr_regex = '(\s*' . join('\s*|\s*', @operators) . '\s*)'; my ($fld,$opr,$val) = split /$opr_regex/, $clause, 2; print \"\n\$opr_regex = $opr_regex\n\n\"; print \"\$clause = $clause\n\n\"; print \"\$fld = ^|$fld^|\n\"; print \"\$opr = ^|$opr^|\n\"; print \"\$val = ^|$val^|\n\n\"; " $opr_regex = (\s*<>\s*|\s*!=\s*|\s*!<\s*|\s*!>\s*|\s*<=\s*|\s*>=\s*|\ +s*<\s*|\s*>\s*|\s*=\s*) $clause = failure_reason <> 'Response=X' $fld = |failure_reason| $opr = | <> | $val = |'Response=X'|
    (Update: In the  qq(failure_reason <> 'Response=X') example given, you only actually want to split once.)


    Give a man a fish:  <%-{-{-{-<

      Some background...

      > In the qq(failure_reason <> 'Response=X') example given, you only actually want to split once.)

      I had the same idea but 3 seemed right, since 3 fields are returned.

      It's a bit confusing, b/c according to the docs "LIMIT... represents the maximum number of fields the EXPR will be split into".

      But the middle field originates from the (regex-group) within the $opr_regex.

      But still, split is the wrong approach here, b/c the LHS could be quoted too.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

        ... the LHS could be quoted too.

        I missed that bit in your reply. I was thinking of responding with a semi-elaborate regex approach, but if there's an SQL parser available on CPAN, I agree that's the way to go: wheel, reinvention, and all that.


        Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1147286]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2019-12-15 20:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?