Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^6: Interpolating subroutine call in SQL INSERT INTO SELECT statement

by shadowsong (Monk)
on Sep 01, 2015 at 08:56 UTC ( #1140632=note: print w/replies, xml ) Need Help??


in reply to Re^5: Interpolating subroutine call in SQL INSERT INTO SELECT statement
in thread Interpolating subroutine call in SQL INSERT INTO SELECT statement

poj - you're right...

While testing it I realized I was getting back dodgy results, it turns out what I needed to do was use negative lookahead

I ended up changing the value of key 1 in %char_swap_hash to this:

1 => {OLD => '&(?!(amp;)|(lt;)|(gt;))', NEW => '&'}, # don't match + legit '&' codes

I then saw your solution which not only confirmed that attempting to use a character class to evaluate lookaheads was bonkers but introduced me YAPE::Regex::Explain - which at the moment isn't being found by cpanm (so I am unable to easily install and use it; which is more than a little irritating)

However, many thanks poj - I appreciate all your help.

  • Comment on Re^6: Interpolating subroutine call in SQL INSERT INTO SELECT statement
  • Download Code

Replies are listed 'Best First'.
Re^7: Interpolating subroutine call in SQL INSERT INTO SELECT statement
by poj (Abbot) on Sep 01, 2015 at 09:27 UTC

    When you get YAPE::Regex::Explain installed the result for your 'old' regex should be

    The regular expression: (?-imsx:&[^(amp;)|(lt;)|(gt;)]) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- [^(amp;)|(lt;)|(gt;) any character except: '(', 'a', 'm', 'p', ] ';', ')', '|', '(', 'l', 't', ';', ')', '|', '(', 'g', 't', ';', ')' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    and for mine
    The regular expression: (?-imsx:(&(?!(?:amp|lt|gt);)|[><])) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- amp 'amp' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- lt 'lt' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- gt 'gt' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ; ';' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- [><] any character of: '>', '<' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    and your new regex
    The regular expression: (?-imsx:&(?!(amp;)|(lt;)|(gt;))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- amp; 'amp;' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- lt; 'lt;' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- gt; 'gt;' ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    poj

      poj,

      I just managed to get the YAPE::Regex::Explain module downloaded and installed via cpanm (turned out I had some firewall/proxy issues to resolve). This is an awesome module and I'll definitely be using it more in future.

      Question: waaayy back when I did compiler design, we used a tool; a lexical analyzer generator called JLex (which was a Java implementation of Lex) wherein we leveraged its built in scanner to generate tokens from an input stream. Now, the documentation for it said we generally get better performance by building specification for tokens using the longest possible RegExps. So, when matching a particular expression the more of the pattern we can provide, the better. Suffice to say instead of providing:

      (amp)|(lt)|(gt);

      We ought to provide:

      (amp;)|(lt;)|(gt;)

      What are your thoughts on this? I'm not all that well versed with Perl (or how its RegExp engine functions) so any tips would be most appreciated.

      Thanks
      shadowsong

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1140632]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (12)
As of 2019-05-27 13:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you enjoy 3D movies?



    Results (156 votes). Check out past polls.

    Notices?