Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^5: Interpolating subroutine call in SQL INSERT INTO SELECT statement

by poj (Abbot)
on Aug 28, 2015 at 15:14 UTC ( [id://1140353]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Interpolating subroutine call in SQL INSERT INTO SELECT statement
in thread Interpolating subroutine call in SQL INSERT INTO SELECT statement

The match for 1 is incorrect, you can't use a character class like that. Try this test

#!perl use strict; use YAPE::Regex::Explain; my $str = '&gt; milk &amp; honey &lt; < bill&ben >'; sub convert { my $input_stream = shift; return undef unless $input_stream; my %char_swap_hash = ( 1 => {OLD => '&[^(amp;)|(lt;)|(gt;)]', NEW => '&amp;'}, # don't mat +ch legit '&' codes 2 => {OLD => '<', NEW => '&lt;'}, 3 => {OLD => '>', NEW => '&gt;'} ); $input_stream =~ s/$char_swap_hash{$_}{OLD}/$char_swap_hash{$_}{NEW} +/g for keys %char_swap_hash; return $input_stream; } print YAPE::Regex::Explain->new('&[^(amp;)|(lt;)|(gt;)]')->explain; print convert($str),"\n";

Alternative

my %htm = ( '&' => '&amp;', '>' => '&gt;', '<' => '&lt;', ); my $REx = qr'(&(?!(?:amp|lt|gt);)|[><])'; $str =~ s/$REx/$htm{$1}/g; print $str."\n\n"; print YAPE::Regex::Explain->new($REx)->explain;
poj

Replies are listed 'Best First'.
Re^6: Interpolating subroutine call in SQL INSERT INTO SELECT statement
by shadowsong (Pilgrim) on Sep 01, 2015 at 08:56 UTC

    poj - you're right...

    While testing it I realized I was getting back dodgy results, it turns out what I needed to do was use negative lookahead

    I ended up changing the value of key 1 in %char_swap_hash to this:

    1 => {OLD => '&(?!(amp;)|(lt;)|(gt;))', NEW => '&amp;'}, # don't match + legit '&' codes

    I then saw your solution which not only confirmed that attempting to use a character class to evaluate lookaheads was bonkers but introduced me YAPE::Regex::Explain - which at the moment isn't being found by cpanm (so I am unable to easily install and use it; which is more than a little irritating)

    However, many thanks poj - I appreciate all your help.

      When you get YAPE::Regex::Explain installed the result for your 'old' regex should be

      The regular expression: (?-imsx:&[^(amp;)|(lt;)|(gt;)]) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- [^(amp;)|(lt;)|(gt;) any character except: '(', 'a', 'm', 'p', ] ';', ')', '|', '(', 'l', 't', ';', ')', '|', '(', 'g', 't', ';', ')' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
      and for mine
      The regular expression: (?-imsx:(&(?!(?:amp|lt|gt);)|[><])) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- amp 'amp' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- lt 'lt' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- gt 'gt' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ; ';' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- [><] any character of: '>', '<' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
      and your new regex
      The regular expression: (?-imsx:&(?!(amp;)|(lt;)|(gt;))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- amp; 'amp;' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- lt; 'lt;' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- gt; 'gt;' ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
      poj

        poj,

        I just managed to get the YAPE::Regex::Explain module downloaded and installed via cpanm (turned out I had some firewall/proxy issues to resolve). This is an awesome module and I'll definitely be using it more in future.

        Question: waaayy back when I did compiler design, we used a tool; a lexical analyzer generator called JLex (which was a Java implementation of Lex) wherein we leveraged its built in scanner to generate tokens from an input stream. Now, the documentation for it said we generally get better performance by building specification for tokens using the longest possible RegExps. So, when matching a particular expression the more of the pattern we can provide, the better. Suffice to say instead of providing:

        (amp)|(lt)|(gt);

        We ought to provide:

        (amp;)|(lt;)|(gt;)

        What are your thoughts on this? I'm not all that well versed with Perl (or how its RegExp engine functions) so any tips would be most appreciated.

        Thanks
        shadowsong

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1140353]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-03-28 14:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found