Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Matching backslash in regexp negative lookbehind

by 1nickt (Monsignor)
on Jan 02, 2018 at 18:04 UTC ( #1206551=perlquestion: print w/replies, xml ) Need Help??
1nickt has asked for the wisdom of the Perl Monks concerning the following question:

Hello friends, I seek help with what I realize may be an XY problem (so I will describe it generally below).

I've inherited some code that is intended to mask sensitive data in logs. The application sends data to be formatted (flattened) for logging. The data may include JSON strings. The current code uses Data::Dumper to flatten the data. The resulting string may have "sensitive" keys quoted with single quotes (from Data::Dumper) or with double quotes (from JSON). Presumably, it could also contain embedded escaped quotes.

A regular expression is used to do the work. The current implementation is broken. I'm working on a replacement, and first it looks for the "quotation mark" in use to quote the key and the value. I'm using a negative lookbehind to skip escaped quotes. This seems to work in simple matching but what I am having trouble with is using the captured "quotation mark" (including the negative lookbehind to skip escaped quotes) in a character class or in a negative lookahead.

my $param = 'password'; for ( q~{'password' => 'secret'}~, q~{"password" => "sec\"ret"}~ ) { $_ =~ s/ ( # capture everything up to the start of th +e value ( # capture the quotation mark we are usin +g (?<!\\\\) # not escaped [ ' " ] # either kind of quote ) # end capture quotation mark $param # the key \2 # the same quotation mark \s* # any amount of space (?: => | : ) # perl or JSON key-value "connector" \s* # any amount of space \2 # the same quotation mark ) # end capture everything up to start of th +e value (?: # group but do not capture the value (?!\2) . # defined as any character except the sam +e quote )* # any number of times /$1***/smxg; # the closing quotation mark will remain i +n place say $_; }
This outputs:
{'password' => '***'} {"password" => "***"ret"}

All suggestions welcome.


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re: Matching backslash in regexp negative lookbehind
by tybalt89 (Curate) on Jan 02, 2018 at 18:44 UTC
    #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1206551 use strict; use warnings; my $param = 'password'; for ( q~{'password' => 'secret'}~, q~{"password" => "sec\"ret"}~ ) { local $_ = $_; s/ ( # capture everything up to the start of th +e value ( # capture the quotation mark we are usin +g (?<!\\\\) # not escaped [ ' " ] # either kind of quote ) # end capture quotation mark $param # the key \2 # the same quotation mark \s* # any amount of space (?: => | : ) # perl or JSON key-value "connector" \s* # any amount of space \2 # the same quotation mark ) # end capture everything up to start of th +e value (?: # group but do not capture the value \\. | (?!\2) . # defined as any character except the same + quote )* # any number of times /$1***/smxg; # the closing quotation mark will remain i +n place print $_; }

    Outputs:

    {'password' => '***'} {"password" => "***"}
      [ ' " ]           # either kind of quote

      Note that  [ ' " ] also matches a blank (0x20). Better perhaps to use  ['"] instead.


      Give a man a fish:  <%-{-{-{-<

        Yes. But there's another possibility: - quoting Regexp Quote-Like Operators in perlop:
        x   Use extended regular expressions;
        specifying two x's means \t and the SPACE character are ignored within square-bracketed character classes

      I am in awe of the effortless boxlessness of your mind.

      Thank you very much.


      The way forward always starts with a minimal test.
Re: Matching backslash in regexp negative lookbehind
by haukex (Abbot) on Jan 02, 2018 at 18:18 UTC
Re: Matching backslash in regexp negative lookbehind
by AnomalousMonk (Chancellor) on Jan 02, 2018 at 20:10 UTC

    Wouldn't it be better (or at least easier on the noggin) to use a two-pass approach? (Uses Perl 5.10+  \K regex extension.)

    Output:
    c:\@Work\Perl\monks\1nickt>perl elide_sensitive_1.pl ok 1 - '{'password' => 'secret'}' -> '{'password' => '***'}' ok 2 - '{'password' => 'sec\'ret'}' -> '{'password' => '***'}' ok 3 - '{"password" => "secret"}' -> '{"password" => "***"}' ok 4 - '{"password" => "sec\"ret"}' -> '{"password" => "***"}' ok 5 - '{'password' : 'secret'}' -> '{'password' : '***'}' ok 6 - '{'password' : 'sec\'ret'}' -> '{'password' : '***'}' ok 7 - '{"password" : "secret"}' -> '{"password" : "***"}' ok 8 - '{"password" : "sec\"ret"}' -> '{"password" : "***"}' ok 9 - '{'user' => 'secret'}' -> '{'user' => '***'}' ok 10 - '{'user' => 'sec\'ret'}' -> '{'user' => '***'}' ok 11 - '{"user" => "secret"}' -> '{"user" => "***"}' ok 12 - '{"user" => "sec\"ret"}' -> '{"user" => "***"}' ok 13 - '{'user' : 'secret'}' -> '{'user' : '***'}' ok 14 - '{'user' : 'sec\'ret'}' -> '{'user' : '***'}' ok 15 - '{"user" : "secret"}' -> '{"user" : "***"}' ok 16 - '{"user" : "sec\"ret"}' -> '{"user" : "***"}' 1..16 ok 17 - no warnings 1..17


    Give a man a fish:  <%-{-{-{-<

Re: Matching backslash in regexp negative lookbehind
by AnomalousMonk (Chancellor) on Jan 02, 2018 at 22:53 UTC

    I don't understand the significance of the  (?<!\\\\) assertion in the regex in the OP (copied as-is in tybalt89's reply). It suppresses elision of secret values when the single- or double-quoted key phrase is immediately preceded by a  \\ double backslash but not by a single backslash. Is this intended? Output from tybalt89's working code with a few minor output formatting modifications:

    c:\@Work\Perl\monks\1nickt>perl tybalt89_pm1206554_1.pl {'password' => 'secret'} -> {'password' => '***'} {"password" => "sec\"ret"} -> {"password" => "***"} {\'password' => 'secret'} -> {\'password' => '***'} {\"password" => "sec\"ret"} -> {\"password" => "***"} {\\'password' => 'secret'} -> {\\'password' => 'secret'} {\\"password" => "sec\"ret"} -> {\\"password" => "sec\"ret"}


    Give a man a fish:  <%-{-{-{-<

      The double backslash thing was an artifact, not needed in the code that tybalt89 gave. Thank you for your solutions also.


      The way forward always starts with a minimal test.
Re: Matching backslash in regexp negative lookbehind
by AnomalousMonk (Chancellor) on Jan 02, 2018 at 21:41 UTC

    Here's a single-pass alternative to this (uses the same test-vector set):

    Whether this variation is preferable is very much a matter of taste. The  BEGIN block is required so that certain scalars | lexicals (the  %mask hash in particular,  $connector not so much since it could easily be inlined) will be guaranteed to be properly initialized. Very much dependent on 5.10 regex extensions.


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1206551]
Approved by haukex
Front-paged by Discipulus
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2018-02-22 05:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When it is dark outside I am happiest to see ...














    Results (288 votes). Check out past polls.

    Notices?