http://www.perlmonks.org?node_id=592818

xaprb has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, another question related to innotop. Innotop displays currently running queries in MySQL to a terminal, and one user wrote me with troubles caused by gzipped data in queries:

update tbl set col='@@@@', col="@@@@" ...

The @@@@ is gzipped cuss-words that freak his terminal out. The way we'd like this to be handled is to simply replace the field's contents with three question marks:

update tbl set col='???', col="???" ...

Now for the fun part: fields can be delimited with either single or double quotes. If single, doubles can be included and vice versa. If single quotes are used, more single quotes can be embedded, preceded by backslashes. And finally, no substitution should happen unless the field has non-printable characters in it.

A friend and I tried for a while to write a regex to accomplish this, but it gets really confusing. I have a nagging feeling this is a solved problem, but I can't turn it up in web searches.

Here's where we've gotten so far:

$text =~ s/ (?<!\\)' # A quote, NOT after a backslash (?:[^'\p{IsC}]*?(?:\\')?)*? # This is hard to explain... \p{IsC} # At least one non-printable char (?:[^']*?(?:\\')?)*? # More stuff... (?<!\\)' # Another single quote /'???'/gx;

This sorta works for single-quoted fields only. But I have a feeling this is entirely the wrong approach, and I'd like to do it in one pass, not twice (one for single quotes, one for double). Among other reasons, if there's a single-quoted string embedded within double quotes or vice versa, one pass that handles both cases is much preferable.

Where's the a-ha moment? Or is there one?

Thanks in advance!

Replies are listed 'Best First'.
Re: How to replace non-printable characters within delimited field
by GrandFather (Saint) on Jan 03, 2007 at 20:04 UTC
      Indeed, I think I am :-) Thanks!