http://www.perlmonks.org?node_id=990859


in reply to Re^2: Regex: Identifying comments
in thread Regex: Identifying comments

Thanks for reply, pvaldes.

I noticed I misunderstood how to escape single quotes in sql. I thought of the case like this

insert into foo (bar,boo) values('''a', 'b');
And this is nothing surprise.
select '--foo';
I changed test data with real SQL with your advice. SQLite and Postgres accept these commands.
drop table foo; create table foo( --tabel foo bar text, --fld name boo text --again ); --populate insert into foo (bar,boo) values('''a', 'b'); -- insert quote's sake insert into foo (bar,boo) values('c', 'd'); -- another line select 'test' || 'abc' as a ; -- maybe you use dual wi +th oracle? select 'test''s' || 'abc' as a ; -- maybe you use dual wi +th oracle? select '--foo';
So far, so good.

Replies are listed 'Best First'.
Re^4: Regex: Identifying comments
by pvaldes (Chaplain) on Aug 30, 2012 at 23:42 UTC
    create table foo( --tabel foo bar text, --fld name boo text --again );

    mmh... this is an interesting example, Yes. Perfectly legitim third type of comments, and difficult to debug with a regex... maybe you can profit that the fact that the type of data are limited with something like:

    m/(\(|text|int|integer|char \(\d+\)),*\s*--(.*)$/

    select '--foo';

    This is not a very probable situation, but certainly is possible too. In any case this false comment is not after a semicolon, nor at the beginning of the line or inside a table, so if there are a ^\s*select in the same line you probably could safely ignore it. But then you could have something like this:

    select field from mytable where field = 'text, --foo important information here about to be lost';

    The safest actitude (although maybe a little paranoic) should be to isolate and examinate personally any case so special, the idea is: "if you found two - after a ' or a " and before a ";" in a line having the string "select" I want to see it personally"

    You can improve your regex if you check previously for troublesome data:

    select * from mitable where field1 like '%--%' or field2 like '%--%' or field3 like '%--%'... etc ;

      umm..
      No.

      You will not see my regex fault with your examples. My regex will stumble with this sql.

      update set bar = bar - 1 ; -- subtraction symbol may disappear.

      I expected to see SQL parser solution in this thread, like this

      my $p = SQLParser->new(type=>'mysql', sql=>$sql) or die SQLParser->error(); $p->prettyprint(1); $p->without_comment(1); print $p->sql;
      At first I looked SQL::Parser. It seems quite near for such tasks, but I couldn't find good solution to rip off comments. Do you know such module?

        use strict; use warnings; while (<DATA>) { chomp; next if /^\s*--/; print $_,"\n" if !/--/; elsif (/--/ && !/--(.*?);/){ s/--(.*?)$//; print $_,"\n"} elsif (/--(.*?);\s*--(.*?)$/){ s/\s*--$2//; print $_,"\n"} else { print $_,"\n"} } __DATA__ select 'text' from foo; -- comment select '--Not comment' from foo; --But this is select q from z; -- as is this select '--Not this' + '--either' from foo; select 'qaws' + make from "a"; -- comment with 'a' quote select 'a' from 'b' with 'c' -- comment with 'a --' comment -- test comment (add1) select 'text\'s' from foo --escaped ... (add2) select 'text\'s' from foo --escaped' ... (add3) update set bar = bar - 1 ; -- subtraction symbol preserved create table ( -- fo field text, -- fufufu field int)