Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^4: Regex: Identifying comments

by pvaldes (Chaplain)
on Aug 30, 2012 at 23:42 UTC ( #990879=note: print w/replies, xml ) Need Help??

in reply to Re^3: Regex: Identifying comments
in thread Regex: Identifying comments

create table foo( --tabel foo bar text, --fld name boo text --again );

mmh... this is an interesting example, Yes. Perfectly legitim third type of comments, and difficult to debug with a regex... maybe you can profit that the fact that the type of data are limited with something like:

m/(\(|text|int|integer|char \(\d+\)),*\s*--(.*)$/

select '--foo';

This is not a very probable situation, but certainly is possible too. In any case this false comment is not after a semicolon, nor at the beginning of the line or inside a table, so if there are a ^\s*select in the same line you probably could safely ignore it. But then you could have something like this:

select field from mytable where field = 'text, --foo important information here about to be lost';

The safest actitude (although maybe a little paranoic) should be to isolate and examinate personally any case so special, the idea is: "if you found two - after a ' or a " and before a ";" in a line having the string "select" I want to see it personally"

You can improve your regex if you check previously for troublesome data:

select * from mitable where field1 like '%--%' or field2 like '%--%' or field3 like '%--%'... etc ;

Replies are listed 'Best First'.
Re^5: Regex: Identifying comments
by remiah (Hermit) on Aug 31, 2012 at 05:37 UTC


    You will not see my regex fault with your examples. My regex will stumble with this sql.

    update set bar = bar - 1 ; -- subtraction symbol may disappear.

    I expected to see SQL parser solution in this thread, like this

    my $p = SQLParser->new(type=>'mysql', sql=>$sql) or die SQLParser->error(); $p->prettyprint(1); $p->without_comment(1); print $p->sql;
    At first I looked SQL::Parser. It seems quite near for such tasks, but I couldn't find good solution to rip off comments. Do you know such module?

      use strict; use warnings; while (<DATA>) { chomp; next if /^\s*--/; print $_,"\n" if !/--/; elsif (/--/ && !/--(.*?);/){ s/--(.*?)$//; print $_,"\n"} elsif (/--(.*?);\s*--(.*?)$/){ s/\s*--$2//; print $_,"\n"} else { print $_,"\n"} } __DATA__ select 'text' from foo; -- comment select '--Not comment' from foo; --But this is select q from z; -- as is this select '--Not this' + '--either' from foo; select 'qaws' + make from "a"; -- comment with 'a' quote select 'a' from 'b' with 'c' -- comment with 'a --' comment -- test comment (add1) select 'text\'s' from foo --escaped ... (add2) select 'text\'s' from foo --escaped' ... (add3) update set bar = bar - 1 ; -- subtraction symbol preserved create table ( -- fo field text, -- fufufu field int)

        I think you need to revisit this...

        syntax error at line 7, near "elsif" syntax error at line 10, near "elsif" syntax error at line 12, near ""\n"}" Execution of aborted due to compilation errors.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://990879]
and the monks are chillaxin'...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2018-01-17 10:11 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (198 votes). Check out past polls.