Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Regex: Identifying comments

by pvaldes (Chaplain)
on Aug 30, 2012 at 18:29 UTC ( #990816=note: print w/ replies, xml ) Need Help??


in reply to Re: Regex: Identifying comments
in thread Regex: Identifying comments

IMHO the basic problem with your sql files is that don't look really like sql files, thus anything can be a comment and anything can be wrong, even if perl is acting fine.

I don't know all databases in the whole world so don't take this advice as immutable, of course, but sql sentences normally aren't complete (and don't run) until you find a semicolon. Is also surprising that you could select a field named '--foo' when you can't even create a table like this:

create table (--foo int); # error create table ('--foo' int); # error create table ("--foo" int); # error create table (\-\-foo int); # big error

So you are probably trying to solve a nonexistent problem. Comments are always after a semicolon or in your own line. And it you have i.e this:

-- ; select * from mytable where foo = "--34"; -- hi ', hi'. "hi'

The complete line IS a comment here. I'm quoting lines that I don't want to run (temporally) all the time in sql files, it's not unusual to see something like this.

So you should be pointing your arrows to ";" instead to "'". You probably should check and sanitize first what's in your files, before to pass this files to perl


Comment on Re^2: Regex: Identifying comments
Select or Download Code
Re^3: Regex: Identifying comments
by remiah (Hermit) on Aug 30, 2012 at 21:58 UTC

    Thanks for reply, pvaldes.

    I noticed I misunderstood how to escape single quotes in sql. I thought of the case like this

    insert into foo (bar,boo) values('''a', 'b');
    And this is nothing surprise.
    select '--foo';
    I changed test data with real SQL with your advice. SQLite and Postgres accept these commands.
    drop table foo; create table foo( --tabel foo bar text, --fld name boo text --again ); --populate insert into foo (bar,boo) values('''a', 'b'); -- insert quote's sake insert into foo (bar,boo) values('c', 'd'); -- another line select 'test' || 'abc' as a ; -- maybe you use dual wi +th oracle? select 'test''s' || 'abc' as a ; -- maybe you use dual wi +th oracle? select '--foo';
    So far, so good.

      create table foo( --tabel foo bar text, --fld name boo text --again );

      mmh... this is an interesting example, Yes. Perfectly legitim third type of comments, and difficult to debug with a regex... maybe you can profit that the fact that the type of data are limited with something like:

      m/(\(|text|int|integer|char \(\d+\)),*\s*--(.*)$/

      select '--foo';

      This is not a very probable situation, but certainly is possible too. In any case this false comment is not after a semicolon, nor at the beginning of the line or inside a table, so if there are a ^\s*select in the same line you probably could safely ignore it. But then you could have something like this:

      select field from mytable where field = 'text, --foo important information here about to be lost';

      The safest actitude (although maybe a little paranoic) should be to isolate and examinate personally any case so special, the idea is: "if you found two - after a ' or a " and before a ";" in a line having the string "select" I want to see it personally"

      You can improve your regex if you check previously for troublesome data:

      select * from mitable where field1 like '%--%' or field2 like '%--%' or field3 like '%--%'... etc ;

        umm..
        No.

        You will not see my regex fault with your examples. My regex will stumble with this sql.

        update set bar = bar - 1 ; -- subtraction symbol may disappear.

        I expected to see SQL parser solution in this thread, like this

        my $p = SQLParser->new(type=>'mysql', sql=>$sql) or die SQLParser->error(); $p->prettyprint(1); $p->without_comment(1); print $p->sql;
        At first I looked SQL::Parser. It seems quite near for such tasks, but I couldn't find good solution to rip off comments. Do you know such module?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://990816]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (10)
As of 2014-09-16 03:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (155 votes), past polls