Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Regex: Identifying comments

by remiah (Hermit)
on Aug 30, 2012 at 13:53 UTC ( #990750=note: print w/replies, xml ) Need Help??

in reply to Regex: Identifying comments

How about this? I am going to match every sql with $2 and print. Ignore $1.

use strict; use warnings; while (<DATA>) { chomp; while (m/(--.*?$) #that is comment | ((?:'[^']*'|[^'-]*)) #quated part of sql or ordinal part /xg) { #global, extended print $2 if $2; } print "\n"; } __DATA__ select 'text' from foo --This is a comment select '--Not a valid comment' from foo --But this is select q from z -- as is this select '--This is not a valid comment' from foo select '--Not this' + '--either' from foo select 'qaws' + make from "a" -- comment with 'a' quote select 'a' from 'b' with 'c' -- comment with 'a --' comment -- test comment (add1) select 'text\'s' from foo --escaped ... (add2) select 'text\'s' from foo --escaped' ... (add3)

I am not so confident with regex and add escaped test.
Then, add2 and add3 is not working well... sigh.

Replies are listed 'Best First'.
Re^2: Regex: Identifying comments
by pvaldes (Chaplain) on Aug 30, 2012 at 18:29 UTC

    IMHO the basic problem with your sql files is that don't look really like sql files, thus anything can be a comment and anything can be wrong, even if perl is acting fine.

    I don't know all databases in the whole world so don't take this advice as immutable, of course, but sql sentences normally aren't complete (and don't run) until you find a semicolon. Is also surprising that you could select a field named '--foo' when you can't even create a table like this:

    create table (--foo int); # error create table ('--foo' int); # error create table ("--foo" int); # error create table (\-\-foo int); # big error

    So you are probably trying to solve a nonexistent problem. Comments are always after a semicolon or in your own line. And it you have i.e this:

    -- ; select * from mytable where foo = "--34"; -- hi ', hi'. "hi'

    The complete line IS a comment here. I'm quoting lines that I don't want to run (temporally) all the time in sql files, it's not unusual to see something like this.

    So you should be pointing your arrows to ";" instead to "'". You probably should check and sanitize first what's in your files, before to pass this files to perl

      Thanks for reply, pvaldes.

      I noticed I misunderstood how to escape single quotes in sql. I thought of the case like this

      insert into foo (bar,boo) values('''a', 'b');
      And this is nothing surprise.
      select '--foo';
      I changed test data with real SQL with your advice. SQLite and Postgres accept these commands.
      drop table foo; create table foo( --tabel foo bar text, --fld name boo text --again ); --populate insert into foo (bar,boo) values('''a', 'b'); -- insert quote's sake insert into foo (bar,boo) values('c', 'd'); -- another line select 'test' || 'abc' as a ; -- maybe you use dual wi +th oracle? select 'test''s' || 'abc' as a ; -- maybe you use dual wi +th oracle? select '--foo';
      So far, so good.

        create table foo( --tabel foo bar text, --fld name boo text --again );

        mmh... this is an interesting example, Yes. Perfectly legitim third type of comments, and difficult to debug with a regex... maybe you can profit that the fact that the type of data are limited with something like:

        m/(\(|text|int|integer|char \(\d+\)),*\s*--(.*)$/

        select '--foo';

        This is not a very probable situation, but certainly is possible too. In any case this false comment is not after a semicolon, nor at the beginning of the line or inside a table, so if there are a ^\s*select in the same line you probably could safely ignore it. But then you could have something like this:

        select field from mytable where field = 'text, --foo important information here about to be lost';

        The safest actitude (although maybe a little paranoic) should be to isolate and examinate personally any case so special, the idea is: "if you found two - after a ' or a " and before a ";" in a line having the string "select" I want to see it personally"

        You can improve your regex if you check previously for troublesome data:

        select * from mitable where field1 like '%--%' or field2 like '%--%' or field3 like '%--%'... etc ;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://990750]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2018-06-23 18:35 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.