Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Regex: Identifying comments

by remiah (Hermit)
on Aug 30, 2012 at 13:53 UTC ( #990750=note: print w/ replies, xml ) Need Help??


in reply to Regex: Identifying comments

How about this? I am going to match every sql with $2 and print. Ignore $1.

use strict; use warnings; while (<DATA>) { chomp; while (m/(--.*?$) #that is comment | ((?:'[^']*'|[^'-]*)) #quated part of sql or ordinal part /xg) { #global, extended print $2 if $2; } print "\n"; } __DATA__ select 'text' from foo --This is a comment select '--Not a valid comment' from foo --But this is select q from z -- as is this select '--This is not a valid comment' from foo select '--Not this' + '--either' from foo select 'qaws' + make from "a" -- comment with 'a' quote select 'a' from 'b' with 'c' -- comment with 'a --' comment -- test comment (add1) select 'text\'s' from foo --escaped ... (add2) select 'text\'s' from foo --escaped' ... (add3)

I am not so confident with regex and add escaped test.
Then, add2 and add3 is not working well... sigh.


Comment on Re: Regex: Identifying comments
Download Code
Replies are listed 'Best First'.
Re^2: Regex: Identifying comments
by pvaldes (Chaplain) on Aug 30, 2012 at 18:29 UTC

    IMHO the basic problem with your sql files is that don't look really like sql files, thus anything can be a comment and anything can be wrong, even if perl is acting fine.

    I don't know all databases in the whole world so don't take this advice as immutable, of course, but sql sentences normally aren't complete (and don't run) until you find a semicolon. Is also surprising that you could select a field named '--foo' when you can't even create a table like this:

    create table (--foo int); # error create table ('--foo' int); # error create table ("--foo" int); # error create table (\-\-foo int); # big error

    So you are probably trying to solve a nonexistent problem. Comments are always after a semicolon or in your own line. And it you have i.e this:

    -- ; select * from mytable where foo = "--34"; -- hi ', hi'. "hi'

    The complete line IS a comment here. I'm quoting lines that I don't want to run (temporally) all the time in sql files, it's not unusual to see something like this.

    So you should be pointing your arrows to ";" instead to "'". You probably should check and sanitize first what's in your files, before to pass this files to perl

      Thanks for reply, pvaldes.

      I noticed I misunderstood how to escape single quotes in sql. I thought of the case like this

      insert into foo (bar,boo) values('''a', 'b');
      And this is nothing surprise.
      select '--foo';
      I changed test data with real SQL with your advice. SQLite and Postgres accept these commands.
      drop table foo; create table foo( --tabel foo bar text, --fld name boo text --again ); --populate insert into foo (bar,boo) values('''a', 'b'); -- insert quote's sake insert into foo (bar,boo) values('c', 'd'); -- another line select 'test' || 'abc' as a ; -- maybe you use dual wi +th oracle? select 'test''s' || 'abc' as a ; -- maybe you use dual wi +th oracle? select '--foo';
      So far, so good.

        create table foo( --tabel foo bar text, --fld name boo text --again );

        mmh... this is an interesting example, Yes. Perfectly legitim third type of comments, and difficult to debug with a regex... maybe you can profit that the fact that the type of data are limited with something like:

        m/(\(|text|int|integer|char \(\d+\)),*\s*--(.*)$/

        select '--foo';

        This is not a very probable situation, but certainly is possible too. In any case this false comment is not after a semicolon, nor at the beginning of the line or inside a table, so if there are a ^\s*select in the same line you probably could safely ignore it. But then you could have something like this:

        select field from mytable where field = 'text, --foo important information here about to be lost';

        The safest actitude (although maybe a little paranoic) should be to isolate and examinate personally any case so special, the idea is: "if you found two - after a ' or a " and before a ";" in a line having the string "select" I want to see it personally"

        You can improve your regex if you check previously for troublesome data:

        select * from mitable where field1 like '%--%' or field2 like '%--%' or field3 like '%--%'... etc ;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://990750]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (10)
As of 2015-07-29 20:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (268 votes), past polls