Kozz has asked for the wisdom of the Perl Monks concerning the following question:
Most esteemed monks:
I'm wrestling with this bit of code... I have modified the record separator, but I think it's interfering with my replacement regex. This should be darn simple, but I can't make it work. It's quite a simple concept: make the record separator ";\n" and filter-out all lines that start with a # -- comments.
#!/usr/bin/perl
$/ = ";\n";
while (my $line = <DATA> ){
$line =~ s/^#[^\r\n]*//g; # get rid of any comments
print "Query: $line\n";
}
__DATA__
# one comment
# two comment
# another comment
insert into table_name values(1, 'testing 1 2 3');
# more comments
insert into table_name values (2, 'test •');
Yes, the file I'm reading is SQL stuff. But I'm not directly importing it with mysql tools because I cannot - they're not available. So I'm doing it the "hard way". I think that perhaps I have to do a "local $/" inside the while loop to change the rec_sep which affects my regular expression, but I'm not sure. I want to make sure that neither a too-lenient record separator will mangle the second insert (which contains a semicolon in a value), nor will the comment-deleting line mangle the second insert which also contains a pound-symbol (octothorpe).
What the heck am I overlooking? It doesn't take care of all the comments, just the stuff anchored to the beginning of the entire string. I must be using anchors wrong, or using the pattern modifiers incorrectly. I've monkeyed with them but with no luck. I thought I had regex basics whipped, but clearly I don't. I feel so humbled.
Re: Record Separator affecting Regex
by Mr. Muskrat (Canon) on Nov 07, 2002 at 18:56 UTC
|
If you really need to redefine the record seperator, you could do something like this:
#!/usr/bin/perl
use strict;
use warnings;
$/ = ";\n";
{
local $/ = "\n";
while (my $line = <DATA> ){
$line =~ s/^#[^\r\n]*//g; # get rid of any comments
print "Query: $line\n" if ($line !~ /^\s+$/);
}
}
__DATA__
# one comment
# two comment
# another comment
insert into table_name values(1, 'testing 1 2 3');
# more comments
insert into table_name values (2, 'test •');
Output:
Query: insert into table_name values(1, 'testing 1 2 3');
Query: insert into table_name values (2, 'test •');
| [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
Re: Record Separator affecting Regex
by Bird (Pilgrim) on Nov 07, 2002 at 19:09 UTC
|
I think what you're looking for is the /m modifier. This allows ^ and $ to match newlines in multiline data. Since it appears you need to worry about multiline queries (otherwise, why are you modifying the record separator in the first place), I changed your data to include one.
$/ = ";\n";
while (my $line = <DATA> ){
$line =~ s/^#[^\r\n]*//mg;
print "Query: $line\n";
}
__DATA__
# one comment
# two comment
# another comment
insert into table_name
values(1, 'testing 1 2 3');
# more comments
insert into table_name values (2, 'test •');
...gives...
Query:
insert into table_name
values(1, 'testing 1 2 3');
Query:
insert into table_name values (2, 'test •');
You could also add $line =~ s/^\s*$//mg; if you want to get rid of some of those blank lines.
-Bird
Update: I like insensate's solution for removing the blank lines better. Use that one. :) | [reply] [d/l] [select] |
|
local $/ = ";\n";
while ( <> )
{
s/^#.*$//mg; # kill comments
s/^\s+//; # kill all remaining leading whitespace
print "Query: $_\n";
}
| [reply] [d/l] |
|
You're absolutely right -- the /m modifier was *exactly* what I needed, and yes, I do have multi-line commands, like create table statements and such. Thanks!
| [reply] [d/l] |
Re: Record Separator affecting Regex
by insensate (Hermit) on Nov 07, 2002 at 19:11 UTC
|
Would something like this suffice? The \m modifier lets the ^ metacharacter match next to a newline in your multiline $line value.
#!/usr/bin/perl
$/ = ";\n";
while (my $line = <DATA> ){
$line =~ s/^#.*[\n\r]*//gm;
print "Query: $line" unless $line=~/^\s+$/;
}
__DATA__
# one comment
# two comment
# another comment
insert into table_name values(1, 'testing 1 2 3');
# more comments
insert into table_name values (2, 'test •')
OUTPUT:
Query: insert into table_name values(1, 'testing 1 2 3');
Query: insert into table_name values (2, 'test •');
| [reply] [d/l] [select] |
Re: Record Separator affecting Regex
by jdporter (Chancellor) on Nov 07, 2002 at 19:13 UTC
|
The problem is that comments and statements are terminated by two different things, and you can expect to see the two types of elements intermingled.
What I would do, if the file isn't grotesquely huge, is read it all into $_, remove the comments, and then split.
$_ = <DATA>;
s/^#.*//gm; # <b>updated</b>
my @lines = split /;\n+/;
| [reply] [d/l] |
|
| [reply] |
Re: Record Separator affecting Regex
by insensate (Hermit) on Nov 07, 2002 at 19:45 UTC
|
I'm not sure what the output file will be used for... I'm an Oracle guy and routinely create files to be executed by a sqlloader/sqlplus application...in this context sometimes simple statements such as:
set lines 32
set pages 0
set feedback off
etc... need to be executed and don't require semicolons...just newlines. Make sure (be this the case for you as well) that your script doesn't end up managing these statements in an undesireable fashion. | [reply] [d/l] |
Re: Record Separator affecting Regex
by dingus (Friar) on Nov 08, 2002 at 07:42 UTC
|
... This should be darn simple, but I can't make it work. It's quite a simple concept: make the record separator ";\n" and filter-out all lines that start with a # -- comments.
First off: Are you sure the current record selector is '\n'. If this is running on a windows machine it is '\r\n' (or do I mean '\n\r'? who cares). In any case the safe way to get the new descriptor is to do (with local if required)
$/= ';'.$/
But actually I think you shoud be not monkeying with $/: at all and just skipping over # lines as you read them in,
i.e.
while (my $line = <DATA> ){
next if ($line =~ s/^#/); # skip comment lines
print "Query: $line\n";
}
Dingus Enter any 47-digit prime number to continue. | [reply] [d/l] [select] |
|
|