Regex error when [] occurs in file..

on Mar 03, 2008 at 15:32 UTC

Dear Monks,

I'm parsing a configuration file, so before doing the proecssing, I want to remove comments, which I take for granted are in the same style as Perl (i.e. one line comments starting with '#' which can occur after other text, which must be left intact)

Here is my code. It works fine unless it comes across '[]' in a comment in my config file (it works fine if the '[]' is outside a comment), when it throws this:
Unmatched [ in regex; marked by <-- HERE in m/ ...some text... [<-- HERE] ...some more text..

while (chomp(my $temp=<INPUT>)){ print Dumper $temp; if ($temp =~ /#/){ $temp =~ s/#$'//; } if ($temp =~ /^\s*$/){ next; } print "after regex:\n"; print Dumper $temp; print "end\n"; }

I'm a bit new with regexes so chances are I've missed something obvious---any ideas?

Re: Regex error when [] occurs in file..
by moritz (Cardinal) on Mar 03, 2008 at 15:38 UTC
    if ($temp =~ /#/){ $temp =~ s/#$'//; }

    That's what's causing the problem: $' can contain arbitrary data, but you try to treat it as a regex.

    The "good" solution is to use this regex instead: $temp =~ s/#.*$//;

    In general you can also quote interpolated variables, then they are treated as text, not as regexes:

    my $varaible = '[a-z]'; m/\Q$variable\E/ # matches literal [a-z], not a character class.

    If you're not inside a regex, quotemeta does the same job.

      If you would happen to have the '#' character inside a (single or double) quoted string in your config file (I don't know if the specs for your config-file even allow this) then the s/#.*$// regex will cause you trouble as it will delete all of the string starting with the '#' character. That is probably not what you want.

      It is not easy to take care of this: not even Regexp::Common gets it right.


Re: Regex error when [] occurs in file..
by Joost (Canon) on Mar 03, 2008 at 15:42 UTC
    $temp =~ s/#$'//; I'm not even sure what you want that to do, but it constructs a regex out of # followed by whatever was followed by the previous match. Any regex special characters in that generated string will be interpreted as regex directives.

    You probably want something like:

    while (chomp(my $temp=<INPUT>)){ print Dumper $temp; $temp =~ s/#.*//; # remove comments if ($temp =~ /^\s*$/){ next; } print "after regex:\n"; print Dumper $temp; print "end\n"; }
      Thanks both---I didn't realise that $' was treated like that. Are all 'special variables' ($&, $N (N is integer)) expanded in that way too? So if you did something like:
      $temp =~ m/(\[0-9\])blah$1/;
      would you match
      $temp = "[0-9]blah6";
      rather than
      $temp = "[0-9]blah[0-9]";
      V. interesting---I assumed that special characters inside the rest of the data would be ignored..
      <--edit to make the last sentence make more sense!-->
        $temp =~ m/(\[0-9\])blah$1/;

        I think you meant

        $temp =~ m/(\[0-9\])blah\1/;

        in which case any special characters in the content of the backreference \1 would not be treated special. IOW, "[0-9]blah[0-9]" would match, but not "[0-9]blah6":

        #!/usr/bin/perl use strict; use warnings; for my $temp ("[0-9]blah[0-9]", "[0-9]blah6") { printf "%-15s ", $temp; if ($temp =~ /(\[0-9\])blah\1/) { print "matched\n"; } else { print "didn't match\n"; } }


        [0-9]blah[0-9] matched [0-9]blah6 didn't match

        while, if you replace \1 with $1 in the above regex, it prints

        Use of uninitialized value in concatenation (.) or string at ./671663. +pl line 8. [0-9]blah[0-9] matched [0-9]blah6 matched

        This is because $1 isn't defined here, thus the regex effectively becomes /(\[0-9\])blah/...

        Update: added demo code.

Re: Regex error when [] occurs in file..
by ysth (Canon) on Mar 03, 2008 at 21:27 UTC
    while (chomp(my $temp=<INPUT>)){
    Don't do that. Do your chomp inside the loop. Otherwise, you'll get a warning (you do have warnings enabled, don't you?) when you reach the end and <INPUT> returns undef, since chomp expects a string, not undef.

Node Type: perlquestion
