Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Bug in glob2pat?

by cdarke (Prior)
on Sep 02, 2008 at 14:13 UTC ( #708493=perlquestion: print w/replies, xml ) Need Help??
cdarke has asked for the wisdom of the Perl Monks concerning the following question:

The subroutine glob2pat from the Perl Cookbook is well known and often quoted (including a couple of nodes here), but I've just used it for the first time. I find it hard to belive that I'm the first person to find a bug in it. SuperSearch on glob2pat didn't show the bug, and neither did Google.
 [0-9] gets converted to ^[0\-9]$.
In other words it does not convert a range correctly since the hyphen is escaped. It is the \Q wot does it. My solution is simple:
sub glob2pat { my $globstr = shift; my %patmap = ( '*' => '.*', '?' => '.', '[' => '[', ']' => ']', '-' => '-', # Added by me ); $globstr =~ s{(.)} { $patmap{$1} || "\Q$1" }ge; return '^' . $globstr . '$'; }
Or am I wrong? This really puzzles me, because it has been around for so long.

Update: I have since realised that neither the original code or my "solution" copes with any escaped characters, like \?, \[, or \*. To make matters worse these characters inside [] should not be converted at all, but they get converted regardless.
Update 2: The following solves the escaped characters, but still does not solve ? and * inside [] which should not be translated:
$globstr =~ s{(?:^|(?<=[^\\]))(.)} { $patmap{$1} || "\Q$1" }ge;

Replies are listed 'Best First'.
Re: Bug in glob2pat?
by cdarke (Prior) on Sep 03, 2008 at 08:46 UTC
    I realised that there are even more problems with the original glob2pat in the Cookbook. The POSIX standard says that the only character inside [] that changes between glob and regular expressions is ! to ^. The Cookbook code does not do that conversion, but does others that are wrong and escapes other special characters, like the ':' in a POSIX characers class.

    I gave up trying to use an RE to do this and ended-up with a brute force approach. Elegant it is not, but it does work:
    sub glob2pat { my $globstr = shift; my $inside_br = 0; my @chars = (split '', $globstr); # C style used because I need to skip-ahead and look-behind for (my $i; $i < @chars; $i++) { if ($chars[$i] eq '\\') { $i++; # ignore next char } elsif ($chars[$i] eq '[') { $inside_br++; # Allow for nested [] } elsif ($chars[$i] eq ']' && $inside_br) { $inside_br--; } elsif ($chars[$i] eq '!' && $inside_br && $chars[$i-1] eq '[') + { # ! only means 'not' at the front of the [] list $chars[$i] = '^' } elsif (! $inside_br) { if ($chars[$i] eq '*') { $chars[$i] = '.*' } elsif ($chars[$i] eq '?') { $chars[$i] = '.' } } } local $" = ''; return "^@chars\$"; }
    Of course improvements are welcome, here is one of my test patterns: '?\?*  [!-0-9?[:upper:]!*?]*'
    which gives: '^.\?.*  [^-0-9?[:upper:]!*?].*$'
Re: Bug in glob2pat?
by Anonymous Monk on Sep 02, 2008 at 14:35 UTC
    Perl Cookbook Errata Page

    Or am I wrong? This really puzzles me, because it has been around for so long.
    All it means is that no one uses the book :p

      cdarke, I think that the suggestion is to submit an erratum yourself, since none of the errata on the linked page include the string 'glob2pat'. I agree with you that this seems to be an error in glob2pat (assuming that it's supposed to match ranges, anyway!).
        I would submit an erratum but the link Submit your own errata for this book does not work.
Re: Bug in glob2pat?
by Anonymous Monk on Sep 02, 2008 at 14:43 UTC
    Or am I wrong?
    Your *fix* will turn [f\-o] into [f\\-o]

      That is better written as [-fo] anyways.

      Fair comment, but I'm not sure that escaping the hyphen is legal. From the linux man pages:
      One can remove the special meaning of '?', '*' and '[' by preceding them by a backslash. No mention of '-'. I have just waded through IEEE Std 1003.2 and that make no mention of escaping hyphen either, although it is probably open to interpretation.
      Having said all that, it can be fixed by adding:'\\' => '\\', to the hash.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://708493]
Approved by Arunbear
[stevieb]: Damn... just wasted two hours wondering why num 23 wasn't setting bit 5 in a register. I was working on the decimal, but the register holds BCD numbers. Sigh.
[jrmcc]: Your problem was pregnant, missing a period!

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2018-05-22 16:41 GMT
Find Nodes?
    Voting Booth?