That's an interesting question. But I am wondering about a few things:
- Perl's sprintf format strings are pretty complex - are you aiming to support all of it (I assume not), or just a subset, and if so, what subset?
- Why are you getting the patterns as printf format strings in the first place, instead of globs or regular expressions? Maybe you could approach the problem from a different angle and have the user input one of those?
- A printf format string of "Img%04d.png" can also produce an output of "Img123456.png", since %04d is just a minimum width specifier, but in your code you take it to mean exactly four digits. Plus, you exclude negative numbers, which are also possible with %04d. Why the difference?
- Why do you convert to glob patterns first, why not just stick to regular expressions all the way through? They're more powerful and can also be used for listing files in combination with readdir, Path::Class, Path::Tiny, or File::Find::Rule. (Update: And they could even be used like so: my @files = grep {/$regex/} glob(".* *");, although I'd strongly recommend one of the aforementioned modules.)
The following is just something I played with, note it is very minimal and unfinished, e.g. it currently supports only a few specifiers and doesn't do anything with the flags or width fields. But basically, it'll turn a format string like "Img%04d.png" into a regex roughly like /^Img([-+]?[0123456789]+)\.png$/ (I used Regexp::Common::number to implement the number matching).
#!/usr/bin/env perl
use warnings;
use strict;
use Regexp::Common qw/number/;
sub fmtstr2regex {
my ($fmtstr) = @_;
my $parse_fmtstr = qr{
(?<esc>%%) | (?<plain>[^%]+)
| % (?<flags>[-+ 0])? (?<width>\d+)?
(?:\.(?<prec>\d+))? (?<type>[dfs])
}msx;
my $outregex="\n";
pos($fmtstr)=undef;
while ($fmtstr=~/\G$parse_fmtstr/gc) {
$outregex .= "\t";
if (exists $+{esc})
{ $outregex .= "\\%" }
elsif (exists $+{plain})
{ $outregex .= quotemeta $+{plain} }
else {
if ($+{type} eq 'd')
{ $outregex .= '('.$RE{num}{int}.')' }
elsif ($+{type} eq 'f') {
if (defined $+{prec})
{ $outregex .= '('
.$RE{num}{real}{-places=>$+{prec}}.')' }
else
{ $outregex .= '('.$RE{num}{real}.')' }
}
elsif ($+{type} eq 's')
{ $outregex .= '((?s).*?)' }
}
$outregex .= "\n";
}
die "failed to parse '$fmtstr'"
unless pos($fmtstr)==length($fmtstr);
return qr/\A$outregex\z/x;
}
use Test::More;
{
my $fmtstr = 'Img%04d.png';
my $regex = fmtstr2regex($fmtstr);
note '$regex = ', explain $regex;
my @grepped = grep {/$regex/} qw/ Img0001.png Hello.txt
Img6789.png ImgABCD.png Img2000.png Img1234.jpg Img0123.png /;
is_deeply \@grepped, [qw/ Img0001.png Img6789.png Img2000.png
Img0123.png /], 'grepped list of files';
}
{
my $fmtstr = 'fo.o%sbar%03d%%% 5squz *%4.2f9baz';
my $regex = fmtstr2regex($fmtstr);
note '$regex = ', explain $regex;
{
my $string = sprintf($fmtstr,"TEST",45,"Foo",1.234);
is $string, 'fo.oTESTbar045% Fooquz *1.239baz', 'sprintf';
ok my @m = $string=~$regex, 'regex matches';
is_deeply \@m, [ "TEST", "045", " Foo", "1.23" ],
'capture groups';
}
{
my $string = sprintf($fmtstr," 34bar",1234,"**",567.890);
is $string, 'fo.o 34barbar1234% **quz *567.899baz', 'sprint
+f';
ok my @m = $string=~$regex, 'regex matches';
is_deeply \@m, [ " 34bar", "1234", " **", "567.89" ],
'capture groups';
}
}
done_testing;