Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Another Pattern Matching Question

by surib (Initiate)
on Oct 12, 2012 at 15:21 UTC ( #998720=perlquestion: print w/ replies, xml ) Need Help??
surib has asked for the wisdom of the Perl Monks concerning the following question:

My pattern search needs to match both files:
3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz (added 'R' to filename)
Original pattern search was:
$fileName =~ /3B4[0|1|2]RT\.\d{10}\.\d+\.bin/)
This worked fine until the 'R' was added for reprocessed files. I've tried:
$fileName =~ /3B4[0|1|2]RT\.\d{10}\.\w+\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\{1,}\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\dR?\.bin/
Any suggestions?

Comment on Another Pattern Matching Question
Select or Download Code
Replies are listed 'Best First'.
Re: Another Pattern Matching Question
by kcott (Canon) on Oct 12, 2012 at 16:51 UTC

    G'day surib,

    Welcome to the monastery.

    You show the original pattern as: /3B41|2RT\.\d{10}\.\d+\.bin/. Making allowances for the lack of <code> tags, I've assumed 1|2 should be [0|1|2] - this gives a regex which matches the first filename. However, making the same assumption a few lines further down, /3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin/ should have matched both filenames. Was my assumption wrong? Have you shown the correct filenames? Did you type some other part of the regex incorrectly? Please clarify.

    -- Ken

      Sorry about not using the
      . Here's my code when I actually hard code the '7R', yet it still fails. + I must be missing something else? <br> <code> my ($product, $year, $month, $day, $hour, $suffix, $ver ) = spl +it /\./, $fileName; # test file name--naming convention is '<product>.<year>.<month>.<day> +.<hour>.bin' if ($fileName =~ /3B4[0|1|2]RT\.\d{10}\.\7R\.bin/) { ($product, $year, $month, $day, $hour, $ver) = ($fileName =~ /(3B4.RT)\.(\d{4})(\d\d)(\d\d)(\d\d)\.(7R\)\.bin/) +; $version = sprintf "%03d", $ver; # a little confusing, $version +is global # $ver is local } elsif ($fileName =~ /3B4\dRT\.\d\d\d\d\.\d\d\.\d\d.\d\dz\.bin/) { ($product, $year, $month, $day, $hour, $suffix) = split /\./, $fil +eName; } else { print STDERR "($0,$$) ERROR: invalid file name--'$fileName'\n" if + $opt_v; exit 2; }

        Your solution which adds R? to the original regex was on the right track and achieves what you want, albeit poorly:

        $ perl -Mstrict -Mwarnings -E 'my $fileName; my $re = qr{3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin}; $fileName = "3B40RT.2000033121.7.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; $fileName = "3B40RT.2000033121.7R.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; ' match match

        I don't think you understand character classes or alternation (perhaps both). Where you're trying to match a 0, 1 or 2 in the same position, [0-2] would be far better than [0|1|2] (which is trying to match a 0, pipe, 1, pipe or 2 in the same position) - the 2nd pipe is redundant and the 1st pipe isn't wanted anyway. So, here's an improved version:

        $ perl -Mstrict -Mwarnings -E 'my $fileName; my $re = qr{3B4[0-2]RT\.\d{10}\.\d+R?\.bin}; $fileName = "3B40RT.2000033121.7.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; $fileName = "3B40RT.2000033121.7R.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; ' match match

        Recommended reading:

        -- Ken

        Or like this: (I've tried so many combinations I'm confusing myself!)
        if ($fileName =~ /3B4[0|1|2]RT\.\d{10}\.7R.bin/) { ($product, $year, $month, $day, $hour, $ver) = ($fileName =~ /(3B4.RT)\.(\d{4})(\d\d)(\d\d)(\d\d)\.(7R)\.bin/); $version = sprintf "%03d", $ver; # a little confusing, $version +is global # $ver is local
Re: Another Pattern Matching Question
by 2teez (Priest) on Oct 12, 2012 at 18:25 UTC

    May you wanted something like this:

    use warnings; use strict; while ( defined( my $filename = <DATA> ) ) { chomp $filename; if ( my ( $product, $year, $month, $day, $hour, $suffix, $ver ) = $filename =~ m/^(.+?)\.(\d{4})(\d{2})(\d{2})(\d{2})\.(.+?)\.(. ++?)\./ ) { print join ' ', ( $product, $year, $month, $day, $hour, $suffi +x, $ver ), $/; } } __DATA__ 3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz 3B40RT.2000033121.7RWER.bin.gz
    The if iterate like so:
    ... if ( my ( $product, $year, $month, $day, $hour, $suffix, $ver ) = $filename =~ m/^(.+?) # matches PRODUCT \. (\d{4}) # matches YEAR (\d{2}) # matches MONTH (\d{2}) # matches DAY (\d{2}) # matches HOUR \. (.+?) # SUFFIX matches both 7,7R or 7anythin +g \. (.+?) # VERSION \./x ) { print join ' ', ( $product, $year, $month, $day, $hour, $suffi +x, $ver ), $/; } ...
    I hope that helps

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Another Pattern Matching Question
by Kenosis (Priest) on Oct 12, 2012 at 19:28 UTC

    Here's a splitting and unpacking option for your data set that puts the desired elements into an array:

    use Modern::Perl; for ( grep /\S/, <DATA> ) { my @data = ( split /\./ )[ 0 .. 3 ]; splice @data, 1, 1, unpack '(a4)(a2)*', $data[1]; say "@data"; } __DATA__ 3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz 3B40RT.2000033121.7RWER.bin.gz


    3B40RT 2000 03 31 21 7 bin 3B40RT 2000 03 31 21 7R bin 3B40RT 2000 03 31 21 7RWER bin
Re: Another Pattern Matching Question
by erix (Vicar) on Oct 12, 2012 at 15:56 UTC

    filename has 3B40

    pattern has 3B41

    Never the twain shall match.

      It finds all 3B40, 3B41 and 3B42. What it doesn't find is the '7R'.
Re: Another Pattern Matching Question
by Anonymous Monk on Oct 12, 2012 at 20:22 UTC
    I think that you might be happiest if you could split the filename based on ".", then next unless ... the length of the array, then a match on the zeroth part, then the first, and so on. Yep, a series of tests, each with an obvious single meaning, instead of a complex that's just going to keep being a maintenance PITA. You know that those filenames are just going to keep evolving...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://998720]
Approved by Kenosis
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2015-11-26 01:25 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (695 votes), past polls