Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Another Pattern Matching Question

by surib (Initiate)
on Oct 12, 2012 at 15:21 UTC ( #998720=perlquestion: print w/ replies, xml ) Need Help??
surib has asked for the wisdom of the Perl Monks concerning the following question:

My pattern search needs to match both files:
3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz (added 'R' to filename)
Original pattern search was:
$fileName =~ /3B4[0|1|2]RT\.\d{10}\.\d+\.bin/)
This worked fine until the 'R' was added for reprocessed files. I've tried:
$fileName =~ /3B4[0|1|2]RT\.\d{10}\.\w+\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\{1,}\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\dR?\.bin/
Any suggestions?

Comment on Another Pattern Matching Question
Select or Download Code
Re: Another Pattern Matching Question
by erix (Vicar) on Oct 12, 2012 at 15:56 UTC

    filename has 3B40

    pattern has 3B41

    Never the twain shall match.

      It finds all 3B40, 3B41 and 3B42. What it doesn't find is the '7R'.
Re: Another Pattern Matching Question
by kcott (Abbot) on Oct 12, 2012 at 16:51 UTC

    G'day surib,

    Welcome to the monastery.

    You show the original pattern as: /3B41|2RT\.\d{10}\.\d+\.bin/. Making allowances for the lack of <code> tags, I've assumed 1|2 should be [0|1|2] - this gives a regex which matches the first filename. However, making the same assumption a few lines further down, /3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin/ should have matched both filenames. Was my assumption wrong? Have you shown the correct filenames? Did you type some other part of the regex incorrectly? Please clarify.

    -- Ken

      Sorry about not using the
      . Here's my code when I actually hard code the '7R', yet it still fails. + I must be missing something else? <br> <code> my ($product, $year, $month, $day, $hour, $suffix, $ver ) = spl +it /\./, $fileName; # test file name--naming convention is '<product>.<year>.<month>.<day> +.<hour>.bin' if ($fileName =~ /3B4[0|1|2]RT\.\d{10}\.\7R\.bin/) { ($product, $year, $month, $day, $hour, $ver) = ($fileName =~ /(3B4.RT)\.(\d{4})(\d\d)(\d\d)(\d\d)\.(7R\)\.bin/) +; $version = sprintf "%03d", $ver; # a little confusing, $version +is global # $ver is local } elsif ($fileName =~ /3B4\dRT\.\d\d\d\d\.\d\d\.\d\d.\d\dz\.bin/) { ($product, $year, $month, $day, $hour, $suffix) = split /\./, $fil +eName; } else { print STDERR "($0,$$) ERROR: invalid file name--'$fileName'\n" if + $opt_v; exit 2; }
        Or like this: (I've tried so many combinations I'm confusing myself!)
        if ($fileName =~ /3B4[0|1|2]RT\.\d{10}\.7R.bin/) { ($product, $year, $month, $day, $hour, $ver) = ($fileName =~ /(3B4.RT)\.(\d{4})(\d\d)(\d\d)(\d\d)\.(7R)\.bin/); $version = sprintf "%03d", $ver; # a little confusing, $version +is global # $ver is local

        Your solution which adds R? to the original regex was on the right track and achieves what you want, albeit poorly:

        $ perl -Mstrict -Mwarnings -E 'my $fileName; my $re = qr{3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin}; $fileName = "3B40RT.2000033121.7.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; $fileName = "3B40RT.2000033121.7R.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; ' match match

        I don't think you understand character classes or alternation (perhaps both). Where you're trying to match a 0, 1 or 2 in the same position, [0-2] would be far better than [0|1|2] (which is trying to match a 0, pipe, 1, pipe or 2 in the same position) - the 2nd pipe is redundant and the 1st pipe isn't wanted anyway. So, here's an improved version:

        $ perl -Mstrict -Mwarnings -E 'my $fileName; my $re = qr{3B4[0-2]RT\.\d{10}\.\d+R?\.bin}; $fileName = "3B40RT.2000033121.7.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; $fileName = "3B40RT.2000033121.7R.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; ' match match

        Recommended reading:

        -- Ken

Re: Another Pattern Matching Question
by 2teez (Priest) on Oct 12, 2012 at 18:25 UTC

    May you wanted something like this:

    use warnings; use strict; while ( defined( my $filename = <DATA> ) ) { chomp $filename; if ( my ( $product, $year, $month, $day, $hour, $suffix, $ver ) = $filename =~ m/^(.+?)\.(\d{4})(\d{2})(\d{2})(\d{2})\.(.+?)\.(. ++?)\./ ) { print join ' ', ( $product, $year, $month, $day, $hour, $suffi +x, $ver ), $/; } } __DATA__ 3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz 3B40RT.2000033121.7RWER.bin.gz
    The if iterate like so:
    ... if ( my ( $product, $year, $month, $day, $hour, $suffix, $ver ) = $filename =~ m/^(.+?) # matches PRODUCT \. (\d{4}) # matches YEAR (\d{2}) # matches MONTH (\d{2}) # matches DAY (\d{2}) # matches HOUR \. (.+?) # SUFFIX matches both 7,7R or 7anythin +g \. (.+?) # VERSION \./x ) { print join ' ', ( $product, $year, $month, $day, $hour, $suffi +x, $ver ), $/; } ...
    I hope that helps

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Another Pattern Matching Question
by Kenosis (Priest) on Oct 12, 2012 at 19:28 UTC

    Here's a splitting and unpacking option for your data set that puts the desired elements into an array:

    use Modern::Perl; for ( grep /\S/, <DATA> ) { my @data = ( split /\./ )[ 0 .. 3 ]; splice @data, 1, 1, unpack '(a4)(a2)*', $data[1]; say "@data"; } __DATA__ 3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz 3B40RT.2000033121.7RWER.bin.gz

    Output:

    3B40RT 2000 03 31 21 7 bin 3B40RT 2000 03 31 21 7R bin 3B40RT 2000 03 31 21 7RWER bin
Re: Another Pattern Matching Question
by Anonymous Monk on Oct 12, 2012 at 20:22 UTC
    I think that you might be happiest if you could split the filename based on ".", then next unless ... the length of the array, then a match on the zeroth part, then the first, and so on. Yep, a series of tests, each with an obvious single meaning, instead of a complex that's just going to keep being a maintenance PITA. You know that those filenames are just going to keep evolving...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://998720]
Approved by Kenosis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2014-08-01 02:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (256 votes), past polls