Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Another Pattern Matching Question

by surib (Initiate)
on Oct 12, 2012 at 15:21 UTC ( #998720=perlquestion: print w/ replies, xml ) Need Help??
surib has asked for the wisdom of the Perl Monks concerning the following question:

My pattern search needs to match both files:
3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz (added 'R' to filename)
Original pattern search was:
$fileName =~ /3B4[0|1|2]RT\.\d{10}\.\d+\.bin/)
This worked fine until the 'R' was added for reprocessed files. I've tried:
$fileName =~ /3B4[0|1|2]RT\.\d{10}\.\w+\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\{1,}\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin/ $fileName =~ /3B4[0|1|2]RT\.\d{10}\.\dR?\.bin/
Any suggestions?

Comment on Another Pattern Matching Question
Select or Download Code
Re: Another Pattern Matching Question
by erix (Vicar) on Oct 12, 2012 at 15:56 UTC

    filename has 3B40

    pattern has 3B41

    Never the twain shall match.

      It finds all 3B40, 3B41 and 3B42. What it doesn't find is the '7R'.
Re: Another Pattern Matching Question
by kcott (Abbot) on Oct 12, 2012 at 16:51 UTC

    G'day surib,

    Welcome to the monastery.

    You show the original pattern as: /3B41|2RT\.\d{10}\.\d+\.bin/. Making allowances for the lack of <code> tags, I've assumed 1|2 should be [0|1|2] - this gives a regex which matches the first filename. However, making the same assumption a few lines further down, /3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin/ should have matched both filenames. Was my assumption wrong? Have you shown the correct filenames? Did you type some other part of the regex incorrectly? Please clarify.

    -- Ken

      Sorry about not using the
      . Here's my code when I actually hard code the '7R', yet it still fails. + I must be missing something else? <br> <code> my ($product, $year, $month, $day, $hour, $suffix, $ver ) = spl +it /\./, $fileName; # test file name--naming convention is '<product>.<year>.<month>.<day> +.<hour>.bin' if ($fileName =~ /3B4[0|1|2]RT\.\d{10}\.\7R\.bin/) { ($product, $year, $month, $day, $hour, $ver) = ($fileName =~ /(3B4.RT)\.(\d{4})(\d\d)(\d\d)(\d\d)\.(7R\)\.bin/) +; $version = sprintf "%03d", $ver; # a little confusing, $version +is global # $ver is local } elsif ($fileName =~ /3B4\dRT\.\d\d\d\d\.\d\d\.\d\d.\d\dz\.bin/) { ($product, $year, $month, $day, $hour, $suffix) = split /\./, $fil +eName; } else { print STDERR "($0,$$) ERROR: invalid file name--'$fileName'\n" if + $opt_v; exit 2; }
        Or like this: (I've tried so many combinations I'm confusing myself!)
        if ($fileName =~ /3B4[0|1|2]RT\.\d{10}\.7R.bin/) { ($product, $year, $month, $day, $hour, $ver) = ($fileName =~ /(3B4.RT)\.(\d{4})(\d\d)(\d\d)(\d\d)\.(7R)\.bin/); $version = sprintf "%03d", $ver; # a little confusing, $version +is global # $ver is local

        Your solution which adds R? to the original regex was on the right track and achieves what you want, albeit poorly:

        $ perl -Mstrict -Mwarnings -E 'my $fileName; my $re = qr{3B4[0|1|2]RT\.\d{10}\.\d+R?\.bin}; $fileName = "3B40RT.2000033121.7.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; $fileName = "3B40RT.2000033121.7R.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; ' match match

        I don't think you understand character classes or alternation (perhaps both). Where you're trying to match a 0, 1 or 2 in the same position, [0-2] would be far better than [0|1|2] (which is trying to match a 0, pipe, 1, pipe or 2 in the same position) - the 2nd pipe is redundant and the 1st pipe isn't wanted anyway. So, here's an improved version:

        $ perl -Mstrict -Mwarnings -E 'my $fileName; my $re = qr{3B4[0-2]RT\.\d{10}\.\d+R?\.bin}; $fileName = "3B40RT.2000033121.7.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; $fileName = "3B40RT.2000033121.7R.bin.gz"; say +($fileName =~ /$re/) ? "match" : "no match"; ' match match

        Recommended reading:

        -- Ken

Re: Another Pattern Matching Question
by 2teez (Priest) on Oct 12, 2012 at 18:25 UTC

    May you wanted something like this:

    use warnings; use strict; while ( defined( my $filename = <DATA> ) ) { chomp $filename; if ( my ( $product, $year, $month, $day, $hour, $suffix, $ver ) = $filename =~ m/^(.+?)\.(\d{4})(\d{2})(\d{2})(\d{2})\.(.+?)\.(. ++?)\./ ) { print join ' ', ( $product, $year, $month, $day, $hour, $suffi +x, $ver ), $/; } } __DATA__ 3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz 3B40RT.2000033121.7RWER.bin.gz
    The if iterate like so:
    ... if ( my ( $product, $year, $month, $day, $hour, $suffix, $ver ) = $filename =~ m/^(.+?) # matches PRODUCT \. (\d{4}) # matches YEAR (\d{2}) # matches MONTH (\d{2}) # matches DAY (\d{2}) # matches HOUR \. (.+?) # SUFFIX matches both 7,7R or 7anythin +g \. (.+?) # VERSION \./x ) { print join ' ', ( $product, $year, $month, $day, $hour, $suffi +x, $ver ), $/; } ...
    I hope that helps

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Another Pattern Matching Question
by Kenosis (Priest) on Oct 12, 2012 at 19:28 UTC

    Here's a splitting and unpacking option for your data set that puts the desired elements into an array:

    use Modern::Perl; for ( grep /\S/, <DATA> ) { my @data = ( split /\./ )[ 0 .. 3 ]; splice @data, 1, 1, unpack '(a4)(a2)*', $data[1]; say "@data"; } __DATA__ 3B40RT.2000033121.7.bin.gz 3B40RT.2000033121.7R.bin.gz 3B40RT.2000033121.7RWER.bin.gz

    Output:

    3B40RT 2000 03 31 21 7 bin 3B40RT 2000 03 31 21 7R bin 3B40RT 2000 03 31 21 7RWER bin
Re: Another Pattern Matching Question
by Anonymous Monk on Oct 12, 2012 at 20:22 UTC
    I think that you might be happiest if you could split the filename based on ".", then next unless ... the length of the array, then a match on the zeroth part, then the first, and so on. Yep, a series of tests, each with an obvious single meaning, instead of a complex that's just going to keep being a maintenance PITA. You know that those filenames are just going to keep evolving...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://998720]
Approved by Kenosis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2014-07-30 11:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (230 votes), past polls