Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

How to take only maximum directory name by using its digits using perl?

by finddata (Sexton)
on Mar 29, 2017 at 03:46 UTC ( #1186302=perlquestion: print w/replies, xml ) Need Help??

finddata has asked for the wisdom of the Perl Monks concerning the following question:

`-- added |-- add.txt `-- added1 |--action | |-- action.txt | `-- rev1 | | `-- rev1.html | `-- rev2 | `-- rev1.html `-- add.html
Expected output:
`-- added |-- add.txt `-- added1 |--action | |-- action.txt | `-- rev2 | `-- rev1.html `-- add.html
Question: My query is how to take the maximum number directory to be displayed by filtering using digits in perl. Code which i have tried:
my $location = $output_dir; print $location; open LOGFILE, $location; my $first_line = 1; #print $first_line; my $max_id; while (<LOGFILE>) { if (/rev_(\d)+/) { if ($first_line) { $first_line = 0; $max_id = $1; } else { $max_id = $1 if ($1 > $max_id); print $max_id; } } }

Replies are listed 'Best First'.
Re: How to take only maximum directory name by using its digits using perl?
by haukex (Chancellor) on Mar 29, 2017 at 06:49 UTC

    If I understand your question correctly, then the problem is that your regex does not match. /rev_(\d)+/ has an _ where your sample input does not, and the + quantifier should be on the character it is supposed to quantify, \d, not the capture group. If I change the regex to /rev(\d+)/, your code works for me, meaning that when the loop ends, $max_id holds the highest directory number (2, in both cases).

    Note: I am taking your sample input literally, meaning that I am assuming the two sample inputs you showed are exactly what your "log file" looks like. If this is not the case, please read and understand this page: SSCCE.

      The thing which you mentioned is right one.I should skip the underscore from the rev.And should include with + inside grouping. Please look the below code.Hence i am not getting the things.Did i made any mistakes with my code.Let me know to check it?
      my $location = $output_dir; print $location; open LOGFILE, $location; my $first_line = 1; print $first_line; my $max_id; while (<LOGFILE>) { if (/rev(\d+)/) { if ($first_line) { $first_line = 0; print $first_line; $max_id = $1; print $max_id; } else { $max_id = $1 if ($1 > $max_id); print $max_id; } } } close LOGFILE;
        Did i made any mistakes with my code.

        This isn't an SSCCE. You have neither declared nor defined $output_dir and therefore $location is undefined so LOGFILE cannot be opened and your while (<LOGFILE>) condition will never be true.

        Let me know to check it?

        Item 1 of the Basic debugging checklist is to use the stricture pragmas. This would have highlighted the problem to you.

        Hence i am not getting the things.

        What does this mean? I already explained it to you: If you ask good questions, like you've done a few times so far, we can give good answers. If you ask questions that can't be answered because you haven't provided enough information, not only will you not get good answers, you will lose support from more and more monks as you go on. I gave you three links to read, please do so, and always follow that advice!

        Your code looks ok to me (Update: however, you should take stevieb's advice, as well as hippo's advice to Use strict and warnings) and when I set $output_dir to an input file name, it seems to work. However, only you know what all of your specifications are, and what all of your input looks like, so I can't tell you if your code will always work, at the moment only you can confirm that your code fully works. Testing is an important skill, and what I can do is show you how you might go about this. For example, you can test whether your code, properly modularized into a subroutine, works for various test inputs. Here, I'm using in-memory files (open) and Test::More. Note how I've left your logic entirely unchanged.

        #!/usr/bin/env perl use warnings; use strict; sub scan_handle_for_rev { my $filehandle = shift; my $first_line = 1; my $max_id; while (<$filehandle>) { if (/rev(\d+)/) { if ($first_line) { $first_line = 0; $max_id = $1; } else { $max_id = $1 if ($1 > $max_id); } } } return $max_id; } use Test::More; { open my $fh, '<', \<<'END_TEST_INPUT' or die $!; | |-- action.txt | `-- rev2 | `-- rev1.html `-- add.html END_TEST_INPUT is scan_handle_for_rev($fh), 2; close $fh; } { open my $fh, '<', \<<'END_TEST_INPUT' or die $!; |--action | |-- action.txt | `-- rev1 | | `-- rev1.html | `-- rev2 | `-- rev1.html `-- add.html END_TEST_INPUT is scan_handle_for_rev($fh), 2; close $fh; } { open my $fh, '<', \<<'END_TEST_INPUT' or die $!; |--action | |-- action.txt | `-- rev1 | | `-- rev1.html | `-- rev13 | | `-- rev1.html | `-- rev2 | `-- rev1.html `-- add.html END_TEST_INPUT is scan_handle_for_rev($fh), 13; close $fh; } done_testing;

        Ok, you've corrected 2 problems with the regex. Now try this SSCCE and see if you can find another error. Remember you said 'No need to consider about the files inside those folders.'

        #!/usr/bin/perl use strict; my $max; while (<DATA>) { if (/rev(\d+)/) { if ($1 > $max){ $max = $1; } } } print "max rev = $max\n"; __DATA__ `-- added |-- add.txt `-- added1 |--action | |-- action.txt | `-- rev1 | | `-- rev9999.html | `-- rev2 | `-- rev1.html `-- add.html
        poj
Re: How to take only maximum directory name by using its digits using perl?
by AnomalousMonk (Bishop) on Mar 29, 2017 at 06:48 UTC

    In addition to the problem of the extraneous underscore in  /rev_(\d)+/ in the OP, note also that  (\d)+ will not match what I think you think it will if there is more than one digit present:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'foorev_123bar'; ;; print qq{captured '$1'} if $s =~ m{ rev_ (\d)+ }xms; " captured '3'
    Capture multiple digits with:
    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'foorev_123bar'; ;; print qq{captured '$1'} if $s =~ m{ rev_ (\d+) }xms; " captured '123'
    One previous reply has implicitly noted this problem. (Update: haukex has also addressed this below.)


    Give a man a fish:  <%-{-{-{-<

Re: How to take only maximum directory name by using its digits using perl?
by poj (Abbot) on Mar 29, 2017 at 05:51 UTC

    Why is rev1.html in rev2 folder ?

    | `-- rev1 | | `-- rev1.html | `-- rev2 | `-- rev1.html

    Run this Short, Self Contained, Compilable, Example and see if you can spot the error in your posted code.

    #!/usr/bin/perl use strict; while (<DATA>) { if (/rev_(\d)+/) { # what does this match ? print "'$1' matched in $_"; } } __DATA__ `-- added |-- add.txt `-- added1 |--action | |-- action.txt | `-- rev1 | | `-- rev1.html | `-- rev2 | `-- rev1.html `-- add.html
    poj
      I'm not sure what's in the OP's log file. Based on his previous questions, I'm guessing it's actually the result of running something like find . -print on the directory structure he posted an ASCII picture of. Asking finddata rhetorical questions never produces any useful result.
      rev1.html can appera in any folders i should filter the folders names by using its digits .No need to consider about the files inside those folders.
      The above does not print anything.It looks empty ?
        It's empty because the code expects an underscore between "rev" and the number.
Re: How to take only maximum directory name by using its digits using perl?
by stevieb (Canon) on Mar 29, 2017 at 03:55 UTC

    For what it's worth, this:

    open LOGFILE, $location;

    Should most definitely be replaced with the three-arg open with a lexical file handle, and an error check:

    open my $fh, '<', $location or die "can't open the damned file!: $!";

    $fh there represents your bareword LOGFILE. Bareword file handles are global, and really shouldn't be used.

      i had changed those things which you have mentioned in the above.
Re: How to take only maximum directory name by using its digits using perl?
by Anonymous Monk on Mar 29, 2017 at 06:23 UTC

    This question looks really familiar. Friend of yours? Classmate? That question seems to indicate that you are looking for a recursive solution, and that rev_N can appear in any sub-path, and that you would only want to retain the path that has the highest rev_N in it. Can you confirm your need? Can you describe how you would solve this if you had a pencil and paper? It's often useful to separate the code-implementation from the algorithm. Once the algorithm is understood, then committing it to code often becomes easier.

Re: How to take only maximum directory name by using its digits using perl?
by Anonymous Monk on Mar 29, 2017 at 04:57 UTC
    Re-read the file and print out things that have the right rev?
    seek LOGFILE, 0, 0; while (<LOGFILE>) { if (/rev_(\d+)/) { print if $1 == $max_id } }
      By the above i couldnt see any changes with my output.The same it prints

        I'm guessing that you've replaced part of your code with the code I posted. You're meant to add my code after yours.

        It's very difficult to give useful answers when you keep making us guess what you're really doing.

Re: How to take only maximum directory name by using its digits using perl?
by Anonymous Monk on Mar 29, 2017 at 06:16 UTC
    Please use simple english to describe the steps needed to accomplish this task as a human

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1186302]
Approved by shmem
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2019-12-13 23:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?