Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

File:Find pattern match question

by RockE (Novice)
on Oct 31, 2013 at 01:23 UTC ( [id://1060489]=perlquestion: print w/replies, xml ) Need Help??

RockE has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks I have this script that looks for specific directories. However it reports the pattern matched directory and then lists the same directory name again depending on how many sub directories there are. How do I fix the code to only list the directory once.

use strict; use warnings; use File::Find; use Fcntl; #*****************Path Variables******************* our $wellpath = 'N:\\repos\\open\\Wells\\Regulated'; our $surveypath = 'N:\\repos\\open\\Surveys\\Regulated'; #************************************************** find(\&dir_names, $wellpath); sub dir_names { # skip over everything that is not a directory return '' if ! -d $File::Find::name; # skip over directories that don't match required pattern return if (not $File::Find::dir =~ qr{([IPD]\d{8}$)}); print "$File::Find::dir\n"; }

It displays this 3 times because there are 3 sub directories below it P00436188

Regulated/AC/Abalone_1_ENO5655/Logs_digital/P00436188

Regulated/AC/Abalone_1_ENO5655/Logs_digital/P00436188 /p>

Regulated/AC/Abalone_1_ENO5655/Logs_digital/P00436188

Thanks for all the replies

Replies are listed 'Best First'.
Re: File::Find pattern match question
by Athanasius (Archbishop) on Oct 31, 2013 at 03:30 UTC

    Hello RockE,

    As you don’t actually ask a question, I’ll have to guess that you want a way to remove duplicate directories from your output. Here is one approach:

    ... my %dirs; find(\&dir_names, $wellpath); print "$_\n" for sort keys %dirs; sub dir_names { # skip over everything that is not a directory return unless -d $File::Find::name; # skip over directories that don't match required pattern return unless $File::Find::dir =~ /[IPD]\d{8}$/; $dirs{$File::Find::dir} = 1; }

    That is, instead of printing each directory as it is found, store it in a hash and print the hash keys after the call to find has completed. As hash keys are necessarily unique, no duplicates will be recorded.

    Hope that helps,

    Update: See the correction below.

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks for the reply :). Helps to ask a question... I'd like the script to only report the directory name once, not print it again depending on how many sub directories it finds under the pattern matched directory. I'll try your example - cheers. We do have some duplicate directories so finding dupes is the next problem.

        "We do have some duplicate directories so finding dupes is the next problem."

        If you change this line in ++Athanasius' code:

        $dirs{$File::Find::dir} = 1;

        to

        ++$dirs{$File::Find::dir};

        The code will run the same but now you'll have a count. You can then find duplicates like this (untested):

        my @dup_dirs = grep { $dirs{$_} > 1 } keys %dirs;

        -- Ken

      ok tried your example and it doesn't print anything out, if I remove the pattern matching requirement it does work but obviously shows me all directories

      #!/usr/bin/perl # dirpath use strict; use warnings; use File::Find; use Fcntl; #*****************Path Variables********************** our $wellpath = 'N:\\repos\\open\\Wells\\Regulated\\'; our $surveypath = 'N:\\repos\\open\\Surveys\\Regulated\\'; our $testpath = 'C:\\Temp\\'; #******************************************************* my %dirs; find(\&dir_names, $testpath); print "$_\n" for sort keys %dirs; sub dir_names { # skip over everything that is not a directory return unless -d $File::Find::name; # skip over directories that don't match required pattern return unless $File::Find::dir =~ /[IPD]\d{8}$/; $dirs{$File::Find::dir} = 1; }

        According the the documentation for File::Find:

        The wanted function takes no arguments but rather does its work through a collection of variables.

            $File::Find::dir is the current directory name,
            $_ is the current filename within that directory
            $File::Find::name is the complete pathname to the file.

        The above variables have all been localized and may be changed without affecting data outside of the wanted function.

        So this line:

        return unless $File::Find::dir =~ /[IPD]\d{8}$/;

        is actually testing the parent directory, not the current file. Better to run both tests against the current filename in $_:

        sub dir_names { return unless -d $_; return unless $_ =~ /[IPD]\d{8}$/; $dirs{$File::Find::name} = 1; }

        or just:

        sub dir_names { return unless -d; return unless /[IPD]\d{8}$/; $dirs{$File::Find::name} = 1; }

        I think that fixes the problem.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      How do I fix the code to only list the directory once. Replace the period with a question mark (?) and you'll see he did have a question. Idiot.

        How do I fix the code to only list the directory once. Replace the period with a question mark (?) and you'll see he did have a question. Idiot.

        You read all the replies and find the one which answered the question years ago

        If you want somebody to point it out for you, don't be rude

Re: File:Find pattern match question
by Anonymous Monk on Oct 31, 2013 at 07:27 UTC
    #!/usr/bin/perl -- use strict; use warnings; use File::Find::Rule qw/ find /; my @dirs = find( -directory , name => qr{([IPD]\d{8}$)}, -in => $wellpath, ); print join "\n", @dirs, '';

      Couldn't get this to work

      #!/usr/bin/perl use strict; use warnings; use Fcntl; #use File::Find; #use File::Find::Rule; use File::Find::Rule qw/ find /; #*****************Path Variables******************* our $wellpath = "N:\\repos\\open\\Wells\\Regulated"; our $surveypath = "N:\\repos\\open\\Surveys\\Regulated"; our $temp = "C:\\Temp\\"; #************************************************** my @dirs = find( -directory , name => qr{([MIPD]\d{8}$)}, -in => $temp, ); print join "\n", @dirs, '';
      C:\Temp\hddzip>perl dirpath3.pl Can't locate method File::Find::Rule::-directory at dirpath3.pl line 16. I have installed File::Find::Rule

        :) Try "directory" instead of "-directory"
Re: File:Find pattern match question
by Anonymous Monk on Oct 31, 2013 at 07:44 UTC

    Perhaps $File::Find::prune will work:

    sub dir_names { # Skip over everything that is not a directory. Note that # chdir means we can use -X file tests on default $_. -d or return; # Skip over directories that don't match required pattern. # Match against $_ instead of entire directory path. /^[IPD]\d{8}$/ or return; print "$File::Find::name\n"; # Do not recurse below current directory. $File::Find::prune = 1; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1060489]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-03-29 08:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found