Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Finding deepest directories in a tree structure represented in a flatfile?

by jeffa (Bishop)
on Dec 04, 2015 at 23:08 UTC ( #1149426=note: print w/replies, xml ) Need Help??


in reply to Finding deepest directories in a tree structure represented in a flatfile?

Just use a hash to eliminate duplicates and File::Basename will do the heavy lifting:

use strict; use warnings; use File::Basename qw( fileparse ); my %seen; while (<DATA>) { (undef, my $path, undef) = fileparse( $_ ); print "$path\n" unless $seen{$path}++; } __DATA__ testing123/ foobar/ helloworld/ helloworld/r1/ helloworld/r1/helloworld-5-0.noarch.rpm helloworld/r1/testfile23.txt helloworld/r1/tomcat-7.0.27.rpm helloworld/r2/ helloworld/r2/helloworld-2-0.noarch.rpm helloworld/r2/testfile12.txt helloworld/r2/tomcat-5.0.52.rpm hellotest/

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
  • Comment on Re: Finding deepest directories in a tree structure represented in a flatfile?
  • Download Code

Replies are listed 'Best First'.
Re^2: Finding deepest directories in a tree structure represented in a flatfile?
by grasshopper!!! (Beadle) on Dec 05, 2015 at 00:08 UTC

    This will work without library on data supplied.May fail on other data best to use jaffa's code if have File::Basename installed.

    #!/usr/bin/perl use strict; use warnings; my %seen; while (<DATA>) { m/^(.+\/)(.*)$/; my $path=$1; print "$path\n" unless $seen{$path}++; } __DATA__ testing123/ foobar/ helloworld/ helloworld/r1/ helloworld/r1/helloworld-5-0.noarch.rpm helloworld/r1/testfile23.txt helloworld/r1/tomcat-7.0.27.rpm helloworld/r2/ helloworld/r2/helloworld-2-0.noarch.rpm helloworld/r2/testfile12.txt helloworld/r2/tomcat-5.0.52.rpm hellotest/
      May fail on other data

      A better regex would be m{^(.+[\\/])[^\\/]*$} which will also work on MS Windows file paths.

      But, as you said, would be better to use File::Basename as it will work with the file path syntax of several other OSes.

Re^2: Finding deepest directories in a tree structure represented in a flatfile?
by Lotus1 (Priest) on Dec 05, 2015 at 17:34 UTC

    The OP requested not to include helloworld/ since there were subdirectories. This addition to your script works but will not be efficient for a large file with the nested searches.

    use strict; use warnings; use File::Basename qw( fileparse ); my %seen; while (<DATA>) { (undef, my $path) = fileparse( $_ ); print "$path\n" unless $seen{$path}++; } print "="x70,"\n"; foreach my $key (keys %seen){ print "$key\n" if 1 == scalar grep { /\Q$key/ } keys %seen; } __DATA__ testing123/ foobar/ helloworld/ helloworld/r1/ helloworld/r1/helloworld-5-0.noarch.rpm helloworld/r1/testfile23.txt helloworld/r1/tomcat-7.0.27.rpm helloworld/r2/ helloworld/r2/helloworld-2-0.noarch.rpm helloworld/r2/testfile12.txt helloworld/r2/tomcat-5.0.52.rpm hellotest/

    For large files sorting and a pass to remove the current element if the next element matches is needed. Or, assuming that the original file is sorted just check each element against the next one.

    use strict; use warnings; use File::Basename qw( fileparse ); my %seen; my @directories; while (<DATA>) { (undef, my $path) = fileparse( $_ ); #push @directories, $path; ## update, I indended to put the push inside the unless block. ## The original works but not exactly as I expected since it puts ## duplicates in the array. #print "$path\n" unless $seen{$path}++; unless ($seen{$path}++) { print "$path\n"; push @directories, $path; } } print "="x70,"\n"; foreach my $index (0..$#directories-1){ my $current = $directories[$index]; print "$current\n" if not $directories[$index+1] =~ /\Q$current/ ; } ### assuming the last one is always terminal print $directories[$#directories], "\n"; __DATA__ testing123/ foobar/ helloworld/ helloworld/r1/ helloworld/r1/helloworld-5-0.noarch.rpm helloworld/r1/testfile23.txt helloworld/r1/tomcat-7.0.27.rpm helloworld/r2/ helloworld/r2/helloworld-2-0.noarch.rpm helloworld/r2/testfile12.txt helloworld/r2/tomcat-5.0.52.rpm hellotest/

    Update: This seems to work on a single pass through the file.

    use strict; use warnings; use File::Basename qw( fileparse ); my $first=''; my $second=''; while (<DATA>) { $first = $second; (undef, $second) = fileparse( $_ ); print "$second\n" if eof DATA; next unless $first; unless ( $second =~ /\Q$first/) { print "$first\n"; } } __DATA__ testing123/ foobar/ helloworld/ helloworld/r1/ helloworld/r1/helloworld-5-0.noarch.rpm helloworld/r1/testfile23.txt helloworld/r1/tomcat-7.0.27.rpm helloworld/r2/ helloworld/r2/helloworld-2-0.noarch.rpm helloworld/r2/testfile12.txt helloworld/r2/tomcat-5.0.52.rpm hellotest/test.txt hellotest/
      Thanks for all the examples -- this one works well for me. I don't think the result set will every be large enough to introduce efficiency issues.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1149426]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2019-05-20 14:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you enjoy 3D movies?



    Results (128 votes). Check out past polls.

    Notices?