http://www.perlmonks.org?node_id=63473

Rudif has asked for the wisdom of the Perl Monks concerning the following question:

Everyone knows how to walk a directory tree and collect the file paths into a flat list, perhaps filtering them along the way.

However, I want to collect those paths into a tree of hashes, where keys would be file or subdirectory names and values would be a scalar for a file and a hashref for a subdirectory.
Once in a tree, I could apply recursive programs to do whatever I need with my files.

Here is my first try, with sample output (and a directory browser in 3 lines of perl :-).

#! perl -w use strict; use File::Find; my $dirth = dirTreeHash(shift); sub dirTreeHash { my $dir = shift; return unless defined $dir && -d $dir; $dir =~ s#\\#/#g; # Win32 :-( my $dirth = {}; find sub { return if -d; eval join '', '$dirth->', ( map { '{\'' . $_ . '\'}' } split +'/', $File::Find::name ), '=""'; warn $@ if $@; }, $dir; $dirth; } # dump it use Data::Dumper; $Data::Dumper::Indent = 1; print Dumper $dirth; # browse it use Tk::ObjScanner; MainWindow->new->ObjScanner(caller => $dirth)->pack; + Tk::MainLoop; __END__ H:\devperl\perlmonks\TryDirTreeHash.pl H:\devperl\PerlMonksChat2 $VAR1 = { 'H:' => { 'devperl' => { 'PerlMonksChat2' => { 'README' => '', 'ChangeLog' => '', 'Makefile' => '', 'getchat.pl' => '', 'getpage.pl' => '', 'getchat.old.pl' => '', 'test' => { 'testnn.pl' => '', 'nodetypes' => '' }, 'PerlMonks' => { 'Users.pm' => '', 'NewestNodes.pm' => '', 'Chat.pm' => '' }, 'Noname6.pl' => '', 'PerlMonks.pm' => '', 'pmpager.pl' => '', 'PerlMonksChat.pm' => '', 'pmchat' => '' } } } };
Question:
What other/better ways to perform this dir-to-hash mapping would you suggest?

Rudif

Replies are listed 'Best First'.
Re: How to map a directory tree to a perl hash tree
by japhy (Canon) on Mar 10, 2001 at 20:42 UTC
    I use the SNAP (Stacked Nodes Affect Parent) method. I use it for my YAPE modules. The process is rather simple. Make two variables refer to the same reference, and you create an array for stacking.
    my $node = my $tree = []; my @stack;
    You never modify $tree directly -- you do all the manipulation on $node, and when you need to recurse, you add a new "field" to $node; then you push the $node onto the @stack, and then you set $node equal to the field:
    if ($token eq '(') { push @{ $node }, []; push @stack, $node; $node = $node->[-1]; # the new empty array ref }
    When you return from the recursion, you get the top value from the @stack:
    if ($token eq ')') { $node = pop @stack; }
    Now, $node contains what it was before, PLUS any modifications to that empty array reference. And because $node and $tree refer to the same data, modifying the data in $node modifies $tree.

    It's important to realize that saying:

    $a = $b = []; $a = 2;
    does not affect $b in any way. It's the changing of the data referred to by $a that affects $b.

    Without further ado, here is my implementation of a directory tree structure, using SNAP.

    #!/usr/bin/perl -w use File::Find; use Data::Dumper; use strict; $Data::Dumper::Indent = 1; build_tree(my $tree, shift); print Dumper $tree; { sub build_tree { my $node = $_[0] = {}; my @s; find( sub { # fixed 'if' => 'while' -- thanks Rudif $node = (pop @s)->[1] while @s and $File::Find::dir ne $s[-1][0] +; return $node->{$_} = -s if -f; push @s, [ $File::Find::name, $node ]; $node = $node->{$_} = {}; }, $_[1]); $_[0]{$_[1]} = delete $_[0]{'.'}; } }
    The recursion is not handled by me, it's handled by find(), so I had to kludge my way into faking the start and end of a recursion. But it works.

    japhy -- Perl and Regex Hacker
      But it works.

      Indeed, it works, it is free of evil string eval, and you explained it in English.
      And I still have 0.5 * $weekend ahead of me, to let it sink in. Thanks!

      Rudif

      PS. Your OGRE is awesome. But it adds an extra  (?-imsx:) around the regex that I type in, e.g. I ask  ([\.\d]+) and it returns and explains  (?-imsx:([\.\d]+)) - why?

      Update You qr // the submitted regex, right? Just guessing, I did not look into the source code.
        I don't think so -- that would stop me from being able to point out the errors in a regex my own way.

        I add the (?:...) around it for several reasons. First, to honor any of the /imsx switches you have on. Second, so that the regex has ONE root node (easier to parse). Third, so the regex is "its own regex". (That third reason is probably lame.)

        japhy -- Perl and Regex Hacker

      But it works.

      Er, almost :-(
      There is a bug in sub build_tree.
      To fix it, you have to replace the  if that pops the stack by a while .
      The test below shows the difference.

      Rudif

      #! perl -w use strict; use File::Find; use File::Path; use Data::Dumper; $Data::Dumper::Indent = 1; my $dir = './buildtreetestdir'; mkpath "$dir"; mkpath "$dir/dir1"; mkpath "$dir/dir1/dir11"; mkpath "$dir/dir2"; mkpath "$dir/dir2/dir21"; build_tree(my $tree, $dir); build_tree_fixed(my $tree_fixed, $dir); print Data::Dumper->Dump( [ $tree, $tree_fixed ], [ qw / tree tree_fix +ed / ] ); { sub build_tree { my $node = $_[0] = {}; my @s; find( sub { $node = (pop @s)->[1] if @s and $File::Find::dir ne $s[-1][0]; return $node->{$_} = -s if -f; push @s, [ $File::Find::name, $node ]; $node = $node->{$_} = {}; }, $_[1]); $_[0]{$_[1]} = delete $_[0]{'.'}; } } { sub build_tree_fixed { my $node = $_[0] = {}; my @s; find( sub { $node = (pop @s)->[1] while @s and $File::Find::dir ne $s[-1][0] +; return $node->{$_} = -s if -f; push @s, [ $File::Find::name, $node ]; $node = $node->{$_} = {}; }, $_[1]); $_[0]{$_[1]} = delete $_[0]{'.'}; } } __END__ $tree = { './buildtreetestdir' => { 'dir2' => { 'dir21' => {}, 'dir1' => { 'dir11' => {} } } } }; $tree_fixed = { './buildtreetestdir' => { 'dir1' => { 'dir11' => {} }, 'dir2' => { 'dir21' => {} } } };
        Sorry. Thanks for the fix. I think I actually fixed it like that after I posted the code, but forgot to fix it here.

        japhy -- Perl and Regex Hacker
Re: How to map a directory tree to a perl hash tree
by merlyn (Sage) on Mar 10, 2001 at 20:27 UTC
    This line:
    eval join '', '$dirth->', ( map { '{\'' . $_ . '\'}' +} split '/', $File::Find::name ), '=""';
    is dangerous for all the same reasons as it was the last two times this was brought up. See the thread containing my "Re: Re: stringification" for the most recent time, and follow the link contained within it to go to the older thread which looks a lot more like your code.

    In short (as I'm also saying in two other threads right now as well),

    • AVOID STRING EVAL

    -- Randal L. Schwartz, Perl hacker

      AVOID STRING EVAL

      I will.
      I did have an inkling that a solution involving multiple references to the same data item (a nascent hash entry), perhaps combined with recursion, should be possible, but I could not see how it would fit together. But then as I wrote some test cases, initializing multilevel hashes, it dawned on me that I could construct similar Perl code on the fly and eval it - my first ever creative use for string eval. I must admit, I took the easy way out .-(
      I probably cost me a --, but I'm wiling to pay my way.

      Rudif
Re: How to map a directory tree to a perl hash tree
by Masem (Monsignor) on Mar 10, 2001 at 19:23 UTC
    Might not a recursive method of loading this hash work better? Namely, if the 'file' is a directory, then it's entry would just be something like $dirth{ $dir } = dirTreeHash( $dir ).
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

      Yes, but then you couldn't use File::Find (one of several problems with File::Find's call-back interface).

              - tye (but my friends call me "Tye")
Re: How to map a directory tree to a perl hash tree
by DeaconBlues (Monk) on Mar 11, 2001 at 12:10 UTC

    Here's another way. I handle the recursion myself, without File::Find. I used your example as a model.

    Is this what you want or am I missing something?

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $dirth = &dirTreeHash(shift); sub dirTreeHash { my $dir = shift; return unless (defined $dir && -d $dir); $dir =~ s#\\#/#g; # Win32 :-( my $dirth = {}; opendir(DIR, $dir) || die "Unable to opendir $dir\n"; my @files = grep {!/^\.\.?$/} readdir(DIR); closedir(DIR); map {$dirth->{$_} = (-d "$dir/$_" ? &dirTreeHash("$dir/$_") : '')} @ +files; return $dirth; } $Data::Dumper::Indent = 1; print Dumper $dirth;

        Sorry Randal. How about if I change the line from

        map {$dirth->{$_} = (-d "$dir/$_" ? &dirTreeHash("$dir/$_") : '')} @fi +les;

        to

        map {$dirth->{$_} = (-d "$dir/$_" ? &dirTreeHash("$dir/$_") : '') unle +ss (-l "$dir/$_")} @files;
      Is this what you want or am I missing something?
      Yes, it does exactly what I want.

      Now we have 3 solutions to compare:

      Yours is perhaps the most natural - a recursive solution for a recursive problem.
      japhy's solution neatly replaces recursion by iteration and enables the use of File::Find::find.
      Mine , deprecated because based on string eval, uses the information extracted with File::Find::find directly.

      All have similar complexity. Performance will probably be dominated by the disk access operations, not by the algorithm used.

      Rudif
Re: How to map a directory tree to a perl hash tree
by brother ab (Scribe) on Mar 11, 2001 at 13:13 UTC

    May be it would be suitable to use tied hash (see e.g. Tie::Dir)? Do you really need to load all the directory tree into the memory before scanning?

    -- brother ab
      May be it would be suitable to use tied hash(see e.g. Tie::Dir)?
      Tie::Dir handles files in one directory. It should be posible to write a Tie::DirTree along similar lines.

      At times I used Tie modules (Win32::OLE, Tie::Registry), but at this stage I don't have a feel for the advantages of a Tie module relative to a plain module. I suppose a Tie module adds syntactic convenience. Can you or any other Monk elaborate on this?

      Do you really need to load all the directory tree into the memory before scanning?

      It depends on what I want to do. A simple job (e.g. count the number of .html files and compute their total size) can be done simply in the find's wanted subroutine.
      For more elaborate jobs it might be better to first extract the relevant directory and file information into a Perl structure and then work on that structure, for example:
      • generate a graphical representation of the directory tree for a browser (javascript, html, chm)
      • compare the perl/html tree with perl/lib and per/site/lib trees to find .pm and .pod files that have no corresponding .html file, and make that file
      Rudif