Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^3: Building a UNIX path from Irritating data

by kyle (Abbot)
on Nov 26, 2009 at 02:14 UTC ( #809465=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Building a UNIX path from Irritating data
in thread Building a UNIX path from Irritating data

I replaced everything after the second open with this:

my %parents_of; my $parentid; while ( my $line = <FFINFO> ) { if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) { $parentid = $1; } elsif ($line =~ /\s+subfolder\s+(\S+)\s+\-\s+.*\./) { push @{ $parents_of{ $1 } }, $parentid; } } sub build_path { my $folderid = shift @_; my @parents = @{ $parents_of{ $folderid } || [] }; return @parents ? [ map { map { "$_/$folderid" } @{ build_path( +$_ ) } } @parents ] : [ $folderid ]; } foreach my $k (sort (keys (%folders))) { $folderpaths{$k} = build_path($k); print "$k => $_\n" for @{ $folderpaths{$k} }; }

The highlights of the changes are:

  • I read the subfolders file only once.
  • I store the subfolders data in a %parents_of hash of arrays, which maps each folder ID to the folders that have it as a subfolder.
  • In build_path, I use the hash of arrays instead of iterating over the whole subfolders file. It's still recursive.
  • Since build_path returns an array reference now, the output loop has to iterate its contents.

For the ID that you pointed out, the new output is:

3053 => 100/3051/3053 3053 => 100/3057/3053 3053 => 100/3063/3053 3053 => 100/3066/3053

The new code produces all the output of the old code plus another 1108 lines, and it runs in less than 1 second while the original takes about 220 seconds.

I hope this helps.


Comment on Re^3: Building a UNIX path from Irritating data
Select or Download Code
Re^4: Building a UNIX path from Irritating data
by roswell1329 (Acolyte) on Nov 29, 2009 at 23:22 UTC
    Hi kyle --

    I've been working on your code for a while, and I cannot yet see how it builds a path, and I cannot get it to work for me. I copied your code above exactly as you said and used the data file that I provided, but here's the output I get:

    100 => 100 101 => 101 1013 => 1013 1014 => 1014 1015 => 1015 1053 => 1053 1057 => 1057 1059 => 1059 1065 => 1065 119 => 119 1198 => 1198 1227 => 1227 1228 => 1228 1229 => 1229 1230 => 1230 1231 => 1231 1232 => 1232 1233 => 1233 1234 => 1234 1238 => 1238 1239 => 1239 1241 => 1241 1298 => 1298 1299 => 1299 1300 => 1300 1311 => 1311 1317 => 1317 1318 => 1318 1320 => 1320 1321 => 1321 1322 => 1322 1323 => 1323 1324 => 1324 1325 => 1325 1327 => 1327 1328 => 1328 1329 => 1329 1347 => 1347 1348 => 1348 1349 => 1349 1350 => 1350 1351 => 1351 1352 => 1352 1374 => 1374 1375 => 1375 1376 => 1376 1377 => 1377 1390 => 1390 1393 => 1393 1394 => 1394 1395 => 1395 1396 => 1396 1397 => 1397 1398 => 1398 ...
    I think the line that may be wrong in my code is this one:
    return @parents ? [ map { map { "$_/$folderid" } @{ build_path( $_ ) } } @parents ] : [ $folderid ];
    I've never seen a "return" command used in a conditional this way. It would seem to me that in order for the conditional to be true, the focus would have already returned to the line calling build_path originally. Does this actually mean if "return @parents" would succeed then do X, if not, Y?

    I really appreciate your help. I think this is very close to what I need, but I'm missing some keystone. To make sure I got your code correct, here's the exact script I'm running (in cygwin on a windows system, and on an HPUX system):

    #!/usr/bin/perl -w use strict; my $foldername_file = shift @ARGV; my $folderfolder_file = shift @ARGV; my %folders = (); my %folderpaths = (); open(FNAMES,"$foldername_file") or die "Can't open $foldername_file: $ +!\n"; foreach my $folderline (<FNAMES>) { chomp $folderline; if ($folderline =~ /^Folder (\d{3,5})\s+\-\s+(.*)\.$/) { $folders{$1} = $2; } } close(FNAMES); open(FFINFO,"$folderfolder_file") or die "Can't open $folderfolder_fil +e: $!\n"; my @subfolders = <FFINFO>; my %parents_of; my $parentid; while ( my $line = <FFINFO> ) { if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) { $parentid = $1; } elsif ($line =~ /\s+subfolder\s+(\S+)\s+\-\s+.*\./) { push @{ $parents_of{ $1 } }, $parentid; } } sub build_path { my $folderid = shift @_; my @parents = @{ $parents_of{ $folderid } || [] }; return @parents ? [ map { map { "$_/$folderid" } @{ build_path( +$_ ) } } @parents ] : [ $folderid ]; } foreach my $k (sort (keys (%folders))) { $folderpaths{$k} = build_path($k); print "$k => $_\n" for @{ $folderpaths{$k} }; }

      I think the problem is that what you have still reads everything into @subfolders. Here's the snippet I'm talking about.

      open(FFINFO,"$folderfolder_file") or die "Can't open $folderfolder_fil +e: $!\n"; # This reads in every line from FFINFO my @subfolders = <FFINFO>; my %parents_of; my $parentid; # This loop also wants to read every line, # but the handle is already at EOF, # so it gets nothing. while ( my $line = <FFINFO> ) {

      If you need @subfolders for some purpose outside the code we're talking about, you could still have it. Just declare my @subfolders; at the same scope as $parentid and %parents_of, without initializing it. Then, right inside the FFINFO read loop, push every $line in. It winds up looking like this:

      open(FFINFO,"$folderfolder_file") or die "Can't open $folderfolder_fil +e: $!\n"; my @subfolders; my %parents_of; my $parentid; while ( my $line = <FFINFO> ) { push @subfolders, $line;

      You also had a question about this:

      return @parents ? [ map { map { "$_/$folderid" } @{ build_path( +$_ ) } } @parents ] : [ $folderid ];

      What this means is, "if @parents is non-empty, return first expression (with nested maps), else return the second expression ($folderid alone in an array reference)." Your description leads me to believe that you have the precedence mixed up. What you describe is this:

      ( return @parents ) ? [ X ] : [ Y ];

      What's happening is this:

      return ( @parents ? [ X ] : [ Y ] );

      You can see this kind of thing yourself using B::Deparse. I ran this command line:

      perl -MO=Deparse,-p input-file.pl

      With the -p, it puts parentheses in to clarify how expressions are interpreted. As a result, it showed me this (along with all the other code):

      return((@parents ? [map({map({"$_/$folderid";} @{build_path($_);});} @ +parents)] : [$folderid]));

      That's not pretty, but it does show that the return is "outside" the whole rest of the line.

      I hope this helps. I feel I've written in some haste, so I wouldn't be surprised if I've been unclear. If I have left you with any other questions, feel free to ask.

        Holy smokes! I had to look at this code for about 2 hours to understand it. I see now that build_path always returns a list (array), but usually only with 1 element. I also see that you're using several anonymous arrays for each iteration. It also took me a while to see how you're switching between lists and scalars back and forth. This code is very efficient. It works brilliantly for me now, and I thank you very much for the enlightenment!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://809465]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (13)
As of 2014-08-27 23:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (253 votes), past polls