Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: Building a UNIX path from Irritating data

by kyle (Abbot)
on Nov 26, 2009 at 02:14 UTC ( #809465=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Building a UNIX path from Irritating data
in thread Building a UNIX path from Irritating data

I replaced everything after the second open with this:

my %parents_of; my $parentid; while ( my $line = <FFINFO> ) { if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) { $parentid = $1; } elsif ($line =~ /\s+subfolder\s+(\S+)\s+\-\s+.*\./) { push @{ $parents_of{ $1 } }, $parentid; } } sub build_path { my $folderid = shift @_; my @parents = @{ $parents_of{ $folderid } || [] }; return @parents ? [ map { map { "$_/$folderid" } @{ build_path( +$_ ) } } @parents ] : [ $folderid ]; } foreach my $k (sort (keys (%folders))) { $folderpaths{$k} = build_path($k); print "$k => $_\n" for @{ $folderpaths{$k} }; }

The highlights of the changes are:

  • I read the subfolders file only once.
  • I store the subfolders data in a %parents_of hash of arrays, which maps each folder ID to the folders that have it as a subfolder.
  • In build_path, I use the hash of arrays instead of iterating over the whole subfolders file. It's still recursive.
  • Since build_path returns an array reference now, the output loop has to iterate its contents.

For the ID that you pointed out, the new output is:

3053 => 100/3051/3053 3053 => 100/3057/3053 3053 => 100/3063/3053 3053 => 100/3066/3053

The new code produces all the output of the old code plus another 1108 lines, and it runs in less than 1 second while the original takes about 220 seconds.

I hope this helps.


Comment on Re^3: Building a UNIX path from Irritating data
Select or Download Code
Re^4: Building a UNIX path from Irritating data
by roswell1329 (Acolyte) on Nov 29, 2009 at 23:22 UTC
    Hi kyle --

    I've been working on your code for a while, and I cannot yet see how it builds a path, and I cannot get it to work for me. I copied your code above exactly as you said and used the data file that I provided, but here's the output I get:

    100 => 100 101 => 101 1013 => 1013 1014 => 1014 1015 => 1015 1053 => 1053 1057 => 1057 1059 => 1059 1065 => 1065 119 => 119 1198 => 1198 1227 => 1227 1228 => 1228 1229 => 1229 1230 => 1230 1231 => 1231 1232 => 1232 1233 => 1233 1234 => 1234 1238 => 1238 1239 => 1239 1241 => 1241 1298 => 1298 1299 => 1299 1300 => 1300 1311 => 1311 1317 => 1317 1318 => 1318 1320 => 1320 1321 => 1321 1322 => 1322 1323 => 1323 1324 => 1324 1325 => 1325 1327 => 1327 1328 => 1328 1329 => 1329 1347 => 1347 1348 => 1348 1349 => 1349 1350 => 1350 1351 => 1351 1352 => 1352 1374 => 1374 1375 => 1375 1376 => 1376 1377 => 1377 1390 => 1390 1393 => 1393 1394 => 1394 1395 => 1395 1396 => 1396 1397 => 1397 1398 => 1398 ...
    I think the line that may be wrong in my code is this one:
    return @parents ? [ map { map { "$_/$folderid" } @{ build_path( $_ ) } } @parents ] : [ $folderid ];
    I've never seen a "return" command used in a conditional this way. It would seem to me that in order for the conditional to be true, the focus would have already returned to the line calling build_path originally. Does this actually mean if "return @parents" would succeed then do X, if not, Y?

    I really appreciate your help. I think this is very close to what I need, but I'm missing some keystone. To make sure I got your code correct, here's the exact script I'm running (in cygwin on a windows system, and on an HPUX system):

    #!/usr/bin/perl -w use strict; my $foldername_file = shift @ARGV; my $folderfolder_file = shift @ARGV; my %folders = (); my %folderpaths = (); open(FNAMES,"$foldername_file") or die "Can't open $foldername_file: $ +!\n"; foreach my $folderline (<FNAMES>) { chomp $folderline; if ($folderline =~ /^Folder (\d{3,5})\s+\-\s+(.*)\.$/) { $folders{$1} = $2; } } close(FNAMES); open(FFINFO,"$folderfolder_file") or die "Can't open $folderfolder_fil +e: $!\n"; my @subfolders = <FFINFO>; my %parents_of; my $parentid; while ( my $line = <FFINFO> ) { if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) { $parentid = $1; } elsif ($line =~ /\s+subfolder\s+(\S+)\s+\-\s+.*\./) { push @{ $parents_of{ $1 } }, $parentid; } } sub build_path { my $folderid = shift @_; my @parents = @{ $parents_of{ $folderid } || [] }; return @parents ? [ map { map { "$_/$folderid" } @{ build_path( +$_ ) } } @parents ] : [ $folderid ]; } foreach my $k (sort (keys (%folders))) { $folderpaths{$k} = build_path($k); print "$k => $_\n" for @{ $folderpaths{$k} }; }

      I think the problem is that what you have still reads everything into @subfolders. Here's the snippet I'm talking about.

      open(FFINFO,"$folderfolder_file") or die "Can't open $folderfolder_fil +e: $!\n"; # This reads in every line from FFINFO my @subfolders = <FFINFO>; my %parents_of; my $parentid; # This loop also wants to read every line, # but the handle is already at EOF, # so it gets nothing. while ( my $line = <FFINFO> ) {

      If you need @subfolders for some purpose outside the code we're talking about, you could still have it. Just declare my @subfolders; at the same scope as $parentid and %parents_of, without initializing it. Then, right inside the FFINFO read loop, push every $line in. It winds up looking like this:

      open(FFINFO,"$folderfolder_file") or die "Can't open $folderfolder_fil +e: $!\n"; my @subfolders; my %parents_of; my $parentid; while ( my $line = <FFINFO> ) { push @subfolders, $line;

      You also had a question about this:

      return @parents ? [ map { map { "$_/$folderid" } @{ build_path( +$_ ) } } @parents ] : [ $folderid ];

      What this means is, "if @parents is non-empty, return first expression (with nested maps), else return the second expression ($folderid alone in an array reference)." Your description leads me to believe that you have the precedence mixed up. What you describe is this:

      ( return @parents ) ? [ X ] : [ Y ];

      What's happening is this:

      return ( @parents ? [ X ] : [ Y ] );

      You can see this kind of thing yourself using B::Deparse. I ran this command line:

      perl -MO=Deparse,-p input-file.pl

      With the -p, it puts parentheses in to clarify how expressions are interpreted. As a result, it showed me this (along with all the other code):

      return((@parents ? [map({map({"$_/$folderid";} @{build_path($_);});} @ +parents)] : [$folderid]));

      That's not pretty, but it does show that the return is "outside" the whole rest of the line.

      I hope this helps. I feel I've written in some haste, so I wouldn't be surprised if I've been unclear. If I have left you with any other questions, feel free to ask.

        Holy smokes! I had to look at this code for about 2 hours to understand it. I see now that build_path always returns a list (array), but usually only with 1 element. I also see that you're using several anonymous arrays for each iteration. It also took me a while to see how you're switching between lists and scalars back and forth. This code is very efficient. It works brilliantly for me now, and I thank you very much for the enlightenment!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://809465]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2015-07-04 08:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (58 votes), past polls