http://www.perlmonks.org?node_id=927910

azheid has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

Alright, I do not have any formal training in computer science, so bear with me please. I use perl code in my job to do complex sorts and data mining. I have a job which I think is best handled with recursion, yet I have never really used this technique in perl, so I am getting lots of errors. What I have is a tab delimited file with three columns.

Basically, the first column gives the name of a network element, the second column gives a related network element, and the third column gives how the two are related. The only options for the third column are activate or inhibit. I want to systematically inhibit all network objects(one at a time) and output all the affected objects. The tab delimited file is called useful_dat.tab.

The recursive code that I have is as follows,
my @activated; my @inhibited; my $object=test_element; &inhibited_rec($object); print "@activated\n"; print "@inhibited\n"; sub inhibited_rec{ my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`; foreach (@a){ my @b=split(/\t/,$_); if($b[1] eq 'Activation'){ } else{ push (@activated,$b[0]); &activated_rec($b[0]); } } } sub activated_rec{ my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`: foreach (@a){ my @b=split(/\t/,$_); if($b[1] eq 'Activation'){ push (@activated,$b[0]); &activated_rec($b[0]); } else{ push (@inhibited,$b[0]); &inhibited_rec($b[0]); } } }

Some of the details of the code are not included here for clarity. The errors I get involve the variable declarations @a,@b,@activated,@inhibited. Can you please help me figure out how to do this correctly? Or if what I want to do is not possible/very hard in perl, please let me know.

Replies are listed 'Best First'.
Re: Perl Recursion
by toolic (Bishop) on Sep 26, 2011 at 18:06 UTC
    my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`:
    That's a compile error. I assume you meant (with a semicolon):
    my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`;
    my $object=test_element;
    use strict and warnings. Should that be... ?
    my $object = 'test_element';
    • Show the exact errors messages you get.
    • Show us a few lines of useful_dat.tab
Re: Perl Recursion
by jwkrahn (Abbot) on Sep 27, 2011 at 00:00 UTC
    my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`: foreach (@a){ my @b=split(/\t/,$_); if($b[1] eq 'Activation'){

    You don't chomp the lines from useful_dat.tab so "Activation\n" will never be equal to 'Activation'.

    You should probably store the contents of useful_dat.tab in a hash instead of using an external process to read the same file multiple times.

Re: Perl Recursion
by zentara (Archbishop) on Sep 26, 2011 at 17:59 UTC
Re: Perl Recursion
by Limbic~Region (Chancellor) on Sep 26, 2011 at 18:18 UTC
    azheid,
    It sounds like your tab delimited file is a text representation of a tree. I believe you have implied but not explicitly stated that if A is related to B and B is related to C that A is related to C. If that's the case, and your file looks like the following:
    object_1 object_2 active object_2 object_3 active

    You need to first identify which of the two columns (1 or 2) represents the parent in the relationship. Next, you need to identify the root(s), and build the tree in memory (assuming the file will fit in memory). Once you have done that, you just need to find the node you want to deactivate and walk the tree below that point deactivating all attached nodes.

    Cheers - L~R

Re: Perl Recursion
by azheid (Sexton) on Sep 27, 2011 at 22:15 UTC

    Thanks monks, I think I have a piece of working code. I still get a few errors, but I think it is from inconsistencies in the datafile. I will post the final code below, and if someone finds more errors I will check back periodically.

    #!/usr/bin/perl use strict; use warnings; # prints out inhibited object, directly activated ohjects, directly de +activated objects, indirectly activated objects, indirectly deacivate +d objects print "Modified Object\tActivated Objects\tDeactivated Objects\n"; open(DAT,'<',"./useful_dat.tab")||die "Canot open file\n"; my @file=<DAT>; close DAT; shift @file;##remove header my @activated; my @inhibited; my %true; foreach my $line(@file){ my @line=split(/\t/,$line); if($true{$line[0]}){ next;#if element has already come up, skip it } else{ $true{$line[0]}=1;#if object has already come up, store true } print "$line[0]\t";#inhibited object &inhibited_rec($line[0]); foreach(@activated){ print "$_,"; } print "\t"; foreach(@inhibited){ print "$_,"; } print "\n"; } sub inhibited_rec{ my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`; foreach (@a){ my @b=split(/\t/,$_); chomp $b[1]; if($b[1] eq 'Activation'){ } else{ push (@activated,$b[0]); if($true{$b[0]}){ next; } else{ $true{$b[0]}=1; &activated_rec($b[0]); } } } } sub activated_rec{ my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`; foreach (@a){ my @b=split(/\t/,$_); chomp $b[1]; if($b[1] eq 'Activation'){ push (@activated,$b[0]); if($true{$b[0]}){ next; } else{ $true{$b[0]}=1; &activated_rec($b[0]); } } else{ push (@inhibited,$b[0]); if($true{$b[0]}){ next; } else{ $true{$b[0]}=1; &inhibited_rec($b[0]); } } } }
Re: Perl Recursion
by azheid (Sexton) on Sep 27, 2011 at 20:07 UTC

    Yes, this is a tab delimited version of a tree. However, the tree is, at some points, circular. So object A might activate object A.

    I am sorry if I did not include enough of the code, I was trying to make it simpler. Yes the line my $object=test_element; should read my $object='test_element';

    The errors are as follows:

    "my" variable @b masks earlier declaration in same scope at interact_p +erturb.pl line 56. "my" variable @b masks earlier declaration in same scope at interact_p +erturb.pl line 60. Global symbol "@activated" requires explicit package name at interact_ +perturb.pl line 45. syntax error at interact_perturb.pl line 51, near "`awk '\$1 ~ /$_[0]/ +' useful_dat.tab |cut -f 2,3|sort|uniq`:" Global symbol "@a" requires explicit package name at interact_perturb. +pl line 52. Missing right curly or square bracket at interact_perturb.pl line 64, +at end of line Execution of interact_perturb.pl aborted due to compilation errors.

    Perhaps it would be better to include the actual version of the code I am using. I apologize if it is difficult to read. Thanks for your help

    #!/usr/bin/perl use strict; use warnings; # prints out inhibited object, directly activated ohjects, directly de +activated objects, indirectly activated objects, indirectly deacivate +d objects print "Modified Object\tActivated Objects\tDeactivated Objects\n"; open(DAT,'<',"./useful_dat.tab")||die "Canot open file\n"; my @file=<DAT>; close DAT; shift @file;##remove header my %true; foreach my $line(@file){ my @line=split(/\t/,$line); if($true{$line[0]}){ next;#if element has already come up, skip it } else{ $true{$line[0]}=1;#if object has already come up, store true } my @activated; my @inhibited; print "$line[0]\t";#inhibited object &inhibited_rec($line[0]); foreach(@activated){ print "$_,"; } print "\t"; foreach(@inhibited){ print "$_,"; } print "\n"; } sub inhibited_rec{ my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`; foreach (@a){ my @b=split(/\t/,$_); if($b[1] eq 'Activation'){ } else{ push (@activated,$b[0]); &activated_rec($b[0]); } } } sub activated_rec{ my @a=`awk '\$1 ~ /$_[0]/' useful_dat.tab |cut -f 2,3|sort|uniq`: foreach (@a){ my @b=split(/\t/,$_); if($b[1] eq 'Activation'){ push (@activated,$b[0]); &activated_rec($b[0]); } else{ push (@inhibited,$b[0]); &inhibited_rec($b[0]); } }
      Change the colon to a semicolon, like I already told you. That will get rid some of the compile errors. Moving these out of the foreach will get rid of the other compile errors:
      my @activated; my @inhibited;
        Sorry I did not see that. Thank you.

      First thing I noticed: there's no need to slurp the entire file into an array if you're going to loop through it line-by-line anyway, so this:

      open(DAT,'<',"./useful_dat.tab")||die "Canot open file\n"; my @file=<DAT>; close DAT; shift @file;##remove header foreach my $line(@file){

      would be better:

      open(DAT,'<',"./useful_dat.tab")||die "Canot open file\n"; <DAT>; # discard a line while(my $line = <DAT>){ #loop through the rest

      As for the rest: Ok, you're looping through the file, and for each line, you call inhibited_rec(), which calls awk to loop through the same file again, looking for a match on the first fields, then for each line, if the second field isn't 'Activation', you call activated_rec() which uses awk to loop through the same file again, perhaps recursively..... Ouch. It seems to me you could end up running your file through that pipeline at least as many times as the number of lines in the file squared, if not cubed or more.

      Loop through the file once, and save the data into a structure that'll allow you to compare fields as you like. Perhaps an array of arrays, or a hash of hashes, or whatever makes sense for your data. If necessary, make a couple copies of it in different variables, so you can compare them to each other to find your circular connections (maybe not necessary, but might make it easier to understand what you're doing).

        You are absolutely correct. In fact, an array of arrays was exactly how I started. However, since the size of the file is not large (and my computer is really awesome), the additional time required to code such a data structure would not be worth it, at least for someone who codes on the side like me. Yes, I coped out, I admit it. :-)