Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Perl Monk, Perl Meditation
 
PerlMonks  

how to find combine common elements of an array?

by ihperlbeg (Novice)
on Apr 05, 2011 at 00:33 UTC ( #897418=perlquestion: print w/ replies, xml ) Need Help??
ihperlbeg has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code:
#!/usr/bin/perl -w use strict; my @array = ("11 12","11 13", "9 8 7", "3 4", "11 4") ; my %gs_grp; #print join("\n",(@array)); #recording index of the array for each shared number my $count =0; for my $key (@array){ my @gs = split(/ /,$key); for my $gs(@gs){ push (@{$gs_grp{$gs}},$count); } $count++; } #combine index with shared number my @sp = @array; for my $gs(keys %gs_grp) { my @tmp=(); for my $grp ( @{ $gs_grp{$gs} } ) { next if(scalar @{ $gs_grp{$gs} }==1); push(@tmp,split(/ /, ($array[$grp]))); splice(@sp,$grp,""); } my %duplicate = map { $_ => 1 } @tmp; my @unique = keys %duplicate; push(@sp,join(" ",@unique)); } print join("\n", @sp); print "\n";

The output from the above code is Not correct

The output I want is this:

3 4 11 12 13 9 8 7

Could anyone please help me here? Thanks

Comment on how to find combine common elements of an array?
Select or Download Code
Re: how to find combine common elements of an array?
by wind (Priest) on Apr 05, 2011 at 01:09 UTC

    Basically you're trying to merge all sets with related values.

    Well, the below logic isn't pretty, but it works. Not sure what the problem was with your code

    #!/usr/bin/perl -w use strict; use warnings; my @array = map {[split / /]} ( "11 12", "11 13", "9 8 7", "3 4", "11 4" ); my %index = (); for my $i (0..$#array) { for my $val (@{$array[$i]}) { push @{$index{$val}}, $i; } } for my $i (0..$#array) { my $arr = $array[$i] or next; $array[$i] = undef; my %values = (); for my $val (@$arr) { next if $values{$val}++; for my $ind (@{$index{$val}}) { my $related = $array[$ind] or next; $array[$ind] = undef; push @$arr, @$related; } } print join(' ', sort {$a <=> $b} keys %values), "\n"; }
    Outputs
    3 4 11 12 13 7 8 9
Re: how to find combine common elements of an array?
by BrowserUk (Pope) on Apr 05, 2011 at 03:54 UTC

    I think a slightly simpler approach is to regex directly on the strings, combine the strings and then dedup:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @array = ("11 12","11 13", "9 8 7", "3 4", "11 4") ; ## combine AGAIN: for my $i ( 0 .. $#array ) { for my $j ( 0 .. $#array ) { next if $i == $j; while( $array[ $j ] =~ m[(\d+)]g ) { my $n = $1; if( $array[ $i ] =~ m[\b$n\b] ) { $array[ $i ] .= ' ' . splice @array, $j, 1; goto AGAIN; } } } } ## dedup for ( @array ) { 1 while s[(\b\d+)(?:\s|$)(?=.+\1)][]g; } pp \@array; __END__ c:\test>897418 ["12 13 11 3 4", "9 8 7"]

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      With the recent question, How can sets be recursively concatenated when intersecting with each other, I ended up revisiting these solutions and providing a streamlined answer. I must say that I'm definitely a fan of jaredor's solution below, but I noticed a couple areas where yours could be improved efficiency wise and one potential bug:

      • Efficiency: loop $j from $i+1 to $#array instead of the entire array.
      • Efficiency: using redo instead of goto so that fully reduced elements don't need to be gone over again.
      • bug: dedup will potentially drop numbers that are substrings of other numbers. ie 11 and 111.

      Here's my suggested changes applied to your solution:

      #! perl -slw use strict; use Data::Dump qw[ pp ]; my @array = ("11 12","11 13", "9 8 7", "3 4", "11 4 111") ; ## combine AGAIN: for my $i ( 0 .. $#array ) { for my $j ( $i+1 .. $#array ) { for my $n (split ' ', $array[ $j ]){ if( $array[ $i ] =~ m[\b$n\b] ) { $array[ $i ] .= ' ' . splice @array, $j, 1; redo AGAIN; } } } } ## dedup for ( @array ) { 1 while s[(\b\d+)\s(?=.*\b\1\b)][]g; } pp \@array;
        I noticed a couple areas where yours could be improved efficiency wise and one potential bug:

        That is a big improvement, especially fixing the bug. The redo instead of goto is so obvious ... yet I didn't see it.

        I'm definitely a fan of jaredor's solution below

        Me too. At least, jaredor's second solution. Which I'd missed till now.

        I did see his original solution and my eyes zeroed in on this line:

        my @sg = sort {$a<=>$b} uniq grep {defined} (@v2g{@$i}, min @$i);

        There's a lot of potentially expensive processing going on in that one line. Repeating that every time around the outer loop of a large set of data smacked of "expensive".

        By contrast, his second solution is almost magical in its simplicity and deserves elevating to tutorial.

        I wouldn't mind betting that many of the graphing packages (and their authors) could learn a lot from that unremarkable looking piece of code.

        then realized that there didn't need to be an identified element per subgraph...

        Don't you just love understatement :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: how to find combine common elements of an array?
by jaredor (Deacon) on Apr 06, 2011 at 05:23 UTC

    Thanks to inspiration from BrowserUK and wind (though I still have a ways to go before I grok their replies) I think I have something to share if only for terseness.

    #!/usr/bin/env perl use strict; use warnings; use List::Util(qw(min)); use List::MoreUtils(qw(uniq)); my @array = map {[split]} ("11 12", "11 13", "9 8 7", "3 4", "11 4"); my %v2g = (); my %g2v = (); for my $i (@array) { my @sg = sort {$a<=>$b} uniq grep {defined} (@v2g{@$i}, min @$i); my $sg = shift @sg; $g2v{$v2g{$_} ||= $sg}->{$_}++ for @$i; for my $j (@sg) { @v2g{keys %{$g2v{$j}}} = ($sg) x keys %{$g2v{$j}}; @{$g2v{$sg}}{keys %{$g2v{$j}}} = values %{$g2v{$j}}; delete $g2v{$j}; } } print join ("\n", map {join " ", sort {$a<=>$b} keys %$_} values %g2v) +, "\n";

    Output

    3 4 11 12 13
    7 8 9
    

    About the only thing really new to add to the discussion is that this problem seems to be equivalent to finding the disjoint subgraphs of a vertex adjacency matrix of a non-directed graph. So there's probably an algorithm out there that puts this code's efficiency to shame.

      Here's a better version. I was going to wipe my scratchpad and saw that I could do one thing better ... then another ... then realized that there didn't need to be an identified element per subgraph...

      #!/usr/bin/env perl use strict; use warnings; use List::MoreUtils(qw(uniq)); my @array = map {[split]} ("11 12", "11 13", "9 8 7", "3 4", "11 4"); my %g = (); for my $i (@array) { my @v = map {@$_} uniq map {$g{$_} or [$_]} @$i; @g{@v} = (\@v) x @v; } print join ("\n", map {join " ", sort {$a<=>$b} @$_} uniq values %g), +"\n";

      Same output as above, although now the ordering of the lines is fortuitous. (But obviously can be set, if desired, by another sort.)

      Hopefully it's a bit more clear that, since each iteration creates a complete set of associated vertices, the process finds the sets of vertices of the connected subgraphs of an arbitrary graph.

      (Please pardon the enthusiasm, this is the kind of stuff I like to think about.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://897418]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (11)
As of 2014-04-16 20:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (434 votes), past polls