Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

how to find combine common elements of an array?

by ihperlbeg (Novice)
on Apr 05, 2011 at 00:33 UTC ( [id://897418]=perlquestion: print w/replies, xml ) Need Help??

ihperlbeg has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code:
#!/usr/bin/perl -w use strict; my @array = ("11 12","11 13", "9 8 7", "3 4", "11 4") ; my %gs_grp; #print join("\n",(@array)); #recording index of the array for each shared number my $count =0; for my $key (@array){ my @gs = split(/ /,$key); for my $gs(@gs){ push (@{$gs_grp{$gs}},$count); } $count++; } #combine index with shared number my @sp = @array; for my $gs(keys %gs_grp) { my @tmp=(); for my $grp ( @{ $gs_grp{$gs} } ) { next if(scalar @{ $gs_grp{$gs} }==1); push(@tmp,split(/ /, ($array[$grp]))); splice(@sp,$grp,""); } my %duplicate = map { $_ => 1 } @tmp; my @unique = keys %duplicate; push(@sp,join(" ",@unique)); } print join("\n", @sp); print "\n";

The output from the above code is Not correct

The output I want is this:

3 4 11 12 13 9 8 7

Could anyone please help me here? Thanks

Replies are listed 'Best First'.
Re: how to find combine common elements of an array?
by BrowserUk (Patriarch) on Apr 05, 2011 at 03:54 UTC

    I think a slightly simpler approach is to regex directly on the strings, combine the strings and then dedup:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @array = ("11 12","11 13", "9 8 7", "3 4", "11 4") ; ## combine AGAIN: for my $i ( 0 .. $#array ) { for my $j ( 0 .. $#array ) { next if $i == $j; while( $array[ $j ] =~ m[(\d+)]g ) { my $n = $1; if( $array[ $i ] =~ m[\b$n\b] ) { $array[ $i ] .= ' ' . splice @array, $j, 1; goto AGAIN; } } } } ## dedup for ( @array ) { 1 while s[(\b\d+)(?:\s|$)(?=.+\1)][]g; } pp \@array; __END__ c:\test>897418 ["12 13 11 3 4", "9 8 7"]

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      With the recent question, How can sets be recursively concatenated when intersecting with each other, I ended up revisiting these solutions and providing a streamlined answer. I must say that I'm definitely a fan of jaredor's solution below, but I noticed a couple areas where yours could be improved efficiency wise and one potential bug:

      • Efficiency: loop $j from $i+1 to $#array instead of the entire array.
      • Efficiency: using redo instead of goto so that fully reduced elements don't need to be gone over again.
      • bug: dedup will potentially drop numbers that are substrings of other numbers. ie 11 and 111.

      Here's my suggested changes applied to your solution:

      #! perl -slw use strict; use Data::Dump qw[ pp ]; my @array = ("11 12","11 13", "9 8 7", "3 4", "11 4 111") ; ## combine AGAIN: for my $i ( 0 .. $#array ) { for my $j ( $i+1 .. $#array ) { for my $n (split ' ', $array[ $j ]){ if( $array[ $i ] =~ m[\b$n\b] ) { $array[ $i ] .= ' ' . splice @array, $j, 1; redo AGAIN; } } } } ## dedup for ( @array ) { 1 while s[(\b\d+)\s(?=.*\b\1\b)][]g; } pp \@array;
        I noticed a couple areas where yours could be improved efficiency wise and one potential bug:

        That is a big improvement, especially fixing the bug. The redo instead of goto is so obvious ... yet I didn't see it.

        I'm definitely a fan of jaredor's solution below

        Me too. At least, jaredor's second solution. Which I'd missed till now.

        I did see his original solution and my eyes zeroed in on this line:

        my @sg = sort {$a<=>$b} uniq grep {defined} (@v2g{@$i}, min @$i);

        There's a lot of potentially expensive processing going on in that one line. Repeating that every time around the outer loop of a large set of data smacked of "expensive".

        By contrast, his second solution is almost magical in its simplicity and deserves elevating to tutorial.

        I wouldn't mind betting that many of the graphing packages (and their authors) could learn a lot from that unremarkable looking piece of code.

        then realized that there didn't need to be an identified element per subgraph...

        Don't you just love understatement :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: how to find combine common elements of an array?
by wind (Priest) on Apr 05, 2011 at 01:09 UTC

    Basically you're trying to merge all sets with related values.

    Well, the below logic isn't pretty, but it works. Not sure what the problem was with your code

    #!/usr/bin/perl -w use strict; use warnings; my @array = map {[split / /]} ( "11 12", "11 13", "9 8 7", "3 4", "11 4" ); my %index = (); for my $i (0..$#array) { for my $val (@{$array[$i]}) { push @{$index{$val}}, $i; } } for my $i (0..$#array) { my $arr = $array[$i] or next; $array[$i] = undef; my %values = (); for my $val (@$arr) { next if $values{$val}++; for my $ind (@{$index{$val}}) { my $related = $array[$ind] or next; $array[$ind] = undef; push @$arr, @$related; } } print join(' ', sort {$a <=> $b} keys %values), "\n"; }
    Outputs
    3 4 11 12 13 7 8 9
Re: how to find combine common elements of an array?
by jaredor (Priest) on Apr 06, 2011 at 05:23 UTC

    Thanks to inspiration from BrowserUK and wind (though I still have a ways to go before I grok their replies) I think I have something to share if only for terseness.

    #!/usr/bin/env perl use strict; use warnings; use List::Util(qw(min)); use List::MoreUtils(qw(uniq)); my @array = map {[split]} ("11 12", "11 13", "9 8 7", "3 4", "11 4"); my %v2g = (); my %g2v = (); for my $i (@array) { my @sg = sort {$a<=>$b} uniq grep {defined} (@v2g{@$i}, min @$i); my $sg = shift @sg; $g2v{$v2g{$_} ||= $sg}->{$_}++ for @$i; for my $j (@sg) { @v2g{keys %{$g2v{$j}}} = ($sg) x keys %{$g2v{$j}}; @{$g2v{$sg}}{keys %{$g2v{$j}}} = values %{$g2v{$j}}; delete $g2v{$j}; } } print join ("\n", map {join " ", sort {$a<=>$b} keys %$_} values %g2v) +, "\n";

    Output

    3 4 11 12 13
    7 8 9
    

    About the only thing really new to add to the discussion is that this problem seems to be equivalent to finding the disjoint subgraphs of a vertex adjacency matrix of a non-directed graph. So there's probably an algorithm out there that puts this code's efficiency to shame.

      Here's a better version. I was going to wipe my scratchpad and saw that I could do one thing better ... then another ... then realized that there didn't need to be an identified element per subgraph...

      #!/usr/bin/env perl use strict; use warnings; use List::MoreUtils(qw(uniq)); my @array = map {[split]} ("11 12", "11 13", "9 8 7", "3 4", "11 4"); my %g = (); for my $i (@array) { my @v = map {@$_} uniq map {$g{$_} or [$_]} @$i; @g{@v} = (\@v) x @v; } print join ("\n", map {join " ", sort {$a<=>$b} @$_} uniq values %g), +"\n";

      Same output as above, although now the ordering of the lines is fortuitous. (But obviously can be set, if desired, by another sort.)

      Hopefully it's a bit more clear that, since each iteration creates a complete set of associated vertices, the process finds the sets of vertices of the connected subgraphs of an arbitrary graph.

      (Please pardon the enthusiasm, this is the kind of stuff I like to think about.)

        You are too kind Miller, and you flatter me to think that my one-off is worth remembering!

        Such kindness deserves a little extra effort, and so here is a little defensive coding to go along with the main thrust of the code: If you are not sure that there will be no duplication of numbers in any given list, then in general you will not be able to collapse the list to a minimal set of elements. For example, if you just have a single list, '3 3 4', then the code will return a list of '3 3 4', not '3 4'. (However, if the currently duplicated element has been seen before, there will be no duplication.) This potential problem is countered by another uniq statement just before the assignment to @v and is (to my taste) slightly ugly, but necessary if you want to run this on slightly muddier data sets.

        You must be right about the title to this post being too obscure, since I presume you are one of only the six who upvoted this post. With 225 XP left to go, my long, hard slog to curate will only happen by me personally upvoting other posts in "Recent Threads." ;-)

        Thank you again for thinking of this post. Double thanks for crediting me elsewhere!

        --Jared

        P.S. I've only run the code through my somewhat fallible wet-ware, so caveat coder. One of these days I'll be an active Perl coder again and I won't make as many educated guesses as I do now. (Project Euler was my perl outlet for a while, but that site is down now. Gee, I can't recall, did I use the same password for PM as for PE? I hope it wasn't the other way around before, say, May 20th, 2009 ;-)

        P.P.S. EDIT Oops, this was supposed to be a reply to wind.

        I recently came by a question on stackoverflow that mirrored this thread: Merging a list of lists of lists

        It's unfortunate that this thread has such a poor title, as I doubt google is going to find it very often. But I definitely remembered your solution and was able to search through all my past posts to find it.

        I'm still in awe of your use of uniq on array references, so just wanted you to know that I referenced your code. :)

        - Miller

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://897418]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-19 00:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found