Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

treat pairs, set as one item: looking for better way

by remiah (Hermit)
on Oct 19, 2013 at 03:34 UTC ( #1058866=perlquestion: print w/replies, xml ) Need Help??
remiah has asked for the wisdom of the Perl Monks concerning the following question:

Hello perlmonk.

there was data like this.

my @key_array = 'a' .. 'z'; my @set_array = ( ['a', 'b'], ['e','f'], ['f','g'] );
(a,b) means "a,b" are one set. (e,f) and (f,g) means "e,f,g" are one set.

I would like to print them per "set", and I would like to print "set which has more items" comes first.
I mean like this.
my code below seems to me not smart. And in fact, I sorted the results with another script... I sorted results by their length...
It was o.k. at that time, but I think monks have more better ways and I would like to hear your suggestions.
use strict; use warnings; use Data::Dumper; my @key_array = 'a' .. 'z'; my @set_array = ( ['a', 'b'], ['e','f'], ['f','g'] ); #return it's id if they already exists in $set_hash #%set_hash should be like { a=>1, b=>1, e=>2, f=>2, g=>2} sub search_id { my( $set_hash, $k1,$k2)=@_; if ( exists( $set_hash->{$k1} ) ){ return $set_hash->{$k1}; } elsif ( exists( $set_hash->{$k2} ) ){ return $set_hash->{$k2}; } else { return 0; } } my (%key_hash, %set_hash, %set_hash_reverse, $id); $key_hash{$_}=0 for @key_array; #0 for initial, increment if they were + sets which has more than one item $id=1; for ( @set_array ){ my ($k1,$k2) = ($_->[0], $_->[1]); if ( exists($key_hash{$k1}) && exists($key_hash{$k2}) ){ if ( my $searched_id = search_id( \%set_hash, $k1,$k2) ){ $set_hash{ $k1 } = $searched_id; $set_hash{ $k2 } = $searched_id; } else { $set_hash{ $k1 } = $id; $set_hash{ $k2 } = $id; $id++; } #increment $key_hash{$k1}++; $key_hash{$k2}++; } else { #no set or both set. die for error; die "there is $k1 but no $k2." if ( exists($key_hash{$k1}) && (!exists($key_hash{$k2})) ) +; die "there is $k2 but no $k1." if ( exists($key_hash{$k2}) && (!exists($key_hash{$k1})) ) +; } } #{a=>1,b=>1, e=>2,f=>2,g=>2} to {1 =>[a,b],2=>[e,f,g]} for (keys %set_hash){ push @{ $set_hash_reverse{$set_hash{$_}} }, $_; } #print out for (sort {$b <=> $a} keys %set_hash_reverse){ print join(',', @{$set_hash_reverse{$_}} ), "\n"; } for ( grep {$key_hash{$_} == 0 } sort {$a cmp $b} keys %key_hash ){ print "$_\n"; }
I hope this makes sense. regards.

Replies are listed 'Best First'.
Re: treat pairs, set as one item: looking for better way
by kcott (Chancellor) on Oct 19, 2013 at 07:00 UTC

    G'day remiah,

    I wasn't entirely certain how @key_array factored into your sets: they're clearly not pairs (as per the title); however, they are included in the output you show. That led me to thinking whether you're really dealing with just pairs or if that was meant as a simplified example. Anyway, the following code should handle whatever you intended.

    #!/usr/bin/env perl -l use strict; use warnings; use List::Util qw{first}; my @key_array = 'a' .. 'i'; my @set_array = ( ['a', 'b'], ['e','f'], ['f','g'] ); my %sets; for my $set_ref (@set_array, map {[$_]} @key_array) { my $key = first { exists $sets{item}{$_} } @$set_ref; my $set_data = defined $key ? $sets{item}{$key} : do { push @{$sets{data}}, {}; $sets{data}[-1] }; for my $item (@$set_ref) { $sets{item}{$item} = $set_data unless exists $sets{item}{$item +}; ++$set_data->{$item}; } } for (sort { @$b <=> @$a } map { [ sort keys %$_ ] } @{$sets{data}}) { print join ',' => @$_; }


    e,f,g a,b c d h i

    So, all the sets are in the @{$sets{data}} array. Each element is a hash. Each hash has the items of the set as its keys and a count of those items as the values. I wasn't sure if you wanted a count (some of your commented out code indicated you might); the code above hasn't used the value: ignore it if you don't need it.

    $sets{data} looks like this:

    [ { a => 2, b => 2 }, { e => 2, f => 3, g => 2 }, { c => 1 }, { d => 1 }, { h => 1 }, { i => 1 }, ]

    Individual items are the keys of the %{$sets{item}} hash; the values are references to the elements in the @{$sets{data}} array.

    $sets{item} looks like this:

    { a => $sets{data}[0], b => $sets{data}[0], c => $sets{data}[2], d => $sets{data}[3], e => $sets{data}[1], f => $sets{data}[1], g => $sets{data}[1], h => $sets{data}[4], i => $sets{data}[5], }

    As you can see, 'a' and 'b' point to a single set; 'c' and 'd' point to two different sets; 'e', 'f' and 'g' point to a single set, and so on.

    You were also asking about sorting. I've sorted the sets by the number of elements to get the largest sets first (descending, numerical):

    sort { @$b <=> @$a }

    I've also sorted the items within each set (ascending, string):

    sort keys %$_

    You should note that the data you provided as input was inherently sorted (i.e. 'a' .. 'z' and the set pairs). If your real input is unordered, and you want sets of the same size to be ordered, you'll need an additional sort.

    -- Ken

      Hello kcott
      I could get really deep responses with this thread.

      I didn't notice that this is graph, until LanX said that, and your item and data are "adjacency list" with reference + count information. "item" is vertex which has reference to edge information,and "data" is edge data storage.

      Your usage of reference: reference from "item" to "data" seems to me an essence I would like to use by myself...

      I would like to have some more time to stick around this code, and thanks for your kind explanation.


Re: treat pairs, set as one item: looking for better way
by LanX (Bishop) on Oct 19, 2013 at 10:30 UTC
    Hello remiah!

    It's a classical problem from graph theory.

    Your pairs are edges and you wanna determine connectivity resp. connected components.

    Don't know if the other posters are aware of edge cases (your example data is too simple to show them).

    But there are already efficient (and proven) algorithms available!

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Hello LanX ! and oh my god !!!

      #!/usr/bin/perl use Graph::Undirected; use strict; use warnings; my $g = Graph::Undirected->new; my @key_array = 'a' .. 'z'; my @set_array = ( ['a', 'b'], ['e','f'], ['f','g'] ); $g->add_vertex( $_ ) for @key_array; $g->add_edge( $_->[0], $_->[1] ) for @set_array; print "$_\n" for sort { length($b) <=> length($a) || $a cmp $b } map {join( ',', @$_) } $g->connected_components;
      I totally forgot about Graph module until you tell me.
      I would like to read your pages you gave me.

      Really thanks for your reply.

Re: treat pairs, set as one item: looking for better way
by LanX (Bishop) on Oct 19, 2013 at 11:59 UTC
    This should be safe! (compare also algorithm of tarjan).

    Plz note thiss should also work with sets which are bigger than just pairs.

    use strict; use warnings; use Data::Dump qw/pp/; my @key_array = 'a' .. 'z'; my @set_array = ( ['a', 'b'], ['e','f'], ['f','g'] ); my (%neighbors,%group,%mark); # collect all neighbors per vertex for my $pair (@set_array) { for my $vert (@$pair){ push @{$neighbors{$vert}},@$pair; } } #pp \%neighbors; # mark all connected vertices with same group-id my $group=1; for my $vert (@key_array) { mark($vert,$group++); } #pp "group",\%group,"mark",\%mark; # sort result by size and alphabetically pp sort {@$b<=>@$a or $a->[0] cmp $b->[0]} values %group; sub mark { # recursivly (deep) mark neighbors my ($vert,$group)=@_; return if $mark{$vert}; $mark{$vert}=$group; push @{$group{$group}},$vert; for my $neighbor (@{$neighbors{$vert}}) { mark($neighbor,$group) } }


    ( ["e", "f", "g"], ["a", "b"], ["c"], ["d"], ["h"], ["i"], ["j"], ["k"], ["l"], ["m"], ["n"], ["o"], ["p"], ["q"], ["r"], ["s"], ["t"], ["u"], ["v"], ["w"], ["x"], ["y"], ["z"], )

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: treat pairs, set as one item: looking for better way
by Anonymous Monk on Oct 19, 2013 at 06:50 UTC

    Another way:

    my @key_array = 'a' .. 'z'; my @set_array = ( ['a', 'b'], ['e','f'], ['f','g'] ); my %set = map {$_ => {$_ => 1}} @key_array; for my $s (@set_array) { @{$set{$_}}{@$s} = (1) x @$s for @$s; } my $saw = ''; print map "$_\n", grep $saw =~ s/^(?!.*$_)/$_ /, sort {length $b <=> length $a || $a cmp $b} map join(',', sort keys %$_), values %set;

      Hello, thanks for reply.
      It took me some time to understand this.

      #!usr/bin/perl use strict; use warnings; use Data::Dumper; my @key_array = 'a' .. 'z'; my @set_array = ( ['a', 'b'], ['e','f'], ['f','g'],['x','y'] ); # {a =>{a=>1}, b=>{b=>1}, ....} my %set = map {$_ => {$_ => 't'}} @key_array; for my $s (@set_array) { @{$set{$_}}{@$s} = (1) x @$s for @$s; =my_memo #same as ... for (@$s){ @{$set{$_}}{@$s} = (1) x @$s } #fill hash value at one time my %h; @h{ 1 .. 5 }= 'A' ..'F'; #(1) x @$s if @$s === (a,b) , (1) x @$s === (1,1) #vertex which has edge information .. #{ a=>{a=>1,b=>1}, b=>{a=>1,b=>1} ... =cut } my $saw = ''; print map "$_\n", grep $saw =~ s/^(?!.*$_)/$_ /, =my_memo # ??? how it can remove duplicate like "a,b","a,b" or "e,f,g","e, +f","f,g" here comes "e,f,g", $saw='', ==> $saw="e,f,g" ==> +true #saw doesn't have "" replace ^ to "e,f,g" "a,b" $saw="e,f,g" ==> $saw="a,be,f,g" ==> +true #saw doesn't have "a,b", replace ^ to "a,b "a,b" $saw="a,be,f,g" ==> $saw="a,be,f,g" = +=> false #saw have "a,b" not match; "e,f" $saw="a,be,f,g" ==> $saw="a,be,f,g" = +=> false #saw have "e,f" not match; ... removes edge already seen ... really great. =cut sort {length $b <=> length $a || $a cmp $b} #sort by length + desc || a cmp b map join(',', sort keys %$_), #"a,b" ... sort +ed values %set; # values => {a= +>1,b=>1}, {a=>1,b=>1},{c=>1},{d=>1},,,
      short and great!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1058866]
Approved by Athanasius
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2018-06-23 01:14 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.