Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Sub set where all are connected

by Sanjay (Sexton)
on Nov 22, 2019 at 16:18 UTC ( [id://11109069]=perlquestion: print w/replies, xml ) Need Help??

Sanjay has asked for the wisdom of the Perl Monks concerning the following question:

Have a set where all members are connected directly or indirectly. A-B means A and B are connected directly. Order of A-B or B-A immaterial. A-B and B-C means A and C are indirectly connected. A, B, C, ... are ID nos.

Now want to find the largest collection (sub set) where each is directly connected with each other - if X-Y, X-Z, Y-Z then X, Y and Z are directly connected. If more than one such sub set, then would like to have all such sub sets, if possible.

This seems more a Graph problem than a Set problem. Looked at all the Graph and Set modules but could not find anything. Googled too.

Surprised that this problem is not common. Help appreciated

Replies are listed 'Best First'.
Re: Sub set where all are connected
by bliako (Monsignor) on Nov 22, 2019 at 17:33 UTC

    Perhaps you are looking for cliques in a Graph? Which are defined as the subset of nodes/items which are all connected to each other. Now this is a hard problem. Perl module maybe Graph::Clique if you know the size of clique you are looking for.

    bw, bliako

      The find the largest cliques, one could use the following:

      Update: Nevermind, completely wrong.

Re: Sub set where all are connected
by LanX (Saint) on Nov 22, 2019 at 22:00 UTC
    Inner loop:
    • you start with one point and create a hash for it's component subset and you mark the point there
    • Then you mark all 1st degree neighbors in this hash.
    • Then you mark all neighbors of next degree by finding new neighbors of the last degree
    • and so on till there are no new neighbors

    Outer loop:

    • you check if there are any uncovered points left in your point set.
    • If yes than you start the inner loop again with one of those.

    Main:

    • Once your point set is exhausted you check which of the hashes has most elements.

    No code since your description is messy and - like others remarked - you didn't provide sample data or a test.

    All necessary set operations could be done by deleting hash-slices and counting keys.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      This is my test data:

      @a_list = ( [1,2], [1,3], [1,4], [1,5], [2,3], [2,4], [3,4], [5,6], [5,7], [5,9], [6,9], [7,8], [8,9], );

      ... and here are my expected results:

      1 2 3 1 2 4 1 3 4 2 3 4 5 6 9

      Shamelessly copied from the the pod of Graph::Clique

      I am a bit hesitant to pursue Graph::Clique due to the warning on large result sets. I expect cliques of hundreds of members, if not thousands. Some simplifications or "way to define the problem" may reduce computational difficulty many fold while in no way compromising the problem. Let me think about it. If any interest, I will get back - no guarantees on how good the solution will be!

        #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11109069 use warnings; use List::Util qw( uniq ); my $edges = <<END; [1,2], [1,3], [1,4], [1,5], [2,3], [2,4], [3,4], [5,6], [5,7], [5,9], [6,9], [7,8], [8,9], END $edges =~ s/(?<=\[)[\w,]+(?=\])/ join ',', sort split ',', $& /ge; # +fix order #print "$edges\n"; my %alldirect; my %seen; find( uniq sort $edges =~ /\w+/g ); # start with e +very node sub find { $seen{ "@_" }++ and return; if( my @out = "@_:$edges" =~ /\b(\w+)\b.+\b(\w+)\b.*:(?!.*?\[\1,\2\] +)/s ) { for my $node ( @out ) # pair of unconnected nodes, try without + each one { find( grep $_ ne $node, @_ ); } } else { $alldirect{ "@_" }++; # it is fully +connected } } my @seq = sort keys %alldirect; my %subset; # remove subsets of +supersets for my $sub ( @seq ) { for my $super ( @seq ) { if( length $sub < length $super and !$subset{$super} and "$sub\n$super" !~ /\b(\w+)\b.*\n(?!.*\b\1\b)/ ) # sub node +not found { $subset{$sub}++; last; } } } my @directlyconnected = grep !$subset{$_}, @seq; print "$_\n" for @directlyconnected;;

        Outputs :

        1 2 3 4 1 5 5 6 9 5 7 7 8 8 9

        Note: I think your expected output is wrong. 1 2 3 4 are all strongly connected and belong in the same subset.

        Quick explanation:
        Top down approach. Start with set of all nodes.
        Try to find unconnected pair of nodes, if so, try with two subsets, each with one of those nodes.
        If no unconnected pair, have a directly connected subset!
        Second half eliminates valid subsets of larger valid subsets.

        If you are really looking for cliques - i.e. complete sub-graphs - than my approach won't be of great help. (Though not useless in reducing complexity)

        > I expect cliques of hundreds of members, if not thousands.

        Wait ... How many nodes does your graph have?

        😳

        Probably you are better off investigating the complement graph.

        You could also sort all nodes by the number of their neighbours.

        An n-clique is only possible if there are at least n nodes with at least n-1 neighbors.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        > I expect cliques of hundreds of members, if not thousands.

        the default approach seems to be the Bron–Kerbosch algorithm

        From the complexity estimates given there, a worst case of at least 3**(1000/3) = 7.609e+158 for cliques of size 1000 is to be expected. °

        I.o.W. you should already reserve a table at the The Restaurant at the End of the Universe to take a break in between.

        Another approach would be porting Perl to quantum computers and hoping that Google and IBM are producing more than vapor ware in your lifetime.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        update Nov 29

        °) for the vertex-ordering version

        from Degeneracy_(graph_theory)#Relation_to_other_graph_parameters

        A k-degenerate graph has chromatic number at most k + 1; this is proved by a simple induction on the number of vertices which is exactly like the proof of the six-color theorem for planar graphs. Since chromatic number is an upper bound on the order of the maximum clique, the latter invariant is also at most degeneracy plus one.

Re: Sub set where all are connected
by marto (Cardinal) on Nov 22, 2019 at 16:33 UTC

      No - This is for a separate program though within the same application. How did you make the connection? Just curious! ;-)

Re: Sub set where all are connected
by LanX (Saint) on Nov 23, 2019 at 03:27 UTC

      You're right! Had done something very similar about two decades ago, although to detect loops and using Oracle PL/SQL. IMHO, the following (relational database inspired) method should work.

      1. Arrange all the sets in a table in columns named FROM and TO where the smaller one is in the FROM column - if no such case there is a loop.

      2. Assign all FROM's without an entry in TO as being at LEVEL = 1 and the corresponding TO as at LEVEL = 2

      3. Starting from LEVEL = 2 and incrementing LEVEL by 1 until there are no further records to be assigned a LEVEL: Assign the next LEVEL to the corresponding TO

      4. Start with the records with the largest LEVEL and work backwards from the previous LEVEL. Be careful when there is more than one "parent" at the previous LEVEL - multiple results.

      5. If at end there are records with no LEVEL assigned, then there is a loop

        I'm still puzzled if you want the connected components or the cliques of a graph.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        Please ignore, Wrong

Re: Sub set where all are connected
by tybalt89 (Monsignor) on Nov 22, 2019 at 19:04 UTC

    Where's your test case?

Re: Sub set where all are connected
by bliako (Monsignor) on Nov 23, 2019 at 17:13 UTC

    There are 2 non-Perl methods you may be interested: Cytoscape and R (igraph package). The first one is an easy way to visualise graphs of reasonable size. And it has some algorithms/plugins to investigate their performance for clique finding. R is a league on its own. Probably the most state-of-the-art algorithms will be implemented in R.

Re: Sub set where all are connected
by tybalt89 (Monsignor) on Nov 29, 2019 at 17:04 UTC

    Here's the latest, fastest version of my program. Is it fast enough to solve a 100,000 node case? No.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11109069 # clique use warnings; use List::Util qw( uniq ); my $edges = <<END; # from https://en.wikipedia.org/wiki/Clique_(graph_ +theory) [1,2],[1,3],[1,4], [2,3],[2,4], [3,4], [4,5],[23,4], [5,6],[5,7],[5,8], [6,7], [7,8], [8,9],[10,8], [10,9],[11,9],[12,9],[13,9], [10,13], [11,12],[11,13], [12,13], [13,14], [14,15],[14,21], [15,16],[15,17],[15,19], [16,17], [17,18],[17,19], [18,19],[18,20],[18,21], [19,20],[19,21],[19,22], [20,23], [21,22],[21,23], [22,23], END $edges =~ s/(?<=\[)[\w,]+(?=\])/ join ',', sort split ',', $& /ge; # +fix order print "$edges\n"; my %edges = map +( $_ => '(*FAIL)' ), $edges =~ /\w+,\w+/g; my %cliques; my %seen; find( uniq sort $edges =~ /\w+/g ); # start with e +very node sub find { $seen{ my $set = "@_" }++ and return; if( my @out = $set =~ /\b(\w+)\b.+\b(\w+)\b(??{ $edges{"$1,$2"} || " +" })/ ) { for my $node ( @out ) # pair of unconnected nodes, try without + each one { @_ > 3 and find( grep $_ ne $node, @_ ); } } else { $cliques{ $set }++; # it is fully +connected } } my $uniquecliques = ''; for ( sort { length $b <=> length $a } sort +uniq keys %cliques, map tr/,/ /r, keys %edges ) { my $pattern = " $_ " =~ s/\w+/\\b$&\\b/gr =~ s/ /.*?/gr; $uniquecliques =~ /^$pattern$/m or $uniquecliques .= "$_\n"; } print $uniquecliques;

    Outputs:

    [1,2],[1,3],[1,4], [2,3],[2,4], [3,4], [4,5],[23,4], [5,6],[5,7],[5,8], [6,7], [7,8], [8,9],[10,8], [10,9],[11,9],[12,9],[13,9], [10,13], [11,12],[11,13], [12,13], [13,14], [14,15],[14,21], [15,16],[15,17],[15,19], [16,17], [17,18],[17,19], [18,19],[18,20],[18,21], [19,20],[19,21],[19,22], [20,23], [21,22],[21,23], [22,23], 11 12 13 9 15 16 17 15 17 19 17 18 19 18 19 20 18 19 21 19 21 22 21 22 23 1 2 3 4 10 13 9 10 8 9 13 14 14 15 14 21 20 23 5 6 7 5 7 8 23 4 4 5

    This 23 node case runs in about 0.05 seconds on my machine.

    I am curious about what the real problem is here. Perhaps there is something in the real problem that could lead to partioning or something else that could limit that actual node size per case.

    Also, how many edges are there in your 100,000 node case?

      Not examined the number of edges. As a wild guess I would say the 100,000 node case may have 200,000 to 400,000 edges. Ever since I learnt that it is a hard problem, I lost interest. I am using another method (linear programming) to get (sub)-cliques optimized to maximize some objective function. Works for around 98+% cases. Times out or unfeasible for the rest because of problems with the (free) software or insufficient resources. I was thinking of cliques to offer an alternative solution. Now I will specify that an optimized solution cannot be found. Good luck to my client!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11109069]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-03-29 00:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found