Do you know where your variables are? PerlMonks

by samtregar (Abbot)
 on May 17, 2005 at 18:18 UTC Need Help??
samtregar has asked for the wisdom of the Perl Monks concerning the following question:

Ahoy Monks. I need an algorithm, I just don't know which one. Here's my problem:

• I have a cyclic directed graph.
• Evaluating the linking of nodes is expensive.
• The graph may be very large.
• Most nodes do not participate in cycles.
• There are many types of nodes, each with their own linking constraints.
• My job is to load the graph from disk into a relational database where links work via sequentially allocated IDs.

My problem is that cycles give my code indigestion, because I'm trying to evaluate the graph by looking through nodes in order (which is more-or-less random). When I find a case that's causing a cycle (i.e. an A node is linking through a B node back to A via the C link type) I go into the code for type A and apply a local fix. This usually means I create the ID for A early so that B can find it at the right point.

Obviously my solution isn't a good one. It takes time every time a new circular link condition pops up. It's also a ticking bomb since I have no confidence that I've really found them all.

This seems like a problem that should have a known comp-sci solution. Does anyone know it? Links to references are fine.

Thanks,
-sam

• Comment on In search of an algorithm for loading cyclic graphs

Replies are listed 'Best First'.
by kvale (Monsignor) on May 17, 2005 at 18:50 UTC
One common algorithm for converting a directed graph into an ordered set (i.e., your sequential IDs) is a topological sort. The problem here is that you have cycles, which generate ambiguous sorts. Thus you cannot accomplish your job as specified.

Your 'local fix' is to break the cycle by supplying a point outside the set of sequenial IDs. If this is kosher, then I would attempt to solve your problem in three steps.

1. First, do a breadth-first search of the graph starting from the root node. When you encounter a cycle (a node previously searched), delete that link from the graph and save the link for later use. After all nodes have been searched, you are left with an acyclic graph, or tree.
2. Second, topologically sort this tree into a sequential set of IDs.
3. Finally, add back in the deleted links you stored from the first step.
Update: fixed a couple of typos.

-Mark

Thanks, this makes sense. I'm concerned about how this will perform on large graphs but perhaps the only way to find out is to try it. The advantage of my current approach is that it only makes one pass over the data and it's hard to beat that for speed...

-sam

by dragonchild (Archbishop) on May 17, 2005 at 18:26 UTC
• I have a cyclic directed graph.

[snip]

• My job is to load the graph from disk into a relational database where links work via sequentially allocated IDs.

If you have A -> B -> A, what do you expect your table to look like? I don't think that using sequential IDs will work that well ...

If, however, you are using three tables (One for nodes, one for linktypes, and one for node/node/link), then this is trivial:

But, if it was that easy, you wouldn't be asking the question. *shrugs*

• In general, if you think something isn't in Perl, try it out, because it usually is. :-)
• "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
If you have A -> B -> A, what do you expect your table to look like? I don't think that using sequential IDs will work that well ...

```Table A

a_id | b_id
-----------
1      1

Table B

b_id | a_id
-----------
1      1

Each table has its own pool of sequential IDs, like most databases.

If, however, you are using three tables (One for nodes, one for linktypes, and one for node/node/link), then this is trivial:

Nope, it's not that simple. Each node type has its own table structure and most of them are more complicated than that.

-sam

I'm not understanding something because that looks like you're creating a table for each node type and the rows of said tables are the edges.

Personally, I would create the tables for the node types and have a completely separate table that actually demonstrates the links between the nodes. I would also treat the nodes both in one tables (called nodes) and in their separate tables who FK to nodes. Kinda like a base class and child classes. That way, you can handle it as a 3-table graph when convenient, but you can also get to the data when needed.

• In general, if you think something isn't in Perl, try it out, because it usually is. :-)
• "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
I'm not sure where the problem is here. If you know that the ids are unique within each table, and you know the table to be looking in for a given id (from the field name), doesn't that uniquely identify what each field is trying to point to?

If not, why not just assign each table a starting number, with room for a few million (or however many seems appropriate) rows between each starting number?

What kind of 'indigestion' does your code get?

by lupey (Monk) on May 17, 2005 at 18:58 UTC
I'm not sure if this is what you need but have you looked at the CPAN module Graph?

http://search.cpan.org/~jhi/Graph-0.65/lib/Graph.pod

lupey

I second that suggestion regarding Graph. From the user side, I used it recently to Order your autobundle by dependency and it had all the right parts. You may also find something interesting in the innards, although I didn't venture there myself.
by exussum0 (Vicar) on May 17, 2005 at 18:57 UTC
Kruskal, Prim.. there are many spanning tree algorithms. Create a spanning tree that is directed from one point and ends at another. Any edge you add will create a cycle. Make your DB follow that.

----
Give me strength for today.. I will not talk it away..
Just for a moment.. It will burn through the clouds.. and shine down on me.

by dynamo (Chaplain) on May 17, 2005 at 18:32 UTC

What do you mean when you say that you 'apply a local fix' - how is what you describe a 'fix' for anything? (You still end up with the cycles in the database, which I gather you want to do)
Why would you be creating the id for A 'early' when you would normally generate the ids in order?
How does this differ from not applying said fix?

What do you mean when you say that you 'apply a local fix' - how is what you describe a 'fix' for anything? (You still end up with the cycles in the database, which I gather you want to do)

I mean I fix the code so it doesn't die() or loop forever anymore. A typical break is either an error about not being about the find the object linked to, or worse, an infinite loop loading the same nodes over and over.

The cycles are perfectly valid, so I'm not trying to weed them out.

Why would you be creating the id for A 'early' when you would normally generate the ids in order?

Here's the "normal" course of events in my system:

• Create an object, unsaved without an ID.
• Fill it with data. Some of the data may include the IDs of linked objects.
• Save the object, creating an ID for it.

That works great until the code finds a cycle. When I run into a cycle I fix it to:

• Create an object, unsaved without an ID.
• Fill it with data, but not the data that might cycle.
• Save the partially filled object, creating an ID for it.
• Fill in the possibly cyclic pieces.
• Save the completed object.

The problem is that doing this for each node type when I find a break is a real pain.

-sam

Create A New User
Node Status?
node history
Node Type: perlquestion [id://457924]
Approved by xorl
Front-paged by perrin
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2018-04-20 01:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My travels bear the most uncanny semblance to ...

Results (75 votes). Check out past polls.

Notices?