Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

A preliminary stab at Flow-Based Programming

by Masem (Monsignor)
on Mar 05, 2002 at 05:49 UTC ( #149294=perlmeditation: print w/replies, xml ) Need Help??

Given some recent discussions and the availability of XML::SAX::Machines, I took a stab at getting a basic Flow Based Programming system up and running, which I've completed tonight (woohoo!). At some later point, after I've done some code tidying up, I'll provide the full code, but as an example of what I've got going so far, the flow assembly code looks something like:
#!/usr/bin/perl -w use strict; use XML::SAX::Machines qw( Machine ); use Language::Flow::Simple::Reader; use Language::Flow::Simple::Writer; use Language::Flow::Simple::Chomper; use Language::Flow::Simple::Counter; use Language::Flow::Simple::Merger; use Language::Flow::Simple::Constant; use Language::Flow::Simple::Sprintf; use XML::Filter::SAXT; my $reader = Language::Flow::Simple::Reader->new; my $writer = Language::Flow::Simple::Writer->new; my $chomper = Language::Flow::Simple::Chomper->new; my $counter = Language::Flow::Simple::Counter->new; my $merger = Language::Flow::Simple::Merger->new; my $constant = Language::Flow::Simple::Constant->new; my $sprintf = Language::Flow::Simple::Sprintf->new; my $merger2 = Language::Flow::Simple::Merger->new; my $m = Machine ( [ Intake => $reader => qw( B ) ], [ B => $chomper => qw( T ) ], [ T => XML::Filter::SAXT => qw( N I C ) ], [ C => $counter => qw( D ) ], [ D => $sprintf => qw( M ) ], [ I => $constant => qw( M ) ], [ M => $merger => qw ( N ) ], [ N => $merger2 => qw ( OUT ) ], [ OUT => $writer ] ); #for ( $m->parts ) { $_->preprocess() }; $m->parse(); #for ( $m->parts ) { $_->postprocess() };
The naming of the components should be self-explainitory, and while the arguments to those are currently hard coded, there's no reason for them to be as such. What this code does above is take in a file, read line by line, and add a formatted line number and a tab stop before the line, and print out to stdout. Certainly not complicated, but the tricky part was to handle the merge points when you bring the data that you split off into one piece.

The basic concept I've got here is that you send 'chunks' of data around the framework; the chunks are transferred as XML (they could be anything, but there's no reason why, with XML as the data language, that flow can't flow to a networked computer and then flow back); however, chunks only are meant to represent a small piece of data, such as a single line in a file. Because of this, while all the basic components of the system above are XML::SAX:;Bases, the only function a user would typically need to overload is a recieve_chunk function as demonstrated below for the sprintf component:

package Language::Flow::Simple::Sprintf; use Language::Flow::Base::Component; use Data::Dumper; BEGIN { @ISA = qw( Language::Flow::Base::Component ); } my $counter = 0; sub recieve_chunk { my $self = shift; my $chunk = shift; my $data = $chunk->get_data(); $self->emit_chunk( sprintf( "%4d", $data ), "string", $chunk->get_history(), { node=>"LFSSprintf", id=>$counter++ } ); } 1;
(All the SAX event functions are buried away in the base class). The tricky part is the history features, as indicated in the last part of the arg list for emit_chunk, as it's necessary for the programmer to include new histories to make sure that components like the merger work right. (The merger looks at some past history point to decide when two chunks should be merged).

Again, this is only a start. I will post full code once I've tidied up what I've gotten and made some improvements. For example, see how I have to use XML::Filter::SAXT to split the stream into multiple parts. I should be able to build something similar in my component system. While I use SAXT, I have commented out functions that would be called on all components before and after the run which could normally be used to reset counters and free resources if needed.

Of course, at some point, it would be best to have a textual way to describe the Machine without using perl code. That's well in the future, but should be a simple addition when the framework is set.

Part of the reason that I post this know is that there's talk on the perl-xml list on XML Pipelines. Now, while I think there's an overlap, it's not the same as what I'm trying to do here; my impression is that with XML Pipelines, you feed in whole documents at a time, and you process at the document level, leaving character-level or other changes up to other transitions like XSLT and the like. This is probably great for, as some of matts preliminary examples suggest, setting up easy document conversion for a web server. I think my approach here is more finely grained, and thus may not be as appropriate for that purpose, but my initial idea when working on this way back even in Java was for a rapid 'scripting' type of programming.

Any initial comments are welcome. AGain, this is only a starting point, and there's a lot of work to get what I consider to be the key base down and stable. Once that's in place, then additions should be very easy to do.

Dr. Michael K. Neylon - || "You've left the lens cap of your mind on again, Pinky" - The Brain
"I can see my house from here!"
It's not what you know, but knowing how to find it if you don't know that's important

Replies are listed 'Best First'.
Re: A preliminary stab at Flow-Based Programming
by Matts (Deacon) on Mar 05, 2002 at 10:37 UTC
    Way cool! Sorry, don't have much to add or say, but this looks really neat. Of course component based programming is a pipe dream, but it's interesting to see this stuff coming together.
      I do believe that FBP is a feasible goal; it does require more processor overhead and memory use than your typical procedural or OOP programming, and just like with those situations, FBP isn't a solution to all problems, only to a small subset. FBP also requires that the components are well designed with sufficient refactoring involved as to make sure most basic operations can be reconstructed with the right components. I think after a few more rounds of work, what I've got right now will start to look a little bit more interesting.

      But while I've got matts ear... :-) One of the problems I had when putting the Merger together is that apparently SAX elements cannot tell whom sent the event or where the event is going. Because of how I set it up, the Merger SAX events were recieving events from two different streams at the same time due to how SAXT and the event tree worked. I had to use a somewhat icky hack for this (using

      do { package DB; @callarr = caller(3); }; my @args = @DB::args; my $sender = $args[0];
      which is similar to code in Carp.) in order to id which dataset I was getting this from. Now, of course, on the way to work I thought of a better solution, possibly using closures, that would be able to connect multiple incoming streams at the same component, but I haven't tried to program this in yet. But in general, is there any other way of getting the sending of a SAX event while in a SAX event?

      Dr. Michael K. Neylon - || "You've left the lens cap of your mind on again, Pinky" - The Brain
      "I can see my house from here!"
      It's not what you know, but knowing how to find it if you don't know that's important

Re: A preliminary stab at Flow-Based Programming
by dragonchild (Archbishop) on Mar 05, 2002 at 13:55 UTC
    How different is this from a state machine? Just looking at it superficially (without knowing anything about the XML technologies), it looks awfully a lot like it ...

    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Well, they are sort of related in that both FBP and State machines typically have a (uni)directional graph that forms the basis for the system.

      However, at least as I've come to understand them, the state machine is where the "flow" through the system is decided by some ruleset, typically comparing a pointer to a storage area to the next possible states (ala regex'ing). The concept of flow based programming is that it's like a series of water pipes; there's no implicit decision block that decides if water can go to one pipe or the other when it reaches a split; it just fills them both. The concept here is that the water, supplied in my case by Reader and composed of chunks with one line of a file each, flow through the system, and are acted upon by each unit the water passes through. Thus, at the end, you'll get a stream of data that you need to do something with (in this case, it's written to STDOUT). Now, this doesn't mean that water can be diverted or blocked by units, but the units have to be built with this functionality. EG, a grep unit would only allow water chunks to pass through that meet the necessary requirements.

      I believe that one of the end goals of FBP is that such a system leads to a visual/rapid design system for general programming solution. I don't suggest this will be replacing direct use of perl any time soon in writing web pages, but this would help those that have some but not sufficient programming skills to create repetitive type tasks easily without knowing the fundamentals.

      Dr. Michael K. Neylon - || "You've left the lens cap of your mind on again, Pinky" - The Brain
      "I can see my house from here!"
      It's not what you know, but knowing how to find it if you don't know that's important

        Let me see if I can restate what you just said. The major conceptual difference between state machines and FBP is that FBP could have output from one state potentially go to more than one state at the same time. State machines, ordinarily, cannot.

        Personally, as I never studied state machines in school (as such), but learned about them on the job, I never had that restriction on them. Why would that restriction be placed?

        Now, I'm thinking that another way to look at it is the difference between procedure-based analysis and data-based analysis. State machines are more concerned with the actions that can take place. FBP is more concerned with the data that is moving through the system. Right?

        We are the carpenters and bricklayers of the Information Age.

        Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://149294]
Approved by root
and a moth chases the moon...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (10)
As of 2017-06-26 14:24 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (581 votes). Check out past polls.