http://www.perlmonks.org?node_id=628098

Hello good monks,

I've recently come up with something I haven't seen before. Granted, this hardly means it hasn't been done before; I just haven't run across it previously. If there's prior art, I'd be interested by all means!

We all know about LWP::UserAgent and WWW::Mechanize -- and I've certainly gained much purchase from using only those modules for a long time -- but there's a certain class of problems that I end up solving repeatedly, in a very similar way. That is: making HTTP requests, reacting to the responses, possibly making more requests, responding to those responses, ad infinitum. I've never liked using imperative Perl code to write this kind of bread and butter request/response/request/response chain.

It struck me that this is a straightforward recursive problem. It can be solved easily with a data structure to describe the requests, their responses, further requests for those responses, and so on. This way the actual HTTP processing (i.e., the use of LWP::UserAgent) is isolated to a simple recursive routine (with good, consistent error handling!), but the definition of the expected behavior is somewhere else.

Now, in case anyone is curious, I'll include an incomplete, and probably broken example. This is what I've been using to develop the idea. This is supposed to be a last.fm track submission client. Right now it does authentication and submits a single bogus track.

Here's the definition:

my $REQUESTS = [ { uri => "http://post.audioscrobbler.com", params => [ hs => 'true', p => '1.1', c => $CLIENT_ID, v => $CLIENT_VER, u => $USER, ], responses => [ { name => 'Successful handshake', content_pattern => qr/^UPTODATE\n([^\n]+)\n([^\n]+)\nI +NTERVAL (.*)/, requests => sub { my ($md5, $submit, $interval) = @_; my $hash = md5_hex(md5_hex($PASS) . $md5); print "Pausing $interval\n"; sleep $interval; return ( { uri => $submit, params => [ u => $USER, s => $hash, 'a[0]' => 'Test Artist', 't[0]' => 'Test Track', 'b[0]' => 'Test Album', 'm[0]' => '', 'l[0]' => 5*60, 'i[0]' => strftime("%Y-%m-%d %H:%M:%S" +, gmtime), ], responses => [ { name => 'Successful submission', content_pattern => qr/^OK\nINTERVA +L (.*)/, requests => sub { print "Pausing $_[0]\n"; sleep $_[0]; return; }, }, { name => 'Failed submission', content_pattern => qr/^FAILED ([^\ +n]*)/, requests => sub { print "Failure: $_[0]\n"; return; }, }, { name => 'Failed authentication', content_pattern => qr/^BADAUTH/, }, ] }, ); }, }, { content_pattern => qr/^UPDATE/ }, { content_pattern => qr/^FAILED/ }, { content_pattern => qr/^BADUSER/ }, ], }, ];

And here's the handler routine

sub handle_requests { my ($agent, $requests, @params) = @_; my @requests; if (ref $requests eq 'ARRAY') { @requests = @$requests; + } elsif (ref $requests eq 'CODE') { @requests = $requests->(@params +); } for (@requests) { my $uri = URI->new($_->{uri}); $uri->query_form($_->{params}); print "Request: $uri\n"; my $response = $agent->get($uri); unless ($response->is_success) { die $response->status_line; } for (@{ $_->{responses} }) { if (my @groups = $response->content =~ $_->{content_patter +n}) { my $name = $_->{name} || "Matched $_->{content_pattern +}"; print "Response: $name\n"; handle_requests($agent, $_->{requests}, @groups); } } } }

I wasn't entirely sure if I should post this code, because this Meditation is really intended to be about the idea. I can think of all kinds of ways to improve the code (in terms of readability, conciseness, completeness, correctness), but I posted it to help explain the idea, in case my textual description is lacking. I'd really like to know, (1) if there is already a well-known -- or even not-well-known -- implementation of this kind of thing out there somewhere, and (2) if this is interesting enough to anybody else that I should spend more time working on it.

Thanks for any response!

Replies are listed 'Best First'.
Re: Data driven HTTP interaction
by sfink (Deacon) on Jul 23, 2007 at 05:15 UTC
    A couple of quick thoughts:
    • The problem you are addressing is in spirit very, very similar to what Expect.pm addresses. The only difference is that you are communicating over an HTTP connection, whereas Expect communicates over a pseudoterminal. But both are based on a request/reply model.
    • You describe this as "recursion", but to me it seems like that's an artifact of your implementation. You could just as well change either the description or implementation to a series of dependent request/reply transactions. (In general, it seems like the pattern is more of a graph than a tree. Any request could result in a "login timed out" reply, and the subsequent login request would be the same for a whole bunch of different original requests.) You could model it as a state machine, but I think a push-down automaton might be a better fit. Which itself makes me think of YACC -- perhaps its grammar input is a good model for a descriptive format?
    • This also reminds me of Prolog. "My whole task is complete if I add a record for X, make sure the user permissions for Y are set up correctly, and I post an update to the news feed. Adding a record for X is complete if I look up "smurf livers" as a category, and I use that category to define..." etc. Using a similar reduction model, you might be able to figure out what things can be performed in parallel, or at least give nice error output: "Task A failed because subtask c2 failed."

      Many thanks for the reply. I'm glad to see you've looked beyond the specific implementation. Your ideas are certainly worth thought.

      I will almost certainly play with the idea in your 2nd point. I may be painting myself into a corner with my current recursive implementation. Your idea seems a lot more flexible, and will probably lead to definitions that are simpler to understand (at least for some definition of "understand").

      I must admit the Prolog-esque solution sounds like a lot of fun. I'm not sure it would be worth the conceptual hurdle, but then again, maybe it would! Hmm.

Re: Data driven HTTP interaction
by CaMelRyder (Pilgrim) on Jul 23, 2007 at 03:24 UTC
    i digg it.
    ¥peace from CaMelRyder¥