Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
For a more robust solution, you only want to push a HTML parser onto a stream that has announced itself as MIME-type of text/html. I did that for a client once, and have talked about the code in a recent Usenet article, and hope to have it published soon. In there, I said:
I have (unpublished) a dynamic-pre-forking Apache-style web streaming proxy server in about 300 lines of pure Perl (using HTTP::Daemon and the other LWP items, of course). It takes the same parameters as Apache child management:
### configuration my $HOST = 'www.stonehenge.com'; my $PORT = 42001; # 0 = pick next available user-port my $START_SERVERS = 4; # start this many, and don't g +o below my $MAX_CLIENTS = 12; # don't go above my $MAX_REQUESTS_PER_CHILD = 250; # just in case there's a leak my $MIN_SPARE_SERVERS = 1; # minimum idle (if 0, never start new) my $MAX_SPARE_SERVERS = 12; # maximum idle (should be "single brow +ser max")
And acts accordingly, using a simple scoreboarding mechanism similar to the Apache method.

Using this code, the apache-benchmark program shows that I'm only half as fast as Apache, and has one quarter the footprint!

The best part is that in those 300 lines, I handle full SSL streaming (the CONNECT call), full content streaming (I was watching live-feed quicktime movies through the proxy), and if the content-type is text/html, an HTML parser in token mode is inserted, allowing real-time rewriting. For example, I could insert <font color=blue> tags around all <a href=> links, while not impeding the stream of the rest of the HTML... there'd just be a hiccup while the <a href=> was being noticed.

The code was originally written as a work for-hire for a client who had intended my work to become open source. But the client dot-bombed, so I'm still trying to get clarification of whether I can release the code under my own copyright. As soon as that clears up, expect a WebTechniques column or two on it. :)

-- Randal L. Schwartz, Perl hacker


In reply to Re: LWP::UserAgent and HTML::Parser and the joys of Open Source by merlyn
in thread LWP::UserAgent and HTML::Parser and the joys of Open Source by grinder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (7)
    As of 2015-07-05 22:09 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (68 votes), past polls