http://www.perlmonks.org?node_id=152429

Liek the title says. Uses LWP::Simple to fetch the ticker.
#!/usr/bin/perl -w =head1 parse the "Other Users XML Ticker" with C<index> and C<substr> For real. Probably won't work if the "DTD" changes (not bloody likely +). =cut use strict; use LWP::Simple; # other users xml ticker my $html = get('http://perlmonks.org/index.pl?node_id=15851'); die unless $html; warn $html; my $fh = [split "\n", $html]; while(@{$fh}) { parse_other_users(shift @{$fh}); } exit; sub parse_other_users { my $s = shift; #<user username="Corion" user_id="5348"></user> my $nX = index($s,'<user user_id='); # 14 my $iX = index($s,'" username="'); # 12 my $IX = index($s,'">', $iX + 11); my ($name, $id); if( $iX >=0 and $nX >=0 ) { printf '(%s)(%s)(%s)(%s)(%s)(%s)'."\n", $iX, $nX, $IX, $iX - $nX - 14, $name = substr($s, $nX + 14 + 1, $iX -$nX - 14 - 1), $id = substr($s, $iX + 12, $IX - $iX - 12),; } return; } __END__ <CHATTER><INFO site="http://perlmonks.org" sitename="Perl Monks">Rende +red by the Other Users XML Ticker</INFO><user user_id="22308" username="dws"></u +ser> <user user_id="83485" username="blakem"></user> <user user_id="65703" username="rob_au"></user> <user user_id="108447" username="demerphq"></user> <user user_id="61798" username="busunsl"></user> <user user_id="78196" username="lestrrat"></user> <user user_id="103344" username="Ryszard"></user> <user user_id="134513" username="metadoktor"></user> <user user_id="141348" username="Dog and Pony"></user> <user user_id="53018" username="juo"></user> <user user_id="12012" username="Malach"></user> <user user_id="107642" username="podmaster"></user> <user user_id="103824" username="perl::scribe"></user> <user user_id="143222" username="gdnew"></user> <user user_id="134230" username="tmiklas"></user> <user user_id="105312" username="thepen"></user> </CHATTER> (130)(110)(145)(6)(22308)(dws) (20)(0)(38)(6)(83485)(blakem) (20)(0)(38)(6)(65703)(rob_au) (21)(0)(41)(7)(108447)(demerphq) (20)(0)(39)(6)(61798)(busunsl) (20)(0)(40)(6)(78196)(lestrrat) (21)(0)(40)(7)(103344)(Ryszard) (21)(0)(43)(7)(134513)(metadoktor) (21)(0)(45)(7)(141348)(Dog and Pony) (20)(0)(35)(6)(53018)(juo) (20)(0)(38)(6)(12012)(Malach) (21)(0)(42)(7)(107642)(podmaster) (21)(0)(45)(7)(103824)(perl::scribe) (21)(0)(38)(7)(143222)(gdnew) (21)(0)(40)(7)(134230)(tmiklas) (21)(0)(39)(7)(105312)(thepen)

Replies are listed 'Best First'.
Re: parse the "Other Users XML Ticker" with index and substr
by mirod (Canon) on Mar 18, 2002 at 08:56 UTC

    No, no and no, for all 3 "parse with index and substr" snippets. How hard is it to use XML::Simple, especially as I believe it can now accept SAX input, and thus does not depend on XML::Parser anymore, so you can use XML::SAX::PurePerl, Matts pure Perl XML parser.

    You are not parsing XML here, you are parsing the exact format of the message _today_. Any extra piece of information added, any comment would break this parser, while proper XML code (ie based on a real XML parser) would do just fine. There are plenty of ways the format of the ticker could be changed while the XML view would remain the same: added entities, comments, namespace declarations, you name it... Only a proper XML parser will allow you to extract the information regardless of the exact way the XML is "physically encoded".

    You can have fun with this code, but I don't think it is a good thing to show it here.

    I suggest you write a second version of those tools using XML::Simple and XML::SAX::PureSax, this way you will learn something, help others by showing them the proper way to process XML, and even garner some ++ in the process.

      No, no and no, for all 3 "parse with index and substr" snippets. How hard is it to use XML::Simple, especially as I believe it can now accept SAX input, and thus does not depend on XML::Parser anymore, so you can use XML::SAX::PurePerl, Matts pure Perl XML parser.
      It is not hard to uxe XML::Simple,but that is not the point.
      You are not parsing XML here, you are parsing the exact format of the message _today_...
      Exactly. Nowhere do I claim the snippets would do anything mroe than parse the XML tickers as they appear today.
      You can have fun with this code, but I don't think it is a good thing to show it here.
      Excuse me?
      I suggest you write a second version of those tools using XML::Simple and XML::SAX::PureSax, this way you will learn something, help others by showing them the proper way to process XML, and even garner some ++ in the process.
      I already have. Reposting them is not of interest to me.

      My snippets do exactly what they claim, not more not less.
       

      Look ma', I'm on CPAN.


      ** The Third rule of perl club is a statement of fact: pod is sexy.

        You don't seem to agree, but I really think that posting on PerlMonks comes with some responsibilities: as with every printed medium, you don't know who is going to read the posts (nor when), and you have no way to correct them if they think that index and substr are a proper way to process XML. Posting code with a bad design is no different from posting broken code. In fact I would think that it is even worse, as unsuspecting readers can try it out, see that it works and use it as a base for their own code.

        These snippets are similar in a way to those countless attempts at parsing CGI parameters: this is just not the way to do it. Use CGI, use XML::Parser and its friends.

        Just to show you the effect of your posts: imagine a good Perl programer browsing the snippets through the PMSI, looking for a way to get started with XML: this is what they get... I don't think you are doing him/her a service!

        You can of course think different (of course you do ;--), but I thought it was important to add a correction after your snippet (and a -- vote).

        As for the fact that you already posted a proper version of those tools... well, I just hadn't realized that podmaster is really crazyinsomniac in disguise ;--)