Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

parse the "Other Users XML Ticker" with index and substr

by PodMaster (Abbot)
on Mar 18, 2002 at 08:25 UTC ( #152429=snippet: print w/ replies, xml ) Need Help??

Description: Liek the title says. Uses LWP::Simple to fetch the ticker.
#!/usr/bin/perl -w

=head1 parse the "Other Users XML Ticker" with C<index> and C<substr>

For real.  Probably won't work if the "DTD" changes (not bloody likely
+).

=cut

use strict;
use LWP::Simple;
# other users xml ticker
my $html = get('http://perlmonks.org/index.pl?node_id=15851');

die unless $html;
warn $html;

my $fh = [split "\n", $html];

while(@{$fh}) {
    parse_other_users(shift @{$fh});
}
exit;


sub parse_other_users {
    my $s = shift;

    #<user username="Corion" user_id="5348"></user>
    my $nX = index($s,'<user user_id='); # 14
    my $iX = index($s,'" username="'); # 12
    my $IX = index($s,'">', $iX + 11);

    my ($name, $id);

    if( $iX >=0 and $nX >=0 ) {
        printf '(%s)(%s)(%s)(%s)(%s)(%s)'."\n",
             $iX,
             $nX,
             $IX,
             $iX - $nX - 14,
             $name = substr($s, $nX + 14 + 1, $iX -$nX - 14 - 1),
             $id = substr($s, $iX + 12, $IX - $iX - 12),;
    }

    return;
}

__END__

<CHATTER><INFO site="http://perlmonks.org" sitename="Perl Monks">Rende
+red by the
 Other Users XML Ticker</INFO><user user_id="22308" username="dws"></u
+ser>
<user user_id="83485" username="blakem"></user>
<user user_id="65703" username="rob_au"></user>
<user user_id="108447" username="demerphq"></user>
<user user_id="61798" username="busunsl"></user>
<user user_id="78196" username="lestrrat"></user>
<user user_id="103344" username="Ryszard"></user>
<user user_id="134513" username="metadoktor"></user>
<user user_id="141348" username="Dog and Pony"></user>
<user user_id="53018" username="juo"></user>
<user user_id="12012" username="Malach"></user>
<user user_id="107642" username="podmaster"></user>
<user user_id="103824" username="perl::scribe"></user>
<user user_id="143222" username="gdnew"></user>
<user user_id="134230" username="tmiklas"></user>
<user user_id="105312" username="thepen"></user>

</CHATTER>
(130)(110)(145)(6)(22308)(dws)
(20)(0)(38)(6)(83485)(blakem)
(20)(0)(38)(6)(65703)(rob_au)
(21)(0)(41)(7)(108447)(demerphq)
(20)(0)(39)(6)(61798)(busunsl)
(20)(0)(40)(6)(78196)(lestrrat)
(21)(0)(40)(7)(103344)(Ryszard)
(21)(0)(43)(7)(134513)(metadoktor)
(21)(0)(45)(7)(141348)(Dog and Pony)
(20)(0)(35)(6)(53018)(juo)
(20)(0)(38)(6)(12012)(Malach)
(21)(0)(42)(7)(107642)(podmaster)
(21)(0)(45)(7)(103824)(perl::scribe)
(21)(0)(38)(7)(143222)(gdnew)
(21)(0)(40)(7)(134230)(tmiklas)
(21)(0)(39)(7)(105312)(thepen)

Comment on parse the "Other Users XML Ticker" with index and substr
Download Code
Re: parse the "Other Users XML Ticker" with index and substr
by mirod (Canon) on Mar 18, 2002 at 08:56 UTC

    No, no and no, for all 3 "parse with index and substr" snippets. How hard is it to use XML::Simple, especially as I believe it can now accept SAX input, and thus does not depend on XML::Parser anymore, so you can use XML::SAX::PurePerl, Matts pure Perl XML parser.

    You are not parsing XML here, you are parsing the exact format of the message _today_. Any extra piece of information added, any comment would break this parser, while proper XML code (ie based on a real XML parser) would do just fine. There are plenty of ways the format of the ticker could be changed while the XML view would remain the same: added entities, comments, namespace declarations, you name it... Only a proper XML parser will allow you to extract the information regardless of the exact way the XML is "physically encoded".

    You can have fun with this code, but I don't think it is a good thing to show it here.

    I suggest you write a second version of those tools using XML::Simple and XML::SAX::PureSax, this way you will learn something, help others by showing them the proper way to process XML, and even garner some ++ in the process.

      No, no and no, for all 3 "parse with index and substr" snippets. How hard is it to use XML::Simple, especially as I believe it can now accept SAX input, and thus does not depend on XML::Parser anymore, so you can use XML::SAX::PurePerl, Matts pure Perl XML parser.
      It is not hard to uxe XML::Simple,but that is not the point.
      You are not parsing XML here, you are parsing the exact format of the message _today_...
      Exactly. Nowhere do I claim the snippets would do anything mroe than parse the XML tickers as they appear today.
      You can have fun with this code, but I don't think it is a good thing to show it here.
      Excuse me?
      I suggest you write a second version of those tools using XML::Simple and XML::SAX::PureSax, this way you will learn something, help others by showing them the proper way to process XML, and even garner some ++ in the process.
      I already have. Reposting them is not of interest to me.

      My snippets do exactly what they claim, not more not less.
       

      Look ma', I'm on CPAN.


      ** The Third rule of perl club is a statement of fact: pod is sexy.

        You don't seem to agree, but I really think that posting on PerlMonks comes with some responsibilities: as with every printed medium, you don't know who is going to read the posts (nor when), and you have no way to correct them if they think that index and substr are a proper way to process XML. Posting code with a bad design is no different from posting broken code. In fact I would think that it is even worse, as unsuspecting readers can try it out, see that it works and use it as a base for their own code.

        These snippets are similar in a way to those countless attempts at parsing CGI parameters: this is just not the way to do it. Use CGI, use XML::Parser and its friends.

        Just to show you the effect of your posts: imagine a good Perl programer browsing the snippets through the PMSI, looking for a way to get started with XML: this is what they get... I don't think you are doing him/her a service!

        You can of course think different (of course you do ;--), but I thought it was important to add a correction after your snippet (and a -- vote).

        As for the fact that you already posted a proper version of those tools... well, I just hadn't realized that podmaster is really crazyinsomniac in disguise ;--)

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://152429]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2014-07-31 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (246 votes), past polls