Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

RSS Parsing not working on new machine

by wintermute115 (Acolyte)
on Apr 15, 2025 at 01:18 UTC ( [id://11164705]=perlquestion: print w/replies, xml ) Need Help??

wintermute115 has asked for the wisdom of the Perl Monks concerning the following question:

For several years now, I've used a home made RSS reader to collect and deal with my podcasts. You can see the full code here: github.

I just moved it to a new computer (Ubuntu 24.04, perl 5.38.2; the previous one was Ubuntu 22.04, perl 5.30.0) and suddenly it's failing to be interpret the RSS feeds that are working fine on the old machine.

It's failing on line 277, where it runs parse_string(), which is returning undef instead of the RSS object I get on the old machine, and I cannot work out what might be different to cause this. Any help would be gratefully appreciated.

Replies are listed 'Best First'.
Re: RSS Parsing not working on new machine
by Corion (Patriarch) on Apr 15, 2025 at 07:34 UTC

    The relevant code is

    ... my $rss = $parser->parse_string($feed); if(!$rss) { print " -- Feed is broken -- \n"; next FEED; }

    The next step would be to look what is in $feed, and to remove the rest of the script to reproduce the problem.

    In your code you already save the content to a file, so maybe you can reduce your code to something like:

    my $feed = read_file( 'saved.rss' ); my $rss = $parser->parse_string($feed); if(!$rss) { print " -- Feed is broken -- \n"; next FEED; }

    ... but maybe the feed you retrieve already is empty?

      As I say, the identical code looking at identical feeds works fine on another machine, so I don't think it's a problem with the feeds themselves. A minimal version of:
      #!/usr/bin/perl use strict; use XML::RSS::Parser; my $feed = "/home/ross/Downloads/New_Podcasts/archive/GMNV.rss"; my $parser = XML::RSS::Parser->new(); my $rss = $parser->parse_file($feed); print $rss . "\n";
      gives me:
      $ ./test.pl XML::RSS::Parser::Feed=HASH(0x5ba514d23270)

      which is what I'd expect to see, rather than the undef I get from the actual script. Yes, that file is one saved by this script.

      Changing it to point at the archived RSS rather than the version stored in memory doesn't fix the problem, though. Something is happening somewhere else that is stopping the parser from reading this file, and I can't see what it might be.

        If reading the file seems weird, maybe it is an issue of file/directory permissions?

        Can you check whether the user your cron job runs as can access all directories in the path, starting from / ?

        Also, does the minimal script still work when running from cron ?

Re: RSS Parsing not working on new machine
by ikegami (Patriarch) on Apr 15, 2025 at 01:59 UTC

    What's the string that's giving you a problem? What's the error that causes the parsing to fail?

      Hrm. Looks like a bunch of related things, for different feeds:
      End tag mismatch (generator != ) [Ln: 6, Col: 626215104372244] End tag mismatch (omny:organizationId != ) [Ln: 4, Col: 62621510350181 +2] End tag mismatch (title != ) [Ln: 14, Col: 626215078242148]
      As I say, this works elsewhere, and a minimal trial (see elsewhere in this thread) handles the same feeds just fine, so I'm not sure why this is happening, or how to fix it.

        Choosing a feed that reports End tag mismatch (ttl != ) [Ln: 5, Col: 575272112746466], line 5 is:

                <ttl>60</ttl>

        That column number is obviously wrong, making me think maybe it's not reading the line breaks properly. But if so, how does it know it's line 5? And XML shouldn't care about line breaks anyway, should it?

        It looks like, in general, it's choking on the first line inside a <channel> that isn't an <atom:link>. But I don't know what that means.

Re: RSS Parsing not working on new machine
by wintermute115 (Acolyte) on Apr 15, 2025 at 17:27 UTC

    OK, so, playing around with my test version, I've discovered that including Use Try; breaks it in exactly the way I'm seeing, regardless of whether or not there are any try/catch blocks. If I remove Try from my script, it works (but will be more fragile), except that it's still breaking on some feeds, always on lines that contain CDATA, but they all include CDATA blocks, and these aren't the first in the feeds, so it's not going to be that obvious:

    -- Feed is broken -- End tag mismatch (itunes:subtitle != content:encoded) [Ln: 735, Col: 2 +93548366282188] Line 734 78 <itunes:subtitle><![CDATA[It is the QUESTIONS +EPISODE!]]></itunes:subtitle>

    If I run it several times, the same feeds are consistently breaking on the same line, so something weird is still happening.

      Depending on your Perl version, native try { ... } catch  { ... } finally { ... } blocks may already be available. Perl 5.34 introduced try ... catch. So maybe you want to use that over Try.

      Alternatively, there is the more weathered Try::Tiny, which also gives you a try/catch pair.

        It looks like the builtin try is getting depreciated, so I'm not keen on using it long term. Having said that, swapping in use experimental 'try'; works (for most feeds; see above). Swapping in Try::Tiny gives me this new error:

        syntax error at /home/ross/scripts/podcasts/podcasts.pl line 233, near + ") {" Execution of /home/ross/scripts/podcasts/podcasts.pl aborted due to co +mpilation errors.

        This is the section it's complaining about:

        try { $curl->pushopt(CURLOPT_HTTPHEADER, [$ua_string]); $curl->setopt(CURLOPT_PROGRESSFUNCTION, \&no_progress); $curl->setopt(CURLOPT_NOPROGRESS, 0); $curl->setopt(CURLOPT_FOLLOWLOCATION, 1); $curl->setopt(CURLOPT_CONNECT_ONLY, 0); $curl->setopt(CURLOPT_URL, $feedsrc); $curl->setopt(CURLOPT_WRITEDATA, \$feed); $curl->perform(); } catch { $broken = 1; #putting next in here complains that it's exi +ting a subroutine } if (!defined($feed)) { # <--- Line 233 $broken = 2; }

        I can't see an error there, and it doesn't throw one without Try::Tiny, so that seems pretty weird.

        EDIT

        OK, I figured that out. The Try::Tiny try/catch blocks need a closing semicolon. So now I have a robust try that won't go away in future versions of perl, and isn't somehow messing up all the feeds, but some are still getting messed up in ways I don't understand.

Re: RSS Parsing not working on new machine
by wintermute115 (Acolyte) on Apr 15, 2025 at 21:07 UTC

    OK, I seem to have gotten it working but I don't really know what I did. I swapped out the version of try I'm using (and I still don't know why that would have an effect), but that didn't entirely solve the problem. And then messing around and putting stuff back as it was seems to have done the job. Thanks, guys, for pointing me the the right directions.

    EDIT

    I'm an idiot. I accidentally deleted the line that was telling me there was an error, so it looked like it was working.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11164705]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2025-04-17 17:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.