Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Mojo::DOM exception handling help

by Anonymous Monk
on Aug 01, 2020 at 17:11 UTC ( [id://11120200]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise Monks who are always wiser than I am. I have several thousand html files that I'm parsing using Mojo::DOM. I've encountered a few older files that are surprisingly missing one the fields I'm trying to extract. Not many by enough to cause my script to error out. Here's the code I have:

my $dom1 = Mojo::DOM->new( $$temp_content ); my $abstr = $dom1->at('div.abstract-content > p')->text; print "Abstract is: $abstr \n\n";

Here's the what I tried to do to catch the exception (when abstract is null)

my $abstr = $dom1->at('div.abstract-content > p' || 'not provided')- +>text;

Here's the error that I'm getting: Can't call method "text" on an undefined value at /Volumes/HD-PATU3/test.pl line 5.

I've looked at the Mojo::DOM documentation and can't seem to find an answer. I've sure it's easy and being a nub I"m missing something simple. Thanks in advance for your wisdom and assistance

Replies are listed 'Best First'.
Re: Mojo::DOM exception handling help
by davido (Cardinal) on Aug 01, 2020 at 19:35 UTC

    This doesn't do anything useful:

    'div.abstract-content > p' || 'not provided'

    The || operator is a logical short circuit. If the expression on the left is true, that expression's value is returned, and the right-hand side is never evaluated. The string "div.abstract-content > p" is just a plain old string, and evaluates to true. Therefore, the 'not provided' string is never seen. You're not providing an alternate of choice, you're providing only one string as an option.

    I'm guessing here but perhaps the path div.abstract-content > p doesn't resolve to anything in the DOM you are parsing. It might be useful for you to provide a sample of the dom, trimmed to show the relevant hierarchy.


    Dave

Re: Mojo::DOM exception handling help
by kcott (Archbishop) on Aug 01, 2020 at 22:59 UTC

    Take a look at the Mojo::DOM documentation; especially the part regarding the at method, whose first sentence is:

    "Find first descendant element of this element matching the CSS selector and return it as a Mojo::DOM object, or undef if none could be found."

    Your posted error message reflects what the documentation describes; i.e. $dom1->at('div.abstract-content > p') is returning undef. I can reproduce this:

    $ perl -MMojo::DOM -E 'my $dom = Mojo::DOM::->new("<p></p>"); my $x = +$dom->at("p.missing")->text(); say defined $x ? "not missing" : "real +ly missing"' Can't call method "text" on an undefined value at -e line 1.

    In order to "catch the exception", you should get the return value from $dom1->at('div.abstract-content > p'). If it is defined, you can use it to invoke the text() method; otherwise, handle as appropriate for your application (output a warning, write to a log, etc.). Based on my code above, you'd want something like this:

    $ perl -MMojo::DOM -E 'my $dom = Mojo::DOM::->new("<p></p>"); my $x = +$dom->at("p.missing"); say defined $x ? "not missing" : "really missi +ng"' really missing

    Update (additional example): Something of an afterthought and really just intended to show the validity of the previous example:

    $ perl -MMojo::DOM -E 'my $dom = Mojo::DOM::->new(q{<p class="missing" +></p>"}); my $x = $dom->at("p.missing"); say defined $x ? "not missin +g" : "really missing"' not missing

    — Ken

Re: Mojo::DOM exception handling help
by stevieb (Canon) on Aug 01, 2020 at 17:20 UTC

    This is only an educated guess here, as I don't have the facilities currently to test, nor the time to read docs to understand what's happening. Can you extract the object, check for defined, then act appropriately?

    Instead of:

    my $abstr = $dom1->at('div.abstract-content > p' || 'not provided')->t +ext;

    Will something like this example work?:

    my $obj = $dom1->at('div.abstract-content > p' || 'not provided'); my $abstr = defined $obj ? $obj->text : 'Object undefined';

    If it does work, you could elaborate on it a little, to say log the entries that are broken.

      To the OP: Without having the time to test myself at the moment, what stevieb suggested should work; you can also choose a different action to take with a more complex statement like if ( defined $obj ) { ... } else { ... }. Note that 'div.abstract-content > p' || 'not provided' is actually not doing anything useful, as it's just a logical or and the first value is always true, so the second value will always be ignored.

Re: Mojo::DOM exception handling help
by Anonymous Monk on Aug 01, 2020 at 18:51 UTC
    Why did you add || there?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11120200]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (7)
As of 2024-04-25 11:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found