Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Mojo::URL returns incorrect absolute path

by mr_p (Scribe)
on May 16, 2013 at 16:52 UTC ( #1033868=perlquestion: print w/ replies, xml ) Need Help??
mr_p has asked for the wisdom of the Perl Monks concerning the following question:

Hello Everyone,

I am running into this issue where Mojo::URL to_abs returns me with incorrect absolute path? Can someone help me with this.

Results are: http://www.mellanox.com/page/page/rss They should be: http://www.mellanox.com/page/rss
#!/usr/bin/perl use 5.010; use open qw( :std :utf8 ); use strict; use utf8; use warnings qw(all); use Data::Dumper; use Mojo::UserAgent; # FIFO queue my $linkUrl = "http://www.mellanox.com/page/press_releases"; my $ua = Mojo::UserAgent->new(max_redirects => 2)->detect_proxy; my $tx = $ua->get($linkUrl); for my $e ($tx->res->dom('a[href]')->each) { my $link = Mojo::URL->new($e->{href}); next if 'Mojo::URL' ne ref $link; $link = $link->to_abs($tx->req->url)->fragment(undef); next unless grep { $link->protocol eq $_ } qw(http https); if ($link->to_string =~ /rss/ ) { print $link->to_abs; print "\n"; } }

Comment on Mojo::URL returns incorrect absolute path
Select or Download Code
Replies are listed 'Best First'.
Re: Mojo::URL returns incorrect absolute path
by McA (Priest) on May 17, 2013 at 02:15 UTC

    Hi,

    I looked at your problem and found the following:

    You grab the URL http://www.mellanox.com/page/press_releases. In the result of that html document there is a href with href="page/rss". If you would build the resulting URL manually you would take the http://www.mellanox.com/page/ and add the relative url page/rss to it. This would result in http://www.mellanox.com/page/page/rss. This is what you get from Mojolicious.

    Now the big "BUT":

    In the resulting html document of http://www.mellanox.com/page/press_releases there is a html tag <base href="http://www.mellanox.com/" /> stating that every relative URL should be based on that base URL. This means that your href="page/rss" is added to <base href="http://www.mellanox.com/" /> resulting in <base href="http://www.mellanox.com/page/rss" />, which is what you want.

    The question remains. Should Mojolicious respect any base-tag on its own or are you responsible to extract a base tag and add it to your absolute-url-generating-code?

    As I took the time to look at your problem I would like to ask you to file a question to the mojolicious maintainers if this behaviour is intentional.

    Best regards
    McA

      Thanks for the the explanation to the problem.

      I understand what your are saying. I was just expecting the behavior for Mojolicious to be the same as browser behavior.

      mrp.

      FYI: I have posted question to mojolicious maintainers.

      Do you know work around this issue?

Re: Mojo::URL returns incorrect absolute path
by Anonymous Monk on May 16, 2013 at 17:18 UTC
    um, how about code without all that useragent/DOM stuff?
      I need to use the dom for the links, don't I?

        I need to use the dom for the links, don't I?

        You say there is a problem with Mojo::URL , that it returns incorrect absolute path , meaning  my $link = Mojo::URL->new($e->{href}); gives you the wrong thing

        So as employ Basic debugging checklist , How do I post a question effectively?, to test your hypothesis :)

        use Data::Dumper, dumper the href, and lets see what it does

        You think Mojo::URL is a proble, fantastic, lets check

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1033868]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2015-07-08 07:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (96 votes), past polls