Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Mojo::URL returns incorrect absolute path

by mr_p (Scribe)
on May 16, 2013 at 16:52 UTC ( #1033868=perlquestion: print w/ replies, xml ) Need Help??
mr_p has asked for the wisdom of the Perl Monks concerning the following question:

Hello Everyone,

I am running into this issue where Mojo::URL to_abs returns me with incorrect absolute path? Can someone help me with this.

Results are: http://www.mellanox.com/page/page/rss They should be: http://www.mellanox.com/page/rss
#!/usr/bin/perl use 5.010; use open qw( :std :utf8 ); use strict; use utf8; use warnings qw(all); use Data::Dumper; use Mojo::UserAgent; # FIFO queue my $linkUrl = "http://www.mellanox.com/page/press_releases"; my $ua = Mojo::UserAgent->new(max_redirects => 2)->detect_proxy; my $tx = $ua->get($linkUrl); for my $e ($tx->res->dom('a[href]')->each) { my $link = Mojo::URL->new($e->{href}); next if 'Mojo::URL' ne ref $link; $link = $link->to_abs($tx->req->url)->fragment(undef); next unless grep { $link->protocol eq $_ } qw(http https); if ($link->to_string =~ /rss/ ) { print $link->to_abs; print "\n"; } }

Comment on Mojo::URL returns incorrect absolute path
Select or Download Code
Re: Mojo::URL returns incorrect absolute path
by Anonymous Monk on May 16, 2013 at 17:18 UTC
    um, how about code without all that useragent/DOM stuff?
      I need to use the dom for the links, don't I?

        I need to use the dom for the links, don't I?

        You say there is a problem with Mojo::URL , that it returns incorrect absolute path , meaning  my $link = Mojo::URL->new($e->{href}); gives you the wrong thing

        So as employ Basic debugging checklist , How do I post a question effectively?, to test your hypothesis :)

        use Data::Dumper, dumper the href, and lets see what it does

        You think Mojo::URL is a proble, fantastic, lets check

Re: Mojo::URL returns incorrect absolute path
by McA (Curate) on May 17, 2013 at 02:15 UTC

    Hi,

    I looked at your problem and found the following:

    You grab the URL http://www.mellanox.com/page/press_releases. In the result of that html document there is a href with href="page/rss". If you would build the resulting URL manually you would take the http://www.mellanox.com/page/ and add the relative url page/rss to it. This would result in http://www.mellanox.com/page/page/rss. This is what you get from Mojolicious.

    Now the big "BUT":

    In the resulting html document of http://www.mellanox.com/page/press_releases there is a html tag <base href="http://www.mellanox.com/" /> stating that every relative URL should be based on that base URL. This means that your href="page/rss" is added to <base href="http://www.mellanox.com/" /> resulting in <base href="http://www.mellanox.com/page/rss" />, which is what you want.

    The question remains. Should Mojolicious respect any base-tag on its own or are you responsible to extract a base tag and add it to your absolute-url-generating-code?

    As I took the time to look at your problem I would like to ask you to file a question to the mojolicious maintainers if this behaviour is intentional.

    Best regards
    McA

      Thanks for the the explanation to the problem.

      I understand what your are saying. I was just expecting the behavior for Mojolicious to be the same as browser behavior.

      mrp.

      FYI: I have posted question to mojolicious maintainers.

      Do you know work around this issue?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1033868]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2014-08-29 20:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (289 votes), past polls