Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: Weird date format behavior with WWW::Mechanize

by whakka (Hermit)
on Apr 01, 2008 at 21:14 UTC ( #677837=note: print w/ replies, xml ) Need Help??


in reply to Re: Weird date format behavior with WWW::Mechanize
in thread Weird date format behavior with WWW::Mechanize

I've read that actually. Really I thought this is a straightforward question that can be answered without code, but here you go:

#!perl! -w use strict; use WWW::Mechanize; #browser, extends LWP use HTTP::Cookies::Mozilla; #cookie reader for bot my $mech = WWW::Mechanize->new(); $mech->cookie_jar(HTTP::Cookies::Mozilla->new( file => 'cookies.txt', autosave => 1 )) || die "Couldn't fill cookie jar!\n"; my $url = "http://www.insor.org/insasoweb/offenderDetails.do?sid=35465 +6.011"; $mech->get($url); print $mech->content;

You will notice the difference between how the page looks in the browser and what prints, I hope.


Comment on Re^2: Weird date format behavior with WWW::Mechanize
Download Code
Re^3: Weird date format behavior with WWW::Mechanize
by Fletch (Chancellor) on Apr 01, 2008 at 21:45 UTC

    See, now that you've given a concrete example to look at you can easily see that the page in question (after you accept their disclaimer thing and get back a session cookie . . .) has the full date text. The page contains a call to pull in a JavaScript file "common.js". Said "common.js" contains a function formatDate which looks to munge dates.

    Given this it's not out of the realm of possibility that there's something calling javascript and munging all the dates. This easily explains the difference between what you see in your browser (even if you view source, you're seeing the source after it's been walked over) and what Mechanize is showing. You can easily confirm this by comparing the output from a third party (say curl and using the JSESSIONID cookie value pulled from your browser) which should match what Mechanize says it is.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Thanks, I'm impressed. I had never noticed you can look inside the js source code before. I'm still confused about how Mechanize would be getting the munged date but a normal browser wouldn't, but regardless, the solution lies in the formatting.
        I'm still confused about how Mechanize would be getting the munged date but a normal browser wouldn't

        Because a browser can run JavaScript, and WWW::Mechanize can't (update: though other modules can, e.g. JavaScript::SpiderMonkey - as suggested by Fletch). Anyway, it's the (JavaScript in the) browser that is munging the date, not WWW::Mech.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://677837]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (12)
As of 2014-08-20 20:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (124 votes), past polls