Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Get data from javascript

by Anonymous Monk
on Jul 07, 2016 at 15:23 UTC ( [id://1167383]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse an old site we have. Mojo::DOM makes it easy to get most things, but I can't figure out how to parse some Javascript code. There's a function which has a URL:

replace_image("imageurlhere",imageIDhere);

If I have the whole page in a string how can I find this image URL via a regex?

Replies are listed 'Best First'.
Re: Get data from javascript
by Corion (Patriarch) on Jul 07, 2016 at 15:49 UTC

    For any specific Javascript, you can implement the same function in Perl and then get the result that way.

    If you want a generic solution for getting the result of arbitrary Javascript code, you will have to run the Javascript. Example environments for Javascript in a web context could be WWW::Mechanize::Firefox and WWW::Mechanize::PhantomJS.

Re: Get data from javascript
by RonW (Parson) on Jul 08, 2016 at 01:56 UTC

    For your specific example (not tested):

    /replace_image\(([^,]+)/
    or
    /replace_image\(["']([^"']+)/

    should fetch the URL into $1 for you.

      The url in the JS is fixed per page, so one per page. I parse lots of pages and want to extract the URL. I tried the code you gave
      my $wanted = $page =~ /push_delayed_image\(([^,]+)/; print Dumper $wanted;
      But this prints $VAR - 1; not the URL
        You need list context to get the capture groups back.

        my ($wanted) = $page =~ /push_delayed_image\(([^,]+)/; # ^ ^

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        Alternately to what Choroba said, you could:

        $page =~ /push_delayed_image\(([^,]+)/; my $wanted = $1;

        Often, I'm using a conditional, so will do this:

        if ($page =~ /push_delayed_image\(([^,]+)/) { my $wanted = $1; ... }
Re: Get data from javascript
by Anonymous Monk on Jul 07, 2016 at 15:37 UTC

    That depends on what the part of the page source where the URLs are stored looks like, show us that?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1167383]
Approved by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-24 11:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found