Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^4: Extract the source code after loaded fully

by Anonymous Monk
on Jun 21, 2015 at 15:48 UTC ( [id://1131346]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Extract the source code after loaded fully
in thread Extract the source code after loaded fully

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^5: Extract the source code after loaded fully
by afoken (Chancellor) on Jun 21, 2015 at 18:28 UTC

    Please define "fully loaded":

    (A) The webserver may just be slow, delivering only a few characters per second in the worst case. So to load the entire document, you just have to be patient. On the network level, you have exactly one HTTP request that just takes a long time to get a full HTTP response.

    (B) There is some JavaScript, Java, Flash or VBScript embedded into or referenced from the document. When the main document is loaded, the Javascript / Java / Flash / VBScript starts at least one more HTTP request to fetch more content that will be embedded into the document, perhaps by modifying the DOM tree. On the network level, you have at least two HTTP requests and two HTTP responses.

    LWP will handle (A) without problems, though you might need to set a higher timeout value. LWP can't handle (B) on its own. If you want to use LWP, you need to work with the first document, and extract the URL and perhaps POST data for the following HTTP requests. You may need a Javascript environment (there are some on CPAN), but for simple cases, a little bit of text processing may be enough.

    A browser automation tool can do (A) and (B), but if the Javascript / Java / Flash / VBscript keeps a connection open to wait for future content, you may need a timeout. And, you need a controlable browser, which usually means that you need a GUI. LWP works fine in text mode.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^5: Extract the source code after loaded fully
by ww (Archbishop) on Jun 21, 2015 at 18:52 UTC

    Strongly suggest following LanX's link to an SEO report. There are several items which appear to be relevant.

    Also, I downvoted your note above as it suggests that you failed to followup on the reply freely given (TWICE!). The downvote is also because your repeated question is offensive in its suggestion that (LanX != /my puppet manager/; ) did not understand your question.

    Not all answers given here are useful but rarely should one be ignored.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1131346]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2024-03-29 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found