Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Download web page including css files, images, etc.

by Anonymous Monk
on Jan 25, 2007 at 14:55 UTC ( #596502=note: print w/replies, xml ) Need Help??


in reply to Download web page including css files, images, etc.

I don't think wget will work in all situations.

1) it doesn't seem to handle the BASE element correctly (which I believe has been part of the HTML specification for a very long time).
2) "-k" won't translate links in CSS file to local links, consider #someid: background: url(folder/picture.jpg) center center;

Johannes

Replies are listed 'Best First'.
Re^2: Download web page including css files, images, etc.
by skx (Parson) on Jan 25, 2007 at 14:59 UTC

    True, but I think it is the most "standard" tool for the job - short of doing the parsing and rewriting myself.

    Steve
    --
      True, I just thought I'd point this out to the original poster: wget won't do the job all the time. If he needs something that works every time, he'd need to use wget and do some of the work manually in case BASE element is involved or CSS is being used for images (maybe there are other problems there I haven't thought of?) - or write it from scratch ...

      The trick would be going through the HTML and CSS specs and find every different way objects can be referenced/included/linked to etc. I'm sure there's plenty!

      Johannes
        Oops, you are the original poster :-P

        With so many edge cases I've pretty much abandoned the use for a wget-only solution.

        I've got a mimimal tool working now using HTML::Parser, but I haven't dug deep in examining the CSS files yet. I will have to work on that later.

        Steve
        --

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://596502]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2018-04-19 20:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?