http://www.perlmonks.org?node_id=128360

osfameron has asked for the wisdom of the Perl Monks concerning the following question:

A colleague of mine is trying to save some websites (with frames) viewed in MS Internet Explorer and edit them in the editor of Champions MS Word. (Please go easy on the 'choice' of tool - this is the corporate toolset!)

The Edit with Microsoft Word for Windows 97 option will happily open all of the document... except for any information that happens to be contained in a frame. ;->

I thought that a simple way to resolve this might be to:

  1. Get the master document (using LWP::Simple?)
  2. Analyse it with HTML::Parser
  3. Get the text of all the documents referred to by the frames.
  4. Spit out the HTML, but with <table> tags instead of <frame> tags.
  5. Inside the table tags, we'd add the subdocuments (with the surrounding <html><head><body> tags removed.
Is this a reasonable way of attempting this? I'm not too proud to accept pre-rolled solutions if anyone's got one lying around - not to mention workarounds or rtfms!
btw: I'm perfectly aware that the above solution might break navigational elements/links etc., but for the purpose at hand all we want is a visual document that can be edited in Word, not a functional web-page.

Cheerio!
Osfameron

Replies are listed 'Best First'.
Re: Convert HTML docs with frames into a single HTML doc?
by Masem (Monsignor) on Nov 29, 2001 at 19:07 UTC
    That's probably a good order; you may also wish to remove all "target" attributed in A tags, which I believe you can also do with HTML::Parser. This should deal with the problem of navigation issues. Also, you'll have to play with COLSPAN and ROWSPAN attributes of the table cells if you have anything more complex than 2 frames.

    Alternatively, if you have your own server with SSI enabled, you can include the text of other HTML files (head/body tags included) directly, so you'd only have to write out a table body with the various include tags, and that will get the job done too, though the final document will not necessarily be HTML4 friendly.

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

      Actually the TARGET options in anchors should only be removed if they refer to a known frame. What if the "designer" uses stuff like _BLANK?

      Greetz
      Beatnik
      ... Quidquid perl dictum sit, altum viditur.
        Personally? I hurt them, unless they can give me a very convincing reason not to.1 target=_blank is one of the most annoyingly used tags ever invented, except maybe (maybe) <blink>.

        Fortunately for most designers, Opera can save me from having to hurt them. I :heart: browser tabs. I also :heart: "Never allow popups."


        1 But yes, there are some valid and good uses for target=_blank, and I've even done it myself once or twice. Too often it is simply not needed.
      Thanks! regarding:
      Also, you'll have to play with COLSPAN and ROWSPAN attributes of the table cells if you have anything more complex than 2 frames.
      actually I was thinking of naively nesting my tables in the same way as the framesets I'm basing them on. I'm aware that this is more resource intensive to process - are there any other problems with this approach?

      Cheerio!
      Osfameron

        No, there shouldn't be any problems with nesting the tables, either. The only problem that I can see is that there will be come a more noticable difference in the margins as you nest further and further unless you remove the cell padding. The margin difference with some cell padding will probably be minor with only two nestings, but 4 or more, and it could be a problem if layout is somewhat important.

        -----------------------------------------------------
        Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
        "I can see my house from here!"
        It's not what you know, but knowing how to find it if you don't know that's important

Re: Convert HTML docs with frames into a single HTML doc?
by AidanLee (Chaplain) on Nov 29, 2001 at 19:26 UTC

    What might be easier than generating a table to build the final document is CSS-shaped-and-aligned <div> tags.

    So, when analyzing the master document, you'll be able to acquire or derive the position and dimensions of all of the frames (you'd need this for building your tables anyways), and then you could wrap each child page (after removing html,head, and body tags) in this:

    <div style="position:absolute; top: ?; left: ?; width:?; height:?;"> child page content goes here </div>

    The big win here is you don't have to worry about grabbing the child documents in a set order. The absolute positioning will take care of visual ordering for you.

      <HTMLPurist Mode>
      Never ever never ever ever never never use absolution positioning for anything in HTML. You have no guarentees on screen size and thus, you might be pushing your content off the sides of the screen. You should always use relative positioning or sizing.
      </HTMLPurist Mode>

      True, the end goal of the poster was not for a page to be published, but just to get down into one page that could be viewed offline, so absolute positioning is not bad. But as advice for those that are considering converting an entire frames-based site to something clean in CSS, this advice very much applies.

      That said, I have myself tried creating a CSS-based site that emulated tables using relative sizing (as opposed to absolute). The various bugs at the time between IE 5 and NS 4.x (this about 2 years) made me realize that this will not happen easily unless you stick with a very simple layout. I would suspect that absolute positioning would work like a dream, but again, that's the wrong approach for web publishing.

      -----------------------------------------------------
      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      "I can see my house from here!"
      It's not what you know, but knowing how to find it if you don't know that's important

        <HTMLPracticalMode>
        True enough, however assuming (a dangerous thing) that the target audience is using PC-only devices (as opposed to wireless/etc) it is usually safe to assume that there is room for two columns of frames in the window (the menubar/main real-estate phenomenon). Additionally, it is possible to use relative values with absolute positioning and widths

        <div style="position:absolute; top:0px;left:0px;width:100%;height: +100px;" > header real estate </div> <div style="position:absolute; top:100px;left:0px;width:20%;" > left menu </div> <div style="position:absolute; top:100px;left:20%;width:80%;" > main real estate </div>
        </HTMLPracticalMode>

        I'm not going to disagree with you that in general staying away from anything that assumes a particular screen size, but your same advice would also apply to

        • frames
        • tables with width/height specifications

        And since he's already got frames here, whether he picks tables to maintain his possibly-too-wide page format, absolute-positioned seems easier

Re: Convert HTML docs with frames into a single HTML doc?
by Caillte (Friar) on Nov 29, 2001 at 19:50 UTC

    You may want to try replacing the frameset with layers. This will give the same look and feel as the frameset and will also allow for scrolling areas.

    $japh->{'Caillte'} = $me;