Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

read a file and insert closing tags if not present

by valavanp (Curate)
on Mar 29, 2007 at 06:41 UTC ( #607168=perlquestion: print w/ replies, xml ) Need Help??
valavanp has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I need to read a file and insert the closing tags for any opened tags which is not closed. How should i approach. Ideas and thoughts will be much appreciated. Thanks monks for your valuable suggestions.

Comment on read a file and insert closing tags if not present
Re: read a file and insert closing tags if not present
by f00li5h (Chaplain) on Mar 29, 2007 at 07:01 UTC

    Exactly what type of file is it? I would presume some sort of markup language.

    What are you hoping to get back from this script, and what is your overall goal?

    What code do you have so far? can we see that, and perhaps offer pointers from there?

    Which modules have you investigaed? I hear there are some really good modules for parsing CSV, HTML and all manner of other things.

    @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;
Re: read a file and insert closing tags if not present
by shigetsu (Hermit) on Mar 29, 2007 at 07:01 UTC

    May I ask, if you have any code so far?

    Update: Missed f00li5h's post.

      The file is sgml file. i don't know how to approach to find the closing tags in that file.
Re: read a file and insert closing tags if not present
by GrandFather (Cardinal) on Mar 29, 2007 at 07:17 UTC

    You are most likely looking for modules like HTML::Tidy, HTML::TreeBuilder or XML::Twig.

    If you show us a small sample of the sort of data you have to deal with and the code you have tried we may be able to give more specific answers.


    DWIM is Perl's answer to Gödel
      Hi grandfather, This is the code which i tried.
      require HTML::TokeParser; $p = HTML::TokeParser->new("output.xml") || die "Can't open: $!"; $p->empty_element_tags(1); open(FH, "output.xml"); print FH $p; close FH;
      output.xml
      <greeting class="simple">Hello, world!
      The above file is a sample file which i tried to insert the closing tag for the greeting. Actually i have a file which contains 500 lines of text with tagging. for. example in that file i have a tag named <to> but it's not closed. I have to insert the closing tag. This is an example. Thanks for your suggestion.

        You can guess sometimes, but there is no way of knowing where the right place for it is.

        in the example,<p> foo <p> bar, you can see where the </p>'s should go, because you can't nest p tags but if you have <span style="rly">Oh, rly<span style="ya">ya, rly there is no real way of knowing where the </span>'s should go, because they can legally be nested.

        You'll most likely have to write rules for how (and where) to end each tag, so that you don't mess the nesting of things (like finding your whole document in a <a href="foo"> or something)

        @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;

        HTML::TreeBuilder handles that simple case:

        use strict; use warnings; use HTML::TreeBuilder; my $sgml = <<SGML; <greeting class="simple">Hello, world! SGML my $root = HTML::TreeBuilder->new (); $root->ignore_unknown (0); $root->parse ($sgml); print $root->guts (0)->as_XML ();

        Prints:

        <greeting class="simple">Hello, world!</greeting>

        although I'd not guarantee it will accept everything a real SGML document may contain.


        DWIM is Perl's answer to Gödel
Re: read a file and insert closing tags if not present
by gopalr (Priest) on Mar 29, 2007 at 10:36 UTC

    Hi Valavan,

    my $sgml = <<SGML; <html> <greeting class="simple">Hello, world!<head>heading</head> </html> SGML while ($sgml=~s#(<)([^/<>\s]+)((?:\s[^/<>]+)?>)([^<>]+)(<[^/<>]+>)#$1$ +2$3$4$1\/$2>$5#){} print "\n\n"; print "\nOutput:\n$sgml\n"; print "\n\n";

    Input:

    <html> <greeting class="simple">Hello, world!<head>heading</head> </html>

    Output:

    <html> <greeting class="simple">Hello, world!</greeting><head>heading</head> </html>
    ~ ~ ~ ~ ~
Re: read a file and insert closing tags if not present
by planetscape (Canon) on Mar 29, 2007 at 14:34 UTC

    I second GrandFather's recommendation to have a closer look at HTML Tidy. As documented here,

    • Missing or mismatched end tags are detected and corrected
    • End tags in the wrong order are corrected

    HTH,

    planetscape

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://607168]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (11)
As of 2014-12-22 20:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (128 votes), past polls