Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

how-to strip empty HTML tags like <b> </b>

by russmann (Initiate)
on Sep 06, 2001 at 04:03 UTC ( [id://110456]=perlquestion: print w/replies, xml ) Need Help??

russmann has asked for the wisdom of the Perl Monks concerning the following question: (regular expressions)

how-to strip empty HTML tags like <b> </b>

Originally posted as a Categorized Question.

  • Comment on how-to strip empty HTML tags like &lt;b&gt; &lt;/b&gt;

Replies are listed 'Best First'.
Re: how-to strip empty HTML tags like b /b
by tachyon (Chancellor) on Sep 06, 2001 at 05:40 UTC
Re: how-to strip empty HTML tags like b /b
by tachyon (Chancellor) on Sep 06, 2001 at 04:56 UTC

    The normal answer would be use HTML::TokeParser if you want a reliable solution to parse HTML. For this *particular* task a well constructed regex should suffice. This will strip all the <b> </b> tags that are empty. I believe it covers all bases.

    $text = join '', <DATA>; print $text; $text =~ s#<\s*b\s*>(?:[\s\n]|&nbsp;)*<\s*/\s*b\s*>##ig; print $text; __DATA__ <p>test <p>test<b></b> <p>test<b >&nbsp; </b> <p>test<b> </b> <p>test<b ></b > <p>test<B></B> <p>test<B> </B> <p>test<B ></B> <p>test<B > </B > <p>test<B> &nbsp; &nbsp; </B> <p><b>I am not empty!</b>

      I'd not replace the non-breaking spaces, maybe you could remove the bold tags, but removing the nbsp can mess up the layout of a table.

      Just my euro 0.02

      -- Hofmator

Re: how-to strip empty HTML tags like b /b
by Hofmator (Curate) on Sep 06, 2001 at 17:20 UTC

    I'd also go with tachyon's suggestion of HTML Tidy, but if you are trying to do this quick and dirty somewhere in the middle of a script, I'd use this regex $text =~ s#<\s*([^>]*)\s*>[\s\n]*<\s*/\s*\1\s*>##ig; It should remove any empty tags which don't contain any attributes (not just bold tags), so it works on

    __DATA__ <i> </ I> < B ></b> < em> < / eM >

Re: how-to strip empty HTML tags like <b> </b>
by Anonymous Monk on Apr 17, 2003 at 19:10 UTC
    You might find this interesting: We made a javascript function that trims leading/trailing spaces from field values, but found it did not include the nonbreaking space character (nbsp;) when it appeared.
    Before fixing it, our function looked like this:
    -------------------------------------
    String.prototype.Trim = function() {

    return this.replace(/(^\s*)|(\s*$)/g, "");

    }
    -------------------------------------
    After discovering that values that included nbsp; were not getting trimmed, I found the following to be true:

    nbsp; = chr(160) = xA0

    So have now modified the Trim function to read:

    -------------------------------------
    String.prototype.Trim = function() {
    return this.replace(/(^{\s\xA0#}*)|({\s\xA0#}*$)/g, "");
    }
    -------------------------------------

    (Please note that I used CURLY BRACKETS in the example above, but if you use this, use SQUARE BRACKETS -- they just wouldn't display on this page when I typed them in...)

    And it works!! FYI, to clean field values, our javascript code calls it this way:
    document.forms[0].myField.value.Trim();
    Hope this helps someone,
    Susan Henesy

    Originally posted as a Categorized Answer.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://110456]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (6)
As of 2024-04-19 03:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found