Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Help with HTML::StripScripts::Parser

by danny0085 (Sexton)
on Aug 07, 2012 at 19:28 UTC ( #986063=perlquestion: print w/replies, xml ) Need Help??
danny0085 has asked for the wisdom of the Perl Monks concerning the following question:

I need to delete the H2 tag but keep the class attributes from:
<ol id="intelliTxt" class="steps"> <li class="section"> <h2 class="header Heading3">Clear +Up Those Blackheads</h2> <ul> <li class="step "> <span clas +s="stepNumber">1</span> .. more code
I am using HTML::StripScripts::Parser:
my $hss = HTML::StripScripts::Parser->new({ Context => 'Flow', BanList => [qw( h1 h2 )], Rules => { div => { class => 1, + }, ul => { class => 1, }, li => { class => 1, + }, ol => { class => 1, + },} }); print $hss->filter_html($HTML);
But the output is:
<ol> <li> <!--filtered-->Clear Up Those Blac +kheads<!--filtered--> <ul> <li> <span>1</s +pan>
NO class attributes

Replies are listed 'Best First'.
Re: Help with HTML::StripScripts::Parser
by tobyink (Abbot) on Aug 07, 2012 at 19:49 UTC

    Personally, I'd do it this way...

    use HTML::HTML5::Parser; use XML::LibXML 1.94; use XML::LibXML::QuerySelector; my $document = HTML::HTML5::Parser->load_html(IO => \*DATA); $document -> querySelectorAll('h1, h2') -> foreach(sub { $_->setNodeName('div') }); print $document->toString; __DATA__ <ol id="intelliTxt" class="steps"> <li class="section"> <h2 class="header Heading3">Clear Up Those Blackheads</h2> <ul> <li class="step "> <span class="stepNumber">1</span>
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Help with HTML::StripScripts::Parser
by ww (Archbishop) on Aug 07, 2012 at 20:15 UTC
    Perhaps you should familiarize yourself with html and attributes, generally.

    You can't have bare attributes. You can put them in a <span class="...">Clear Up...</span> tag-pair in some cases, including this one, but stand-along attributes will merely be rendered as ordinary content in the user's browser.

      "You can't have bare attributes."

      I don't think the OP wanted to have bare attributes replacing the <h3> tag. I think the OP was concerned about the class attributes disappearing from the other elements, such as <ol>. That was my reading of the question anyway.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        I found a solution maybe not the best but works
        $HTML =~ s/<h2[^)]*h2>//g;
        No more H2 tags and class attributes OK

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://986063]
Approved by tobyink
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (10)
As of 2018-04-25 13:08 GMT
Find Nodes?
    Voting Booth?