Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Help with HTML::StripScripts::Parser

by danny0085 (Sexton)
on Aug 07, 2012 at 19:28 UTC ( #986063=perlquestion: print w/ replies, xml ) Need Help??
danny0085 has asked for the wisdom of the Perl Monks concerning the following question:

I need to delete the H2 tag but keep the class attributes from:
<ol id="intelliTxt" class="steps"> <li class="section"> <h2 class="header Heading3">Clear +Up Those Blackheads</h2> <ul> <li class="step "> <span clas +s="stepNumber">1</span> .. more code
I am using HTML::StripScripts::Parser:
my $hss = HTML::StripScripts::Parser->new({ Context => 'Flow', BanList => [qw( h1 h2 )], Rules => { div => { class => 1, + }, ul => { class => 1, }, li => { class => 1, + }, ol => { class => 1, + },} }); print $hss->filter_html($HTML);
But the output is:
<ol> <li> <!--filtered-->Clear Up Those Blac +kheads<!--filtered--> <ul> <li> <span>1</s +pan>
NO class attributes

Comment on Help with HTML::StripScripts::Parser
Select or Download Code
Replies are listed 'Best First'.
Re: Help with HTML::StripScripts::Parser
by tobyink (Abbot) on Aug 07, 2012 at 19:49 UTC

    Personally, I'd do it this way...

    use HTML::HTML5::Parser; use XML::LibXML 1.94; use XML::LibXML::QuerySelector; my $document = HTML::HTML5::Parser->load_html(IO => \*DATA); $document -> querySelectorAll('h1, h2') -> foreach(sub { $_->setNodeName('div') }); print $document->toString; __DATA__ <ol id="intelliTxt" class="steps"> <li class="section"> <h2 class="header Heading3">Clear Up Those Blackheads</h2> <ul> <li class="step "> <span class="stepNumber">1</span>
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Help with HTML::StripScripts::Parser
by ww (Bishop) on Aug 07, 2012 at 20:15 UTC
    Perhaps you should familiarize yourself with html and attributes, generally.

    You can't have bare attributes. You can put them in a <span class="...">Clear Up...</span> tag-pair in some cases, including this one, but stand-along attributes will merely be rendered as ordinary content in the user's browser.

      "You can't have bare attributes."

      I don't think the OP wanted to have bare attributes replacing the <h3> tag. I think the OP was concerned about the class attributes disappearing from the other elements, such as <ol>. That was my reading of the question anyway.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        I found a solution maybe not the best but works
        $HTML =~ s/<h2[^)]*h2>//g;
        No more H2 tags and class attributes OK

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://986063]
Approved by tobyink
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2015-07-08 07:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (96 votes), past polls