Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: How to parse HTML5?

by choroba (Cardinal)
on Mar 08, 2016 at 12:57 UTC ( [id://1157071]=note: print w/replies, xml ) Need Help??


in reply to How to parse HTML5?

Crossposted to StackOverflow. It's considered polite to inform about crossposting, so that people not attending both sites don't waste their efforts hacking a solution for a problem already solved at the other end of the internet.

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^2: How to parse HTML5?
by Anonymous Monk on Mar 09, 2016 at 09:35 UTC
    Still this problem is not solved in both end. Okay, And i am not wasting my time as well as your time. I just try to solve my problem. If you can then thanks other wise it's okay.

      Out of curiosity, why do you say that the problem has not been solved?

      Prior to your claim that "this problem is not solved in both end", I posted to both forums (here at PerlMonks and here at Stack Overflow) a suggestion to check out HTML::Valid.

      According to the documentation of HTML::Tidy, you need to have tidyp installed first and tidyp appears to be a fork of tidy and that site indicates that it is the "HTML Tidy Legacy Website". The HTML::Valid module is based on the HTML Tidy project and it does support HTML5.

      And I'll take it a bit further. Here's a demonstration of HTML::Valid on the OP's posted HTML/XHTML data.

      I created a test.html file with the following content (from the OP):

      And here's the Perl code that uses HTML::Valid to check that file:

      And here's the output of that script:

      That shows that HTML::Valid is not having issues dealing with <section> tags and that is also provides line numbers and column numbers as the OP stated here as something that was needed. Unfortunately it looks like HTML::Valid does not have an ignore method that was in the OP's code had that used HTML::Tidy, so the OP may need to write a little bit more code to parse out the messages concerning tags that the OP wants to ignore.

      Unless I totally misunderstood what the "problem" was, it looks like HTML::Valid "solves" the "problem".

        Hi

        Is HTML::Valid is available with Active State Perl Because in PPM it does showing?

        Thanks

        Nikhil Ranjan

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1157071]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-03-30 08:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found