Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

How would I find a piece of html from a sourcefile.html?

( #14941=categorized question: print w/ replies, xml ) Need Help??
Contributed by Corry on May 26, 2000 at 14:01 UTC
Q&A  > regular expressions


Description:

Hi, I want to grep a piece of code from a html source file somewhere in the middle of it. The wanted snippet is situated between comment tags. Can anyone help me getting started? (I ordered the book "regular expressions" from O'reilly. Till it arrives, I hope you guys can help.) thnx in advance, Corry.

Answer: How would I find a piece of html from a sourcefile.html?
contributed by dsb

This regular expression will grap the entire tag, brackets and all.

$data =~ m/(<[^>]+>)/; print $1, "\n"; # print the tag
This regex will leave out the brackets and print only the string inside them:
$data =~ m/<([^>]+)>/; print $1, "\n"; # print the tag
Use modifiers, or loops as you need too. -kel
Answer: How would I find a piece of html from a sourcefile.html?
contributed by athomason

Comments make HTML extraction even more difficult than it usually is. However, if you're dealing with fairly standard HTML you could use

$page =~ /<!--\w+(.*)\w*-->;/; $commented = $1;
This will grab the string inside the comment; add appropriate qualifiers to the regexp as necessary (or use another on $commented) if you only want to pick certain stuff out.

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (5)
    As of 2014-09-22 02:01 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (177 votes), past polls