Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

How would I find a piece of html from a sourcefile.html?

by Corry (Initiate)
on May 26, 2000 at 14:01 UTC ( #14941=categorized question: print w/replies, xml ) Need Help??
Contributed by Corry on May 26, 2000 at 14:01 UTC
Q&A  > regular expressions


Hi, I want to grep a piece of code from a html source file somewhere in the middle of it. The wanted snippet is situated between comment tags. Can anyone help me getting started? (I ordered the book "regular expressions" from O'reilly. Till it arrives, I hope you guys can help.) thnx in advance, Corry.

Answer: How would I find a piece of html from a sourcefile.html?
contributed by dsb

This regular expression will grap the entire tag, brackets and all.

$data =~ m/(<[^>]+>)/; print $1, "\n"; # print the tag
This regex will leave out the brackets and print only the string inside them:
$data =~ m/<([^>]+)>/; print $1, "\n"; # print the tag
Use modifiers, or loops as you need too. -kel
Answer: How would I find a piece of html from a sourcefile.html?
contributed by athomason

Comments make HTML extraction even more difficult than it usually is. However, if you're dealing with fairly standard HTML you could use

$page =~ /<!--\w+(.*)\w*-->;/; $commented = $1;
This will grab the string inside the comment; add appropriate qualifiers to the regexp as necessary (or use another on $commented) if you only want to pick certain stuff out.

Please (register and) log in if you wish to add an answer

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others romping around the Monastery: (6)
    As of 2018-06-24 19:09 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (126 votes). Check out past polls.