Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

How would I find a piece of html from a sourcefile.html?

by Corry (Initiate)
on May 26, 2000 at 14:01 UTC ( #14941=categorized question: print w/replies, xml ) Need Help??
Contributed by Corry on May 26, 2000 at 14:01 UTC
Q&A  > regular expressions


Hi, I want to grep a piece of code from a html source file somewhere in the middle of it. The wanted snippet is situated between comment tags. Can anyone help me getting started? (I ordered the book "regular expressions" from O'reilly. Till it arrives, I hope you guys can help.) thnx in advance, Corry.

Answer: How would I find a piece of html from a sourcefile.html?
contributed by dsb

This regular expression will grap the entire tag, brackets and all.

$data =~ m/(<[^>]+>)/; print $1, "\n"; # print the tag
This regex will leave out the brackets and print only the string inside them:
$data =~ m/<([^>]+)>/; print $1, "\n"; # print the tag
Use modifiers, or loops as you need too. -kel
Answer: How would I find a piece of html from a sourcefile.html?
contributed by athomason

Comments make HTML extraction even more difficult than it usually is. However, if you're dealing with fairly standard HTML you could use

$page =~ /<!--\w+(.*)\w*-->;/; $commented = $1;
This will grab the string inside the comment; add appropriate qualifiers to the regexp as necessary (or use another on $commented) if you only want to pick certain stuff out.

Please (register and) log in if you wish to add an answer

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [holli]: There are Anti Vaxxer weeks at IKEA.
    [Corion]: Of course, it's a 20% cut in the money I receive, while the work doesn't necessarily reduce at all, but my approach is to make the work take longer instead of fitting 5 days worth of work into 4
    [holli]: 50% off all children coffins.
    [Corion]: holli: Ooof :)
    [hippo]: Cut should beless than 20% after tax, though. :-)
    [Corion]: hippo: Yeah, but at least two years ago, it still was close enough to 20% cut
    [Corion]: But I have a very positive experience with a four day workweek and a three day weekend. I can't easily go back though to full money.
    [Corion]: That is easy without having to pay for a house, a wife or children though. If I had any of these, or any two of these, the decision wouldn't be that easy.
    [ambrus]: wait. I understand no wife and children, but how do you not have to pay for a house?

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (10)
    As of 2017-09-21 15:13 GMT
    Find Nodes?
      Voting Booth?
      During the recent solar eclipse, I:

      Results (249 votes). Check out past polls.