Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
A couple things about turnstep's answer.. on a minor issue, it will only catch capital SRC, which might not catch them all. A more important point, though, is that IMG tags aren't the only ones with SRC attributes.. FRAME and JAVASCRIPT come to mind. For a nasty one, I'd try something like this:
$html =~ s/(<\s*img\s+.*src\s*=\s*)(")?.*?(?(2)")([\s>])/$1"newimage.j +pg"$3/sig;
To go through this in parts.. The first group of parentheses is catching the beginning of the tag, with optional whitespace checking, followed by a bunch of junk (the src attribute doesn't necessarily have to follow the img, e.g. <img border=0 src="img.gif">). This matches up to the src= part. Next, a quote is matched if there is one, and if there is a quote, the match is taken up to the closing quote. The match ends with either whitespace or a tag close. The $1 match is everything up to the name of the image, which is being preserved. Then, your new image is subbed in, and the original image name is disregarded. The i flag is needed to catch src and SRC (and sRc, etc.), and the s flag in case the image tag is broken up on to multiple lines. This is a pretty difficult regular expression (which went through moderate testing..), but if you're up to reading through the perlre man pages, you should be able to understand it all. Let me know if there are any questions about it.

In reply to Re: Extract and modify IMG SRC tags in an HTML document. by plaid
in thread Extract and modify IMG SRC tags in an HTML document. by jmpvm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (12)
    As of 2014-09-22 19:20 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (198 votes), past polls