Perl RE; how to capture, and replace based on a block?

taint has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Monks.

I'm working on editing some documents (web pages), where I need to replace a block context, with a larger one. I've experimented, but can't yet get it quite right. For example, I am attempting to match the following:

</div>
</body>
[download]

for some reason (my lack of experience with Perl RE) this doesn't work

\<\/div\>\n\<\/body\>
[download]

I can match </div> or </body>. But not both. Sorry for the bother. I'm so good with sed I feel I should be closer with Perl, than I am. But still haven't quite got the hang of it. :/

Thank you for your time, and consideration.

--Chris

UPDATE -- now with my broken example

Yes. What say about me, is true.

Comment on Perl RE; how to capture, and replace based on a block? Select or Download Code

Replies are listed 'Best First'.
Re: Perl RE; how to capture, and replace based on a block? by GrandFather (Saint) on Dec 18, 2013 at 00:23 UTC
Don't do that. As a general thing correctly parsing and editing XML is tricky. Use a module such as XML::Twig to do the hard work for you. If you'd provided sample code, input data and expected output most likely someone would show you how to use an appropriate module to do the job. True laziness is hard work	[reply]
Re^2: Perl RE; how to capture, and replace based on a block? by taint (Chaplain) on Dec 18, 2013 at 00:33 UTC
Ahh. I see. I didn't have any code (other than the RE I was using). Because, at this point, if I can't even match the block. There would be no point in attempting to replace. So I hadn't bothered to attempt to replace anything yet. I'm still trying to figure out how to correctly match what I need. Just seemed the logical progression, in learning to do it. :) Thanks GrandFather, for the reply (and suggestion). --Chris Yes. What say about me, is true.	[reply]
Re^3: Perl RE; how to capture, and replace based on a block? by AnomalousMonk (Archbishop) on Dec 18, 2013 at 01:39 UTC
I agree that use of an XML parser is likely to be a better idea, but just an example of what your fellow monks were hoping for as an example of what you tried and what resulted (except this works): `>perl -wMstrict -le "my $s = qq{xxx </div>\n</body> xxx}; print qq{[[$s]]}; ;; my $tags = qr{ </div> \n </body> }xms; $s =~ s{ $tags }{gone}xms; print qq{[[$s]]}; " [[xxx </div> </body> xxx]] [[xxx gone xxx]]` [download] Question: Are you sure it's only a single newline that's present? The presence of other whitespace characters than just a newline can confuse the issue. The following might be a better regex: `qr{ </div> \s* </body> }xms`	[reply] [d/l] [select]
Re: Perl RE; how to capture, and replace based on a block? by educated_foo (Vicar) on Dec 18, 2013 at 04:16 UTC
To minimize unhelpful replies here, you should probably do something like this: Add "use strict;" to the top of your code. Add "my ();" right after it. Keep adding "VAR," between the parens in #2 as long as Perl complains about 'Global symbol "VAR"...' Someone should probably write an Acme:: module to do this automatically.	[reply]
Re^2: Perl RE; how to capture, and replace based on a block? by taint (Chaplain) on Dec 18, 2013 at 05:51 UTC
Sorry. I just `cat ./FILE.html \| perl {...}` [download] in an open xterm. After several failures, and no more ideas. I closed the xterm, and asked for help. I didn't think it'd be of any use in the request. I've since read every single reference in the Perl documentation, and while I think I've got the RE part down. I'm quite sure I don't know how to feed Perl the file properly to do any more than eat a single line at a time. So let me have another go at it. The following `#!/usr/bin/perl -w #retest.pl # my feeble attempt to a multi-line RE in Perl $regexp = shift; while (<>) { print if /$regexp/; }` [download] won't work as `# ./retest.pl \</\div\>\n\<\/body\> ./FILE.html` [download] because `shift` will only manage input one line at a time. Attempts to figure how to make use of psed, and s2p, have failed miserably. Apologies for the previous noise, and thank you for the thoughtful responses. --Chris Yes. What say about me, is true.	[reply] [d/l] [select]
Re^3: Perl RE; how to capture, and replace based on a block? by hdb (Monsignor) on Dec 18, 2013 at 07:29 UTC
Hi Chris, specifying a regex on the command line seems a difficult thing to do. At least you should be printing your `$regexp` to see what it contains. In any case, this code seems to work: `my $str = " </div> </body> "; print "Success\n" if $str =~ /\<\/div\>\n\<\/body\>/;` [download] which suggests that if you slurp in your whole file as a single string (e.g. by unsetting $/), your regex should do its job. `local $/; my $str = <>; print "Success\n" if $str =~ /\<\/div\>\n\<\/body\>/;` [download]	[reply] [d/l] [select]
Re^4: Perl RE; how to capture, and replace based on a block? by taint (Chaplain) on Dec 18, 2013 at 07:51 UTC
Re^3: Perl RE; how to capture, and replace based on a block? by QM (Parson) on Dec 19, 2013 at 09:41 UTC
For fiddling with little bits of code, just use the debugger straight away: swedish_chef> perl -demo Loading DB routines from perl5db.pl version 1.32 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): mo DB<1> $string = "one two three four" DB<2> x $string =~ m/(\w+)/g 0 'one' 1 'two' 2 'three' 3 'four' [download] Note that "my" variables don't work as expected, I think they get created in the Debug scope, and not in the interpreted scope. But otherwise, have fun in the sandbox. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re^2: Perl RE; how to capture, and replace based on a block? by taint (Chaplain) on Dec 18, 2013 at 07:11 UTC
Thank you educated_foo. " Keep adding "VAR," between the parens in #2 as long as Perl complains about 'Global symbol "VAR"...' Someone should probably write an Acme:: module to do this automatically." I'll be glad to. Just as soon as I figure this all out. :) My biggest hangup, I think, is that I'm quite comfortable with sed. But sed is "greedy" by default, and while Perl RE can be. It's not, by default, and that's what I need here (not greedy). `s/\<\/div\>/,/\<\/body\>/` [download] will match my pattern in sed. But it will match from the first `</div>` till the first `</body>`. Which is too much. Thanks again for the response, educated_foo --Chris Yes. What say about me, is true.	[reply] [d/l]
Re^3: Perl RE; how to capture, and replace based on a block? by Laurent_R (Canon) on Dec 18, 2013 at 07:46 UTC
By default, Perl RE are greedy. Have you considered the possibility that the end of line might be more than \n (if the file is coming from Windows, for example)?	[reply]
Re^4: Perl RE; how to capture, and replace based on a block? by taint (Chaplain) on Dec 18, 2013 at 08:02 UTC
Re^5: Perl RE; how to capture, and replace based on a block? by Laurent_R (Canon) on Dec 18, 2013 at 18:36 UTC
Some notes below your chosen depth have not been shown here
Re^3: Perl RE; how to capture, and replace based on a block? by Anonymous Monk on Dec 18, 2013 at 07:29 UTC
... sed ... here is my test program `use re 'debug'; $_ = q{</div> </body>}; print 'does it match ', int m{\<\/div\>\n\<\/body\>};` [download]	[reply] [d/l]
Re: Perl RE; how to capture, and replace based on a block? by Anonymous Monk on Dec 18, 2013 at 00:12 UTC
How about you post actual perl code, you know, stuff ready to run?	[reply]
Re^2: Perl RE; how to capture, and replace based on a block? by taint (Chaplain) on Dec 18, 2013 at 00:26 UTC
Um. I did that; Code to replace `</div> </body>` [download] RE I'm using, that doesn't work `\<\/div\>\n\<\/body\>` [download] As stated in my OP; my RE (shown) matches one, or the other, not both. I had hoped to match both (`</div></body>`). Is it clearer? --Chris Yes. What say about me, is true.	[reply] [d/l] [select]
Re^3: Perl RE; how to capture, and replace based on a block? by Anonymous Monk on Dec 18, 2013 at 00:30 UTC
Um. I did that; Sorry but you didn't. You posted some data and a pattern you say you want to match the data but it doesn't match -- great, now show your code that uses the pattern with this data that fails to match it works "perfectly" as expected `Compiling REx "\<\/div\>\n\<\/body\>" Final program: 1: EXACT <</div>\n</body>> (6) 6: END (0) anchored "</div>%n</body>" at 0 (checking anchored isall) minlen 14 Guessing start of match in sv for REx "\<\/div\>\n\<\/body\>" against +"</div>%n</body>" Found anchored substr "</div>%n</body>" at offset 0... Guessed: match at offset 0 Freeing REx: "\<\/div\>\n\<\/body\>"` [download]	[reply] [d/l]
Re^4: Perl RE; how to capture, and replace based on a block? by taint (Chaplain) on Dec 18, 2013 at 01:22 UTC
Re^5: Perl RE; how to capture, and replace based on a block? by AnomalousMonk (Archbishop) on Dec 18, 2013 at 01:42 UTC
Re^5: Perl RE; how to capture, and replace based on a block? by Anonymous Monk on Dec 18, 2013 at 02:18 UTC

Back to Seekers of Perl Wisdom