Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Perl RE; how to capture, and replace based on a block?

by educated_foo (Vicar)
on Dec 18, 2013 at 04:16 UTC ( #1067585=note: print w/ replies, xml ) Need Help??


in reply to Perl RE; how to capture, and replace based on a block?

To minimize unhelpful replies here, you should probably do something like this:

  1. Add "use strict;" to the top of your code.
  2. Add "my ();" right after it.
  3. Keep adding "VAR," between the parens in #2 as long as Perl complains about 'Global symbol "VAR"...'
Someone should probably write an Acme:: module to do this automatically.


Comment on Re: Perl RE; how to capture, and replace based on a block?
Re^2: Perl RE; how to capture, and replace based on a block?
by taint (Chaplain) on Dec 18, 2013 at 05:51 UTC

    Sorry. I just

    cat ./FILE.html | perl {...}
    in an open xterm. After several failures, and no more ideas. I closed the xterm, and asked for help. I didn't think it'd be of any use in the request.

    I've since read every single reference in the Perl documentation, and while I think I've got the RE part down. I'm quite sure I don't know how to feed Perl the file properly to do any more than eat a single line at a time.

    So let me have another go at it. The following

    #!/usr/bin/perl -w #retest.pl # my feeble attempt to a multi-line RE in Perl $regexp = shift; while (<>) { print if /$regexp/; }
    won't work as
    # ./retest.pl \</\div\>\n\<\/body\> ./FILE.html
    because shift will only manage input one line at a time. Attempts to figure how to make use of psed, and s2p, have failed miserably.

    Apologies for the previous noise, and thank you for the thoughtful responses.

    --Chris

    Yes. What say about me, is true.
    

      Hi Chris, specifying a regex on the command line seems a difficult thing to do. At least you should be printing your $regexp to see what it contains.

      In any case, this code seems to work:

      my $str = " </div> </body> "; print "Success\n" if $str =~ /\<\/div\>\n\<\/body\>/;

      which suggests that if you slurp in your whole file as a single string (e.g. by unsetting $/), your regex should do its job.

      local $/; my $str = <>; print "Success\n" if $str =~ /\<\/div\>\n\<\/body\>/;
        Perl said; Success.

        Thanks a million, hdb! Your suggestion has helped me greatly in putting the last piece in my current "puzzle".

        Thanks again. I'd like to buy a round of +'s, for the house.

        --Chris

        UPDATE; I forgot to mention. The reason I was feeding the file to Perl is
        1) That's what worked best for me with sed.
        2) It seemed the easiest way to experiment getting a correct match with Perl.
        Yes. What say about me, is true.
        
      For fiddling with little bits of code, just use the debugger straight away:
      swedish_chef> perl -demo Loading DB routines from perl5db.pl version 1.32 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): mo DB<1> $string = "one two three four" DB<2> x $string =~ m/(\w+)/g 0 'one' 1 'two' 2 'three' 3 'four'

      Note that "my" variables don't work as expected, I think they get created in the Debug scope, and not in the interpreted scope. But otherwise, have fun in the sandbox.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re^2: Perl RE; how to capture, and replace based on a block?
by taint (Chaplain) on Dec 18, 2013 at 07:11 UTC
    Thank you educated_foo.

    " Keep adding "VAR," between the parens in #2 as long as Perl complains about 'Global symbol "VAR"...' Someone should probably write an Acme:: module to do this automatically."

    I'll be glad to. Just as soon as I figure this all out. :)

    My biggest hangup, I think, is that I'm quite comfortable with sed. But sed is "greedy" by default, and while Perl RE can be. It's not, by default, and that's what I need here (not greedy).

    s/\<\/div\>/,/\<\/body\>/
    will match my pattern in sed. But it will match from the first </div> till the first </body>. Which is too much.

    Thanks again for the response, educated_foo

    --Chris

    Yes. What say about me, is true.
    

      ... sed ...

      here is my test program

      use re 'debug'; $_ = q{</div> </body>}; print 'does it match ', int m{\<\/div\>\n\<\/body\>};
      By default, Perl RE are greedy. Have you considered the possibility that the end of line might be more than \n (if the file is coming from Windows, for example)?
        Greetings, Laurent_R, and thanks for the reply.

        Oh yes. I'm keen on the \n v \r v \n\r thing, and you're absolutely correct. Except, in my case, I'm on a *NIX box, and I've written the files myself. So I know they're utf-8 (no BOM), with newlines, no "hard" returns. :)

        Maybe it's just the examples I was reading (perlrequick, perlretut, and perlfaq6) but I got the impression that Perl RE wasn't greedy. More Perl RE reading, I guess.

        Thanks again, for the response Laurent_R.

        --Chris

        Yes. What say about me, is true.
        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067585]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2015-07-03 16:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (54 votes), past polls