Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Perl RE; how to capture, and replace based on a block?

by educated_foo (Vicar)
on Dec 18, 2013 at 04:16 UTC ( #1067585=note: print w/ replies, xml ) Need Help??


in reply to Perl RE; how to capture, and replace based on a block?

To minimize unhelpful replies here, you should probably do something like this:

  1. Add "use strict;" to the top of your code.
  2. Add "my ();" right after it.
  3. Keep adding "VAR," between the parens in #2 as long as Perl complains about 'Global symbol "VAR"...'
Someone should probably write an Acme:: module to do this automatically.


Comment on Re: Perl RE; how to capture, and replace based on a block?
Re^2: Perl RE; how to capture, and replace based on a block?
by taint (Chaplain) on Dec 18, 2013 at 05:51 UTC

    Sorry. I just

    cat ./FILE.html | perl {...}
    in an open xterm. After several failures, and no more ideas. I closed the xterm, and asked for help. I didn't think it'd be of any use in the request.

    I've since read every single reference in the Perl documentation, and while I think I've got the RE part down. I'm quite sure I don't know how to feed Perl the file properly to do any more than eat a single line at a time.

    So let me have another go at it. The following

    #!/usr/bin/perl -w #retest.pl # my feeble attempt to a multi-line RE in Perl $regexp = shift; while (<>) { print if /$regexp/; }
    won't work as
    # ./retest.pl \</\div\>\n\<\/body\> ./FILE.html
    because shift will only manage input one line at a time. Attempts to figure how to make use of psed, and s2p, have failed miserably.

    Apologies for the previous noise, and thank you for the thoughtful responses.

    --Chris

    Yes. What say about me, is true.
    

      Hi Chris, specifying a regex on the command line seems a difficult thing to do. At least you should be printing your $regexp to see what it contains.

      In any case, this code seems to work:

      my $str = " </div> </body> "; print "Success\n" if $str =~ /\<\/div\>\n\<\/body\>/;

      which suggests that if you slurp in your whole file as a single string (e.g. by unsetting $/), your regex should do its job.

      local $/; my $str = <>; print "Success\n" if $str =~ /\<\/div\>\n\<\/body\>/;
        Perl said; Success.

        Thanks a million, hdb! Your suggestion has helped me greatly in putting the last piece in my current "puzzle".

        Thanks again. I'd like to buy a round of +'s, for the house.

        --Chris

        UPDATE; I forgot to mention. The reason I was feeding the file to Perl is
        1) That's what worked best for me with sed.
        2) It seemed the easiest way to experiment getting a correct match with Perl.
        Yes. What say about me, is true.
        
      For fiddling with little bits of code, just use the debugger straight away:
      swedish_chef> perl -demo Loading DB routines from perl5db.pl version 1.32 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): mo DB<1> $string = "one two three four" DB<2> x $string =~ m/(\w+)/g 0 'one' 1 'two' 2 'three' 3 'four'

      Note that "my" variables don't work as expected, I think they get created in the Debug scope, and not in the interpreted scope. But otherwise, have fun in the sandbox.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re^2: Perl RE; how to capture, and replace based on a block?
by taint (Chaplain) on Dec 18, 2013 at 07:11 UTC
    Thank you educated_foo.

    " Keep adding "VAR," between the parens in #2 as long as Perl complains about 'Global symbol "VAR"...' Someone should probably write an Acme:: module to do this automatically."

    I'll be glad to. Just as soon as I figure this all out. :)

    My biggest hangup, I think, is that I'm quite comfortable with sed. But sed is "greedy" by default, and while Perl RE can be. It's not, by default, and that's what I need here (not greedy).

    s/\<\/div\>/,/\<\/body\>/
    will match my pattern in sed. But it will match from the first </div> till the first </body>. Which is too much.

    Thanks again for the response, educated_foo

    --Chris

    Yes. What say about me, is true.
    

      ... sed ...

      here is my test program

      use re 'debug'; $_ = q{</div> </body>}; print 'does it match ', int m{\<\/div\>\n\<\/body\>};
      By default, Perl RE are greedy. Have you considered the possibility that the end of line might be more than \n (if the file is coming from Windows, for example)?
        Greetings, Laurent_R, and thanks for the reply.

        Oh yes. I'm keen on the \n v \r v \n\r thing, and you're absolutely correct. Except, in my case, I'm on a *NIX box, and I've written the files myself. So I know they're utf-8 (no BOM), with newlines, no "hard" returns. :)

        Maybe it's just the examples I was reading (perlrequick, perlretut, and perlfaq6) but I got the impression that Perl RE wasn't greedy. More Perl RE reading, I guess.

        Thanks again, for the response Laurent_R.

        --Chris

        Yes. What say about me, is true.
        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067585]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2014-09-21 08:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (168 votes), past polls