A serch and replace question

on Oct 31, 2003 at 16:15 UTC
I have been away from Perl for almost 2 years and now I find myself needing a quick, elegant solution for what should be a simple problem.

I have a file that comtains records delimited by crlf (0D0A). In this file there are extra crlf's. The extra ones are always followed by || (two pipes).

I am trying to replace the "\n||" with a simple "||".

Here's what I'm doing. It's not working.

#! /usr/bin/perl $datafile=@ARGV[0]; my $data; { local($/) = undef; open (FILE, "<$datafile"); $data = <FILE>; close FILE; $data =~ s/\x0d\x0a\x7c\x7c/\x7c\x7c/sg; if ($data =~ m/\x0d\x0a\x7c\x7c/) { print "FOUND IT!\n"; } else { print "DIDN'T FIND IT.\n"; } open (OUTFILE, ">$datafile"); print OUTFILE $data, "-Damn"; close OUTFILE; }
Any help would be greatly appreciated.


Re: A serch and replace question
on Oct 31, 2003 at 16:46 UTC
    If you are trying to replace the "\n||" with a simple "||", why not simply do this?:
    $data =~ s/\n\|\|/\|\|/g;
      Thanks, that got it. The scary thing is that I'm certain I tried that already. But, to be honest with you I don't care that it didn't work before. I'm just greatful it works now.

      Thanks again


Re: A serch and replace question
on Oct 31, 2003 at 16:32 UTC
    That's because | is special to the regexp engine, and using hex escapes isn't going to disable that. Try this:
    $data =~ s/\x0d\x0a[|][|]/||/g; # No /s needed.


      Not true. Using -Mre=debug (output trimmed a bit)
      /a|b/ yields:

      Compiling REx `a|b' 1: BRANCH(4) 2: EXACT <a>(7) 4: BRANCH(7) 5: EXACT <b>(7) 7: END(0) minlen 1

      but /a\x7cb/ yields:
      Compiling REx `a\x7cb' 1: EXACT <a|b>(3) 3: END(0) anchored `a|b' at 0 (checking anchored isall) minlen 3

      (tested w/ 5.005_03, 5.6.0, and 5.8.0)
      I figured using Hex escapes would get around that issue. I guess I was wrong. Thanks.


Re: A serch and replace question
on Oct 31, 2003 at 16:41 UTC
    I tried the following :

    #! /usr/bin/perl -w use strict; use warnings; my $data; { local($/) = undef; $data = <DATA>; $data =~ s/\n\|\|/\|\|/sg; print "$data"; } __DATA__ some stuff more embedded crlf ||and other stuff end


    some stuff more embedded crlf||and other stuff end
Re: A search and replace question
on Oct 31, 2003 at 17:32 UTC
    Why not this way?
    #!perl use strict; use warnings; $^I=''; $/="\n||"; while (<>) { s/\n\|\|$/||/; print; print "-Damn" if eof; }
      I'll note that several of the replies have been using "\n" when the original problem description said the two pipes were preceded by a carriage return character and a line feed character. So, your code should set $/ = "\xd\xa||" rather than rely on the local newline convention as embodied by "\n".
Re: A serch and replace question
on Nov 01, 2003 at 11:30 UTC
    Don't go looking for the CR characters. When on Windows/DOS/etc, they will have been stripped by perl, without binmode, when they appear in front of the linefeed — the normal case. That's why you failed, IMO. If you want to keep searching for them (for example for on Linux), make them optional. That's the best of both worlds.
Re: A serch and replace question
on Nov 01, 2003 at 07:05 UTC
    pizza@pizzabox:~/pl$ cat DATA a || b || c || pizza@pizzabox:~/pl$ perl -pe'BEGIN{undef $/} s/\n\|\|//gm' DATA a b c pizza@pizzabox:~/pl$

    map print(chr(hex((q{6f634070617a6d692e7273650a}=~/../g)hex))),(q{375542349abb99098106c}=~/./g)

    Voting Booth?