http://www.perlmonks.org?node_id=303584

ChuckularOne has asked for the wisdom of the Perl Monks concerning the following question:

I have been away from Perl for almost 2 years and now I find myself needing a quick, elegant solution for what should be a simple problem.

I have a file that comtains records delimited by crlf (0D0A). In this file there are extra crlf's. The extra ones are always followed by || (two pipes).

I am trying to replace the "\n||" with a simple "||".

Here's what I'm doing. It's not working.

#! /usr/bin/perl $datafile=@ARGV[0]; my $data; { local($/) = undef; open (FILE, "<$datafile"); $data = <FILE>; close FILE; $data =~ s/\x0d\x0a\x7c\x7c/\x7c\x7c/sg; if ($data =~ m/\x0d\x0a\x7c\x7c/) { print "FOUND IT!\n"; } else { print "DIDN'T FIND IT.\n"; } open (OUTFILE, ">$datafile"); print OUTFILE $data, "-Damn"; close OUTFILE; }
Any help would be greatly appreciated.

-Chuckularone

Replies are listed 'Best First'.
Re: A serch and replace question
by PERLscienceman (Curate) on Oct 31, 2003 at 16:46 UTC
    If you are trying to replace the "\n||" with a simple "||", why not simply do this?:
    $data =~ s/\n\|\|/\|\|/g;
      Thanks, that got it. The scary thing is that I'm certain I tried that already. But, to be honest with you I don't care that it didn't work before. I'm just greatful it works now.

      Thanks again

      -Chuckularone

Re: A serch and replace question
by Abigail-II (Bishop) on Oct 31, 2003 at 16:32 UTC
    That's because | is special to the regexp engine, and using hex escapes isn't going to disable that. Try this:
    $data =~ s/\x0d\x0a[|][|]/||/g; # No /s needed.

    Abigail

      Not true. Using -Mre=debug (output trimmed a bit)
      /a|b/ yields:

      Compiling REx `a|b' 1: BRANCH(4) 2: EXACT <a>(7) 4: BRANCH(7) 5: EXACT <b>(7) 7: END(0) minlen 1

      but /a\x7cb/ yields:
      Compiling REx `a\x7cb' 1: EXACT <a|b>(3) 3: END(0) anchored `a|b' at 0 (checking anchored isall) minlen 3

      (tested w/ 5.005_03, 5.6.0, and 5.8.0)
      I figured using Hex escapes would get around that issue. I guess I was wrong. Thanks.

      -Chuckularone

Re: A serch and replace question
by inman (Curate) on Oct 31, 2003 at 16:41 UTC
    I tried the following :

    #! /usr/bin/perl -w use strict; use warnings; my $data; { local($/) = undef; $data = <DATA>; $data =~ s/\n\|\|/\|\|/sg; print "$data"; } __DATA__ some stuff more embedded crlf ||and other stuff end

    produces:

    some stuff more embedded crlf||and other stuff end
    inman
Re: A search and replace question
by Roy Johnson (Monsignor) on Oct 31, 2003 at 17:32 UTC
    Why not this way?
    #!perl use strict; use warnings; $^I=''; $/="\n||"; while (<>) { s/\n\|\|$/||/; print; print "-Damn" if eof; }
      I'll note that several of the replies have been using "\n" when the original problem description said the two pipes were preceded by a carriage return character and a line feed character. So, your code should set $/ = "\xd\xa||" rather than rely on the local newline convention as embodied by "\n".
Re: A serch and replace question
by bart (Canon) on Nov 01, 2003 at 11:30 UTC
    Don't go looking for the CR characters. When on Windows/DOS/etc, they will have been stripped by perl, without binmode, when they appear in front of the linefeed — the normal case. That's why you failed, IMO. If you want to keep searching for them (for example for on Linux), make them optional. That's the best of both worlds.
    s/\r?\n(?=\|\|)//g;
Re: A serch and replace question
by pizza_milkshake (Monk) on Nov 01, 2003 at 07:05 UTC
    pizza@pizzabox:~/pl$ cat DATA a || b || c || pizza@pizzabox:~/pl$ perl -pe'BEGIN{undef $/} s/\n\|\|//gm' DATA a b c pizza@pizzabox:~/pl$

    map print(chr(hex((q{6f634070617a6d692e7273650a}=~/../g)hex))),(q{375542349abb99098106c}=~/./g)