Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Regular Expressions - replacing newlines and carriage returns inside quotes

by PRyanRay (Novice)
on Oct 31, 2012 at 14:54 UTC ( #1001678=perlquestion: print w/replies, xml ) Need Help??
PRyanRay has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am wanting code that replaces newlines or carriage returns with the charachter \n. The following code will do this:
use strict; $_=<<'_quote_'; hai xtest "aa xx aax" baix "xx" x "axa\"x\\" xa "x\\\ \\"x" ax xbai!x _quote_ print "Original:\n", $_, "\n"; s/ ( (?: # at the beginning of the string match till inside the quotes ^(?&outside_quote) " # or continue from last match which always stops inside quotes | (?!^)\G ) (?&inside_quote) # eat things up till we find what we want ) \r?\n # the thing we want to replace ( (?&inside_quote) # eat more possibly till end of quote # if going out of quote make sure the match stops inside them # or at the end of string (?: " (?&outside_quote) (?:"|\z) )? ) (?(DEFINE) (?<outside_quote> [^"]*+ ) # just eat everything till quoting star +ts (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes ) /$1\n$2/xg; print "Replaced:\n", $_, "\n";
However, I want to be able to do this for a file that I read in (*.csv). For example, if I use the follwing to read the same file into $_, it does not work:
my $file="testdata.csv"; open(FILE, $file) or die "Can't open $file: $!\n"; select((select(FILE), $/ = undef)[0]); $_=<FILE>
Any ideas? And, no I am not able to use the Perl packages Spreadsheet::**** Here, testdata.csv is this business: hai xtest "aa xx aax" baix "xx" x "axa\"x\\" xa "x\\\ \\"x" ax xbai!x

Replies are listed 'Best First'.
Re: Regular Expressions - replacing newlines and carriage returns inside quotes
by mbethke (Hermit) on Oct 31, 2012 at 15:45 UTC

    Text::CSV is out of the question, too? I tried to do some "quick n dirty" CSV manipulation much like this recently but as usual things quickly got much more dirty than quick, I said bleep this, I'll use CPAN, and things were peachy. The main part of the resulting script looks like this:

    my $csv = Text::CSV->new({ binary => 1, sep_char => "\t", always_quote => 1, }) or do { print STDERR "Cannot initialize CSV: ", Text::CSV->error_diag, "\n +"; exit 1; }; LINE: while (my $row = $csv->getline(\*STDIN)) { s/\n/$replace/g foreach(@$row); unless($csv->combine(@$row)) { print STDERR "Error converting record $. for output: ", $csv-> +error_input, "\n"; next LINE; } print $csv->string, "\n"; } $csv->eof or $csv->error_diag;

    I tried getting your code to work but couldn't, maybe for lack of CRs in my source, but I'd doubt it covers all the subtleties CSV allows regarding quoting and escaping, let alone those it doesn't allow but you'll find anyway.

    Edit: I think we got ourselves a consensus here :-D

Re: Regular Expressions - replacing newlines and carriage returns inside quotes
by 2teez (Vicar) on Oct 31, 2012 at 15:41 UTC
      Thanks everybody, it turns out I do have the Text::CSV_XS module so I will work with this. I am locked behind some gnarly security fences so I have to reinvent the wheel quite often. Guess not this time. Thanks!
Re: Regular Expressions - replacing newlines and carriage returns inside quotes
by bitingduck (Chaplain) on Oct 31, 2012 at 15:42 UTC
    Try using Text::CSV to read the file in. If it's "well formed" CSV (if there is such a thing) then the CR and LF should be between quotes and be handled properly.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1001678]
Approved by Happy-the-monk
help
Chatterbox?
and dust plays in a shaft of sunlight...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2018-07-17 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (374 votes). Check out past polls.

    Notices?