Re: Embedded Newlines or Converting One-Liner to Loop

Replies are listed 'Best First'.
Re^2: Embedded Newlines or Converting One-Liner to Loop by mwb613 (Beadle) on Dec 15, 2016 at 06:59 UTC
Thanks Rolf! I believe I was able to make your idea work. I've posted a solution below, feel free to critique if you have the notion. my $wait_for_odd_quotes = 0; my $line_accumulator = ''; while(<$CSVFILE>) { chomp(my $this_line = $_); my @matches = $this_line =~ /(\")/g; my $count = @matches; if($wait_for_odd_quotes == 0){ if($count % 2 == 1){ $line_accumulator = $this_line; #Reset Accumulator $wait_for_odd_quotes = 1; #Prime next loop to look for end + of quotes } else { print $OUTFILE $this_line . "\n"; #We are not looking for +and end quote and this line doesn't have an odd number of quotes so w +e'll write it to file } } else { if($count % 2 == 1){ $line_accumulator .= $this_line; #matched our open quotes, + taking this last bit $wait_for_odd_quotes = 0; #reset so next loop knows we're +not looking to close print $OUTFILE $line_accumulator . "\n"; } else { $line_accumulator .= ' ' . $this_line; } } } print $OUTFILE $line_accumulator . "\n"; #catch final line if it had e +mbedded newlines [download]	[reply] [d/l]
Re^3: Embedded Newlines or Converting One-Liner to Loop by Marshall (Canon) on Dec 15, 2016 at 08:45 UTC
Another implemenation based on Rolf's suggestion: #!/usr/bin/perl use strict; use warnings; while (my $line = get_CSVline()) { print "$line\n"; } sub get_CSVline { my $buffer; while (!defined($buffer) or is_odd_quotes($buffer) ) { my $temp =<DATA>; $buffer .= $temp; } chomp $buffer; $buffer =~ s/\n/\\n/g; #### make "\n" "visible" ### return $buffer; } sub is_even_quotes { my $string = shift; return !( ($string=~tr/"//) % 2); } sub is_odd_quotes { my $string = shift; return ( ($string=~tr/"//) % 2); } =PRINTS: 1,2,3.3,"\n",4,5 6,7,8,9 6,"\n",7,8 a,b,c,"something\nmore" 1,2,3 1,"x\n","y","z\n",3 "3.5 "" disks" =cut __DATA__ 1,2,3.3," ",4,5 6,7,8,9 6," ",7,8 a,b,c,"something more" 1,2,3 1,"x ","y","z ",3 "3.5 "" disks" [download] Update: Added one more test case. Added the '"3.5 "" disks"' test case.	[reply] [d/l]
Re^4: Embedded Newlines or Converting One-Liner to Loop by LanX (Saint) on Dec 15, 2016 at 11:09 UTC
Just a minor nitpick, your test cases doesn't cover an escaped double quote. Adding a line like `,"3.5 "" disks",` might help. :) edit And you wouldn't need to check `defined $buffer` if you used a do-while loop. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l] [select]
Re^5: Embedded Newlines or Converting One-Liner to Loop by Marshall (Canon) on Dec 15, 2016 at 13:30 UTC
Re^6: Embedded Newlines or Converting One-Liner to Loop by LanX (Saint) on Dec 15, 2016 at 14:32 UTC
Re^3: Embedded Newlines or Converting One-Liner to Loop by perldigious (Priest) on Dec 15, 2016 at 14:40 UTC
Regarding LanX's suggestion: Remove Tabs and Newlines Inside Fields of Text Tab Delimited Files from Excel. Untested for comma delimiters, but presumably you would just have to change all the `split` and `join` lines to use commas instead of tabs. Also, from another post: Do I have to trick Split? Using "-1" for the third `split` parameter fixed some `warnings` trailing empty data columns caused during the subsequent `join`. my @data = resolve_comma_delimited_file_line($CSVFILE, $this_line); # This subroutine accepts a filehandle and a line read from that fileh +andle as arguments given in that order. # If necessary it will modify the line that was passed to it (as if pa +ssed by reference) to resolve it, and return an array of the split da +ta. sub resolve_comma_delimited_file_line { my $fh = $_[0]; chomp($_[1]); # $_[1] being the read line passed in to this subrou +tine that is to be modified if necessary (as if passed by reference) my @data = split /,/, $_[1], -1; my $last_index = $#data; for (my $field_index=0; $field_index<$last_index; $field_index++) { if (($data[$field_index] =~ tr/"//) % 2 == 1) { splice @data, $field_index, 2, "$data[$field_index] $data[ +$field_index+1]"; $_[1] = join ",", @data; $last_index--; $field_index--; } } if (($data[$last_index] =~ tr/"//) % 2 == 1) { $_[1] .= " " . <$fh>; @data = &resolve_comma_delimited_file_line; } return @data; } [download] UPDATE: Deleted comments related to tabs since they wouldn't be relevant for the comma case. That being said, Text::CSV is still probably a better option because it saves you from having to "reinvent the wheel" so to speak, and probably is a lot more robust solution for the types of hiccups you may encounter in your file data. Just another Perl hooker - will code for food	[reply] [d/l] [select]


Think about Loose Coupling
	PerlMonks

Re: Embedded Newlines or Converting One-Liner to Loop

edit