splinting a line of text by comma

benaw has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: splinting a line of text by comma by tybalt89 (Monsignor) on Nov 22, 2018 at 22:36 UTC
#!/usr/bin/perl # https://perlmonks.org/?node_id=1226189 use strict; use warnings; while( <DATA> ) { if( /\\|([^\|,](?:,[^\|,])+)\\|/ ) { print "$`\|$_\|$'" for split /,/, $1; } else { print; } } __DATA__ some text FOO \| some text BAR \| oh , no , commas \| some text BAZ \| so +me text QUX some text FOO \| some text BAR \| single \| some text BAZ \| some text QU +X [download] Outputs: `some text FOO \| some text BAR \| oh \| some text BAZ \| some text QUX some text FOO \| some text BAR \| no \| some text BAZ \| some text QUX some text FOO \| some text BAR \| commas \| some text BAZ \| some text QU +X some text FOO \| some text BAR \| single \| some text BAZ \| some text QU +X` [download]	[reply] [d/l] [select]
Re: splinting a line of text by comma by tybalt89 (Monsignor) on Nov 22, 2018 at 22:47 UTC
And here's one that handles multiple columns with commas: #!/usr/bin/perl # https://perlmonks.org/?node_id=1226189 use strict; use warnings; while( <DATA> ) { my @queue = $_; while( @queue ) { local $_ = shift @queue; if( /\\|([^\|,](?:,[^\|,])+)\\|/ ) { push @queue, map "$`\|$_\|$'", split /,/, $1; } else { print; } } } __DATA__ some text FOO \| some text BAR \| oh , no , commas \| some text BAZ \| so +me text QUX some text FOO \| some text BAR \| single \| some text BAZ \| some text QU +X some text FOO \| some text BAR \| oh , no , commas \| one, two \| some te +xt QUX [download] Outputs: some text FOO \| some text BAR \| oh \| some text BAZ \| some text QUX some text FOO \| some text BAR \| no \| some text BAZ \| some text QUX some text FOO \| some text BAR \| commas \| some text BAZ \| some text QU +X some text FOO \| some text BAR \| single \| some text BAZ \| some text QU +X some text FOO \| some text BAR \| oh \| one\| some text QUX some text FOO \| some text BAR \| oh \| two \| some text QUX some text FOO \| some text BAR \| no \| one\| some text QUX some text FOO \| some text BAR \| no \| two \| some text QUX some text FOO \| some text BAR \| commas \| one\| some text QUX some text FOO \| some text BAR \| commas \| two \| some text QUX [download]	[reply] [d/l] [select]
Re^2: splinting a line of text by comma (updated) by AnomalousMonk (Archbishop) on Nov 23, 2018 at 00:09 UTC
Rather than iterating through every permutation of every comma-separated sub-phrase, wouldn't it be better to print each set of sub-phrases once per line? Also, comma-separated sub-phrases at the beginning or end of the line will not be processed. How about something like: c:\@Work\Perl\monks>perl -wMstrict -le "my @lines = ( 'some text foo \| some text BAR \| oh , no , commas \| some text BAZ \| + some text QUX', 'some text Oof \| some, text, BAR \| some , text , BAZ \| some text QU +X', 'some, text, FOO \| some text BAR \| one two \| some , text , QUX', ); ;; my $got_commas = qr{ (?: (?<= [\|]) \| \A) [^\|,]* (?: , [^\|,])+ (?= [\|] \| \z) }xms; ;; for my $line (@lines) { while ($line =~ m{ ($got_commas) }xmsg) { my $before = substr $line, 0, $-[1]; my $after = substr $line, $+[1]; print qq{:$before$_$after:} for split m{ , }xms, $1; } } " :some text foo \| some text BAR \| oh \| some text BAZ \| some text QUX: :some text foo \| some text BAR \| no \| some text BAZ \| some text QUX: :some text foo \| some text BAR \| commas \| some text BAZ \| some text QU +X: :some text Oof \| some\| some , text , BAZ \| some text QUX: :some text Oof \| text\| some , text , BAZ \| some text QUX: :some text Oof \| BAR \| some , text , BAZ \| some text QUX: :some text Oof \| some, text, BAR \| some \| some text QUX: :some text Oof \| some, text, BAR \| text \| some text QUX: :some text Oof \| some, text, BAR \| BAZ \| some text QUX: :some\| some text BAR \| one two \| some , text , QUX: : text\| some text BAR \| one two \| some , text , QUX: : FOO \| some text BAR \| one two \| some , text , QUX: :some, text, FOO \| some text BAR \| one two \| some : :some, text, FOO \| some text BAR \| one two \| text : :some, text, FOO \| some text BAR \| one two \| QUX: [download] (Availability of `\K` with Perl versions 5.10+ can make the `(?: (?<= [\|]) \| \A)`) expression a little nicer.) Update:* An "impure" variation. No big improvement; maybe of some interest. c:\@Work\Perl\monks>perl -wMstrict -le "my @lines = ( 'some text foo \| some text BAR \| oh , no , commas \| some text BAZ \| + some text QUX', 'some text Oof \| some, text, BAR \| some , text , BAZ \| some text QU +X', 'some, text, FOO \| some text BAR \| one two \| some , text , QUX', ); ;; my $got_commas = qr{ (?: (?<= [\|]) \| \A) [^\|,]* (?: , [^\|,])+ (?= [\|] \| \z) }xms; ;; local our @out; use re 'eval'; m{ \A (.?) ($got_commas) (?= (.*)) (?{ push @out, map qq{>$1$_$3<}, split m{ , }xms, $2 }) (?!) }xmsg for @lines; ;; print for @out; " >some text foo \| some text BAR \| oh \| some text BAZ \| some text QUX< >some text foo \| some text BAR \| no \| some text BAZ \| some text QUX< >some text foo \| some text BAR \| commas \| some text BAZ \| some text QU +X< >some text Oof \| some\| some , text , BAZ \| some text QUX< >some text Oof \| text\| some , text , BAZ \| some text QUX< >some text Oof \| BAR \| some , text , BAZ \| some text QUX< >some text Oof \| some, text, BAR \| some \| some text QUX< >some text Oof \| some, text, BAR \| text \| some text QUX< >some text Oof \| some, text, BAR \| BAZ \| some text QUX< >some\| some text BAR \| one two \| some , text , QUX< > text\| some text BAR \| one two \| some , text , QUX< > FOO \| some text BAR \| one two \| some , text , QUX< >some, text, FOO \| some text BAR \| one two \| some < >some, text, FOO \| some text BAR \| one two \| text < >some, text, FOO \| some text BAR \| one two \| QUX< [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: splinting a line of text by comma by tybalt89 (Monsignor) on Nov 23, 2018 at 02:14 UTC
That is printing lines with commas. How about this with just a small tweak to allow commas in the first and/or last field. #!/usr/bin/perl # https://perlmonks.org/?node_id=1226189 use strict; use warnings; while( <DATA> ) { my @queue = $_; while( @queue ) { local $_ = shift @queue; if( / [^\|,\n]* (?: , [^\|,\n]* )+ /x ) { push @queue, map "$`$_$'", split /,/, $&; } else { print; } } } __DATA__ some text foo \| some text BAR \| oh , no , commas \| some text BAZ \| som +e text QUX some text Oof \| some, text, BAR \| some , text , BAZ \| some text QUX some,text,FOO \| some text BAR \| one two \| some , text , QUX [download] Outputs: some text foo \| some text BAR \| oh \| some text BAZ \| some text QUX some text foo \| some text BAR \| no \| some text BAZ \| some text QUX some text foo \| some text BAR \| commas \| some text BAZ \| some text QUX some text Oof \| some\| some \| some text QUX some text Oof \| some\| text \| some text QUX some text Oof \| some\| BAZ \| some text QUX some text Oof \| text\| some \| some text QUX some text Oof \| text\| text \| some text QUX some text Oof \| text\| BAZ \| some text QUX some text Oof \| BAR \| some \| some text QUX some text Oof \| BAR \| text \| some text QUX some text Oof \| BAR \| BAZ \| some text QUX some\| some text BAR \| one two \| some some\| some text BAR \| one two \| text some\| some text BAR \| one two \| QUX text\| some text BAR \| one two \| some text\| some text BAR \| one two \| text text\| some text BAR \| one two \| QUX FOO \| some text BAR \| one two \| some FOO \| some text BAR \| one two \| text FOO \| some text BAR \| one two \| QUX [download]	[reply] [d/l] [select]
Re^4: splinting a line of text by comma by AnomalousMonk (Archbishop) on Nov 23, 2018 at 04:11 UTC
Re^5: splinting a line of text by comma by poj (Abbot) on Nov 23, 2018 at 07:11 UTC
Re^5: splinting a line of text by comma by tybalt89 (Monsignor) on Nov 23, 2018 at 04:30 UTC
Re: splinting a line of text by comma by Tux (Canon) on Nov 23, 2018 at 07:57 UTC
Why would you overcomplicate this? Use Spreadsheet::Read (or Spreadsheet::ParseXLSX directly) to read your Excel sheet and then use Text::CSV_XS (or Text::CSV) on the content of the cell(s) you want to split Here is a tested example. The pm1226189.xlsx that I used has comma-separated fields that also include newlines and quotation. If your Excel also contains encoded fields, you probably have to deal with that inside the inner loop #!/pro/bin/perl use 5.14.1; use warnings; use Data::Peek; use Spreadsheet::Read; use Text::CSV_XS; my $book = Spreadsheet::Read->new ("pm1226189.xlsx") or die "Cannot read spreadsheet: $!\n"; my $sheet = $book->sheet (1) or die "Book has no sheets\n"; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whitespace => 1, }); my @data; foreach my $col (1 .. $sheet->maxcol) { foreach my $row (1 .. $sheet->maxrow) { my $cell = $sheet->cell ($col, $row); if ($cell && $cell =~ m/,/) { open my $fh, "<", \$cell; $data[$col][$row] = $csv->getline ($fh); close $fh; } else { $data[$col][$row] = $cell; } } } DDumper \@data; [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re: splinting a line of text by comma by swl (Parson) on Nov 22, 2018 at 22:31 UTC
This appears to work given the data you provided. However, it does not handle multiple fields that contain commas. It also does not handle quoted commas and the like. You would need to use Text::CSV or similar for that. Adapt the outer loop as needed (e.g. if reading from a file handle): `use 5.010; my @lines = ('some text FOO \| some text BAR \| oh , no , commas \| some text BAZ + \| some text QUX'); for my $line (@lines) { chomp $line; if ($line =~ /,/) { my @items = split /,/, $line; my $prefix = substr $items[0], 0, rindex $items[0], '\|'; my $suffix = substr $items[-1], index $items[-1], '\|'; #say $prefix; #say $suffix; foreach my $item (@items) { say "$prefix \| $item $suffix"; } } else { say $line; } }` [download]	[reply] [d/l]
Re: splinting a line of text by comma by stevieb (Canon) on Nov 22, 2018 at 22:18 UTC
Can't speak for everyone, but I'd like to see the code you've tried that isn't working. Although Perlmonks isn't as stringent in the whole "we're not a code writing service" as some other sites, and periodically no-code posts sometimes do get workable answers, it's only fair that you provide the code you have tried, so we do know you've made some form of effort. Note that you've got a typo in your Subject line: `s/splinting/splitting/`.	[reply] [d/l]
Re: splinting a line of text by comma by LanX (Saint) on Nov 23, 2018 at 03:00 UTC
not recommended: `use strict; use warnings; sub globify { local $" = ","; my @commas = map qq{"$_"}, split /,/, $_; @commas == 1 ? qq{"$_"} : "{@commas}"; } while (<DATA>) { print glob ( join '\|', map {globify} split /\\|/ ,$_ ); } __DATA__ some text FOO \| some text BAR \| oh , no , commas \| some text BAZ \| som +e text QUX some text FOO \| some text BAR \| single \| some text BAZ \| some text QUX` [download] `some text FOO \| some text BAR \| oh \| some text BAZ \| some text QUX some text FOO \| some text BAR \| no \| some text BAZ \| some text QUX some text FOO \| some text BAR \| commas \| some text BAZ \| some text QUX some text FOO \| some text BAR \| single \| some text BAZ \| some text QUX` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l] [select]
Re^2: splinting a line of text by comma by tybalt89 (Monsignor) on Nov 23, 2018 at 04:20 UTC
Still very much not recommended :) `#!/usr/bin/perl # https://perlmonks.org/?node_id=1226189 use strict; use warnings; print glob qq("$_") =~ s/[^\|\n]+/{$&}/gr while <DATA>; __DATA__ some text foo \| some text BAR \| oh , no , commas \| some text BAZ \| som +e text QUX some text Oof \| some, text, BAR \| some , text , BAZ \| some text QUX some,text,FOO \| some text BAR \| one two \| some , text , QUX` [download] Outputs: some text foo \| some text BAR \| oh \| some text BAZ \| some text QUX some text foo \| some text BAR \| no \| some text BAZ \| some text QUX some text foo \| some text BAR \| commas \| some text BAZ \| some text QUX some text Oof \| some\| some \| some text QUX some text Oof \| some\| text \| some text QUX some text Oof \| some\| BAZ \| some text QUX some text Oof \| text\| some \| some text QUX some text Oof \| text\| text \| some text QUX some text Oof \| text\| BAZ \| some text QUX some text Oof \| BAR \| some \| some text QUX some text Oof \| BAR \| text \| some text QUX some text Oof \| BAR \| BAZ \| some text QUX some\| some text BAR \| one two \| some some\| some text BAR \| one two \| text some\| some text BAR \| one two \| QUX text\| some text BAR \| one two \| some text\| some text BAR \| one two \| text text\| some text BAR \| one two \| QUX FOO \| some text BAR \| one two \| some FOO \| some text BAR \| one two \| text FOO \| some text BAR \| one two \| QUX [download]	[reply] [d/l] [select]
Re^3: splinting a line of text by comma by LanX (Saint) on Nov 23, 2018 at 17:02 UTC
yeah, better. I was already in bed half sleeping and was surprised about whitespace handling in glob. > Still very much not recommended :) Well if you find a way to escape curlies and other "wildcard characters" from the input... :) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply]
Re^4: splinting a line of text by comma by tybalt89 (Monsignor) on Nov 26, 2018 at 15:22 UTC
Re^5: splinting a line of text by comma by LanX (Saint) on Nov 26, 2018 at 15:27 UTC
Re: splinting a line of text by comma by kcott (Archbishop) on Nov 23, 2018 at 08:16 UTC
TMTOWTDI: `$ perl -nE ' chomp; my @x = split /\\|/; my ($y) = grep $x[$_] =~ /,/, 0..$#x; for (split /,/, $x[$y]) { splice @x, $y, 1, $_; say join "\|", @x; } ' a,b,c\|d\|e a\|d\|e b\|d\|e c\|d\|e a\|b,c,d\|e a\|b\|e a\|c\|e a\|d\|e a\|b\|c,d,e a\|b\|c a\|b\|d a\|b\|e` [download] Although, for dealing with spreadsheet data, Text::CSV would always be my first port of call. Update: The code I originally posted had: `my @z = split /,/, $x[$y]; for (@z) {` [download] I've reduced that to: `for (split /,/, $x[$y]) {` [download] The output is identical for both cases. — Ken	[reply] [d/l] [select]


Perl: the Markov chain saw
	PerlMonks