Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

splinting a line of text by comma

by benaw (Novice)
on Nov 22, 2018 at 22:04 UTC ( [id://1226189]=perlquestion: print w/replies, xml ) Need Help??

benaw has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

Background: I have an excel spreadsheet and one of the columns has multiple values separated with a comma like an array. What I want to do is unwind the array so each value of the array is on a separate line and all other fields are duplicated or carried with it mongo documentation of $unwind does what I'd like or it may be a map reduce?

In the below example | is the delimiter in a CSV because I'm trying to target the commas in a particular cell and placing all the same text on each line. For example:

Source:

some text FOO | some text BAR | oh , no , commas | some text BAZ | some text QUX

Output:

some text FOO | some text BAR | oh | some text BAZ | some text QUX some text FOO | some text BAR | no | some text BAZ | some text QUX some text FOO | some text BAR |commas | some text BAZ | some text QUX

Replies are listed 'Best First'.
Re: splinting a line of text by comma
by tybalt89 (Monsignor) on Nov 22, 2018 at 22:36 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1226189 use strict; use warnings; while( <DATA> ) { if( /\|([^|,]*(?:,[^|,]*)+)\|/ ) { print "$`|$_|$'" for split /,/, $1; } else { print; } } __DATA__ some text FOO | some text BAR | oh , no , commas | some text BAZ | so +me text QUX some text FOO | some text BAR | single | some text BAZ | some text QU +X

    Outputs:

    some text FOO | some text BAR | oh | some text BAZ | some text QUX some text FOO | some text BAR | no | some text BAZ | some text QUX some text FOO | some text BAR | commas | some text BAZ | some text QU +X some text FOO | some text BAR | single | some text BAZ | some text QU +X
Re: splinting a line of text by comma
by tybalt89 (Monsignor) on Nov 22, 2018 at 22:47 UTC

    And here's one that handles multiple columns with commas:

    #!/usr/bin/perl # https://perlmonks.org/?node_id=1226189 use strict; use warnings; while( <DATA> ) { my @queue = $_; while( @queue ) { local $_ = shift @queue; if( /\|([^|,]*(?:,[^|,]*)+)\|/ ) { push @queue, map "$`|$_|$'", split /,/, $1; } else { print; } } } __DATA__ some text FOO | some text BAR | oh , no , commas | some text BAZ | so +me text QUX some text FOO | some text BAR | single | some text BAZ | some text QU +X some text FOO | some text BAR | oh , no , commas | one, two | some te +xt QUX

    Outputs:

    some text FOO | some text BAR | oh | some text BAZ | some text QUX some text FOO | some text BAR | no | some text BAZ | some text QUX some text FOO | some text BAR | commas | some text BAZ | some text QU +X some text FOO | some text BAR | single | some text BAZ | some text QU +X some text FOO | some text BAR | oh | one| some text QUX some text FOO | some text BAR | oh | two | some text QUX some text FOO | some text BAR | no | one| some text QUX some text FOO | some text BAR | no | two | some text QUX some text FOO | some text BAR | commas | one| some text QUX some text FOO | some text BAR | commas | two | some text QUX

      Rather than iterating through every permutation of every comma-separated sub-phrase, wouldn't it be better to print each set of sub-phrases once per line? Also, comma-separated sub-phrases at the beginning or end of the line will not be processed. How about something like:

      c:\@Work\Perl\monks>perl -wMstrict -le "my @lines = ( 'some text foo | some text BAR | oh , no , commas | some text BAZ | + some text QUX', 'some text Oof | some, text, BAR | some , text , BAZ | some text QU +X', 'some, text, FOO | some text BAR | one two | some , text , QUX', ); ;; my $got_commas = qr{ (?: (?<= [|]) | \A) [^|,]* (?: , [^|,]*)+ (?= [|] | \z) }xms; ;; for my $line (@lines) { while ($line =~ m{ ($got_commas) }xmsg) { my $before = substr $line, 0, $-[1]; my $after = substr $line, $+[1]; print qq{:$before$_$after:} for split m{ , }xms, $1; } } " :some text foo | some text BAR | oh | some text BAZ | some text QUX: :some text foo | some text BAR | no | some text BAZ | some text QUX: :some text foo | some text BAR | commas | some text BAZ | some text QU +X: :some text Oof | some| some , text , BAZ | some text QUX: :some text Oof | text| some , text , BAZ | some text QUX: :some text Oof | BAR | some , text , BAZ | some text QUX: :some text Oof | some, text, BAR | some | some text QUX: :some text Oof | some, text, BAR | text | some text QUX: :some text Oof | some, text, BAR | BAZ | some text QUX: :some| some text BAR | one two | some , text , QUX: : text| some text BAR | one two | some , text , QUX: : FOO | some text BAR | one two | some , text , QUX: :some, text, FOO | some text BAR | one two | some : :some, text, FOO | some text BAR | one two | text : :some, text, FOO | some text BAR | one two | QUX:
      (Availability of  \K with Perl versions 5.10+ can make the  (?: (?<= [|]) | \A)) expression a little nicer.)

      Update: An "impure" variation. No big improvement; maybe of some interest.

      c:\@Work\Perl\monks>perl -wMstrict -le "my @lines = ( 'some text foo | some text BAR | oh , no , commas | some text BAZ | + some text QUX', 'some text Oof | some, text, BAR | some , text , BAZ | some text QU +X', 'some, text, FOO | some text BAR | one two | some , text , QUX', ); ;; my $got_commas = qr{ (?: (?<= [|]) | \A) [^|,]* (?: , [^|,]*)+ (?= [|] | \z) }xms; ;; local our @out; use re 'eval'; m{ \A (.*?) ($got_commas) (?= (.*)) (?{ push @out, map qq{>$1$_$3<}, split m{ , }xms, $2 }) (?!) }xmsg for @lines; ;; print for @out; " >some text foo | some text BAR | oh | some text BAZ | some text QUX< >some text foo | some text BAR | no | some text BAZ | some text QUX< >some text foo | some text BAR | commas | some text BAZ | some text QU +X< >some text Oof | some| some , text , BAZ | some text QUX< >some text Oof | text| some , text , BAZ | some text QUX< >some text Oof | BAR | some , text , BAZ | some text QUX< >some text Oof | some, text, BAR | some | some text QUX< >some text Oof | some, text, BAR | text | some text QUX< >some text Oof | some, text, BAR | BAZ | some text QUX< >some| some text BAR | one two | some , text , QUX< > text| some text BAR | one two | some , text , QUX< > FOO | some text BAR | one two | some , text , QUX< >some, text, FOO | some text BAR | one two | some < >some, text, FOO | some text BAR | one two | text < >some, text, FOO | some text BAR | one two | QUX<


      Give a man a fish:  <%-{-{-{-<

        That is printing lines with commas. How about this with just a small tweak to allow commas in the first and/or last field.

        #!/usr/bin/perl # https://perlmonks.org/?node_id=1226189 use strict; use warnings; while( <DATA> ) { my @queue = $_; while( @queue ) { local $_ = shift @queue; if( / [^|,\n]* (?: , [^|,\n]* )+ /x ) { push @queue, map "$`$_$'", split /,/, $&; } else { print; } } } __DATA__ some text foo | some text BAR | oh , no , commas | some text BAZ | som +e text QUX some text Oof | some, text, BAR | some , text , BAZ | some text QUX some,text,FOO | some text BAR | one two | some , text , QUX

        Outputs:

        some text foo | some text BAR | oh | some text BAZ | some text QUX some text foo | some text BAR | no | some text BAZ | some text QUX some text foo | some text BAR | commas | some text BAZ | some text QUX some text Oof | some| some | some text QUX some text Oof | some| text | some text QUX some text Oof | some| BAZ | some text QUX some text Oof | text| some | some text QUX some text Oof | text| text | some text QUX some text Oof | text| BAZ | some text QUX some text Oof | BAR | some | some text QUX some text Oof | BAR | text | some text QUX some text Oof | BAR | BAZ | some text QUX some| some text BAR | one two | some some| some text BAR | one two | text some| some text BAR | one two | QUX text| some text BAR | one two | some text| some text BAR | one two | text text| some text BAR | one two | QUX FOO | some text BAR | one two | some FOO | some text BAR | one two | text FOO | some text BAR | one two | QUX
Re: splinting a line of text by comma
by Tux (Canon) on Nov 23, 2018 at 07:57 UTC

    Why would you overcomplicate this? Use Spreadsheet::Read (or Spreadsheet::ParseXLSX directly) to read your Excel sheet and then use Text::CSV_XS (or Text::CSV) on the content of the cell(s) you want to split

    Here is a tested example. The pm1226189.xlsx that I used has comma-separated fields that also include newlines and quotation. If your Excel also contains encoded fields, you probably have to deal with that inside the inner loop

    #!/pro/bin/perl use 5.14.1; use warnings; use Data::Peek; use Spreadsheet::Read; use Text::CSV_XS; my $book = Spreadsheet::Read->new ("pm1226189.xlsx") or die "Cannot read spreadsheet: $!\n"; my $sheet = $book->sheet (1) or die "Book has no sheets\n"; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_whitespace => 1, }); my @data; foreach my $col (1 .. $sheet->maxcol) { foreach my $row (1 .. $sheet->maxrow) { my $cell = $sheet->cell ($col, $row); if ($cell && $cell =~ m/,/) { open my $fh, "<", \$cell; $data[$col][$row] = $csv->getline ($fh); close $fh; } else { $data[$col][$row] = $cell; } } } DDumper \@data;

    Enjoy, Have FUN! H.Merijn
Re: splinting a line of text by comma
by swl (Parson) on Nov 22, 2018 at 22:31 UTC

    This appears to work given the data you provided. However, it does not handle multiple fields that contain commas. It also does not handle quoted commas and the like. You would need to use Text::CSV or similar for that.

    Adapt the outer loop as needed (e.g. if reading from a file handle):

    use 5.010; my @lines = ('some text FOO | some text BAR | oh , no , commas | some text BAZ + | some text QUX'); for my $line (@lines) { chomp $line; if ($line =~ /,/) { my @items = split /,/, $line; my $prefix = substr $items[0], 0, rindex $items[0], '|'; my $suffix = substr $items[-1], index $items[-1], '|'; #say $prefix; #say $suffix; foreach my $item (@items) { say "$prefix | $item $suffix"; } } else { say $line; } }
Re: splinting a line of text by comma
by stevieb (Canon) on Nov 22, 2018 at 22:18 UTC

    Can't speak for everyone, but I'd like to see the code you've tried that isn't working.

    Although Perlmonks isn't as stringent in the whole "we're not a code writing service" as some other sites, and periodically no-code posts sometimes do get workable answers, it's only fair that you provide the code you have tried, so we do know you've made some form of effort.

    Note that you've got a typo in your Subject line: s/splinting/splitting/.

Re: splinting a line of text by comma
by LanX (Saint) on Nov 23, 2018 at 03:00 UTC
    not recommended:

    use strict; use warnings; sub globify { local $" = ","; my @commas = map qq{"$_"}, split /,/, $_; @commas == 1 ? qq{"$_"} : "{@commas}"; } while (<DATA>) { print glob ( join '|', map {globify} split /\|/ ,$_ ); } __DATA__ some text FOO | some text BAR | oh , no , commas | some text BAZ | som +e text QUX some text FOO | some text BAR | single | some text BAZ | some text QUX
    some text FOO | some text BAR | oh | some text BAZ | some text QUX some text FOO | some text BAR | no | some text BAZ | some text QUX some text FOO | some text BAR | commas | some text BAZ | some text QUX some text FOO | some text BAR | single | some text BAZ | some text QUX

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Still very much not recommended :)

      #!/usr/bin/perl # https://perlmonks.org/?node_id=1226189 use strict; use warnings; print glob qq("$_") =~ s/[^|\n]+/{$&}/gr while <DATA>; __DATA__ some text foo | some text BAR | oh , no , commas | some text BAZ | som +e text QUX some text Oof | some, text, BAR | some , text , BAZ | some text QUX some,text,FOO | some text BAR | one two | some , text , QUX

      Outputs:

      some text foo | some text BAR | oh | some text BAZ | some text QUX some text foo | some text BAR | no | some text BAZ | some text QUX some text foo | some text BAR | commas | some text BAZ | some text QUX some text Oof | some| some | some text QUX some text Oof | some| text | some text QUX some text Oof | some| BAZ | some text QUX some text Oof | text| some | some text QUX some text Oof | text| text | some text QUX some text Oof | text| BAZ | some text QUX some text Oof | BAR | some | some text QUX some text Oof | BAR | text | some text QUX some text Oof | BAR | BAZ | some text QUX some| some text BAR | one two | some some| some text BAR | one two | text some| some text BAR | one two | QUX text| some text BAR | one two | some text| some text BAR | one two | text text| some text BAR | one two | QUX FOO | some text BAR | one two | some FOO | some text BAR | one two | text FOO | some text BAR | one two | QUX
        yeah, better.

        I was already in bed half sleeping and was surprised about whitespace handling in glob.

        > Still very much not recommended :)

        Well if you find a way to escape curlies and other "wildcard characters" from the input... :)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: splinting a line of text by comma
by kcott (Archbishop) on Nov 23, 2018 at 08:16 UTC

    TMTOWTDI:

    $ perl -nE ' chomp; my @x = split /\|/; my ($y) = grep $x[$_] =~ /,/, 0..$#x; for (split /,/, $x[$y]) { splice @x, $y, 1, $_; say join "|", @x; } ' a,b,c|d|e a|d|e b|d|e c|d|e a|b,c,d|e a|b|e a|c|e a|d|e a|b|c,d,e a|b|c a|b|d a|b|e

    Although, for dealing with spreadsheet data, Text::CSV would always be my first port of call.

    Update: The code I originally posted had:

    my @z = split /,/, $x[$y]; for (@z) {

    I've reduced that to:

    for (split /,/, $x[$y]) {

    The output is identical for both cases.

    — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1226189]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-19 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found