How to Split on specific occurrence of a comma

dirtdog has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm a little stuck and was hoping one of you could help get me out of the mud :-). I had a script that worked until we added over 999 values to it recently. I now have about 3,000 values separated by a comma on 1 line. Unfortunately, I can't change the way the data is provided. Ultimately, I need to get a max of 999 of them into an array or some other structure so I can then perform a query from the db with those values. i.e, select * from table_name where values in ( ). If I can just get chunks of 999 stuffed into an array I could then do a "for loop" until all values have been queried and added as a new worksheet to my excel spreadsheet. It would need to be dynamic too since the values could increase or decrease over time.

Below is a snippet of the code that used to work (until we broke the 999 threshold that Oracle allows in an "IN" clause by adding over 3k comma separated values in the input record.)

open INPUT_FILE, @row[$cell_column] or die "Can't open @row[$cell_colu
+mn]: $!\n";
                while ( $line = <INPUT_FILE> ) {
                        chomp $line;
                        if ($line =~ /^#/ || $line =~ /^\s*$/) {
                                next;
                        }
                        $line =~ s/^[^=]*\=//;
                        @users = split /,/, $line;
                        
                }
                close INPUT_FILE;

                $placeholders = join ( ',', ('?') x @users );

                my $SQLLock=qq{
                select user,name from teams
                where user in ( $placeholders )
                 };

                my $sthL = $dbh->prepare($SQLLock);
                $sthL->execute(@users);


       while ( my @row = $sthL->fetchrow_array() ) {
                print ">@row<\n";
                for($cell_column=0; $cell_column<=$#row; $cell_column+
++){
                        $worksheet->write_string($i, $cell_column,  @r
+ow[$cell_column],$format1);
                        }
                $worksheet->set_row($i, undef, undef, 0, 0, 1);
                $i++;
                }
        $sthL->finish;
[download]

Here is my attempt to get the split function broken up by lines of 3. ( For the example I chose a small number because it's more managable. Just need to prove out the logic, but clearly the "limit" clause of the split isn't keeping 3 values per line. This does not work.

#!/usr/bin/env perl
use DBI;
use POSIX 'strftime';
use strict;
use Data::Dumper;

my $line;
my @teams;


while ( $line = <DATA> ) {
                        chomp $line;
                        if ($line =~ /^#/ || $line =~ /^\s*$/) {
                                next;
                        }
                        $line =~ s/^[^=]*\=//;
                        @teams = map { [ split /,/, $_, 3 ] } $line;
                }
print Dumper \@teams;

__DATA__
Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG
+ALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS
[download]

Would greatly appreciate any help or assistance that could be provided. Thanks!

Comment on How to Split on specific occurrence of a comma Select or Download Code

Replies are listed 'Best First'.
Re: How to Split on specific occurrence of a comma by GrandFather (Saint) on Dec 03, 2020 at 22:01 UTC
Keep the split simple and use splice to slice up the entries into appropriate size chunks: `#!/usr/bin/env perl use strict; use Data::Dumper; my $line; my @teams; my @pending; while ( $line = <DATA> ) { next if $line =~ /^#/ \|\| $line =~ /^\s$/; chomp $line; $line =~ s/^[^=]\=//; push @pending, split /,/, $line; push @teams, [splice @pending, 0, 3] while @pending >= 3; } push @teams, \@pending if @pending; print Dumper \@teams; __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS Teams=STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS` [download] Prints: `$VAR1 = [ [ 'PATRIOTS', 'BILLS', 'DOLPHINS' ], [ 'JETS', 'COWBOYS', 'GIANTS' ], [ 'EAGLES', 'REDSKINS', 'BENGALS' ], [ 'OILERS', 'STEELERS', 'BROWNS' ], [ 'SEAHAWKS', 'RAMS', '49ERS' ], [ 'RAIDERS' ] ];` [download] Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply] [d/l] [select]
Re: How to Split on specific occurrence of a comma by LanX (Saint) on Dec 03, 2020 at 21:27 UTC
Hi instead of limiting `split` , I'd rather swallow all splitted cells into a long `@array` and then loop over that `@array` with `splice` to chunk out whatever length you want. here a little demo in the debugger, looping over the alphabet in chunks of 10 `DB<208> @array = a..z DB<209> while ( my @b = splice @array, 0, 10 ) { say "@b" } a b c d e f g h i j k l m n o p q r s t u v w x y z DB<210>` [download] HTH! :) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^2: How to Split on specific occurrence of a comma by dirtdog (Monk) on Dec 03, 2020 at 22:07 UTC
I think this approach should work. thanks!	[reply]
Re^3: How to Split on specific occurrence of a comma by LanX (Saint) on Dec 03, 2020 at 22:34 UTC
Your welcome! :) on a side note This `$line =~ s/^[^=]*\=//;` is IMHO better written as `my ($var,$list) = split /=/, $line, 2;` ° It's not only much easier to read and self-documenting, you'll also have a chance to check if `$var eq "Teams"` And if $var is of no interest you can also write `my ( undef, $list ) = split /=/, $line;` debugger demo `DB<211> $line = "Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,E +AGLES,REDSKINS,BENGALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAI +DERS" DB<212> x ($var,$list) = split /=/, $line 0 'Teams' 1 'PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENGAL +S,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS'` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} °) the 2 in case you are worried that the character `"="` appears in your list elements	[reply] [d/l] [select]
Re: How to Split on specific occurrence of a comma by eyepopslikeamosquito (Archbishop) on Dec 03, 2020 at 23:26 UTC
Your problem reminds of this old node: Rosetta code: Split an array into chunks Adapting that code to your problem shows how we can employ functions from List::MoreUtils to this problem. use strict; use warnings; use List::MoreUtils qw(part natatime); use Data::Dumper; # Load test DATA below into an array of teams (adapted from original n +ode) my $line; my @teams; while ( $line = <DATA> ) { chomp $line; if ($line =~ /^#/ \|\| $line =~ /^\s$/) { next; } $line =~ s/^[^=]\=//; push @teams, split( /,/, $line ); } print Dumper \@teams; # ------------------------------------------------------ # Adapted from https://perlmonks.org/?node_id=861938 # Using natatime sub chunk_array { my ($n, @vals) = @_; my $str; my $iter = natatime($n, @vals); while ( my @line = $iter->() ) { $str .= join(",", @line) . ",\n"; } $str =~ s/,$//; return $str; } # Using part sub chunk_array_2 { my ($n, @vals) = @_; my $str; my $i = 0; $str = join "", map { join(",", @$_).",\n" } part { $i++/$n } @val +s; $str =~ s/,$//; return $str; } my $v1 = chunk_array(3, @teams); print "Version 1--------------\n"; print $v1; my $v2 = chunk_array_2(3, @teams); print "Version 2--------------\n"; print $v2; #--------------------------------------- __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS [download] Running the test program above produces: `$VAR1 = [ 'PATRIOTS', 'BILLS', 'DOLPHINS', 'JETS', 'COWBOYS', 'GIANTS', 'EAGLES', 'REDSKINS', 'BENGALS', 'OILERS', 'STEELERS', 'BROWNS', 'SEAHAWKS', 'RAMS', '49ERS', 'RAIDERS' ]; Version 1-------------- PATRIOTS,BILLS,DOLPHINS, JETS,COWBOYS,GIANTS, EAGLES,REDSKINS,BENGALS, OILERS,STEELERS,BROWNS, SEAHAWKS,RAMS,49ERS, RAIDERS Version 2-------------- PATRIOTS,BILLS,DOLPHINS, JETS,COWBOYS,GIANTS, EAGLES,REDSKINS,BENGALS, OILERS,STEELERS,BROWNS, SEAHAWKS,RAMS,49ERS, RAIDERS` [download] References Added Later Scalar-List-Utils by Paul Evans List::Util (core Perl module) Scalar::Util (core Perl module) Sub::Util (core Perl module) List::UtilsBy by Paul Evans List::MoreUtils by Jens Rehsack (included with Strawberry Perl) List::SomeUtils by Dave Rolsky List::AllUtils by Dave Rolsky	[reply] [d/l] [select]
Re: How to Split on specific occurrence of a comma by NetWallah (Canon) on Dec 04, 2020 at 04:28 UTC
IMHO - instead of adding the complexity of batching with artificial limits, my proposal is: use a temporary table to store the list of things you are looking for. Then let the DB do the work of intersecting the 2 tables. This is no longer subject to memory or list-size limits, and is likely better optimized by the db engine. "Imaginary friends are a sign of a mental disorder if they cause distress, including antisocial behavior. Religion frequently meets that description"	[reply]
Re: How to Split on specific occurrence of a comma by tybalt89 (Monsignor) on Dec 03, 2020 at 22:08 UTC
`#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11124600 use warnings; my $line; my @teams; while ( $line = <DATA> ) { chomp $line; if ($line =~ /^#/ \|\| $line =~ /^\s$/) { next; } $line =~ s/^[^=]\=//; push @teams, map [ split /,/ ], "$line," =~ /((?:.*?,){1,3})/gx; } use Data::Dump 'dd'; dd \@teams; __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS` [download] Outputs: `[ ["PATRIOTS", "BILLS", "DOLPHINS"], ["JETS", "COWBOYS", "GIANTS"], ["EAGLES", "REDSKINS", "BENGALS"], ["OILERS", "STEELERS", "BROWNS"], ["SEAHAWKS", "RAMS", "49ERS"], ["RAIDERS"], ]` [download]	[reply] [d/l] [select]
Re: How to Split on specific occurrence of a comma by tybalt89 (Monsignor) on Dec 03, 2020 at 22:28 UTC
Playing games with split :) #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11124600 use warnings; my $line; my @teams; while ( $line = <DATA> ) { chomp $line; if ($line =~ /^#/ \|\| $line =~ /^\s$/) { next; } $line =~ s/^[^=]\=//; my $count = 3; while( length $line ) { push @teams, [ split /,/, $line, $count + 1 ]; $line = defined $teams[-1][$count] ? pop @{ $teams[-1] } : ''; } } use Data::Dump 'dd'; dd \@teams; __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS [download]	[reply] [d/l]
Re: How to Split on specific occurrence of a comma by swl (Parson) on Dec 03, 2020 at 21:35 UTC
The limit arg to split lumps the rest of the string into the last array entry. If you want n items, then split with a limit of n+1 and ignore or discard the last entry in the array. Your code was also using a map to create an array ref, which is not needed here. This updated snippet from your example code will give an array of two items: `@teams = split /,/, $line, 3; pop @teams;` [download]	[reply] [d/l]
Re: How to Split on specific occurrence of a comma by jszinger (Scribe) on Dec 03, 2020 at 23:22 UTC
Another approch might be to load all 1000+ users into a temporary table and then join against that.	[reply]
Re^2: How to Split on specific occurrence of a comma by dirtdog (Monk) on Dec 04, 2020 at 16:14 UTC
Thank you so much to everyone who replied to my question. All of you Monks are so generous with your time and with sharing your knowledge. You are extremely appreciated!	[reply]
Re: How to Split on specific occurrence of a comma by leszekdubiel (Scribe) on Dec 10, 2020 at 20:22 UTC
Dont' build "IN" SQL clause. Build temporary table and join with that table `SELECT * INTO #MyTempTable FROM (VALUES @{[ $placeholders ]}) AS Temp (Xxx); SELECT .... JOIN #MyTempTAble MTT ON MTT.Xxx = ... WHERE ... GROUP BY ...` [download]	[reply] [d/l]
Re: How to Split on specific occurrence of a comma by perlfan (Vicar) on Dec 07, 2020 at 20:47 UTC
until we added over 999 values to it recently. I now have about 3,000 values separated by a comma on 1 line. Your data model is broke.	[reply]
Re^2: How to Split on specific occurrence of a comma by marto (Cardinal) on Dec 07, 2020 at 22:55 UTC
Very next sentence:"Unfortunately, I can't change the way the data is provided.". How does your response address either this issue or the question asked?	[reply]

Back to Seekers of Perl Wisdom