Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

How to Split on specific occurrence of a comma

by dirtdog (Monk)
on Dec 03, 2020 at 21:13 UTC ( #11124600=perlquestion: print w/replies, xml ) Need Help??

dirtdog has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm a little stuck and was hoping one of you could help get me out of the mud :-). I had a script that worked until we added over 999 values to it recently. I now have about 3,000 values separated by a comma on 1 line. Unfortunately, I can't change the way the data is provided. Ultimately, I need to get a max of 999 of them into an array or some other structure so I can then perform a query from the db with those values. i.e, select * from table_name where values in ( ). If I can just get chunks of 999 stuffed into an array I could then do a "for loop" until all values have been queried and added as a new worksheet to my excel spreadsheet. It would need to be dynamic too since the values could increase or decrease over time.

Below is a snippet of the code that used to work (until we broke the 999 threshold that Oracle allows in an "IN" clause by adding over 3k comma separated values in the input record.)

open INPUT_FILE, @row[$cell_column] or die "Can't open @row[$cell_colu +mn]: $!\n"; while ( $line = <INPUT_FILE> ) { chomp $line; if ($line =~ /^#/ || $line =~ /^\s*$/) { next; } $line =~ s/^[^=]*\=//; @users = split /,/, $line; } close INPUT_FILE; $placeholders = join ( ',', ('?') x @users ); my $SQLLock=qq{ select user,name from teams where user in ( $placeholders ) }; my $sthL = $dbh->prepare($SQLLock); $sthL->execute(@users); while ( my @row = $sthL->fetchrow_array() ) { print ">@row<\n"; for($cell_column=0; $cell_column<=$#row; $cell_column+ ++){ $worksheet->write_string($i, $cell_column, @r +ow[$cell_column],$format1); } $worksheet->set_row($i, undef, undef, 0, 0, 1); $i++; } $sthL->finish;

Here is my attempt to get the split function broken up by lines of 3. ( For the example I chose a small number because it's more managable. Just need to prove out the logic, but clearly the "limit" clause of the split isn't keeping 3 values per line. This does not work.

#!/usr/bin/env perl use DBI; use POSIX 'strftime'; use strict; use Data::Dumper; my $line; my @teams; while ( $line = <DATA> ) { chomp $line; if ($line =~ /^#/ || $line =~ /^\s*$/) { next; } $line =~ s/^[^=]*\=//; @teams = map { [ split /,/, $_, 3 ] } $line; } print Dumper \@teams; __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS

Would greatly appreciate any help or assistance that could be provided. Thanks!

Replies are listed 'Best First'.
Re: How to Split on specific occurrence of a comma
by GrandFather (Saint) on Dec 03, 2020 at 22:01 UTC

    Keep the split simple and use splice to slice up the entries into appropriate size chunks:

    #!/usr/bin/env perl use strict; use Data::Dumper; my $line; my @teams; my @pending; while ( $line = <DATA> ) { next if $line =~ /^#/ || $line =~ /^\s*$/; chomp $line; $line =~ s/^[^=]*\=//; push @pending, split /,/, $line; push @teams, [splice @pending, 0, 3] while @pending >= 3; } push @teams, \@pending if @pending; print Dumper \@teams; __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS Teams=STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS

    Prints:

    $VAR1 = [ [ 'PATRIOTS', 'BILLS', 'DOLPHINS' ], [ 'JETS', 'COWBOYS', 'GIANTS' ], [ 'EAGLES', 'REDSKINS', 'BENGALS' ], [ 'OILERS', 'STEELERS', 'BROWNS' ], [ 'SEAHAWKS', 'RAMS', '49ERS' ], [ 'RAIDERS' ] ];
    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: How to Split on specific occurrence of a comma
by LanX (Cardinal) on Dec 03, 2020 at 21:27 UTC
    Hi

    instead of limiting split , I'd rather swallow all splitted cells into a long @array and then loop over that @array with splice to chunk out whatever length you want.

    here a little demo in the debugger, looping over the alphabet in chunks of 10

    DB<208> @array = a..z DB<209> while ( my @b = splice @array, 0, 10 ) { say "@b" } a b c d e f g h i j k l m n o p q r s t u v w x y z DB<210>

    HTH! :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I think this approach should work. thanks!

        Your welcome! :)

        on a side note

        This

          $line =~ s/^[^=]*\=//;

        is IMHO better written as

          my ($var,$list) = split /=/, $line, 2;

        It's not only much easier to read and self-documenting, you'll also have a chance to check if $var eq "Teams"

        And if $var is of no interest you can also write

          my ( undef, $list ) = split /=/, $line;

        debugger demo

        DB<211> $line = "Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,E +AGLES,REDSKINS,BENGALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAI +DERS" DB<212> x ($var,$list) = split /=/, $line 0 'Teams' 1 'PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENGAL +S,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS'

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        ) the 2 in case you are worried that the character "=" appears in your list elements

Re: How to Split on specific occurrence of a comma
by eyepopslikeamosquito (Bishop) on Dec 03, 2020 at 23:26 UTC

    Your problem reminds of this old node: Rosetta code: Split an array into chunks

    Adapting that code to your problem shows how we can employ functions from List::MoreUtils to this problem.

    use strict; use warnings; use List::MoreUtils qw(part natatime); use Data::Dumper; # Load test DATA below into an array of teams (adapted from original n +ode) my $line; my @teams; while ( $line = <DATA> ) { chomp $line; if ($line =~ /^#/ || $line =~ /^\s*$/) { next; } $line =~ s/^[^=]*\=//; push @teams, split( /,/, $line ); } print Dumper \@teams; # ------------------------------------------------------ # Adapted from https://perlmonks.org/?node_id=861938 # Using natatime sub chunk_array { my ($n, @vals) = @_; my $str; my $iter = natatime($n, @vals); while ( my @line = $iter->() ) { $str .= join(",", @line) . ",\n"; } $str =~ s/,$//; return $str; } # Using part sub chunk_array_2 { my ($n, @vals) = @_; my $str; my $i = 0; $str = join "", map { join(",", @$_).",\n" } part { $i++/$n } @val +s; $str =~ s/,$//; return $str; } my $v1 = chunk_array(3, @teams); print "Version 1--------------\n"; print $v1; my $v2 = chunk_array_2(3, @teams); print "Version 2--------------\n"; print $v2; #--------------------------------------- __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS

    Running the test program above produces:

    $VAR1 = [ 'PATRIOTS', 'BILLS', 'DOLPHINS', 'JETS', 'COWBOYS', 'GIANTS', 'EAGLES', 'REDSKINS', 'BENGALS', 'OILERS', 'STEELERS', 'BROWNS', 'SEAHAWKS', 'RAMS', '49ERS', 'RAIDERS' ]; Version 1-------------- PATRIOTS,BILLS,DOLPHINS, JETS,COWBOYS,GIANTS, EAGLES,REDSKINS,BENGALS, OILERS,STEELERS,BROWNS, SEAHAWKS,RAMS,49ERS, RAIDERS Version 2-------------- PATRIOTS,BILLS,DOLPHINS, JETS,COWBOYS,GIANTS, EAGLES,REDSKINS,BENGALS, OILERS,STEELERS,BROWNS, SEAHAWKS,RAMS,49ERS, RAIDERS

Re: How to Split on specific occurrence of a comma
by NetWallah (Canon) on Dec 04, 2020 at 04:28 UTC
    IMHO - instead of adding the complexity of batching with artificial limits, my proposal is:

    use a temporary table to store the list of things you are looking for.
    Then let the DB do the work of intersecting the 2 tables.

    This is no longer subject to memory or list-size limits, and is likely better optimized by the db engine.

                    "Imaginary friends are a sign of a mental disorder if they cause distress, including antisocial behavior. Religion frequently meets that description"

Re: How to Split on specific occurrence of a comma
by tybalt89 (Prior) on Dec 03, 2020 at 22:08 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11124600 use warnings; my $line; my @teams; while ( $line = <DATA> ) { chomp $line; if ($line =~ /^#/ || $line =~ /^\s*$/) { next; } $line =~ s/^[^=]*\=//; push @teams, map [ split /,/ ], "$line," =~ /((?:.*?,){1,3})/gx; } use Data::Dump 'dd'; dd \@teams; __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS

    Outputs:

    [ ["PATRIOTS", "BILLS", "DOLPHINS"], ["JETS", "COWBOYS", "GIANTS"], ["EAGLES", "REDSKINS", "BENGALS"], ["OILERS", "STEELERS", "BROWNS"], ["SEAHAWKS", "RAMS", "49ERS"], ["RAIDERS"], ]
Re: How to Split on specific occurrence of a comma
by tybalt89 (Prior) on Dec 03, 2020 at 22:28 UTC

    Playing games with split :)

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11124600 use warnings; my $line; my @teams; while ( $line = <DATA> ) { chomp $line; if ($line =~ /^#/ || $line =~ /^\s*$/) { next; } $line =~ s/^[^=]*\=//; my $count = 3; while( length $line ) { push @teams, [ split /,/, $line, $count + 1 ]; $line = defined $teams[-1][$count] ? pop @{ $teams[-1] } : ''; } } use Data::Dump 'dd'; dd \@teams; __DATA__ Teams=PATRIOTS,BILLS,DOLPHINS,JETS,COWBOYS,GIANTS,EAGLES,REDSKINS,BENG +ALS,OILERS,STEELERS,BROWNS,SEAHAWKS,RAMS,49ERS,RAIDERS
Re: How to Split on specific occurrence of a comma
by swl (Priest) on Dec 03, 2020 at 21:35 UTC

    The limit arg to split lumps the rest of the string into the last array entry. If you want n items, then split with a limit of n+1 and ignore or discard the last entry in the array.

    Your code was also using a map to create an array ref, which is not needed here.

    This updated snippet from your example code will give an array of two items:

    @teams = split /,/, $line, 3; pop @teams;
Re: How to Split on specific occurrence of a comma
by jszinger (Scribe) on Dec 03, 2020 at 23:22 UTC
    Another approch might be to load all 1000+ users into a temporary table and then join against that.

      Thank you so much to everyone who replied to my question. All of you Monks are so generous with your time and with sharing your knowledge. You are extremely appreciated!

Re: How to Split on specific occurrence of a comma
by leszekdubiel (Scribe) on Dec 10, 2020 at 20:22 UTC

    Dont' build "IN" SQL clause. Build temporary table and join with that table

    SELECT * INTO #MyTempTable FROM (VALUES @{[ $placeholders ]}) AS Temp (Xxx); SELECT .... JOIN #MyTempTAble MTT ON MTT.Xxx = ... WHERE ... GROUP BY ...
Re: How to Split on specific occurrence of a comma
by perlfan (Vicar) on Dec 07, 2020 at 20:47 UTC
    until we added over 999 values to it recently. I now have about 3,000 values separated by a comma on 1 line.

    Your data model is broke.

      Very next sentence:"Unfortunately, I can't change the way the data is provided.". How does your response address either this issue or the question asked?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11124600]
Approved by marto
Front-paged by thomas895
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2021-01-25 18:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?