Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Recommendations for breaking up string?

by jdlev (Scribe)
on Sep 16, 2015 at 18:34 UTC ( #1142225=perlquestion: print w/replies, xml ) Need Help??

jdlev has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks! I'm trying to do something pretty simple, but haven't worked with Perl in a while so was looking for some recommendations.

Basically, I have a string that looks like this:

"QB Carson Palmer RB Chris Ivory RB Eddie Lacy WR A.J. Green WR John Brown WR Davante Adams TE Martellus Bennett FLEX Jeremy Hill DST Panthers"

Then I have a database with columns QB, RB1, RB2, WR1, WR2, WR3, TE, FLEX, DST.

What methods would you guys use to insert the data into a database? Would you split each string using the split function, regex it, load it into an array, or use an entirely different method and then submit everything using DBI?

TIA :)
I love it when a program comes together - jdhannibal
  • Comment on Recommendations for breaking up string?

Replies are listed 'Best First'.
Re: Recommendations for breaking up string?
by GotToBTru (Prior) on Sep 16, 2015 at 18:58 UTC

    See this post for an idea that could be adapted to this. It's a simple approach to parse the line into a hash in position => name form. In fact, your application would be simpler. The hash could be used to create the DML.

    Update: maybe not simpler - the repetition of positions will cause my approach problems!

    Update 2:

    use strict; use warnings; use Data::Dumper; my $string = "QB Carson Palmer RB Chris Ivory RB Eddie Lacy WR A.J. Gr +een WR John Brown WR Davante Adams TE Martellus Bennett FLEX Jeremy H +ill DST Panthers"; my @words = split ' ',$string; my $i = 0; while ($i <= $#words) { if ($words[$i] =~ m/^[A-Z]{2,4}$/) { $words[$i++] .= ' ->'; } else { $words[$i-1] .= ' ' . splice @words,$i,1 } } print Dumper(\@words);

    Output:

    $VAR1 = [ 'QB -> Carson Palmer', 'RB -> Chris Ivory', 'RB -> Eddie Lacy', 'WR -> A.J. Green', 'WR -> John Brown', 'WR -> Davante Adams', 'TE -> Martellus Bennett', 'FLEX -> Jeremy Hill', 'DST -> Panthers' ];
    Dum Spiro Spero
Re: Recommendations for breaking up string? (sexeger)
by jeffa (Bishop) on Sep 16, 2015 at 23:57 UTC

    Fun problem! Here is my take. The bulk of the work is done via a regex, and since the target delimiters appear first, reversing the string makes the matching easier. I do not recommend this solution for production, but rather as a learning experience. It assumes the roles are uppercase characters between 2 and 4 chars in length.

    use strict; use warnings; my $str = scalar reverse "QB Carson Palmer RB Chris Ivory RB Eddie Lac +y WR A.J. Green WR John Brown WR Davante Adams TE Martellus Bennett F +LEX Jeremy Hill DST Panthers"; for ($str =~ /(.*?[A-Z][A-Z][A-Z]?[A-Z]?)/g) { $_ = scalar reverse $_; chomp( my ($role,@name) = split ); print "$role -> @name\n"; } __DATA__ DST -> Panthers FLEX -> Jeremy Hill TE -> Martellus Bennett WR -> Davante Adams WR -> John Brown WR -> A.J. Green RB -> Eddie Lacy RB -> Chris Ivory QB -> Carson Palmer

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Recommendations for breaking up string?
by trippledubs (Deacon) on Sep 16, 2015 at 22:24 UTC
    #!/usr/bin/env perl use strict; use warnings; use Data::Dump; my @positions = qw/QB RB WR WR WR TE FLEX DST/; my $str = 'QB Carson Palmer RB Chris Ivory RB Eddie Lacy WR A.J. Green + WR John Brown WR Davante Adams TE Martellus Bennett FLEX Jeremy Hill DST Panth +ers'; my $ds; my $currentPosition; for my $word (split /\s+/,$str) { if (my ($position) = grep { $word eq $_ } @positions) { $currentPosition = $position; push @{$ds->{$currentPosition}}, []; } else { push @{$ds->{$currentPosition}[-1]}, $word; } } dd $ds; { DST => [["Panthers"]], FLEX => [["Jeremy", "Hill"]], QB => [["Carson", "Palmer"]], RB => [["Chris", "Ivory"], ["Eddie", "Lacy"]], TE => [["Martellus", "Bennett"]], WR => [["A.J.", "Green"], ["John", "Brown"], ["Davante", "Adams"]] +, }
Re: Recommendations for breaking up string?
by Anonymous Monk on Sep 16, 2015 at 20:34 UTC

    I changed the DB schema - suggest you rethink yours. Anytime there are column names FOO1 FOO2 FOO3 that's a red flag of an awkward schema, however SQL Schema theory is beyond the scope of this reply.

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1142225 use strict; use warnings; use DBI; $_ = 'QB Carson Palmer RB Chris Ivory RB Eddie Lacy WR A.J. Green WR J +ohn Brown WR Davante Adams TE Martellus Bennett FLEX Jeremy Hill DST +Panthers'; my $dbfilename = 'db.1142225'; unlink $dbfilename; my $db = DBI->connect_cached("dbi:SQLite(RaiseError=>1):$dbfilename"); $db->do('create table players (team text, position text, player text)' +); my %dups = /\b([A-Z]{2,})\b(?=.*\b(\1)\b)/g; # insert counts if needed for my $position (keys %dups) { my $n = 1; s/\b$position\K\b/$n++/ge; } # make hash my %db = /\b([A-Z]{2,}\d*) (.*?)(?= [A-Z]{2}|$)/g; #use Data::Dump 'pp'; pp \%db; # remove team (this is a guess) my $team = delete $db{'DST'} or die "no team"; for my $pos (keys %db) { $db->do('insert into players values (?, ?, ?)', {}, $team, $pos, $db +{$pos}); }

    Here's the DB as dumped by sqlite db.1142225 .dump

    PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; CREATE TABLE players (team text, position text, player text); INSERT INTO "players" VALUES('Panthers','QB','Carson Palmer'); INSERT INTO "players" VALUES('Panthers','TE','Martellus Bennett'); INSERT INTO "players" VALUES('Panthers','WR2','John Brown'); INSERT INTO "players" VALUES('Panthers','RB1','Chris Ivory'); INSERT INTO "players" VALUES('Panthers','FLEX','Jeremy Hill'); INSERT INTO "players" VALUES('Panthers','WR3','Davante Adams'); INSERT INTO "players" VALUES('Panthers','WR1','A.J. Green'); INSERT INTO "players" VALUES('Panthers','RB2','Eddie Lacy'); COMMIT;

    DB Hint: come up with the several ways you will query the DB and that might help your DB schema (sample query: "get all RB". This is slightly awkward with this schema, how is it with the RB1 RB2 column name schema? )

Re: Recommendations for breaking up string?
by poj (Abbot) on Sep 17, 2015 at 07:53 UTC

    Build a regex and use placeholders.

    #!perl use strict; use DBI; use Data::Dump 'pp'; my $dbh = get_dbh(); # connect # field names my @fname = qw(QB RB1 RB2 WR1 WR2 WR3 TE FLEX DST); my $fld = join ',',@fname; # placeholders my $ph = '?'.(',?' x $#fname); # build regex from field names without numbers my $re = join '\s*(.*?)\s*',(map { s/\d// ; $_}@fname),'' ; # prepare my $sql = "INSERT INTO test_table (ID,$fld) VALUES (NULL, $ph)"; my $sth = $dbh->prepare($sql); # input data while (<DATA>){ if ( my @f = /$re$/ ){ $sth->execute(@f); } } # print results pp $dbh->selectall_arrayref('SELECT * FROM test_table'); # connect / create test table sub get_dbh{ my $database = "test"; my $user = "user"; my $pw = "password"; my $dsn = "dbi:mysql:$database:localhost:3306"; my $dbh = DBI->connect($dsn, $user, $pw, { RaiseError=>1, AutoCommit=>1 } ); #$dbh->do('drop table test_table'); $dbh->do(" CREATE TABLE IF NOT EXISTS test_table ( ID int(11) NOT NULL AUTO_INCREMENT, QB varchar(45) DEFAULT NULL, RB1 varchar(45) DEFAULT NULL, RB2 varchar(45) DEFAULT NULL, WR1 varchar(45) DEFAULT NULL, WR2 varchar(45) DEFAULT NULL, WR3 varchar(45) DEFAULT NULL, TE varchar(45) DEFAULT NULL, FLEX varchar(45) DEFAULT NULL, DST varchar(45) DEFAULT NULL, UNIQUE KEY ID_UNIQUE (ID) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;"); #$dbh->do('delete from test_table'); return $dbh; } __DATA__ QB Carson Palmer RB Chris Ivory RB Eddie Lacy WR A.J. Green WR John Br +own WR Davante Adams TE Martellus Bennett FLEX Jeremy Hill DST Panthe +rs
    poj
Re: Recommendations for breaking up string?
by sundialsvc4 (Abbot) on Sep 16, 2015 at 19:49 UTC

    A dreadful program that works ... for those of you who think I can’t write (bad ...) Perl code:

    use strict; use warnings; my $str = "QB Carson Palmer RB Chris Ivory RB Eddie Lacy WR A.J. Green + WR John Brown WR Davante Adams TE Martellus Bennett FLEX Jeremy Hill + DST Panthers"; my $key = undef; my $result = {}; for my $wd (split(/\b(QB|RB|WR|TE|FLEX|DST)\b/,$str)) { # As writ, "split()" will return the key followed by the string # for that key. We abuse "defined($key)" to separate the two. # Probably awful way to do it but it works. # '$result' has a hash-bucket for each key and an array-of-strings # for the content of the bucket. Perl's "auto-vivification" # makes this easy, if cryptic, to write. (If a bucket does not yet # exist, it magically appears.) # The following statement is a further smelly hack. # (The first string returned is empty.) next if $wd eq ""; print "wd $wd\n"; if (defined($key)) { push @{$result->{$key}}, $wd; $key = undef; } else { $key = $wd; } } foreach my $k (keys $result) { print "$k is: " . join(", ", @{$result->{$k}}) . "\n"; }

    It does not update a database, of course, and you will also notice that it stacks any number of (say ...) RB entries into a list, instead of distinguishing between RB1 and RB2 (or assuming there are only two).   Probably the only saving grace of it, if any, is the use of split with a particular regular-expression.

      A extractive variation. This approach assumes that a position ('QB', 'RB', etc.) cannot be confused with the second or any subsequent field of a player's name.

      c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $str = 'QB Carson Palmer RB Chris Ivory RB Eddie Lacy WR A. J. Gre +en ' . 'WR John Brown WR WR Grace WR Pele TE Billy Bob Bennett ' . 'FLEX A.P. Hill DST Panthers'; print qq{[[$str]]}; ;; my $result = {}; ;; my $position = qr{ \b (?: QB | RB | WR | TE | FLEX | DST) \b }xms; my $player = qr{ \S+ (?: \s+ \S+)*? }xms; ;; while ($str =~ m{ ($position) \s+ ($player) (?= \s+ $position | \z) } +xmsg) { ;; my ($posn, $name) = ($1, $2); ;; push @{$result->{$posn}}, $name; ;; } ;; dd $result; " [[QB Carson Palmer RB Chris Ivory RB Eddie Lacy WR A. J. Green WR John + Brown WR WR Grace WR Pele TE Billy Bob Bennett FLEX A.P. Hill +DST Panthers]] { DST => ["Panthers"], FLEX => ["A.P. Hill"], QB => ["Carson Palmer"], RB => ["Chris Ivory", "Eddie Lacy"], TE => ["Billy Bob Bennett"], WR => ["A. J. Green", "John Brown", "WR Grace", "Pele"], }
      The  $player regex could be refined to make it more discriminative of human names, but that's always tricky.


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1142225]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2022-05-19 08:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (71 votes). Check out past polls.

    Notices?