http://www.perlmonks.org?node_id=1069878

mrras25 has asked for the wisdom of the Perl Monks concerning the following question:

I have these types of lines that repeat :

$content = "Points/Game 80.5 Opp Points/Game 69.7 Avg Score Margin + +10.8 Opp Avg Score Margin -10.8 Assists/Game 16.5 Opp Assists/Game + 12.2 Total Rebounds/Gm 39.3 Opp Total Rebounds/Gm 36.7 Assists/FG +M 0.557 Opp Assists/FGM 0.472 Assists/Turnover 1.259 Opp Assists/T +urnover 0.712"; $content .= "Effective FG 51.0 Opp Effective FG 47.5 FTA/FGA 0.409 + Opp FTA/FGA 0.348 Free Throw 65.4 Opp Free Throw 64.6 Three Po +int 29.2 Opp Three Point 30.4 Two Point 53.6 Opp Two Point 48. +2 Shooting 47.3 Opp Shooting 43.5 Shooting Efficiency 1.080 Opp + Shooting Efficiency 1.007 FGM/Game 29.5 Opp FGM/Game 25.8 FGA/Gam +e 62.5 Opp FGA/Game 59.3 3PM/Game 4.7 Opp 3PM/Game 4.8 3PA/Game 1 +6.1 Opp 3PA/Game 15.9 FTM/Game 16.7 Opp FTM/Game 13.3 FTA/Game 25 +.5 Opp FTA/Game 20.6 1st Half Pts/Gm 39.8 Opp 1st Half Pts/Gm 33.6 2nd Half Pts/Gm 40.6 +Opp 2nd Half Pts/Gm 36.1 OT Pts/Gm -- Opp OT Pts/Gm -- "; $content .= "Off Rebounds/Gm 12.1 Opp Off Rebounds/Gm 10.8 Def Rebou +nds/Gm 23.2 Opp Def Rebounds/Gm 21.2 Off Rebound 36.3 Opp Off Reb +ound 31.7 Def Rebound 68.3 Opp Def Rebound 63.7 Blocks/Game 5.4 Opp Blocks/Game 4.7 Block 9.1 Opp Block 7.5 Steals/Game 10.1 Opp Steals/Game 7.7 Steals/Play 11.7 Opp Steals/Pl +ay 8.8 Turnovers/Game 13.1 Opp Turnovers/Game 17.1 Turnovers/Play 14.9 Opp + Turnovers/Play 19.8"; $content .= "Personal Fouls/Gm 18.5 Opp Personal Fouls/Gm 20.8 Perso +nal Fouls/Play 21.4 Opp Personal Fouls/Play 23.7";

And I would like to extract each one to be a word/Number key/value in a hash. Example:

Points/Game => 80.5, Opp Points/Game => 69.7, Avg Score Margin => +10.8

I have tried the following and it doesn't give me the desired output:

my %stats = $content =~ /(\w+\s+\d+)/;

Replies are listed 'Best First'.
Re: Words and Numbers to hash
by hippo (Bishop) on Jan 08, 2014 at 23:42 UTC

    No, it won't as I think will be obvious if you examine it. At the very least you should have keys with spaces in them which your regex won't cope with. Nor will it return the keys and values as separate fields.

    This appears to work fine for your sample input:

    my %stats = $content =~ /\s*([\w\s\/]+)\s+([\d.]+)/g;

      this worked for the most part however, the first value I am getting is:

      $VAR1 = 'Opp Avg Score Margin'; $VAR2 = '10.8'; ...

      Instead of all the stuff leading up to it:

      Points/Game 80.5 Opp Points/Game 69.7 Avg Score Margin 10.8

      Why would you think that would be?

        ... the first value I am getting ...

        You may be expecting the hash to preserve the order of the fields extracted from the string. It won't. The only 'order' in a hash is the key/value paring of each hash element. You must somehow impose your own order on top of this inherent order. Perhaps see Tie::IxHash.

Re: Words and Numbers to hash
by kcott (Archbishop) on Jan 09, 2014 at 05:04 UTC

    G'day mrras25,

    Firstly, there's two issues with how you generate $content.

    The individual strings have double-quotes which interpolate and will cause problems with embedded '$' or '@' characters: use single-quotes instead to avoid this.

    Any $content .= '...' line where the string starts with a digit will leave the start of that line indistinguishable from the value at the end of the previous line. Consider a situation where $content .= '3PA/Game 16.1 ...' follows $content .= '... Opp 3PM/Game 4.8': $content now contains '... Opp 3PM/Game 4.83PA/Game 16.1 ...'. A way around this is to get rid of every $content .= and just use a single join with a space: my $content = join ' ', '...', '...', '...';.

    Back to your question. This code will do what you want with the data you've shown here:

    my $re = qr{\s*(.+?)\s+(--|[+-]?\d+[.]\d+)}; my (%stats, @stat_order); while ($content =~ /$re/g) { push @stat_order, $1; $stats{$1} = $2; }

    Here's a complete script with all your input showing full output:

    Update: I made a small change to the part of the regex matching a potential leading sign: s{[+-]*}{[+-]?}

    -- Ken

Re: Words and Numbers to hash
by Laurent_R (Canon) on Jan 08, 2014 at 23:50 UTC
    This session under the Perl debugger should give you some ideas on where to go:
    DB<1> $_ = 'Points/Game 80.5 Opp Points/Game 69.7 Avg Score Margi +n +10.8 Opp Avg Score Margin -10.8 Assists/Game 16.5' DB<2> @score_val = /([\.\d]+)/g; DB<3> x @score_val 0 80.5 1 69.7 2 10.8 3 10.8 4 16.5 DB<4>
Re: Words and Numbers to hash
by simmisam (Novice) on Jan 09, 2014 at 01:52 UTC
    $content=~ s/\n\+//sg; $content =~ s/--/00.00/g; my @array1 = split (/(\+?-?\d+\.\d+)/,$content); my @array2; foreach(@array1) { $_ =~ s/^\s+//; push (@array2,$_); } my %hash = @array2; foreach (keys %hash) { $hash{$_} = "--" if ($hash{$_} =~ m/00\.00/); print "$_ => $hash{$_}\n"; }

    Try this code, it worked for me.