http://www.perlmonks.org?node_id=696603


in reply to Parsing a Variable Format String

Combine regular expression with a dispatch table. For each regular expression you define what action should take place. In this case, the actions will add values to a hash.

Parsing is performed from left to right. As matches are found the matched text is removed from the input string. Parsing stops when no patterns match or the input string is reduced to the empty string.

use Data::Dumper; my %patterns = ( qr/SS\s+(\d+)/ => sub { $_->{SS} = $1 }, qr/PL\s+([=#]?)(\d+)([=#])(\d+)/ => sub { $_->{SP} = $2; $_->{LP} = $4 }, qr/PV\s+([a-z]?\d+)(\.\d+)?/ => sub { }, # to be written... qr/RL\s+(\d+)([,'"])/ => sub { $_->{RL} = $1; }, qr/SA\s+(\d+)/ => sub { $_->{SA} = $1 }, qr/DS\s+(\d+)/ => sub { $_->{DS} = $1 }, qr/CT\s+([#^]?)\s*(\d+)\s+(\S+)\s+(\S+)/, => sub { }, qr/CL\s+([#^]?)(\d+)\s+(\S+)/, => sub { }, ); sub MAGIC { my $line = $_[0]; $_ = {}; $line =~ s/^\s*//; LOOP: while (length($line)) { for my $k (keys %patterns) { if ($line =~ s/^$k(?:\s+|\z)//) { $patterns{$k}->(); next LOOP; } } warn "got stuck on: $line\n"; $_->{leftover} = $line; last; } $_; } print Dumper( MAGIC("SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 1 +06 DS 93")); __END__ $VAR1 = { 'SA' => '106', 'DS' => '93', 'RL' => '126', 'SP' => '2', 'SS' => '21', 'LP' => '3' };

Replies are listed 'Best First'.
Re^2: Parsing a Variable Format String
by ozboomer (Friar) on Jul 10, 2008 at 06:13 UTC
    OOooo, that's certainly elegant :)

    I've been trying to get things happening... with some success.

    My regex is still woeful... but I have a vague idea of what's going on here.. I still can't get the CT/CL items "usable" though, viz:
    use Data::Dumper; @opts = ( "SS 21M- PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93", "SS 21 + PL 2.3 PVa51.3 CT^ 110 +0 RL126, SA 106 DS 93", "SS 21F PL2#3# PV 51.3 CL #110 +0 RL 126' SA 106 DS 93" +, "SS 21#+ PL2.3# PV 51.3 CL 110 -13 RL 126' SA 106 DS 93 +", "SS 32 - PL2.3# PV 51.3 CL #110 +0 RL 126' SA 106 DS 93" ); my ($ss, $mstate, $mvote); my %patterns = ( qr/SS\s+(\d+)([MF# ])([\+\- ])/ => sub { $ss = $1; $mstate = $2; $mvote = $3; }, # qr/PL\s+([=#]?)(\d+)([=#]?)(\d+)/ # qr/PL\s+(\d+)([\.\#])(\d+)([\.\#])/ qr/PL\s*(\d+)([\#\.])(\d+)([\#\.]?)/ => sub { $sp = $1; $spl = $2; $spl = "" if ($spl ne "#"); $lp = $3; $lpl = $4; $lpl = "" if ($lpl ne "#"); }, qr/PV\s*([a-z]?\d+)(\.)(\d+)/ => sub { $vsp = $1; $vlp = $3 }, qr/RL\s+(\d+)([,'"]?)/ => sub { $rl = $1; $rlflg = $2; }, qr/SA\s+(\d+)/ => sub { $csa = $1 }, qr/DS\s+(\d+)/ => sub { $cds = $1 }, qr/CT\s+([#^]?)\s*(\d+)\s+(\S+)\s+(\S+)/, => sub { }, qr/CL\s+([#^]?)(\d+)\s+(\S+)/, => sub { }, ); sub MAGIC { my $line = $_[0]; $_ = {}; $line =~ s/^\s*//; printf("Input: >%s<\n", $line); LOOP: while (length($line)) { for my $k (keys %patterns) { if ($line =~ s/^$k(?:\s+|\z)//) { $patterns{$k}->(); next LOOP; } } warn "got stuck on: $line\n"; $_->{leftover} = $line; last; } printf(" \$ss = $ss\n"); printf("\$mstate = >$mstate<\n"); printf(" \$mvote = >$mvote<\n"); printf(" \$sp = $sp\n"); printf(" \$spl = $spl\n"); printf(" \$lp = $lp\n"); printf(" \$lpl = $lpl\n"); printf(" \$ctl = ...\n"); printf(" \$ct = ...\n"); printf(" \$ctr = ...\n"); printf(" \$vsp = $vsp\n"); printf(" \$vlp = $vlp\n"); printf(" \$rl = $rl\n"); printf(" \$rlflg = $rlflg\n"); printf(" \$csa = $csa\n"); printf(" \$cds = $cds\n"); $_; } foreach $buf (@opts) { printf("START -----\n"); MAGIC($buf); # print Dumper( MAGIC($buf)); printf(" \n"); }
    For example, for the 4th item in @opts, I would like to capture the info that:

    Buf: "SS 21#+ PL2.3# PV 51.3 CL 110 +13 RL 126' SA 106 DS 93" Cycle Time = CT (or CL) = 110 Locks = (no #) = OFF Rotation = "that number" = 13
    Many thanks for the assistance; I appreciate it a lot.

      Something like:
      qr/(CT|CL)\s+([#]?)\s*(\d+)\s+(\S+)/, => sub { $_->{CTorCL} = $1; $_->{CycleTime} = $3; $_->{Locks} = ($2 eq "#" ? "ON" : "OFF"); $_->{Rotation} = $4; }
      Note I put the values in a hash. Otherwise you'll have to explicitly declare or reset your variables so one call won't interfere with subsequent calls.