Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: nth field extraction

by anonymized user 468275 (Curate)
on Jul 27, 2018 at 09:57 UTC ( [id://1219371]=note: print w/replies, xml ) Need Help??


in reply to nth field extraction

As I understand it you have several layers of delimiters. One idea for reusable code would be something that converts this into a multi-dimensional array -- one dimension per delimiter. E.g:
my $aref = &fieldParse($fullString, '!!', ';', ','); sub fieldParse { my $source = shift; my $ret = []; my $delim = shift; defined($delim) or return $source; for (split $delim, $source) { push @$ret, &fieldParse($_, @_); } return $ret; }
which produces:-
$VAR1 = [ [ [ 'abcd-efgh-ijkl-mnop' ], [ 'key1=data1', 'key2=data2' ], [ 'key1=data +3', 'key2=data4' ] ], [ [ 'qwer-asdf-zxcv-tyui' ], [ 'key1=data3' ], [ 'key3=data6' ] ], [ [ 'trew-hgfd-yt +re-bvcx' ], [ 'key1=data7', 'key2=data8' ], [ 'key1=data9', 'key2=data10' ] ], [ [ 'erty-dfgh-cvbn- +hjkl' ], [ 'key2=data5' ], [ 'key3=data6' ] ] ];
Updated (handle case of false-value delimiter as someone suggested)

One world, one people

Replies are listed 'Best First'.
Re^2: nth field extraction
by AnomalousMonk (Archbishop) on Jul 27, 2018 at 20:51 UTC
    my $delim = shift or return $source;

    This statement in  fieldParse() makes me uneasy. The parse will fail if any  $*_div is '0'. Perhaps unlikely, but still... A safer alternative IMHO would be:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $major_div = '!!'; my $user_div = '0'; my $var_div = ','; my $full_string = join $major_div, 'abcd-efgh-ijkl-mnop0key1=data1,key2=data20key1=data3,key2=data4', 'qwer-asdf-zxcv-tyui0key1=data30key3=data6', 'trew-hgfd-ytre-bvcx0key1=data7,key2=data80key1=data9,key2=data10', 'erty-dfgh-cvbn-hjkl0key2=data50key3=data6', ; print qq{full_string: <<$full_string>> \n}; ;; my $aref = fieldParse($full_string, $major_div, $user_div, $var_div); dd $aref; ;; sub fieldParse { my $source = shift; return $source unless @_; ;; my $delim = shift; return [ map fieldParse($_, @_), split $delim, $source ]; } " full_string: <<abcd-efgh-ijkl-mnop0key1=data1,key2=data20key1=data3,ke +y2=data4!!qwer-asdf-zxcv-tyui0key1=data30key3=data 6!!trew-hgfd-ytre-bvcx0key1=data7,key2=data80key1=data9,key2=data10!!e +rty-dfgh-cvbn-hjkl0key2=data50key3=data6>> [ [ ["abcd-efgh-ijkl-mnop"], ["key1=data1", "key2=data2"], ["key1=data3", "key2=data4"], ], [["qwer-asdf-zxcv-tyui"], ["key1=data3"], ["key3=data6"]], [ ["trew-hgfd-ytre-bvcx"], ["key1=data7", "key2=data8"], ["key1=data9", "key2=data1"], ], [["erty-dfgh-cvbn-hjkl"], ["key2=data5"], ["key3=data6"]], ]
    (This version of the function still has some vulnerabilities, but I'm a bit more comfortable with it. :)


    Give a man a fish:  <%-{-{-{-<

Re^2: nth field extraction
by lee_crites (Scribe) on Jul 27, 2018 at 14:27 UTC

    This is exactly the direction I was thinking of going! Thanks!!! I will be digesting this

    The problem I'm having is that I (re)process the string multiple times. That worked okay when I was doing it a few times -- perhaps several hundred or thousand times, total, in a run. But my best guess is that it will be run something between 1,500k and 3,000k times per run. Hence my hope for ideas on a better way.

    Just for giggles and grins, I extracted the function I had into a standalone test script. It was probably at the top of my coding about 15+ years ago. Here it is:

    #!/usr/bin/env perl my $which = 3; my $div = '!!!'; my $str = 'asdf' . $div . 'qwer' . $div . 'zxcv' . $div . 'hjkl' . $di +v . 'yuio' . $div . 'vbmn'; my @stuff; my $spot = 0; my $result = index($str, $div, $spot); print "str: [$str]\n"; while ($result != -1) { print "Found '$div' at $result\n"; my $start_spot = ($spot ? $spot + length($div) - 1 : 0); my $field_length = ($spot ? $result - $spot - length($div) + 1 : $ +result - $spot); push @stuff, substr($str, $start_spot, $field_length); $spot = $result + 1; $result = index($str, $div, $spot); } print @stuff . "\n"; print '-- #' . $which . '=' . @stuff[$which-1] . "\n";

    I am continually amazed and pleased at the quality of the responses I get/see here on perlmonks! Thanks, y'all!!!

    Lee Crites
    lee@critesclan.com
      In that case there could be a slight performance benefit in storing results in a hash, e.g.
      my %res; ... ... for my $fullString (however they are obtained) { $res{$fullString} ||= fieldParse( $fullString, etc. ); etc... }

      One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1219371]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-16 18:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found