Re^2: How to split line with varying number of tokens?

by AnomalousMonk (Chancellor)
on Apr 29, 2013 at 00:33 UTC ( #1031126=note: print w/replies, xml ) Need Help??

in reply to Re: How to split line with varying number of tokens?
in thread How to split line with varying number of tokens?

... re-join the fields in the middle, possibly distorting the white space.

I, too, wondered about the significance of embedded whitespace in the  FROM field of the data and about the fixed-field nature of the data, concerning all of which zBernie is silent in the OP and, to this moment, elsewhere in this thread. If embedded whitespace in the  FROM field matters, it's simple enough to deal with it using split if the sub-strings corresponding to the separators are also captured and everything is re-assembled with a minor modification to your existing split approach. (Even so, I think I prefer a regex-based extraction approach like that of davido, which lends itself better to data validation efforts.)

>perl -wMstrict -le "my @data = ( 'REQID DEST FROM DATE TIME + nPages RCV', '138454 mail_room Marco`s Pizza 12/26 21:52 1 rcv' +, '138446 custsvc 973 618 0577 12/26 18:44 1 rcv', '138445 county2 spam 12/26 18:41 3 rcv' +, '138444 custsvc spam 12/26 18:30 1 rcv', '138439 county2 7182737253 12/26 17:54 2 rcv' +, '138438 county2 Acme Products, Inc. 12/26 17:52 1 rcv' +, ); ;; for my $record (@data) { my @fields = split /(\s+)/, $record; my $from = join '', splice @fields, 4, $#fields - 11; my ($reqid, $dest, $date, $time, $pages, $rcv) = @fields[ 0, 2, map { $#fields - $_ } 6, 4, 2, 0 ]; printf qq{'%s' \n}, join '|', $reqid, $dest, $from, $date, $time, $pages, $rcv; } " 'REQID|DEST|FROM|DATE|TIME|nPages|RCV' '138454|mail_room|Marco`s Pizza|12/26|21:52|1|rcv' '138446|custsvc|973 618 0577|12/26|18:44|1|rcv' '138445|county2|spam|12/26|18:41|3|rcv' '138444|custsvc|spam|12/26|18:30|1|rcv' '138439|county2|7182737253|12/26|17:54|2|rcv' '138438|county2|Acme Products, Inc.|12/26|17:52|1|rcv'

Node Type: note [id://1031126]
