Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Fixed-Length Fields Question

by mrmick (Curate)
on Jul 24, 2000 at 14:32 UTC ( #24043=perlquestion: print w/ replies, xml ) Need Help??
mrmick has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks

Which of the following would be better/more efficient at parsing a fixed length record into its fields:

/(.{16})(.{20})(.{20})(.{1})(.{8})/
OR
unpack($PS_T, $_);
I humbly thank you in advance. Mick

Comment on Fixed-Length Fields Question
Select or Download Code
Re: Fixed-Length Fields Question
by turnstep (Parson) on Jul 24, 2000 at 14:42 UTC

    As a general rule, any of perl's built-in functions such as unpack, split, pop, etc... do a better job (e.g. are more efficient) than trying to do it yourself with a regular expression (or with any code). There are exceptions, of course, but perl's functions are very optimized by very clever people. You can probably write something faster for a specific set of data, as the functions are generalized, but for generic data and everyday use, the built-in functions win.

Re: Fixed-Length Fields Question
by ase (Monk) on Jul 24, 2000 at 15:04 UTC
    I ran the following quick Benchmark code to test:
    #!/usr/bin/perl -w use strict; use Benchmark; my $input = 'X' x 65; #test input my @result; timethese(-10, { 'Regex' => sub { @result = ($input =~ /(.{16})(.{20})(.{20})(.{1})(. +{8})/); # print join "\t", @result; }, 'Unpack' => sub { @result = unpack "A16A20A20AA8",$input; # print join "\t", @result; }, });
    which yielded:
    Benchmark: running Regex, Unpack, each for at least 10 CPU seconds... Regex: 9 wallclock secs (10.44 usr + 0.00 sys = 0.44 CPU) @ 198 +59.10/s (n=207329) Unpack: 11 wallclock secs (10.17 usr + 0.00 sys = 0.17 CPU) @ 333 +12.29/s (n=338786)

    It would seem unpack is more efficient in this case.
    -ase
Re: Fixed-Length Fields Question
by lhoward (Vicar) on Jul 24, 2000 at 19:24 UTC
    I decided to test a substr approach. It is almost as fast as the unpack approach, but not quite.
    Benchmark: running Regex, Substr, Unpack, each for at least 3 CPU seco +nds... Regex: 2 wallclock secs ( 3.17 usr + 0.00 sys = 3.17 CPU) @ 42 +172.56/s (n=133687) Substr: 2 wallclock secs ( 3.12 usr + 0.00 sys = 3.12 CPU) @ 56 +391.35/s (n=175941) Unpack: 2 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 61 +831.65/s (n=195388)
    #!/usr/bin/perl -w use strict; use Benchmark; my $input = 'X' x 65; #test input my @result; timethese(0, { 'Regex' => sub { undef @result; @result = ($input =~ /(.{16})(.{20})(.{20})(.{1})(. +{8})/); # print join "\t", @result; }, 'Unpack' => sub { undef @result; @result = unpack "A16A20A20AA8",$input; # print join "\t", @result; }, 'Substr' => sub { undef @result; @result=((substr $input,0,15), (substr $input,15,20), (substr $input,35,20), (substr $input,55,1), (substr $input,56,8)); # print join "\t", @result; }, });
Re: Fixed-Length Fields Question
by japhy (Canon) on Jul 24, 2000 at 20:55 UTC
    The Perl docs suggest using unpack() instead of multiple substr()s. And the regex approach should be using the /s modifier (theoretically) in case your data has embedded newlines.

    $_="goto+F.print+chop;\n=yhpaj";F1:eval

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://24043]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (13)
As of 2015-07-07 11:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls