Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Fixed-Length Fields Question

by mrmick (Curate)
on Jul 24, 2000 at 14:32 UTC ( #24043=perlquestion: print w/ replies, xml ) Need Help??
mrmick has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks

Which of the following would be better/more efficient at parsing a fixed length record into its fields:

/(.{16})(.{20})(.{20})(.{1})(.{8})/
OR
unpack($PS_T, $_);
I humbly thank you in advance. Mick

Comment on Fixed-Length Fields Question
Select or Download Code
Re: Fixed-Length Fields Question
by turnstep (Parson) on Jul 24, 2000 at 14:42 UTC

    As a general rule, any of perl's built-in functions such as unpack, split, pop, etc... do a better job (e.g. are more efficient) than trying to do it yourself with a regular expression (or with any code). There are exceptions, of course, but perl's functions are very optimized by very clever people. You can probably write something faster for a specific set of data, as the functions are generalized, but for generic data and everyday use, the built-in functions win.

Re: Fixed-Length Fields Question
by ase (Monk) on Jul 24, 2000 at 15:04 UTC
    I ran the following quick Benchmark code to test:
    #!/usr/bin/perl -w use strict; use Benchmark; my $input = 'X' x 65; #test input my @result; timethese(-10, { 'Regex' => sub { @result = ($input =~ /(.{16})(.{20})(.{20})(.{1})(. +{8})/); # print join "\t", @result; }, 'Unpack' => sub { @result = unpack "A16A20A20AA8",$input; # print join "\t", @result; }, });
    which yielded:
    Benchmark: running Regex, Unpack, each for at least 10 CPU seconds... Regex: 9 wallclock secs (10.44 usr + 0.00 sys = 0.44 CPU) @ 198 +59.10/s (n=207329) Unpack: 11 wallclock secs (10.17 usr + 0.00 sys = 0.17 CPU) @ 333 +12.29/s (n=338786)

    It would seem unpack is more efficient in this case.
    -ase
Re: Fixed-Length Fields Question
by lhoward (Vicar) on Jul 24, 2000 at 19:24 UTC
    I decided to test a substr approach. It is almost as fast as the unpack approach, but not quite.
    Benchmark: running Regex, Substr, Unpack, each for at least 3 CPU seco +nds... Regex: 2 wallclock secs ( 3.17 usr + 0.00 sys = 3.17 CPU) @ 42 +172.56/s (n=133687) Substr: 2 wallclock secs ( 3.12 usr + 0.00 sys = 3.12 CPU) @ 56 +391.35/s (n=175941) Unpack: 2 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 61 +831.65/s (n=195388)
    #!/usr/bin/perl -w use strict; use Benchmark; my $input = 'X' x 65; #test input my @result; timethese(0, { 'Regex' => sub { undef @result; @result = ($input =~ /(.{16})(.{20})(.{20})(.{1})(. +{8})/); # print join "\t", @result; }, 'Unpack' => sub { undef @result; @result = unpack "A16A20A20AA8",$input; # print join "\t", @result; }, 'Substr' => sub { undef @result; @result=((substr $input,0,15), (substr $input,15,20), (substr $input,35,20), (substr $input,55,1), (substr $input,56,8)); # print join "\t", @result; }, });
Re: Fixed-Length Fields Question
by japhy (Canon) on Jul 24, 2000 at 20:55 UTC
    The Perl docs suggest using unpack() instead of multiple substr()s. And the regex approach should be using the /s modifier (theoretically) in case your data has embedded newlines.

    $_="goto+F.print+chop;\n=yhpaj";F1:eval

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://24043]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-10-25 18:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (147 votes), past polls