Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Splitting into variables columned data without delimiters with a regexp ?

by gerleu (Novice)
on Nov 01, 2011 at 09:21 UTC ( [id://935065]=perlquestion: print w/replies, xml ) Need Help??

gerleu has asked for the wisdom of the Perl Monks concerning the following question:

Hello again and sorry to bother you with a perhaps interesting question:

I've a text without any data or line delimiters and I need to put into different variables (ideally $1, $2, etc.) for example the data in the positions 0 (first char) to 5, then in another variable the data in the positions 6 to 10 for each chunk of 6+5=11 chars and then continue afterwards with the following chunks.

But I need to do it using a regexp instead of unpack('A6 A5',$chunk_of_data) because the later is soooo slow...

Please can you suggest me a regexp to do it in a very efficient way ? I'm searching since a while without a result !

Many thanks in advance for your celestial help to fulfill my humble prayer...

Germain from http://www.vehicall.com

Replies are listed 'Best First'.
Re: Splitting into variables columned data without delimiters with a regexp ?
by BrowserUk (Patriarch) on Nov 01, 2011 at 09:31 UTC
    I need to do it using a regexp instead of unpack('A6 A5',$chunk_of_data) because the later is soooo slow...

    The regex engine will not be quicker than unpack.

    Your perceived problem with the performance of unpack is probably because you are calling it for each chunk rather than unpacking all the chunks in one go:

    my @bits = unpack '(A6A5)*', $all_the_data;

    This is much faster than unpacking each 11 byte chunk individually.

    eg:

    $data = 'AAAAAABBBBB' x 10;; @bits = unpack '(A6A5)*', $data;; print for @bits;; AAAAAA BBBBB AAAAAA BBBBB AAAAAA BBBBB AAAAAA BBBBB AAAAAA BBBBB AAAAAA BBBBB AAAAAA BBBBB AAAAAA BBBBB AAAAAA BBBBB AAAAAA BBBBB

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hi BrowserUK and thank again for your gentle help ! You are totally right: I will investigate the global unpack you suggest and will let you know the result... I was thinking about solutions with sprintf or split too but I doubt they will be faster than a global unpack.
        I was thinking about solutions with sprintf or split too

        split also invokes the regex engine, and doesn't really lend itself to the task, so it's not going to help any.

        I've no idea how sprintf could be used for this as its primary purpose is composing strings, not decomposing them.

        The regex engine can decompose fix field data surprisingly efficiently, but it will never beat unpack that was designed for this express purpose:

        cmpthese -1,{ a => q[ my $s='x'x1100; my@bits= unpack'(A6A5)*',$s; ], b => q[ my $s='x'x1100; my@bits= $s=~m[(.{6})(.{5})]g; ], };; Rate b a b 4516/s -- -6% a 4780/s 6% --

        Not huge savings but they grow with size:

        cmpthese -1,{ a => q[ my $s='x'x11000; my@bits= unpack'(A6A5)*',$s; ], b => q[ my $s='x'x11000; my@bits= $s=~m[(.{6})(.{5})]g; ], };; Rate b a b 427/s -- -10% a 471/s 11% --

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Splitting into variables columned data without delimiters with a regexp ?
by AnomalousMonk (Archbishop) on Nov 01, 2011 at 09:49 UTC

    I agree with BrowserUk's reply: I cannot imagine that a regex would be faster (or, if at all, to any significant degree) than unpack in extracting what seem, from the description in the OP, to be fixed-width data fields.

    I would like to add that the lack of example input and output in the OP makes it much harder (for me at least) to understand exactly what is required. Even a very simple example would have helped greatly.

      Hello, I'm back with my findings ! In my case, I'm getting the columned data from a file and I've benchmarked global unpacks with file slurping against real-time settings of the different variables by specific sequential read statements: the second solution is always faster ! And I guess this is simply due to the efficient I/O read buffer provided with my Linux Ubuntu OpSys... Many thanks anyway for all your pertinent remarks, as ever, which learned to me a lot about our beloved programming language, Perl of course :-) Bye now, Germain

Re: Splitting into variables columned data without delimiters with a regexp ?
by Ratazong (Monsignor) on Nov 01, 2011 at 09:33 UTC

    If I understand your requirements, you need to get 11 characters, split in a group of 6 and one of 5. The code below just does this (using the metacharacter .).

    my $s = "xabcdefghijXABCDEFGHIJ01234567890"; while ($s =~ /(......)(.....)/g) { print "$1 $2\n"; }
    HTH and have fun benchmarking, Rata

      Thank you Ratazong, I will investigate your suggestion too !

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://935065]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-03-29 02:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found