Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Modifying pos() from within (?{...})

by Anonymous Monk
on Apr 26, 2018 at 13:42 UTC ( [id://1213605]=note: print w/replies, xml ) Need Help??


in reply to Modifying pos() from within (?{...})

Ah, tybalt89 beat me to it. Some fixes are needed to your regex

  • You want to match in scalar context, otherwise the m// will return all the groups of all the matches, and the while will loop forever
  • You will most definitely want an /s modifier to your regex, otherwise the dot will not match newline characters in the data
  • The regex engine has a limitation of 32767 to the repeat count; a workaround is needed

Here's one version:

#!/usr/bin/env perl use strict; use warnings; use 5.016; our $VERSION = 0.1; # Rebuild PNG image local $/; my $input = <DATA>; chomp $input; $input = pack "H*", $input; # Skip PNG Header $input = substr( $input, 8 ); $| = 1; while ( $input =~ m{ # four byte length (....) # four byte tag (....) # variable length data ((??{ # unpack length my $len = unpack "N", $1; print STDERR "$2 $len\n"; ".{30000}" x ($len/30000) . ".{@{[$len%30000]}}" })) # four byte crc (....) }gsx ) { my ( $len, $tag, $data, $crc ) = ($1, $2, $3, $4); say "Chunk len:", unpack "N", $len; say "Chunk tag:", $tag; say "Chunk data:", unpack "H*", $data; say "Chunk data len:", length $data; say "Chunk CRC: ", unpack "H*", $crc; } __DATA__ 89504e470d0a1a0a0000000d4948445200000010000000100803000000282d0f530000 +015c504c544547704c4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb5 +4f7eb54f7eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb34f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb44f7 +eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb5 +4f7eb54f7eb54f7eb54f7eb34f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7 +eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb4 +4f7eb54f7eb54f7eb5a29b27fc0000007374524e5300242d75280b02fa8bae61fdf7f +bf5e266d109dd6462faf9fc6b0dde6571b3d9b0031f01e0cd83d64505928fc51eb405 +582a426d052f12e5840e63ecea046ab248d0f6327c3e0c4ff311b67960822ef295f8a +093af76e3683c3d7213eb02192cbb8106f190db5ad42cc19ba7504c36150ec28de169 +2647000000cb4944415418d3636040005b264b0614e0e7e50c24f595617ccf201e5f3 +706062e16289f8d39b0d83fc09521980bc237737011e6f370b260080fb11390111195 +55b0b6f7f671b7d2658808e3e665176397282e2e766404298de229860173539040226 +f71b18db43848401eac222db9983d9445925385bb584e0a2490915a1c13c9c0c0aac1 +5fccc10612c8e1e74b8806d28682eacc60fb8b84d3536281b4910ea731588043303b3 +30e481ba8ea690b8104720b05b2e2813413879a96124840282fbf2009c4d034615504 +520015a424be37c5c7be0000000049454e44ae426082


But using unpack is better for, ahem, unpacking binary formats. You could write something like:

my ($tag, $data, $crc) = unpack q( (x4L X8Lx4/a* L)> ), $input;
Place ( )* around the template to get all the chunks.

Replies are listed 'Best First'.
Re^2: Modifying pos() from within (?{...})
by mxb (Pilgrim) on Apr 26, 2018 at 14:23 UTC

    Thanks for your help also!

    But using unpack is better for, ahem, unpacking binary formats. You could write something like:

    my ($tag, $data, $crc) = unpack q( (x4L X8Lx4/a* L)> ), $input;

    Place ( )* around the template to get all the chunks.

    Wow, that's concise and reasonably clear. I'm learning regex and more advanced pack|unpack at the same time so it's definitely something I need to dig into deeper.

    Armed with the pack documentation I dissected your pack string into the following:

    x4 - four null bytes (effectively skip the length fields)

    L - 32bit value (the tag)

    X8 - seek backwards 8 bytes (back to the start of the chunk

    L - 32bit value (data length)

    x4 - four null bytes (skip the tag field)

    / - pops the last value (data length) off the extraction stack and uses it as a repetition count for the next item

    a* - binary data (count is set by the / above); not sure why * follows it, it works when removed

    L - 32 bit CRC

    It seems that I need to go through perlpacktut next!

    I do have a small question however, when I try to repeat the group with * I get the following error:

    'x' outside of string in unpack

    Maybe this is something to do with the x at the start of the group? It's trying to read a new group starting just past the end of the string?

    edit:Changing the first part of the pack string to L seemed to fix the 'outside of string' error. The data length is now explicitly within the returned values, but could be ignored later.

    edit2:I span this out into a separate thread as it was getting too far off the original topic.

      I already answered in the second thread, that the $input has an extra \n at the end, and I just realized why. The funny thing is, I reached that conclusion for the wrong reason. When I did my test I didn't modify $/ and didn't call chomp, so the \n wasn't removed. You, on the other hand did have chomp, but with $/ set to undef. The chomp doc states:

      When in slurp mode ($/ = undef ) or fixed-length record mode ($/ is a reference to an integer or the like; see perlvar), chomp won't remove anything.

      So in my tests by forgetting to call chomp, I got the same result you had by disabling its effect, and therefore reached the correct conclusion :D

        When in slurp mode ... chomp won't remove anything.

        Another illustration of the wisdom of local-ized modification of globals in the narrowest scope, e.g.
            my $input = do { local $/;  <DATA>; };
        rather than in broad or, indeed, global scope even when "it's just a small program; it doesn't really matter what we do to globals": spooky action at a distance.


        Give a man a fish:  <%-{-{-{-<

Re^2: Modifying pos() from within (?{...})
by Anonymous Monk on Apr 26, 2018 at 13:55 UTC

    Oops, that * in the unpack template was redundant.

    my @chunks = unpack '( (x4L X8Lx4/a L)> )*', $input; while ( my ($tag, $data, $crc) = splice(@chunks, 0, 3) ) { ... }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1213605]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-23 23:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found