Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

[SOLVED] Pack/unpack - understanding the '@' and '.' templactes

by ateague (Monk)
on Mar 30, 2013 at 19:14 UTC ( [id://1026297]=perlquestion: print w/replies, xml ) Need Help??

ateague has asked for the wisdom of the Perl Monks concerning the following question:

UPDATE: I got my example to work with the following code:

say unpack '(C.*/X/xa@1)3', "\003\003\003abcdef"; # prints 'aaa' say unpack '(C.*/X/xa@1)3', "\005\004\003abcdef"; # prints 'cba'
Thank you Loops and 7stud for your help.



Good morning! I am trying to get a better understanding of Perl's pack/unpack function.

I have a pretty good grasp of the "standard" templates, but I simply cannot get my head around the "@" and "." templates.

Does anyone know of (or have) any tutorials/examples that delve more in depth with the "@" and "." templates? perldoc pack and perlpacktut have the briefest mention of "@" and (to my eyes) no mention of ".".

A more concrete example: After playing around with pack and the "/" modifier, I came up with the following:

say unpack '(C/xa@)3', "\003\003\003abcdef"; # prints 'bcd' say unpack '(C/xa@)3', "\005\004\003abcdef"; # prints 'ddd'
1. I understand what C/xa is doing, but what does @ do in this case?

2. How would I make the template skip from the absolute start of the string, rather than from a relative position?
e.g.:

say unpack '(C/xa@)3', "\003\003\003abcdef"; # prints 'bcd' - would li +ke 'aaa' say unpack '(C/xa@)3', "\005\004\003abcdef"; # prints 'ddd' - would li +ke 'cba'

Thank you for your time.

Replies are listed 'Best First'.
Re: Pack/unpack - understanding the '@' and '.' templactes
by Loops (Curate) on Mar 31, 2013 at 07:11 UTC
    Hi there,

    So, the "@" repositions the pointer to which character will next be consumed from the expression string. You may pass a number to it.. so:

    say unpack 'aaaaa@0aa', 'ABCDEFGH'; # prints ABCDEAB
    Now, when it's inside a group, it doesn't move the pointer to the start of the expression, but back to where the current group started matching:
    say unpack 'aa(aaa@0)aa', 'ABCDEFGH'; # prints ABCDECD
    Now you'll notice I was supplying a 0 parameter to @, which means back to the very start of the string or group. But if you don't supply a number, 1 is used as a default.
    say unpack 'aa(aaa@)aa', 'ABCDEFGH'; # prints ABCDEDE say unpack 'aa(aaa@1)aa', 'ABCDEFGH'; # prints ABCDEDE
    So in your first example:
    say unpack '(C/xa@)3', "\003\003\003abcdef"; # prints 'bcd'

    C reads in the first byte (which is 3) which then is used as a length by /x to move over 3 bytes in the input expression where the 'a' reads the next character yielding "b". The @ having no numeric modifier, and being constrained in a group moves back to 1 byte from the starting point of where the group was anchored. Since the group is currently anchored on the very first character, the new starting point will be the second byte.

    And the whole thing repeats but now anchored on this new starting point. The next time @ is met, it will move back to 1 byte from the current starting point... ie the 3rd byte.

    In this way, the 3 offset bytes at the start of the input expression can each be scanned. We jump forward to process what each one references, and then we jump back to process the next offset.

    Now to the "." character. It's kind of tricky. Essentially what it does is give you your current pointer position within the expression. So:
    say unpack 'aa.', 'ABCDEFGH'; # prints AB2 say unpack 'aaaa.', 'ABCDEFGH'; # prints ABCD4
    Now we can combine this with the '/' operator! Which will consume this number instead of sending it to the output:
    say unpack 'aa./xaa', 'ABCDEFGH'; # prints ABEF
    What happened? The first two 'a' print AB, the . returns '2' (as in the above example) but instead of being printed is consumed by the '/x' which moves the input pointer over by this 2 characters. Where the next two 'a' print EF. When used inside of a group, by default the '.' character will return to you the offset from the start of the group! But using the '*' modifier, we can escape the group and get it to act like it's not inside a group and return the absolute position from the start of the string:
    say unpack 'aa.aa', 'ABCDEFGH'; # prints AB2CD say unpack 'a(a.a)a', 'ABCDEFGH'; # prints AB1CD say unpack 'a(a.*a)a', 'ABCDEFGH'; # prints AB2CD
    And this should give you a hint about the answer to your second question. To return to the absolute start of the string, use the ".*" construct to get your current position combined with "/X" to move backward that many characters:
    say unpack 'aa(aa).*/Xaa', 'ABCDEFGH'; # prints ABCDAB say unpack 'aa(aa.*/Xaa)', 'ABCDEFGH'; # prints ABCDAB
Re: Pack/unpack - understanding the '@' and '.' templactes
by 7stud (Deacon) on Mar 31, 2013 at 10:01 UTC

    Also note you can set the input string's pointer at any position at any time with @:

    @29 A3 @0 A3 @39 A3

    Here's a full example:

    use strict; use warnings; use 5.016; my $text =<<END_OF_TEXT; 1 2 3 012345678901234567890123456789012 date last first id 2/1/10 Smith 001 3/1/11 Smith Betty 002 4/2/12 Jones 003 END_OF_TEXT open my $INFILE, '<', \$text or die "Couldn't open string for reading: $!"; my $discard = <$INFILE> for (1 .. 3); while (my $line = <$INFILE>) { chomp $line; printf "%-10s %-10s %-3s %-5s \n", unpack('@20 A10 @10 A10 @30 A3 @0 A5', $line); } close $INFILE; --output:-- Smith 001 2/1/1 Betty Smith 002 3/1/1 Jones 003 4/2/1

    I understand what C/xa is doing,...

    Maybe not. The 'C' reads the first byte from the input string and treats it as an integer. After that read, the position pointer for the string advances to the start of the second byte. Next, '/x' says to use the integer on top of the stack as the repeat count for 'x'. What the heck is the stack and where does it come from? According to the docs, the stack is...

    …an internal stack of integer arguments unpacked so far.. 
    

    That means if you use the template 'C2 /A' on the string '\001\002hello', then the stack looks like this:

    2
    1
    

    The first integer that was unpacked, 1, was placed on top of the stack, then the second integer that was read, 2, was placed on top of the stack. Subsequently, the '/A' pops the value off the top of the stack, i.e. 2, and instead of being inserted into the results array the 2 becomes the repeat count for the 'A':

    my $str = "\001\002hello"; my @results = unpack 'C2 /A', $str; say "@results"; --output:-- 1 he

    So what happens when you apply the template 'C/xa' to the string '\003\003\003abcd'? The 'C' says to read one byte from the input string and treat it as an integer. After that read, the position pointer for the input string advances to the start of the second byte, and the 3 from the first byte is placed on top of 'the stack':

    \003\003\003abcd
        ^
        |
       position pointer after reading 'C'
    
    
    stack:  3
    

    Then '/x' pops the 3 off the stack and uses it as the repeat count for the 'x', and as a result the 3 doesn't get inserted into the results array. Then because 'x' means to skip a byte and the repeat count is 3, unpack() skips 3 bytes in the input string starting at the location of the position pointer:

    \003\003\003abcd
         ^
         |
       position pointer after reading a 'C'
    
    
    \003\003\003abcd
                 ^
                 |
       position pointer after reading 'C/x'
    
    
    
    Finally, the 'a' in the template reads the character 'b'.

    perldoc pack and perlpacktut have the briefest mention of "@" and (to my eyes) no mention of ".".

    Whoever wrote the pack tutorial and pack docs should be shot. I vote for deleting the current pack tutorial and pack docs, and starting with what Loops wrote.

      Shot? I didn't know you were a sad clown 7stud

      pack/unpack had even less documentation before those pages were written up

Re: Pack/unpack - understanding the '@' and '.' templactes
by ambrus (Abbot) on Mar 31, 2013 at 11:14 UTC
      Heh heh

      Yup, I saw those. They inspired me to experiment deeper with pack/unpack. Oddly enough, your Just another unpack hacker helped me conceptualise what the "/" modifier does. (You know the docs are in a sorry state when you turn to a JAPH for help... o_0)

Re: Pack/unpack - understanding the '@' and '.' templactes
by ateague (Monk) on Mar 31, 2013 at 14:44 UTC
    Thank you Loops and 7stud for your explanations. You have certainly given me a few things to chew over and experiment with.

    Whoever wrote the pack tutorial and pack docs should be shot. I vote for deleting the current pack tutorial and pack docs, and starting with what Loops wrote.

    I doubleplus agree with that (the deleting the existing docs part, not the shooting bit). The existing pack and packtut pages desperately need a vigorous round of editing - preferably with a gang mower and a fire axe. The previous author should be sentenced to rewriting the GNU tar man/info pages until they are clear enough to actually use.

Re: Pack/unpack - understanding the '@' and '.' templates
by Anonymous Monk on Mar 31, 2013 at 03:45 UTC

    update: :) good night

    1. I understand what C/xa is doing, but what does @ do in this case? but what does @ do in this case?

    Looks like a typo to me :)

    http://perldoc.perl.org/perlpacktut.html#The-Alignment-Pit gives the use-case for this one, like a lot of these, its weird :)

    Here is what I think I kinda know, remove the @ and you get

    'x' outside of string in unpack

    So I think @ means start from 0 , where 0 is the beginning for the innermost group (), instead of outside the end of the string :)

    So C/x says read one byte/octet, then SKIP(x) that amount of bytes, so it eats the first \003, skips next 3 bytes , which are \003\003a

    And its a group, so repeats that three times, each time the C/ eating up one byte, so it inches forward

    Yeah, I don't think you can do this in one step, I think it's a 2 step operation like shown in the alignment pit

    2. How would I make the template skip from the absolute start of the string, rather than from a relative position? e.g.:

    Wait a minute, doesn't this mean you understand 1.?

    I think you can't do that, now the original makes even less sense to me

    Ok, here's the 2step

    use 5.12.0; my $raw = "\003\003\003abcdef"; my @offsets = unpack '(C)3', $raw; my $chucks = join ' ', map { '@' . $_ . 'a' } @offsets; say $chucks; say for unpack $chucks, $raw; use 5.12.0; my $raw = "\003\003\003abcdef"; my @offsets = unpack '(C)3', $raw; my $chucks = join ' ', map { '@' . $_ . 'a' } @offsets; say $chucks; say for unpack $chucks, $raw; __END__ @3a @3a @3a a a a

    update: HAHAHAHAHAHAAH I DID IT  (C/x! a@)3 gets you aaa, not sure how that works, makes no sense to me, but its straight from gappy alignment pit portion

    This is what I don't like about pack/unpack,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1026297]
Approved by toolic
Front-paged by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-03-19 02:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found