http://www.perlmonks.org?node_id=895919


in reply to Re^2: Using pack to evaluate text strings as hexadecimal values
in thread Using pack to evaluate text strings as hexadecimal values

I'm going to "pop this up a few levels", because the exchange between BrowserUk and me isn't easily visible any more. Here's a (somewhat edited) version of the last couple exchanges. Sorry, my html smarts don't include making the quotations evident, and the Formatting Tips didn't help a lot.

(jpl) Why couldn't we have a format item "Y" (where "Y" is some unused format character, finding which may be the real problem) such that

pack("Y4", 32); # produces "0020" unpack("Y4", "0020"); # produces 32

There are many format characters, v among them, that pack integers into strings and unpack strings into integers. I believe the only practical difference between "Y" and "v" is the contents of the string. The one produced by "v" may be unsuitable for display, the one produced by "Y" would be both displayable and, when displayed, indicative of the bits in the integer from which it was produced.

(BrowserUk:) It could work and would be useful. It would be a departure from the norm of converting numerics to and from their binary representation.

There are some issues as to what should happen if you specified pack 'Y4', 100000; or pack 'Y4', 32.0 but actually that perhaps suggests a way around the lack of remaining letters that also has an existing precedent.

To skip a complex structure--say consisting of 2 shorts and float--the syntax is X[vvf], meaning skip enough bytes to cover 2 shorts and a float. Ie. 2+2+4 = 8 bytes.

To get your hexified numeric, the syntax could be pack 'H[v]', 32, meaning treat the number as a a 16-bit int and hexify it; thereby producing your 4 bytes of output.

The nice thing is that this then extends naturally to pack 'H[V]', 32; to produce 8-bytes of hex. And pack 'H[Q]', 123456789012345; and even pack 'H[f]', 1234.56e78; and so on.

And once that is accepted, this further extends to the other bane of pack/unpack; binary. With 'B[v] b[V] B[d]' etc. And a quick peruse of the docs suggest] that 'O' isn't currently used, so maybe 'O[v] O[Q]' might be useful also.

Now all you've got to do is: knock up the patch; get it by p5p; and wait for it to make it into a build :)

(End of thread summary).

Perhaps, rather than having to reproduce much of the complexity of the [v] [Q] etc. notation, the format item could "functionally compose" with the item that followed. (Not so different from BrowserUk's Re^3 suggestion, but all carried out within pack/unpack.) So, whatever string the following pack format item produces gets turned into printable hex (or octal or binary, I suppose), and unpack would turn the printable string into the (usually unprintable string) that the next item expects to unpack. I haven't actually looked at the pack/unpack code, but this has the "feel" of something that should be able to piggy-back off existing code to do all the "hard work".

Replies are listed 'Best First'.
Re^4: Using pack to evaluate text strings as hexadecimal values
by BrowserUk (Patriarch) on Mar 28, 2011 at 14:17 UTC

    You've done no one any favours but "popping a few levels". This is a hierarchal forum and, used correctly, all posts are "visible" regardless of how deep they get,

    As for the idea of functional composition of pack formats. You'd need to post a couple of examples of your intent before I would be able to interpret your meaning?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Here's what I have in mind. pack, in the abstract, takes a substring of the format string, starting at some offset, and an argument to pack, converts the argument to a string that gets appended to the result returned by pack, and updates the offset in the format string so the next argument can be processed. Maybe that's also how the code works, maybe not, as I said, I haven't yet looked. If it were possible for a format item, let's call it Y, to "intercept" the string produced by whatever what follows it in the format string, and return the "printable hex" version as the result (pretty much as the H format does now to the string argument it expects in the argument list), then we would already have all the integer width/endian-ness/sign-edness processing done for us by existing code. Maybe Y[o] could get us octal or Y[X] would yield hex with uppercase A-F. But the heavy lifting would be done by whatever follows in the format string. So

      pack("Yv", -1); # yields "ffff" pack("YN", -1); # yields "ffffffff" pack("Y[X]v", -1); # maybe yields "FFFF" pack("Y[o]v", -1); # maybe yields "177777" pack("Y[b]c", 15); # maybe yields "00001111"

      The point being, what follows the Y construct determines the content (including length) of the string, and the Y construct turns that string, wherever it came from, into something printable.

      Unpack would do the reverse, but it would be a little more complicated, because it would have to determine, from what follows in the format string, how long a string is expected. Knowing that, it could convert a printable string into whatever string the following format expects.

      This would make the job of the Y format item nearly trivial, but it would put some demands on the pack/unpack code, like being able to determine the length of the string a format string expects to unpack. If we restricted what could follow Y to a single letter (no skipping, repetitions, etc.), that could easily be provided, and would be adequate for handling all the integer and floating point values, which are the major suspects in packing into unprintable strings.

      BrowserUk's suggestions about how to generalize the concept are the sort of wisdom I was hoping for from this forum. I'd like to refine the idea some before digging into the code and maybe bouncing it off p5p.

         pack("Yv", -1);      # yields "ffff"

        I thought that is what you meant by functional composition, but I wanted to be sure.

        The problem is, abuttal of template characters is already allowed. pack 'NQ', ... says pack the first argument as a unsigned dword (in network order), and the second as an unsigned quadword. And it is entirely equivalent to pack 'N Q', ... and pack 'N1Q1, ....

        In other words, abuttal has no significance. You are suggesting making it significant. Besides that it could break existing code, whilst it may possible to code Your Y character to expect and handle abuttal to be significant, the human eye/brain is less reliable.

        The beauty of my suggestion is that it is a natural extension of existing implemented syntax.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.