davis has asked for the wisdom of the Perl Monks concerning the following question:


I've been writing some code to trundle through Mozilla Thunderbird's mbox format (with an eventual aim to copying all the folder structure and the emails to an IMAP server), and I've come up against the following problem. I think this is where my lack of a computer science education fails me. I'm trying to parse the X-Mozilla-Status field, which is stored as a string. I've thought of prepending a "0x" and evaling the result, but, after reading Pack/Unpack Tutorial (aka How the System Stores Data) I thought I should be able to do this with pack.

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $string = "0002"; my $MSG_FLAG_READ = 0x0001; my $MSG_FLAG_REPLIED = 0x0002; my $MSG_FLAG_MARKED = 0x0004; my $MSG_FLAG_EXPUNGED = 0x0008; my $MSG_FLAG_HAS_RE = 0x0010; my $MSG_FLAG_ELIDED = 0x0020; my $MSG_FLAG_OFFLINE = 0x0080; my $MSG_FLAG_WATCHED = 0x0100; my $MSG_FLAG_SENDER_AUTHED = 0x0200; my $MSG_FLAG_PARTIAL = 0x0400; my $MSG_FLAG_QUEUED = 0x0800; my $MSG_FLAG_FORWARDED = 0x1000; my $MSG_FLAG_PRIORITIES = 0xE000; my $value = pack("H4", $string); #print Dumper(\$value); if($value && $MSG_FLAG_READ) { print "Message has been read\n"; } else { print "Unread message\n"; }
I would expect this to produce "Unread message", but the "if" is always successful. I'd also expecting the writing of this node to lead me to the answer as usual, but there you go.

Kids, you tried your hardest, and you failed miserably. The lesson is: Never try.

Replies are listed 'Best First'.
Re: Using pack to evaluate text strings as hexadecimal values
by polettix (Vicar) on Mar 07, 2006 at 13:02 UTC
    You don't need pack, you need hex:
    shell$ perl -e 'print hex("E000"), "\n"' 57344
    You also need to use & (binary-wise and) instead of && (logic-wise and).

    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
      Perfect, exactly what I wanted. Thanks very much!

      Kids, you tried your hardest, and you failed miserably. The lesson is: Never try.
Re: Using pack to evaluate text strings as hexadecimal values
by ikegami (Pope) on Mar 07, 2006 at 14:47 UTC

    In addition to using hex instead of pack,
    you need to change
    if($value && $MSG_FLAG_READ) {
    if($value & $MSG_FLAG_READ) {


Re: Using pack to evaluate text strings as hexadecimal values
by erroneousBollock (Curate) on Mar 07, 2006 at 13:54 UTC
    It this to be an automated system for background transfer?
    I only ask because you can do it with a simple drag&drop in Thunderbird / Mozilla Mail.
      I tried that, but it didn't work for folders. (I.e. I couldn't select a folder and copy it to another email account).

      Kids, you tried your hardest, and you failed miserably. The lesson is: Never try.
Re: Using pack to evaluate text strings as hexadecimal values
by Anonymous Monk on Mar 07, 2006 at 16:26 UTC
    you could use unpack instead of hex; pack is definitly wrong.
    remember: hex is a string (representation) of an (integer) value.
    pack creates a (one) string out of (multiple) values.
    unpack creates (multiple) values out of a (one) string.

    Therefore it is better to wrap the var in () in a call to unpack.

      With apologies for resurrecting a dead horse (and mangling metaphors), pack/unpack give us the capability of going back and forth between strings and hex digits (H and h), and between strings and integers (i and n and friends), but not, as far as I can tell, between hex digits and integers, which I think was the original intent of davis. Yes, hex() and sprintf() let us do these, but there's no reason in principle why there couldn't be a pack/unpack format item to do the job.

      Why bother, one might ask? I want to keep the "external form" of my records as pure ascii, so I can use all my favorite UNIX utilities on them. So how do I represent integer bit masks? I cannot use formats like n blindly, lest some bit mask results in a byte that looks like a newline or a null. I can "or in" 6 bits to 0x20 (ord(' ')), avoiding the high bit to stay ascii and the constant x20 bit, to avoid all the nasty control characters, but A) it's unlikely to port well to a non-ascii system and, more important, B) it's not at all easy to see what bits are on and what bits are off. The same applies to uuencoded masks using format u. I could represent the mask as a fixed width digit string, but that's B) still difficult to see what bits are on and off, and C) it takes a lot of characters to encode a few bits (5 digit numbers to accommodate 16 bits). I'd be happy to trade off the extra characters to represent the bits in hex, for the benefit of easy determination of what's on and what's off.

      I can so this using hex() and sprintf(), but then I need a way to fiddle records before packing and after unpacking. Am I alone in wishing there were a pack format item for the conversion, so I could do the whole job with pack and unpack?

        What doesn't this do that you'd like it to do?

        $n = 0; $n |= 1 << $_ for 1,3,5,7;; print $n, unpack 'H2', pack 'v', $n;; 170 aa

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        I'm going to "pop this up a few levels", because the exchange between BrowserUk and me isn't easily visible any more. Here's a (somewhat edited) version of the last couple exchanges. Sorry, my html smarts don't include making the quotations evident, and the Formatting Tips didn't help a lot.

        (jpl) Why couldn't we have a format item "Y" (where "Y" is some unused format character, finding which may be the real problem) such that

        pack("Y4", 32); # produces "0020" unpack("Y4", "0020"); # produces 32

        There are many format characters, v among them, that pack integers into strings and unpack strings into integers. I believe the only practical difference between "Y" and "v" is the contents of the string. The one produced by "v" may be unsuitable for display, the one produced by "Y" would be both displayable and, when displayed, indicative of the bits in the integer from which it was produced.

        (BrowserUk:) It could work and would be useful. It would be a departure from the norm of converting numerics to and from their binary representation.

        There are some issues as to what should happen if you specified pack 'Y4', 100000; or pack 'Y4', 32.0 but actually that perhaps suggests a way around the lack of remaining letters that also has an existing precedent.

        To skip a complex structure--say consisting of 2 shorts and float--the syntax is X[vvf], meaning skip enough bytes to cover 2 shorts and a float. Ie. 2+2+4 = 8 bytes.

        To get your hexified numeric, the syntax could be pack 'H[v]', 32, meaning treat the number as a a 16-bit int and hexify it; thereby producing your 4 bytes of output.

        The nice thing is that this then extends naturally to pack 'H[V]', 32; to produce 8-bytes of hex. And pack 'H[Q]', 123456789012345; and even pack 'H[f]', 1234.56e78; and so on.

        And once that is accepted, this further extends to the other bane of pack/unpack; binary. With 'B[v] b[V] B[d]' etc. And a quick peruse of the docs suggest] that 'O' isn't currently used, so maybe 'O[v] O[Q]' might be useful also.

        Now all you've got to do is: knock up the patch; get it by p5p; and wait for it to make it into a build :)

        (End of thread summary).

        Perhaps, rather than having to reproduce much of the complexity of the [v] [Q] etc. notation, the format item could "functionally compose" with the item that followed. (Not so different from BrowserUk's Re^3 suggestion, but all carried out within pack/unpack.) So, whatever string the following pack format item produces gets turned into printable hex (or octal or binary, I suppose), and unpack would turn the printable string into the (usually unprintable string) that the next item expects to unpack. I haven't actually looked at the pack/unpack code, but this has the "feel" of something that should be able to piggy-back off existing code to do all the "hard work".