Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I think it's best to clarify a few things - you may already know some or all of this, but since this kind of question comes up once in a while, I'll take this opportunity to write a little more on it.

First, hexadecimal is just another representation format for the more abstract concept of an integer number - although Perl handles all kinds of numbers pretty transparently, I'll just talk about integers for now. The idea of different representations of the same thing occurs in a lot of places:

  • 0b00101010, 0x2a, 052, and XLII represent the same integer commonly known as 42 in decimal; note that in C (not Perl!) one can even write that integer as '*',
  • the byte sequences EF BB BF, FE FF, and 2B 2F 76 38 represent the same Unicode character (the BOM),
  • 42°C, 315.15°K, and 107.6°F represent the same temperature, and
  • "2009-02-13 13:31:30 HST" and "2009-02-14 10:31:30 AEDT" represent the same second in time.

In any of these examples, if one is getting inputs in different formats, it's usually easiest to first convert them to a single internal representation, do the work, and then re-convert them to the desired format on output, instead of trying to mix different formats in one program.

Second, I've noticed that sometimes the term "hexadecimal" is used in place of "binary", probably because binary files are viewed with "hex editors", and also that there is sometimes confusion between a string containing a hexadecimal representation of some bytes and the bytes themselves.

Let's say a file contains four bytes whose numeric values in decimal are 80, 101, 114, and 108. This sequence of bytes interpreted as ASCII characters is the string "Perl". If I run a tool to get a hex dump of that file, I will see something like "5065726c", that is, each byte is being represented as a two-digit hexadecimal number. However, if in my code I have a string that looks like "5065726c", this is a sequence of eight characters and, ignoring Unicode for now, also a sequence of eight bytes (decimal values 53, 48, 54, 53, etc.). If in my program I instead want to work with the four bytes (8-bit integers) which the string "5065726c" represents, I need to convert it first. In Perl, the usual tools for all these kinds of conversions are pack, unpack, ord, chr, hex, oct, and sprintf.

In Perl, the string "Perl" can also be written as "\x50\x65\x72\x6C", but regarding your code note that "\x10" is not the same as 0x10 - the latter is simply another way of writing the number 16, while the former is a string with one character, that character having a decimal value of 16 (the ASCII control character DLE / "data link escape"). If one has had exposure to a language like C or C++, one might be used to the equivalence (again ignoring Unicode) of strings and char arrays, and therefore being able to initialize a "string" with code like {0x50,0x65,0x72,0x6C,0}. But in Perl, strings can't directly be treated as arrays of integers - although this being Perl, of course there's a module for that ;-) (but please don't use it in real code). Another point: although Perl converts between the two transparently in most cases, 16 and "16" are still two different things, the first an integer and the second a two-character string.

When it comes to the operations ~ | & ^, in Perl one has to be aware of how the Bitwise String Operators work, so it is important to be aware of whether one is working with strings or integers. 0x2A ^ 0x7A and "\x2A" ^ "\x7A" are essentially the same operation (00101010 xor 01111010 = 01010000), but in the first, the inputs and output are integers, while in the latter they are strings.

While it's possible to implement division in other domains, it's of course easiest with numbers, and we don't even really have to care what internal format the computer is using to store those numbers. So at the very latest, the division is when you'll need to convert. But in this case, I'd personally probably work with integers all the way through. (By the way, dividing an integer by a power of two is the same as a bit shift, that is, int($x/16) is the same as $x>>4.)

Lastly, Data::Dumper with Useqq turned on (or Data::Dump) will help you inspect the variables at each step. However, note that both of these modules will sometimes display a string like "16" as the number 16 - if you want to know exactly how Perl is storing your value internally, use Devel::Peek.

It is still a bit unclear to me if you need the output as an integer or string. In your example, the output as an integer would be 0 (aka 0x00), which Perl will transparently convert to the string "0", which is a one-character string, that character being the ASCII character 0x30 - which would probably explain the result you are currently getting. If instead you need the output as a binary value stored in a string, you'll have to convert the integer 0 to the string "\0" (aka "\x00") using e.g. pack, or, since it's a single character, chr.

use warnings; use strict; use List::Util qw/reduce/; use Data::Dumper; $Data::Dumper::Useqq=1; $Data::Dumper::Indent=0; $Data::Dumper::Terse=1; my $string = "a202005"; print Dumper($string),"\n"; # "a202005" my @chars = map ord, split //, $string; # or: unpack 'C*', $string print Dumper(\@chars),"\n"; # [97,50,48,50,48,48,53] my $xor = reduce { $a ^ $b } @chars; print Dumper($xor),"\n"; # 100 my $chr = chr $xor; # or: pack 'C', $xor print Dumper($chr),"\n"; # "d" my $out = $xor>>4; print Dumper($out),"\n"; # 6 print Dumper(chr $out),"\n"; # "\6"

Updated as per replies. Also very minor edits for clarity.


In reply to Re: hexadecimal division by haukex
in thread hexadecimal division by holandes777

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-03-29 02:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found