Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: How to find Unicode: 0x13 in File

by choroba (Cardinal)
on Nov 18, 2016 at 16:51 UTC ( [id://1176105]=note: print w/replies, xml ) Need Help??


in reply to Re^2: How to find Unicode: 0x13 in File
in thread How to find Unicode: 0x13 in File

> 0x01

Why do you specify the length in hex?

Also note that if you use a length greater than 1 (which you want to speed it up), you can find false positives: read $fh, my $char, 2 reports 0x13 present in the following file:

a1

because

$ perl -wE 'say unpack "H*", "a1"' 6131 ~~

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^4: How to find Unicode: 0x13 in File
by james28909 (Deacon) on Nov 18, 2016 at 17:24 UTC

    "Why is read length hex." - Because that is how I deal with most files I personally work with, so it is more of preference than anything.

    The false positive does not show on my pc unless I use unpack('h*' $_); so I am sure it is a platform/architecture scenario between our PC's. Also, from the way I read the OP's question, I thought he is not looking for actually 0x13 in plaintext, I though he was looking for 0x13 after unpacking to hex. If I am wrong I apologize, I were actually hoping to learn something more than anything else.

    I am not up to par on Unicode because i havent had to deal with it in any of the material I work on. So if I am way off in left field, I apologize.

      The false positive does not show on my pc unless I use unpack('h*' $_); so I am sure it is a platform/architecture scenario between our PC's.

      Not a platform mismatch, I would say, but probably because you're still reading a single character at a time from the file. If more than one character is read, the  /13/ ambiguity | false positive can appear with either  'H*' or  'h*' unpack templates:

      c:\@Work\Perl>perl -wMstrict -le "print 'A: found 0x13!' if unpack('H*', 'a1') =~ /13/; print 'B: found 0x13!' if unpack('h*', qq{\x{1f}s}) =~ /13/; " A: found 0x13! B: found 0x13!
      (BTW: The  * in both  'H*' and  'h*' implies reading and operating on a string of more than one character.)

      I am not up to par on Unicode because ...

      ... and because you value your sanity.

      Update: Another, perhaps more general, code example:

      c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; for my $s ('a1', qq{\x{1f}s}) { print q{'H*' found 0x13! in }, pp($s) if unpack('H*', $s) =~ /13/; print q{'h*' found 0x13! in }, pp($s) if unpack('h*', $s) =~ /13/; } " 'H*' found 0x13! in "a1" 'h*' found 0x13! in "a1" 'h*' found 0x13! in "\37s"


      Give a man a fish:  <%-{-{-{-<

        I were able to reproduce that, However I have no idea why it does that. The following code seems to work as expected:

        my ($one, $two) = unpack('(h2)*', 'a1'); #print "$one | $two\n"; print "found 0x13\n" if ($one =~ '13' | $two =~ '13');

        But even doing this, it is splitting the two bytes into individual bytes. Is this behavior of unpack documented? I am using Active Perl version 5.16.3 as well.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1176105]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-19 16:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found