Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: How to find Unicode: 0x13 in File

by choroba (Cardinal)
on Nov 18, 2016 at 15:03 UTC ( [id://1176096]=note: print w/replies, xml ) Need Help??


in reply to How to find Unicode: 0x13 in File

You haven't shown the "grep commands and perl one liners", so it's hard to tell what's wrong with them. The following finds the character \x13 in a file in bash:
grep $'\023' file

Same in Perl:

perl -ne 'print if /\023/' file perl -ne 'print if /\x13/' file

Finding "Unicode" in a file is not possible if you don't know the encoding of the file. In the examples above, it works for UTF-8 (and probably other ones, too).

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re: How to find Unicode: 0x13 in File (no magic bullet!)
by Discipulus (Canon) on Nov 18, 2016 at 15:21 UTC
    finding "Unicode" in a file is not possible if you don't know the encoding of the file.

    Indeed! just to add something take a look at tchrist about Perl and Unicode: No magic bullet (SO)

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re^2: How to find Unicode: 0x13 in File
by james28909 (Deacon) on Nov 18, 2016 at 16:40 UTC

    I may be wrong here, but if I am, I will learn something new :)

    Could you not read in few MB's of the file (if it is big enough) and then unpack it and then test to see if the character matches 0x13?

    Something like:
    open (my $fh, '<', 'file') or die "$!\n"; binmode($fh); while(read $fh, my $char, 0x01){ $buf = unpack('H*', $char); if ($buf =~ /13/){ print "found 0x13\n" } }
    Contents of 'file': '.Eg5™eEfx`.' #'.' = 0x13;
    Im not up to par on unicode so I could be way off.
      > 0x01

      Why do you specify the length in hex?

      Also note that if you use a length greater than 1 (which you want to speed it up), you can find false positives: read $fh, my $char, 2 reports 0x13 present in the following file:

      a1

      because

      $ perl -wE 'say unpack "H*", "a1"' 6131 ~~

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        "Why is read length hex." - Because that is how I deal with most files I personally work with, so it is more of preference than anything.

        The false positive does not show on my pc unless I use unpack('h*' $_); so I am sure it is a platform/architecture scenario between our PC's. Also, from the way I read the OP's question, I thought he is not looking for actually 0x13 in plaintext, I though he was looking for 0x13 after unpacking to hex. If I am wrong I apologize, I were actually hoping to learn something more than anything else.

        I am not up to par on Unicode because i havent had to deal with it in any of the material I work on. So if I am way off in left field, I apologize.

Re^2: How to find Unicode: 0x13 in File
by dirtdog (Monk) on Nov 18, 2016 at 15:20 UTC

    I was using the following which did not work :

    perl -ne 'print "$ARGV:$.\n" if /[^[:ascii:]]/;' $filename grep -e "[\x{00FF}-\x{FFFF}]" $filename

    The Command you sent worked perfectly

    Thanks!

      > did not work

      And here's why:

      • [:ascii:] matches character in the range 0-127.
      • 19 doesn't belong between 255 and 65535.

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1176096]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-04-20 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found