Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^6: Encoding/decoding question

by slugger415 (Beadle)
on Sep 12, 2011 at 20:20 UTC ( #925563=note: print w/ replies, xml ) Need Help??


in reply to Re^5: Encoding/decoding question
in thread Encoding/decoding question

heh - can't say I follow all that -- I save the FB page as HTML from Firefox, and run tidy on it to make it XHTML. I'm doing all this on Windows 7 so I have no idea how or where it's being encoded. Tidy does allow various encodings but I seem to be getting wonky results no matter what I set it at.

Anyway I tried running uniquote on text file (test.txt) containing only this string:

sous réserve

Here's what I got:

> perl -nle 'print if /\P{ASCII}/' test.txt | uniquote.pl -vE cp1252
Can't find string terminator "'" anywhere before EOF at -e line 1.

Not sure what that means... appreciate the help...


Comment on Re^6: Encoding/decoding question
Re^7: Encoding/decoding question
by Anonymous Monk on Sep 12, 2011 at 20:27 UTC
      thanks --

      ok if I run uniquote, the first two seem correct:

      > perl -nle "print if /\P{ASCII}/" test.txt | uniquote.pl -vE cp1252
      sous r\N{LATIN SMALL LETTER E WITH ACUTE}serve
      
      > perl -nle "print if /\P{ASCII}/" test.txt | uniquote.pl -vE latin1
      sous r\N{LATIN SMALL LETTER E WITH ACUTE}serve
      
      
      > perl -nle "print if /\P{ASCII}/" test.txt | uniquote.pl -vE macroman
      sous r\N{LATIN CAPITAL LETTER E WITH GRAVE}serve
      

      So what's happening? (sorry, still clueless.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://925563]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (13)
As of 2014-12-23 04:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (135 votes), past polls