Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

"Unrecognized character" stops perl cold

by andyford (Curate)
on Jan 17, 2008 at 19:10 UTC ( #662942=perlquestion: print w/ replies, xml ) Need Help??
andyford has asked for the wisdom of the Perl Monks concerning the following question:

I copy and pasted some text from perldoc'ing some module documentation and perl is complaining

Unrecognized character \xE2 at /home/forda/bin/send_to_mms line 6.
I can't see any extra characters even when I try to view non-printables with "set list" in vim (which displays all my tabs and newlines).

I tried replacing bits and pieces, but the only fix I could blunder into was replacing the entire line.

What does perl see that I don't? From where does this '\xE2' arise?

Is there any way to avoid or fix this more easily either on the perldoc side or in the editor?

Comment on "Unrecognized character" stops perl cold
Download Code
Re: "Unrecognized character" stops perl cold
by moritz (Cardinal) on Jan 17, 2008 at 19:28 UTC
    In which encoding is your script? (in vim: :se fileencoding).

    You can try hexdump -C script.pl to inspect it even further.

    Chances are that your system's nroff (or whatever manpage processor you're using) substituted some quotes with a "fancy" quote that now causes trouble.

      The encoding is 'utf-8'. I think you're right, it's the quotes. See below, the quotes are replaced by "..." in hexdump's ASCII, which are 'e2 80 99'. Now where did I put that nroff book...

      00000000 75 73 65 20 4d 49 4d 45 3a 3a 4c 69 74 65 3b 0a |use MIME: +:Lite;.| 00000010 23 23 23 20 43 72 65 61 74 65 20 61 20 6e 65 77 |### Creat +e a new| 00000020 20 73 69 6e 67 6c 65 2d 70 61 72 74 20 6d 65 73 | single-p +art mes| 00000030 73 61 67 65 2c 20 74 6f 20 73 65 6e 64 20 61 20 |sage, to +send a | 00000040 47 49 46 20 66 69 6c 65 3a 0a 24 6d 73 67 20 3d |GIF file: +.$msg =| 00000050 20 4d 49 4d 45 3a 3a 4c 69 74 65 2d 3e 6e 65 77 | MIME::Li +te->new| 00000060 28 0a 46 72 6f 6d 20 20 20 20 20 3d 3e e2 80 99 |(.From + =>...| 00000070 6d 65 40 6d 79 68 6f 73 74 2e 63 6f 6d e2 80 99 |me@myhost +.com...| 00000080 2c 0a 54 6f 20 20 20 20 20 20 20 3d 3e e2 80 99 |,.To + =>...| 00000090 79 6f 75 40 79 6f 75 72 68 6f 73 74 2e 63 6f 6d |you@yourh +ost.com| 000000a0 e2 80 99 2c 0a 43 63 20 20 20 20 20 20 20 3d 3e |...,.Cc + =>| 000000b0 e2 80 99 73 6f 6d 65 40 6f 74 68 65 72 2e 63 6f |...some@o +ther.co| 000000c0 6d 2c 20 73 6f 6d 65 40 6d 6f 72 65 2e 63 6f 6d |m, some@m +ore.com| 000000d0 e2 80 99 2c 0a 53 75 62 6a 65 63 74 20 20 3d 3e |...,.Subj +ect =>| 000000e0 e2 80 99 48 65 6c 6c 6f 6f 6f 6f 6f 6f 2c 20 6e |...Helloo +oooo, n| 000000f0 75 72 73 65 21 e2 80 99 2c 0a 54 79 70 65 20 20 |urse!..., +.Type | 00000100 20 20 20 3d 3e e2 80 99 69 6d 61 67 65 2f 67 69 | =>...i +mage/gi| 00000110 66 e2 80 99 2c 0a 45 6e 63 6f 64 69 6e 67 20 3d |f...,.Enc +oding =| 00000120 3e e2 80 99 62 61 73 65 36 34 e2 80 99 2c 0a 50 |>...base6 +4...,.P| 00000130 61 74 68 20 20 20 20 20 3d 3e e2 80 99 68 65 6c |ath = +>...hel| 00000140 6c 6f 6e 75 72 73 65 2e 67 69 66 e2 80 99 0a 29 |lonurse.g +if....)| 00000150 3b 0a 24 6d 73 67 2d 3e 73 65 6e 64 3b 20 23 20 |;.$msg->s +end; # | 00000160 73 65 6e 64 20 76 69 61 20 64 65 66 61 75 6c 74 |send via +default| 00000170 0a 0a |..| 00000172

        Would a simple use utf8; solve your problem? See perldoc utf8.

        Jim

Re: "Unrecognized character" stops perl cold
by NetWallah (Abbot) on Jan 17, 2008 at 19:31 UTC
    0xE2 represents the leter "a" with a caret on top >> <<. This may be intermingled and visible in the text pasted, but not obvious to the reader.

    Look at all lower-case "a"'s in the line to find those that look unusual.

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

        It's more like a hat really.
Re: "Unrecognized character" stops perl cold
by naChoZ (Curate) on Jan 17, 2008 at 20:12 UTC

    I find this very annoying also. It's because the perldoc renderer is doing silly things like replacing regular '-' dash/hyphen characters with chr(226) (instead of the normal chr(45) character) and replacing single ticks with the "prettier" version of the single quote. I have not been able to figure out how to get this to display properly without switching completely to raw or text mode.

    My solution was to bind a key in vim to run this silly little script. So whenever I paste some code from some perldoc I'm viewing, I run this over the code.

    #!/usr/bin/perl -n s/&#8208;/-/g; s/&#8722;/-/g; s//'/g; print;

    I notice it renders oddly in the perlmonks node, but you get the idea. I literally just copy and pasted the offending symbol into this script.

    --
    naChoZ

    Therapy is expensive. Popping bubble wrap is cheap. You choose.

      Yes, thanks, you put me right onto "the" answer for my work enviroment.

      In your vimrc, put

      vmap ,qq :%s//'/g<CR> nmap ,qq :%s//'/g<CR>
      where the first quote is entered as a digraph: Ctrl-V Ctrl-K '9.

      At that point "comma q q" is mapped in vim to replace all the bad fancy quotes with good single quotes.

        That will only handle that one character. The problem is there are other characters that are modified as well. After my previous post the other day, I ended up going and making a much more verbose version of that script. Now the bad characters can be referred to by name. Plus I added a silly way of making it display the identity of characters to make it easier to find more that need to be fixed.

        #!/usr/bin/perl -n #use strict; #use warnings; use charnames (); use encoding "utf8"; $|++; my $chars = { 'HYPHEN' => '-', # \x{2010} 'MINUS SIGN' => '-', # \x{2212} 'FIGURE DASH' => '-', # \x{2012} 'RIGHT SINGLE QUOTATION MARK' => "'", # \x{2212} 'BOX DRAWINGS LIGHT VERTICAL' => '|', # \x{2502} }; # If the first character is an equal sign, skip it and # display the identity of each remaining characters. # if (/^=/) { for my $index ( 1 .. length($_) - 1 ) { my $char = substr( $_, $index++, 1 ); print $char . " " . sprintf( "\\x{%04X}", ord($char) ) . "\" = '" . charnames::viacode( ord($char) ) . "'\n" ; } } else { for my $cname ( keys %$chars ) { my $char = chr( charnames::vianame($cname) ); s/$char/$chars->{$cname}/g; } print; }

        --
        naChoZ

        Therapy is expensive. Popping bubble wrap is cheap. You choose.

Re: "Unrecognized character" stops perl cold
by Anonymous Monk on Dec 25, 2013 at 08:55 UTC
    I was using Perl for the first time on Mac using the TextEdit editor. Really frustrating, until I read this and worked out that it uses

    Edit->Substitutions->Smart Quotes

    the quote characters not being as expected by the interpreter.Switch this off and all is OK.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://662942]
Approved by Corion
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (14)
As of 2014-07-31 21:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (253 votes), past polls