Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: Reg Ex to strip MS smart quotes

by derby (Abbot)
on Aug 19, 2005 at 18:18 UTC ( #485253=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Reg Ex to strip MS smart quotes
in thread Reg Ex to strip MS smart quotes

Are you sure? What problems are you having? Here's the snippet from the code that translates smart-quotes:

$s =~ s/\x93/"/g; $s =~ s/\x94/"/g;

And here's how I've modified the core demoronise sub:

sub de_cp1252 { my( $self, $s ) = @_; # Map incompatible CP-1252 characters $s =~ s/\x82/,/g; $s =~ s-\x83-<em>f</em>-g; $s =~ s/\x84/,,/g; $s =~ s/\x85/.../g; $s =~ s/\x88/^/g; $s =~ s-\x89- /-g; $s =~ s/\x8B/</g; $s =~ s/\x8C/Oe/g; $s =~ s/\x91/'/g; $s =~ s/\x92/'/g; $s =~ s/\x93/"/g; $s =~ s/\x94/"/g; $s =~ s/\x95/*/g; $s =~ s/\x96/-/g; $s =~ s/\x97/--/g; $s =~ s-\x98-<sup>~</sup>-g; $s =~ s-\x99-<sup>TM</sup>-g; $s =~ s/\x9B/>/g; $s =~ s/\x9C/oe/g; # Now check for any remaining untranslated characters. if ($s =~ m/[\x00-\x08\x10-\x1F\x80-\x9F]/) { for( my $i = 0; $i < length($s); $i++) { my $c = substr($s, $i, 1); if ($c =~ m/[\x00-\x09\x10-\x1F\x80-\x9F]/) { printf(STDERR "warning--untranslated character 0x%02X i +n input line %s\n", unpack('C', $c), $s ); } } } $s; }

I didn't really care about the other stuff (such as bad html or unicode) - just translating the known cp1252 misplaced characters into something reasonable.

-derby


Comment on Re^3: Reg Ex to strip MS smart quotes
Select or Download Code
Re^4: Reg Ex to strip MS smart quotes
by freddo411 (Chaplain) on Aug 19, 2005 at 20:35 UTC
    Bingo. That snippit is perfect.

    Interestingly, I found demoronizer and I kept looking because I thought it only worked on HTML and output HTML entities.

    Thanks again.


    -------------------------------------
    Nothing is too wonderful to be true
    -- Michael Faraday

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://485253]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2014-07-14 05:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (255 votes), past polls