Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^3: Reg Ex to strip MS smart quotes

by derby (Abbot)
on Aug 19, 2005 at 18:18 UTC ( #485253=note: print w/replies, xml ) Need Help??

in reply to Re^2: Reg Ex to strip MS smart quotes
in thread Reg Ex to strip MS smart quotes

Are you sure? What problems are you having? Here's the snippet from the code that translates smart-quotes:

$s =~ s/\x93/"/g; $s =~ s/\x94/"/g;

And here's how I've modified the core demoronise sub:

sub de_cp1252 { my( $self, $s ) = @_; # Map incompatible CP-1252 characters $s =~ s/\x82/,/g; $s =~ s-\x83-<em>f</em>-g; $s =~ s/\x84/,,/g; $s =~ s/\x85/.../g; $s =~ s/\x88/^/g; $s =~ s-\x89- /-g; $s =~ s/\x8B/</g; $s =~ s/\x8C/Oe/g; $s =~ s/\x91/'/g; $s =~ s/\x92/'/g; $s =~ s/\x93/"/g; $s =~ s/\x94/"/g; $s =~ s/\x95/*/g; $s =~ s/\x96/-/g; $s =~ s/\x97/--/g; $s =~ s-\x98-<sup>~</sup>-g; $s =~ s-\x99-<sup>TM</sup>-g; $s =~ s/\x9B/>/g; $s =~ s/\x9C/oe/g; # Now check for any remaining untranslated characters. if ($s =~ m/[\x00-\x08\x10-\x1F\x80-\x9F]/) { for( my $i = 0; $i < length($s); $i++) { my $c = substr($s, $i, 1); if ($c =~ m/[\x00-\x09\x10-\x1F\x80-\x9F]/) { printf(STDERR "warning--untranslated character 0x%02X i +n input line %s\n", unpack('C', $c), $s ); } } } $s; }

I didn't really care about the other stuff (such as bad html or unicode) - just translating the known cp1252 misplaced characters into something reasonable.


Replies are listed 'Best First'.
Re^4: Reg Ex to strip MS smart quotes
by freddo411 (Chaplain) on Aug 19, 2005 at 20:35 UTC
    Bingo. That snippit is perfect.

    Interestingly, I found demoronizer and I kept looking because I thought it only worked on HTML and output HTML entities.

    Thanks again.

    Nothing is too wonderful to be true
    -- Michael Faraday

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://485253]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2021-01-23 09:10 GMT
Find Nodes?
    Voting Booth?