TomG has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks. I'm a beginner using perl. I used LWP::UserAgent to get some web pages. And I need to remove all the wide characters. From what I understand regexp doesnít handle this. (Am I correct?) . Is there another way to do this? A quick and dirty solution will also be OK for me. Can someone help me with this? Many thanks, Tom

Replies are listed 'Best First'.
Re: Removing wide characters
by GrandFather (Saint) on Dec 08, 2005 at 23:42 UTC

    This may do what you want:

    use warnings; use strict; my $str = do {local $/ = ''; <DATA>}; print $str . "\n"; $str =~ s/[^\x00-\x7f]//g; print $str; __DATA__ This Ä that

    Prints:

    This Ä that This that

    DWIM is Perl's answer to Gödel
      Thank you very much Grandfather and BorgCopyeditor. Both solutions were helpful and worked fine. Tom
      Very helpful... I found such a character in the column names of paypal transaction csvs... maddening until I found out just what was going on!

      Thanks a lot. This solved my problem. I was facing this problem from last 4 days.

Re: Removing wide characters
by BorgCopyeditor (Friar) on Dec 08, 2005 at 23:07 UTC
    Assuming that by "wide characters," you mean "not ASCII," you could use \P{IsASCII} in your regexes. That said, you might have to do things differently depending on the charset in which the webpage is served.

    BCE
    --Your punctuation skills are insufficient!

Re: Removing wide characters
by Happy-the-monk (Canon) on Dec 08, 2005 at 22:43 UTC

    I need to remove all the wide characters. From what I understand regexp doesnít handle this.

    I don't know what you mean by "wide characters".
    You could illuminate the point by showing us what you have tried with regexen, or why you believe they cannot do what you are trying to do.

    Cheers, Sören