Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Remove unicode "whitespace"

by HYanWong (Acolyte)
on Feb 28, 2013 at 11:16 UTC ( #1021039=note: print w/replies, xml ) Need Help??


in reply to Re: Remove unicode "whitespace"
in thread Remove unicode "whitespace"

You're right that it is the LRM character, and so shouldn't be stripped in general (so it's sensible that it doesn't match \s). But it is useless at the end of a string, hence my suggestion that it should be considered something like whitespace in that context. I hoped there might be a function to trim the end of strings for this specific purpose. Or if not, something generic I could add to a RE to strip unicode characters of this nature.

Replies are listed 'Best First'.
Re^3: Remove unicode "whitespace"
by Khen1950fx (Canon) on Feb 28, 2013 at 16:11 UTC
    Give URI::Encode a try.
    #!usr/bin/perl -l use strict; use warnings; use URI::Encode qw(uri_decode); my $encoded = 'http://commons.wikimedia.org /wiki/File:Atelerix_algirus.jpg%E2%80%8E'; print uri_decode($encoded);

      Yes, I've done that. It converts the %E2%80%8E string to the unicode LRM character, which isn't printed, but is still embedded in the string, causing problems when accessing the URL again. Thanks, though.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1021039]
help
Chatterbox?
[holli]: this really really should not be executed as Javascript
erix's head is getting content indisposed
[holli]: maybe try a GET instead?
[moritz]: I must say, when I was very active on perlmonks, I got really used to the markup and all the workflows
[moritz]: and then I didn't post for a while, and then tried to answer a question. It nearly drove me nuts
[moritz]: when you try to quote part of a question, you have to know/guess/reverse -engineer what kind of markup they used
[holli]: see? https://imgur.com/ a/XpF4b
[moritz]: and stuff it into a <blockquote>...</ blockquote>, which is, like, not at all bulky
[moritz]: specially if you're used to markdown
[holli]: there probably is some nodelet hack that lets you use markdown :)

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (15)
As of 2017-11-20 19:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:













    Results (291 votes). Check out past polls.

    Notices?