Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: This looks like whitespace in my CSV but doesn't seem to be

by tobyink (Canon)
on Sep 30, 2012 at 08:44 UTC ( [id://996481]=note: print w/replies, xml ) Need Help??


in reply to Re^2: This looks like whitespace in my CSV but doesn't seem to be
in thread This looks like whitespace in my CSV but doesn't seem to be

\xC2\xA0 is a Unicode non-breaking space encoded into UTF-8. If you decode your UTF-8 string into a native Perl Unicode string (see Encode), and then add the /u modifier to your regular expression, that ought to enable \s to remove non-ASCII whitespace.

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Replies are listed 'Best First'.
Re^4: This looks like whitespace in my CSV but doesn't seem to be
by ghenry (Vicar) on Sep 30, 2012 at 08:58 UTC

    That will be why Vim isn't allowing me either to do :%s/\s//g in that field.

    Excellent catch!

    Walking the road to enlightenment... I found a penguin and a camel on the way.....
    Fancy a yourname@perl.me.uk? Just ask!!!
Re^4: This looks like whitespace in my CSV but doesn't seem to be
by ghenry (Vicar) on Sep 30, 2012 at 09:06 UTC

    I need to get a newer perl as I'm on 5.10.1 and /u throws an error.

    Cheers though, will look how I can do this.

    Walking the road to enlightenment... I found a penguin and a camel on the way.....
    Fancy a yourname@perl.me.uk? Just ask!!!

      You can always use unicode-regex-range-character-class.pl

      space => [\u0009-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u20 +28-\u2029\u202F\u205F\u3000]

      so

      $ perl -pe " s{\\u(....)}{\\x{$1}}g " [\u0009-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028-\u2029 +\u202F\u205F\u3000] [\x{0009}-\x{000D}\x{0020}\x{0085}\x{00A0}\x{1680}\x{180E}\x{2000}-\x{ +200A}\x{2028}-\x{2029}\x{202F}\x{205F}\x{3000}]

      Thus

      #!/usr/bin/perl -- use warnings;use strict; use Data::Dump; $_ = qq{\xC2\xA01.00};; utf8::decode($_); dd[$_]; s{^[\x{0009}-\x{000D}\x{0020}\x{0085}\x{00A0}\x{1680}\x{180E}\x{2000}- +\x{200A}\x{2028}-\x{2029}\x{202F}\x{205F}\x{3000}]+}{}g; dd[$_]; __END__ ["\xA01.00"] ["1.00"]

      Although, in 5.10 you could probably just use  s{^\p{space}+}{}g;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://996481]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-18 14:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found