Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

\s and non-breaking spaces

by Ratazong (Prior)
on Feb 29, 2012 at 13:03 UTC ( #956920=perlquestion: print w/ replies, xml ) Need Help??
Ratazong has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

today I have played around a bit with HTML::Tree and its parts. And was surprised that the following expression was not working (as intended) on a string extracted from a webpage using as_trimmed_text().

$name =~ s/\sx\s\d+//; # remove trailing " x 3" (and similar)
After a lot of searching I found the culprit: the blanks were coded as 0xA0 (non-breaking spaces). And \s is not matching them. Is there a better way to handle those besides my rather ugly solution below?
$name =~ s/[\s\xA0]x[\s\xA0]\d+//;
Or another workaround?

Rata

Update: Thanks a lot rovf, Eliya and LanX: I added all of your solutions to my code for future reference - all work like a charm :-) Now that I installed v5.14, I will keep the solution with use feature "unicode_strings"; active - I like its elegance.

And thanks for the link to the interesting blog-post, shawnhcorey!

Comment on \s and non-breaking spaces
Select or Download Code
Re: \s and non-breaking spaces
by rovf (Priest) on Feb 29, 2012 at 13:14 UTC
    If you apply several regexpes to your HTML text, you could first translate all nbsp to real spaces, i.e.

    $text =~ tr/\xA0/ /;

    -- 
    Ronald Fischer <ynnor@mm.st>
Re: \s and non-breaking spaces
by Eliya (Vicar) on Feb 29, 2012 at 13:15 UTC
Re: \s and non-breaking spaces
by LanX (Canon) on Feb 29, 2012 at 13:18 UTC
    IMHO the simplest (and most generally useful) workaround is to put complex sub-patterns into variables and to use /x-option for readability:

    DB<101> $_=" x 3 " => " x 3 " DB<102> $s='[\s\xA0]' => "[\\s\\xA0]" DB<103> s/$s (x) $s/$1/x => 1 DB<104> $_ => "x3 "

    Cheers Rolf

Re: \s and non-breaking spaces
by shawnhcorey (Pilgrim) on Feb 29, 2012 at 14:57 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://956920]
Approved by ww
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-12-20 19:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (97 votes), past polls