Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

\s and non-breaking spaces

by Ratazong (Prior)
on Feb 29, 2012 at 13:03 UTC ( #956920=perlquestion: print w/ replies, xml ) Need Help??
Ratazong has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

today I have played around a bit with HTML::Tree and its parts. And was surprised that the following expression was not working (as intended) on a string extracted from a webpage using as_trimmed_text().

$name =~ s/\sx\s\d+//; # remove trailing " x 3" (and similar)
After a lot of searching I found the culprit: the blanks were coded as 0xA0 (non-breaking spaces). And \s is not matching them. Is there a better way to handle those besides my rather ugly solution below?
$name =~ s/[\s\xA0]x[\s\xA0]\d+//;
Or another workaround?

Rata

Update: Thanks a lot rovf, Eliya and LanX: I added all of your solutions to my code for future reference - all work like a charm :-) Now that I installed v5.14, I will keep the solution with use feature "unicode_strings"; active - I like its elegance.

And thanks for the link to the interesting blog-post, shawnhcorey!

Comment on \s and non-breaking spaces
Select or Download Code
Re: \s and non-breaking spaces
by rovf (Priest) on Feb 29, 2012 at 13:14 UTC
    If you apply several regexpes to your HTML text, you could first translate all nbsp to real spaces, i.e.

    $text =~ tr/\xA0/ /;

    -- 
    Ronald Fischer <ynnor@mm.st>
Re: \s and non-breaking spaces
by Eliya (Vicar) on Feb 29, 2012 at 13:15 UTC
Re: \s and non-breaking spaces
by LanX (Canon) on Feb 29, 2012 at 13:18 UTC
    IMHO the simplest (and most generally useful) workaround is to put complex sub-patterns into variables and to use /x-option for readability:

    DB<101> $_=" x 3 " => " x 3 " DB<102> $s='[\s\xA0]' => "[\\s\\xA0]" DB<103> s/$s (x) $s/$1/x => 1 DB<104> $_ => "x3 "

    Cheers Rolf

Re: \s and non-breaking spaces
by shawnhcorey (Pilgrim) on Feb 29, 2012 at 14:57 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://956920]
Approved by ww
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2015-07-06 07:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (70 votes), past polls