Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Exception from a character class

by pirkil (Beadle)
on May 30, 2013 at 10:38 UTC ( #1036022=perlquestion: print w/ replies, xml ) Need Help??
pirkil has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I am wondering how to define an exception from a character class in a regexp. Here is an example:

1. simple removal of punctuation:

my $text = q{"This is dad's car." "OK", he said.}; $text =~ s{\p{Punct}}{}xmsg;
2. I want to remove punctuation except the apostrophe. I don't want to make just:
$text =~ s{(?<!dad)\p{Punct}}{}xmsg;

because I need to apply the regexp in a general way. This is maybe a trivial problem and I have overlooked something, but how can I do it? (With a conditional expression (?(condition)yes-pattern|no-pattern)?)

Thank you.

Comment on Exception from a character class
Select or Download Code
Re: Exception from a character class
by Anonymous Monk on May 30, 2013 at 10:46 UTC
Re: Exception from a character class
by AnomalousMonk (Abbot) on May 30, 2013 at 19:06 UTC

    I think the set manipulation operations used in the approach of Re: Exception from a character class were only introduced with Perl version 5.16.

    The following approach uses only 'old-style' character classes. It depends on a kind of double-negation to match all characters that are not non-digits and also not specific digits. (\P{Whatever} is the inverse class of  \p{Whatever} – note P versus p.) Of course, adapt this to your punctuation application.

    >perl -wMstrict -le "my $s = 'abc 123 def 45678 g 90 h'; print qq{'$s'}; ;; $s =~ s{ [^\P{PosixDigit}257] }{-}xmsg; print qq{'$s'}; " 'abc 123 def 45678 g 90 h' 'abc -2- def -5-7- g -- h'
Re: Exception from a character class
by ambrus (Abbot) on May 30, 2013 at 20:06 UTC

    You're already showing how to use a zero-width assertion, so why not use that to exclude the apsotrophe?

    $text =~ s{(?!')\p{Punct}}{}g;
    Alternately, how about
    { no warnings "uninitialized"; $text =~ s{(')|\p{Punct}}{$1}g; }

    Update: a third solution is to match what you want to keep.

    $text = join "", $text =~ m{[\x27\P{punct}]+}g;
Re: Exception from a character class (user defined character properties \p{IsUserDefined} sub IsUserDefined )
by Anonymous Monk on May 30, 2013 at 20:53 UTC

    perlunicode#User Defined Character Properties

    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump; Main( @ARGV ); exit( 0 ); sub Main { my $wa = "'I' am with \x{2018}two\x{2018} kinds of quotes"; my $wi = $wa; my $subs = int $wi =~ s{\p{Punct}}{-}g ; dd [ 'Punct ', $subs, $wa, $wi ]; $wi = $wa; $subs = int $wi =~ s{\p{InPunctNoApostrophe}}{-}g ; dd ['InPunctNoApostrophe', $subs, $wa, $wi ]; } sub InPunctNoApostrophe { return join "\n", ,'+utf8::Punct' ## \p{Punct} aka General_Punctuation ,'-2018' ## LEFT SINGLE QUOTATION MARK (U+2018) ,'-2019' ## RIGHT SINGLE QUOTATION MARK (U+2019) ,'-201A' ## SINGLE LOW .'-9 QUOTATION MARK (U+201A) ,'-201B' ## SINGLE HIGH .'-REVERSED .'-9 QUOTATION MA +RK (U+201B) ,'-201C' ## LEFT DOUBLE QUOTATION MARK (U+201C) ,'-201D' ## RIGHT DOUBLE QUOTATION MARK (U+201D) ,'-201E' ## DOUBLE LOW .'-9 QUOTATION MARK (U+201E) ,'-201F' ## DOUBLE HIGH .'-REVERSED .'-9 QUOTATION MA +RK (U+201F) ,'-2039' ## SINGLE LEFT .'-POINTING ANGLE QUOTATION MARK + (U+2039) ,'-203A' ## SINGLE RIGHT .'-POINTING ANGLE QUOTATION MAR +K (U+203A) #~ ,'-0027' ## ascii https://en.wikipedia.org/wiki/Apostrop +he } __END__ [ "Punct ", 4, "'I' am with \x{2018}two\x{2018} kinds of quotes", "-I- am with -two- kinds of quotes", ] [ "InPunctNoApostrophe", 2, "'I' am with \x{2018}two\x{2018} kinds of quotes", "-I- am with \x{2018}two\x{2018} kinds of quotes", ]

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1036022]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2014-12-25 03:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls