Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Exception from a character class

by pirkil (Beadle)
on May 30, 2013 at 10:38 UTC ( #1036022=perlquestion: print w/ replies, xml ) Need Help??
pirkil has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I am wondering how to define an exception from a character class in a regexp. Here is an example:

1. simple removal of punctuation:

my $text = q{"This is dad's car." "OK", he said.}; $text =~ s{\p{Punct}}{}xmsg;
2. I want to remove punctuation except the apostrophe. I don't want to make just:
$text =~ s{(?<!dad)\p{Punct}}{}xmsg;

because I need to apply the regexp in a general way. This is maybe a trivial problem and I have overlooked something, but how can I do it? (With a conditional expression (?(condition)yes-pattern|no-pattern)?)

Thank you.

Comment on Exception from a character class
Select or Download Code
Re: Exception from a character class
by Anonymous Monk on May 30, 2013 at 10:46 UTC
Re: Exception from a character class
by AnomalousMonk (Abbot) on May 30, 2013 at 19:06 UTC

    I think the set manipulation operations used in the approach of Re: Exception from a character class were only introduced with Perl version 5.16.

    The following approach uses only 'old-style' character classes. It depends on a kind of double-negation to match all characters that are not non-digits and also not specific digits. (\P{Whatever} is the inverse class of  \p{Whatever} – note P versus p.) Of course, adapt this to your punctuation application.

    >perl -wMstrict -le "my $s = 'abc 123 def 45678 g 90 h'; print qq{'$s'}; ;; $s =~ s{ [^\P{PosixDigit}257] }{-}xmsg; print qq{'$s'}; " 'abc 123 def 45678 g 90 h' 'abc -2- def -5-7- g -- h'
Re: Exception from a character class
by ambrus (Abbot) on May 30, 2013 at 20:06 UTC

    You're already showing how to use a zero-width assertion, so why not use that to exclude the apsotrophe?

    $text =~ s{(?!')\p{Punct}}{}g;
    Alternately, how about
    { no warnings "uninitialized"; $text =~ s{(')|\p{Punct}}{$1}g; }

    Update: a third solution is to match what you want to keep.

    $text = join "", $text =~ m{[\x27\P{punct}]+}g;
Re: Exception from a character class (user defined character properties \p{IsUserDefined} sub IsUserDefined )
by Anonymous Monk on May 30, 2013 at 20:53 UTC

    perlunicode#User Defined Character Properties

    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump; Main( @ARGV ); exit( 0 ); sub Main { my $wa = "'I' am with \x{2018}two\x{2018} kinds of quotes"; my $wi = $wa; my $subs = int $wi =~ s{\p{Punct}}{-}g ; dd [ 'Punct ', $subs, $wa, $wi ]; $wi = $wa; $subs = int $wi =~ s{\p{InPunctNoApostrophe}}{-}g ; dd ['InPunctNoApostrophe', $subs, $wa, $wi ]; } sub InPunctNoApostrophe { return join "\n", ,'+utf8::Punct' ## \p{Punct} aka General_Punctuation ,'-2018' ## LEFT SINGLE QUOTATION MARK (U+2018) ,'-2019' ## RIGHT SINGLE QUOTATION MARK (U+2019) ,'-201A' ## SINGLE LOW .'-9 QUOTATION MARK (U+201A) ,'-201B' ## SINGLE HIGH .'-REVERSED .'-9 QUOTATION MA +RK (U+201B) ,'-201C' ## LEFT DOUBLE QUOTATION MARK (U+201C) ,'-201D' ## RIGHT DOUBLE QUOTATION MARK (U+201D) ,'-201E' ## DOUBLE LOW .'-9 QUOTATION MARK (U+201E) ,'-201F' ## DOUBLE HIGH .'-REVERSED .'-9 QUOTATION MA +RK (U+201F) ,'-2039' ## SINGLE LEFT .'-POINTING ANGLE QUOTATION MARK + (U+2039) ,'-203A' ## SINGLE RIGHT .'-POINTING ANGLE QUOTATION MAR +K (U+203A) #~ ,'-0027' ## ascii https://en.wikipedia.org/wiki/Apostrop +he } __END__ [ "Punct ", 4, "'I' am with \x{2018}two\x{2018} kinds of quotes", "-I- am with -two- kinds of quotes", ] [ "InPunctNoApostrophe", 2, "'I' am with \x{2018}two\x{2018} kinds of quotes", "-I- am with \x{2018}two\x{2018} kinds of quotes", ]

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1036022]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2015-07-07 11:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls