Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Regexp - match if not between [ ]

by Anonymous Monk
on May 30, 2011 at 13:51 UTC ( #907316=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there all,

I need to split a string at dots (.) that are not between square braces, or not immediately after two patterns ( Cf. or \w\d+.\d+ pattern, like A432.23 ).

If the sample is The fox did it[ at 12.23 ] well, Cf. 23 A423.23. The swallow was even better , then the split should happen at '23. The' substring. Actually, this sample covers the typo cases I have encountered.

Any idea is appreciated.

salmonix

Comment on Regexp - match if not between [ ]
Download Code
Re: Regexp - match if not between [ ]
by moritz (Cardinal) on May 30, 2011 at 14:02 UTC
    I need to split a string at dots (.) that are not between square braces

    Sounds like a task for Text::CSV with brackets as delimiters and dot as separator

    or not immediately after two patterns ( Cf. or \w\d+.\d+ pattern, like A432.23 ).

    Post-process the output from Text::CSV, and join two adjacent columns if the first of them ends in one of the patterns.

Re: Regexp - match if not between [ ]
by JavaFan (Canon) on May 30, 2011 at 14:08 UTC
    Something like this (untested):
    my @chunks = /[^C\[.]*(?:(?:Cf\.|C(?!f)|\[[^]]*\])[^C\[.]*)*/g;

      Exactly. Thanx.

Re: Regexp - match if not between [ ]
by BrowserUk (Pope) on May 30, 2011 at 14:18 UTC

    If your \w\d+ pattern can be substituted by \w\d{3}, then this seems to work:

    $s = 'The fox did it[ at 12.23 ] well, Cf. 23 A423.23. The swallow was + even better,';; print for split '(?<!Cf)(?<!\w\d{3})\.(?![^\]]+])', $s;; The fox did it[ at 12.23 ] well, Cf. 23 A423.23 The swallow was even better,

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Look-behind is problematic, for the number of digits etc. are not fixed. But thanks. I was really thinking in the wrong direction.

        Look-behind is problematic, for the number of digits etc. are not fixed.

        Look behinds can still accommodate the task, but it does get pretty unwieldy if the width variation is more than a few characters:

        print for split m[ (?<! Cf ) (?: (?<! \w\d\d\d ) | (?<! \w\d\d ) | (?<! \w\d ) ) \. (?! [ ^\] ]+ \] ) ]x, $s;; The fox did it[ at 12.23 ] well, Cf. 23 A423.23 The swallow was even better,

        But it sounds like you've settled on a solution.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Regexp - match if not between [ ]
by AnomalousMonk (Monsignor) on May 30, 2011 at 15:34 UTC

    Using Text::CSV would probably be better. However, this regex approach, while more verbose, is perhaps more maintainable. Needs 5.10+ Special Backtracking Control Verbs. (I've made a guess at the proper regex for a A423.23 thingy.)

    >perl -wMstrict -le "my $s = 'The fox did it[ at 12.23 ] well, Cf. 23 A423.23. The ' . 'swallow was even better'; print qq{''$s''}; ;; my $parens = qr{ \[ [^]]* \] }xms; my $cf = qr{ (?i) cf \. }xms; my $ref = qr{ [[:alpha:]]+ \d+ (?: \. \d+)+ }xms; my $splitter = qr{ (?: $parens | $cf | $ref) (*SKIP)(*FAIL) | \. }xms; ;; my @ra = split $splitter, $s; print qq{'$_'} for @ra; " ''The fox did it[ at 12.23 ] well, Cf. 23 A423.23. The swallow was eve +n better'' 'The fox did it[ at 12.23 ] well, Cf. 23 A423.23' ' The swallow was even better'

      Thanx for all, refreshing.

      salmonix

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://907316]
Approved by philipbailey
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (12)
As of 2014-07-25 21:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls