http://www.perlmonks.org?node_id=1059565


in reply to Splitting string using two overlapping patterns

When I look at situations like this, I get really nervous that the incoming file might be even more inconsistent than I thought it was ... and that my “clever regex” solution might be less-robust than I need it to be.   I would not be confident that my code is, in fact, a verifiably-correct answer, due to the “clever regex.”   So, what I would probably choose to do, is to use a loop, and to break the string down right-to-left, pushing the pieces onto a stack-array as I parsed them.   For example:

while ($str ne '') { $str =~ s/^\s+//; # REMOVE LEADING WHITESPACE last if ($str eq ''); # LEAVE LOOP EARLY IF IT WAS ALL-WHITESP +ACE if ($str =~ /^\{/) { # STARTS WITH '{' ... elsif ... ...
Well, you get the idea, I think.

Even though this code might-or might-not be “efficient,” I am fairly confident that I could debug it, and that I could extend it to cover new cases and be confident (a) that the new changes work, and that (b) I didn’t break something in the process.

Replies are listed 'Best First'.
Re^2: Splitting string using two overlapping patterns
by bioinformatics (Friar) on Oct 25, 2013 at 04:54 UTC

    It looks like he's parsing the headers from output of a specific program, so it ought to be consistent. Your method and other suggestions should work, but the way he's looking to do it should be fine. This isn't large-scale logfile parsing...

    Bioinformatics