Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: regex anchoring issue

by smls (Friar)
on Feb 15, 2013 at 12:01 UTC ( #1018884=note: print w/replies, xml ) Need Help??

in reply to regex anchoring issue

If you need your regex to match one of several possible text fragments of which at least one is longer than 1 character, you have to use an alternation (...|...|...) instead of a character class ([...]).

Some additional comments on your code:

  • The {1} quantifier in the first regex is redundant. Matching one occurrence is the default behavior if no quantifier is given.

  • The {1,} quantifier in the third regex can be more succinctly written as +.

  • The /g modifier is not needed in the first two regexes, as kcott already noted.

  • Before the first regex, the $_ =~ is redundant because Perl will match against that variable by default. Similarly, passing $_ as the second parameter to split is redundant.

  • I'm pretty sure you don't need to nest two foreach my $line (@split) { ... } loops... :)

  • The outer parenthesis in the line ( $line =~ s/[\s\n]{1,}//g ) are redundant.

  • Inside the while loop, you create a new @split array for each iteration (i.e. for each line from the input file), but then you don't do anything with it. Did you just cut out the code that does something with the current line's @split array to keep your question shorter, or did you actually intend to add all split fragments from all lines into a single array that will be available after the end of the while loop? In the latter case you need to modify your code.

  • You can probably restructure the code to avoid specifying the split regex twice. It would depend on your exact requirements. For example if each split fragment has to be followed by one of the "<SOH>"/"^A"/chr(1) markers, i.e. no line of the file may end like "8=FIX.4.2<SOH>8=FIX.4.2<SOH>8=FIX.4.2" with a lone fragment at the end, you could use an m/../g regex like this to do the splitting without calling split, and then print the error based on whether any matches were found:

    my @split; foreach (m/(.+?)(?:<SOH>|\^A|\cA)/g) { s/\s+//g; push @split, $_; } if (!@split) { # print error here last; } # do stuff with @split here

Replies are listed 'Best First'.
Re^2: regex anchoring issue
by Anonymous Monk on Feb 15, 2013 at 22:30 UTC