Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: regex anchoring issue

by kcott (Abbot)
on Feb 15, 2013 at 05:49 UTC ( #1018845=note: print w/ replies, xml ) Need Help??


in reply to regex anchoring issue

G'day penguin-attack,

Welcome to the monastery.

Firstly, your data description seems a little ambiguous: you say "end character" then describe <SOH> (5 chars), ^A (2 chars) and Ctrl-A (1 char). If, by <SOH>, you mean the ASCII character - that is the same character as Ctrl-A (i.e. the character with the ASCII value of 1).

Your main problem in your regexp is the use of a character class (i.e. [...]) - see Character Classes and other Special Escapes under perlre - Regular Expressions for details. You also don't need the 'g' modifier in either the match (m/.../) or the split function.

The following script does what I think you want (in terms of identifying the line endings). If not, please provide some sample data with expected output to remove the ambiguity I mentioned at the start.

#!/usr/bin/env perl use 5.010; use strict; use warnings; my $soh_string = 'soh_string<SOH>'; my $caret_a_string = 'caret_a_string^A'; my $ctrl_a_string = 'ctrl_a_string' . chr(1); my $test_string = join('', $soh_string, $caret_a_string, $ctrl_a_string, $caret_a_string, $ctrl_a_string, $soh_string, $ctrl_a_string, $soh_string, $caret_a_string ); my $string_re = qr{(?><SOH>|\^A|\cA)}; say for split $string_re => $test_string;

Output:

$ pm_soh_split.pl soh_string caret_a_string ctrl_a_string caret_a_string ctrl_a_string soh_string ctrl_a_string soh_string caret_a_string

-- Ken


Comment on Re: regex anchoring issue
Select or Download Code
Re^2: regex anchoring issue
by smls (Friar) on Feb 15, 2013 at 11:23 UTC
    Why do you place the regex in a (?>...) non-backtracking group?

      Wrapping regexp alternations in (?>...) is something I do by default. While there may be rare cases where this might be problematical, I haven't encountered any: it's something that doesn't hurt and, indeed, often helps.

      This usage is based on a "Perl Best Practices" guideline: Backtracking (page 269). It's summarised on page 271 as:

      ... rewrite any instance of:

      X | Y

      as:

      (?> X | Y )

      While I'm not a slave to all "Perl Best Practices" guidelines, this is one I have found to be useful.

      Update: s/have encountered/haven't encountered/

      -- Ken

Re^2: regex anchoring issue
by BillKSmith (Chaplain) on Feb 15, 2013 at 14:01 UTC

    Refer to charnames for a neat way to code the value of your $soh_string.

    use charnames qw(:full); $soh_string = "\N{SOH}";
    Bill

      Thanks, Bill. I had considered that but decided not to use it due to the ambiguity I noted in my opening paragraph. Had penguin-attack wanted the single ASCII character SOH, instead of the string '<SOH>', that was covered by Ctrl-A (also noted).

      [Side issue (struggling not to appear grossly pedantic): the charnames pragma has been distributed with Perl since at least v5.8.8 - the perldoc link (charnames) would provide the most recent documentation.]

      -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1018845]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2014-09-21 01:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (165 votes), past polls