Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: regex anchoring issue

by kcott (Abbot)
on Feb 15, 2013 at 05:49 UTC ( #1018845=note: print w/ replies, xml ) Need Help??


in reply to regex anchoring issue

G'day penguin-attack,

Welcome to the monastery.

Firstly, your data description seems a little ambiguous: you say "end character" then describe <SOH> (5 chars), ^A (2 chars) and Ctrl-A (1 char). If, by <SOH>, you mean the ASCII character - that is the same character as Ctrl-A (i.e. the character with the ASCII value of 1).

Your main problem in your regexp is the use of a character class (i.e. [...]) - see Character Classes and other Special Escapes under perlre - Regular Expressions for details. You also don't need the 'g' modifier in either the match (m/.../) or the split function.

The following script does what I think you want (in terms of identifying the line endings). If not, please provide some sample data with expected output to remove the ambiguity I mentioned at the start.

#!/usr/bin/env perl use 5.010; use strict; use warnings; my $soh_string = 'soh_string<SOH>'; my $caret_a_string = 'caret_a_string^A'; my $ctrl_a_string = 'ctrl_a_string' . chr(1); my $test_string = join('', $soh_string, $caret_a_string, $ctrl_a_string, $caret_a_string, $ctrl_a_string, $soh_string, $ctrl_a_string, $soh_string, $caret_a_string ); my $string_re = qr{(?><SOH>|\^A|\cA)}; say for split $string_re => $test_string;

Output:

$ pm_soh_split.pl soh_string caret_a_string ctrl_a_string caret_a_string ctrl_a_string soh_string ctrl_a_string soh_string caret_a_string

-- Ken


Comment on Re: regex anchoring issue
Select or Download Code
Replies are listed 'Best First'.
Re^2: regex anchoring issue
by BillKSmith (Deacon) on Feb 15, 2013 at 14:01 UTC

    Refer to charnames for a neat way to code the value of your $soh_string.

    use charnames qw(:full); $soh_string = "\N{SOH}";
    Bill

      Thanks, Bill. I had considered that but decided not to use it due to the ambiguity I noted in my opening paragraph. Had penguin-attack wanted the single ASCII character SOH, instead of the string '<SOH>', that was covered by Ctrl-A (also noted).

      [Side issue (struggling not to appear grossly pedantic): the charnames pragma has been distributed with Perl since at least v5.8.8 - the perldoc link (charnames) would provide the most recent documentation.]

      -- Ken

Re^2: regex anchoring issue
by smls (Friar) on Feb 15, 2013 at 11:23 UTC
    Why do you place the regex in a (?>...) non-backtracking group?

      Wrapping regexp alternations in (?>...) is something I do by default. While there may be rare cases where this might be problematical, I haven't encountered any: it's something that doesn't hurt and, indeed, often helps.

      This usage is based on a "Perl Best Practices" guideline: Backtracking (page 269). It's summarised on page 271 as:

      ... rewrite any instance of:

      X | Y

      as:

      (?> X | Y )

      While I'm not a slave to all "Perl Best Practices" guidelines, this is one I have found to be useful.

      Update: s/have encountered/haven't encountered/

      -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1018845]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (10)
As of 2015-07-31 05:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls