http://www.perlmonks.org?node_id=911824


in reply to How to substitute something from only between two specified charecters

Just rely on the fact that the second item is the only one that allows spacing:

use strict; use warnings; while (<DATA>) { if (/^(\S+)\s+(.*\S)\s+(\S+)\s+(\S+)$/) { print "A:B = $1\n"; print "C/D/E/F = $2\n"; print "G/H/I = $3\n"; print "J = $4\n"; } else { warn "Invalid record: $_"; } } __DATA__ >cds:ADD75048 A/Brussels/INS71/2009 2009/10/30 HA >cds:ADF58353 A/Germany-MV/HGW4/2009 2009/12/ HA >cds:ADF58351 A/Germany-MV/HGW6/2009 2009/12/ HA >cds:ADU76781 A/England/94780010/2009 2009/10/22 HA >cds:AEA30293 A/Netherlands/2223b/2009 2009/11/18 HA >cds:ADD23250 A/District of Columbia/INS17/2009 2009/10/26 HA >cds:ADX98640 A/San Diego/INS13/2009 2009/10/19 HA >cds:ADD74978 A/San Diego/INS54/2009 2009/10/12 HA >cds:ADF27925 A/Texas/JMS407/2010 2010/01/11 HA >cds:ADM95824 A/Finland/661/2009 2009/10/26 HA >cds:ADD97035 A/Wisconsin/629-D00036/2009 2009/09/15 HA
  • Comment on Re: How to substitute something from only between two specified charecters
  • Download Code

Replies are listed 'Best First'.
Re^2: How to substitute something from only between two specified charecters (sub_question)
by ww (Archbishop) on Jun 29, 2011 at 13:45 UTC
    OP can probably extrapolate, and maybe that's why the parent stops just short of actually answering the original question: how to remove spaces, but only in the location field -- or learn from some other replies.

    But just in case the assumption above is wrong, assign $2 to a named var ($second maybe) and remove spaces:

    $second =~ s/\s*//g; ... say "C/D/E/F - $second"; ...

    BUT that's not really the point of this post; rather (perhaps because /me is suffering brain-freeze, why the heck is the second capture ((.*\S)) a-greedy-anything followed by anything-not-whitespace working?

    Y::R::E isn't helping this morning; neither is a recheck of (some obvious parts of) Mastering Regular Expressions

    And in case my brain-freeze isn't clear, that chill is telling me that s+(.*\S)\s+(\S+) should capture the location-field and everything else up to the last space, before "HA". That's obviously wrong, but why?

    Can someone, please, provide a the meat for a slap my forehead, grunt "Duh!" moment?

      I wrote the regex that way to have an explicit boundary between the second field and the spacing separating it from the third field. I didn't want to eat any extra spacing.

      I could have accomplished this in one of three ways:

      1. 1) Explicitly specify that the field shouldn't contain a space at the end like I did. (.*\S)\s+
      2. 2) Use an explicit boundary like (.*)\b\s+
      3. 3) Or rely on non-greedy matching: (.*?)\s+ Which would work because of the hard boundaries for the other fields

      In the end, the third method above would probably appear the cleanest, but they all accomplish the same thing in the context of the rest of the regex.