contextual substitution with s///?

Wiggins has asked for the wisdom of the Perl Monks concerning the following question:

I think there should be a way to do this simply, but I can't find it in the docs.
Syslog records contain literal dates that do not have leading zeros, but rather leading spaces. So Feb 1st is 'Feb..1' (2 spaces).
I want to remove one of those spaces (so I can 'split()' the line on spaces consistantly). Simple:

$a="test string x  x\nFeb  1 09:12:33  (";
if ($a =~ /^\w{3}\s\s\d/m)  { # /^Feb  1/
  $a =~ s/^\w{3}(\s\s)\d/ /;
  print "<$a>\n";
}else{
  print "no match\n";
}
#<x  x
Feb  1 09:12:33  (>
[download]

But it seems the substtitution isn't as smart as a regex. The program runs, but doesn't modify the string. It obviously recognizes the parens for grouping purposes, but only as insertions into the substituting text. But it can do:

my $var = 'testing';
$_ = 'In this string we are $var the "e" modifier.';

s/(\$\w+)/$1/ee;

print;

>In this string we are testing the "e" modifier.
[download]

It seem intuitive that a grouped substring would be the target of a substitution, if it exists for no other purpose. Or how about:

  $a=~ s/$a =~ s/^\w{3}(\s\s)\d/$1/ /; #3 partr substitute like sed, $
+1 with a space?
[download]

So, is there a simple way to do substitution within a context?

It is always better to have seen your target for yourself, rather than depend upon someone else's description.

Comment on contextual substitution with s///? Select or Download Code

Replies are listed 'Best First'.
Re: contextual substitution with s///? by BrowserUk (Patriarch) on Feb 01, 2013 at 16:27 UTC
I want to remove one of those spaces (so I can 'split()' the line on spaces consistantly) Seems like your working hard to do something that can be far more easily achieved. The problem you are trying to fix is when you split on \s. (Here I've substituted _ for spaces to make things clearer.): `$s = 'The_quick__brown____fox';; print for split /_/, $s;; The quick brown fox` [download] Instead of trying to remove the extra spaces, simply accommodate them: `print for split /_+/, $s;; The quick brown fox` [download] Problem solved. In addition, as splitting on variable amounts of whitespace is such a common thing to do, if you supply split with a single space (`' '`) in place of the regex argument, it takes care of that (and any leading whitespace) for you: `print for split / /, ' the quick brown fox ';; the quick brown fox print for split ' ', ' the quick brown fox ';; the quick brown fox` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re: contextual substitution with s///? by choroba (Cardinal) on Feb 01, 2013 at 16:13 UTC
Capture what should remain, not what should change: `$s = "test string x x\nFeb 1 09:12:33 ("; $s =~ s/^(\w{3}) /$1 /m;` [download] Also note that if you are matching multiline string, the `/m` modifier is needed to match after a newline. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re: contextual substitution with s///? by toolic (Bishop) on Feb 01, 2013 at 16:18 UTC
`use warnings; use strict; $a="test string x x\nFeb 1 09:12:33 ("; if ($a =~ /^\w{3}\s\s\d/m) { # /^Feb 1/ $a =~ s/^(\w{3})\s\s(\d)/$1 $2/m; print "<$a>\n"; }else{ print "no match\n"; } __END__ <test string x x Feb 1 09:12:33 (>` [download]	[reply] [d/l]
Re: contextual substitution with s///? by mbethke (Hermit) on Feb 01, 2013 at 16:33 UTC
What choroba said. Your problem is the missing /m on the substitution so it only tries the first line in the string where it doesn't match. If you want to split the line into space-separated fields later regardless of their meaning), couldn't you just use `split /\s+/`? When I have a task like this, I usually write a regex that captures a bunch of fields and ignores others, like this: `my $re = qr{ ^ (\S+ \s+ \S+ \s+ \S+) \s+ # time_stamp: Sep 18 00:00:58 (\S+) \s+ # host: mailgate04 [[:alpha:]]+/([[:alpha:]]+) # process: postfix/smtp: \S+ \s+ # PID: [29259]: (.*) # rest }ox; while(<$log>) { my ($time, $host, $rest) = /$re/o; ... }` [download] In case you're interested in the timestamp field and few other things, you could also think about a simple `substr()`. The spaces are there just to make it easy to work with fixed field widths.	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks