Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

contextual substitution with s///?

by Wiggins (Friar)
on Feb 01, 2013 at 16:06 UTC ( #1016574=perlquestion: print w/ replies, xml ) Need Help??
Wiggins has asked for the wisdom of the Perl Monks concerning the following question:

I think there should be a way to do this simply, but I can't find it in the docs.
Syslog records contain literal dates that do not have leading zeros, but rather leading spaces. So Feb 1st is 'Feb..1' (2 spaces).
I want to remove one of those spaces (so I can 'split()' the line on spaces consistantly). Simple:
$a="test string x x\nFeb 1 09:12:33 ("; if ($a =~ /^\w{3}\s\s\d/m) { # /^Feb 1/ $a =~ s/^\w{3}(\s\s)\d/ /; print "<$a>\n"; }else{ print "no match\n"; } #<x x Feb 1 09:12:33 (>

But it seems the substtitution isn't as smart as a regex. The program runs, but doesn't modify the string. It obviously recognizes the parens for grouping purposes, but only as insertions into the substituting text. But it can do:
my $var = 'testing'; $_ = 'In this string we are $var the "e" modifier.'; s/(\$\w+)/$1/ee; print; >In this string we are testing the "e" modifier.
It seem intuitive that a grouped substring would be the target of a substitution, if it exists for no other purpose. Or how about:
$a=~ s/$a =~ s/^\w{3}(\s\s)\d/$1/ /; #3 partr substitute like sed, $ +1 with a space?
So, is there a simple way to do substitution within a context?

It is always better to have seen your target for yourself, rather than depend upon someone else's description.

Comment on contextual substitution with s///?
Select or Download Code
Re: contextual substitution with s///?
by choroba (Abbot) on Feb 01, 2013 at 16:13 UTC
    Capture what should remain, not what should change:
    $s = "test string x x\nFeb 1 09:12:33 ("; $s =~ s/^(\w{3}) /$1 /m;
    Also note that if you are matching multiline string, the /m modifier is needed to match after a newline.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: contextual substitution with s///?
by toolic (Chancellor) on Feb 01, 2013 at 16:18 UTC
    use warnings; use strict; $a="test string x x\nFeb 1 09:12:33 ("; if ($a =~ /^\w{3}\s\s\d/m) { # /^Feb 1/ $a =~ s/^(\w{3})\s\s(\d)/$1 $2/m; print "<$a>\n"; }else{ print "no match\n"; } __END__ <test string x x Feb 1 09:12:33 (>
Re: contextual substitution with s///?
by BrowserUk (Pope) on Feb 01, 2013 at 16:27 UTC
    I want to remove one of those spaces (so I can 'split()' the line on spaces consistantly)

    Seems like your working hard to do something that can be far more easily achieved.

    The problem you are trying to fix is when you split on \s. (Here I've substituted _ for spaces to make things clearer.):

    $s = 'The_quick__brown____fox';; print for split /_/, $s;; The quick brown fox

    Instead of trying to remove the extra spaces, simply accommodate them:

    print for split /_+/, $s;; The quick brown fox

    Problem solved.

    In addition, as splitting on variable amounts of whitespace is such a common thing to do, if you supply split with a single space (' ') in place of the regex argument, it takes care of that (and any leading whitespace) for you:

    print for split / /, ' the quick brown fox ';; the quick brown fox print for split ' ', ' the quick brown fox ';; the quick brown fox

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: contextual substitution with s///?
by mbethke (Hermit) on Feb 01, 2013 at 16:33 UTC

    What choroba said. Your problem is the missing /m on the substitution so it only tries the first line in the string where it doesn't match.

    If you want to split the line into space-separated fields later regardless of their meaning), couldn't you just use split /\s+/? When I have a task like this, I usually write a regex that captures a bunch of fields and ignores others, like this:
    my $re = qr{ ^ (\S+ \s+ \S+ \s+ \S+) \s+ # time_stamp: Sep 18 00:00:58 (\S+) \s+ # host: mailgate04 [[:alpha:]]+/([[:alpha:]]+) # process: postfix/smtp: \S+ \s+ # PID: [29259]: (.*) # rest }ox; while(<$log>) { my ($time, $host, $rest) = /$re/o; ... }

    In case you're interested in the timestamp field and few other things, you could also think about a simple substr(). The spaces are there just to make it easy to work with fixed field widths.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1016574]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (12)
As of 2014-12-26 14:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (171 votes), past polls