Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Matching and replacing the minimum string from the tail of the regex

by abitkin (Monk)
on Aug 08, 2007 at 21:22 UTC ( #631414=perlquestion: print w/replies, xml ) Need Help??

abitkin has asked for the wisdom of the Perl Monks concerning the following question:

Regex question. I have something similar to this:
use strict; my $lines = ""; while(<DATA>) { $lines .= $_; } $lines =~ s/^s.*?e p$//msg; print $lines; __DATA__ Random String s erartt e p s foo e f blah blah s adflkja e p End of file
I want to get this:
Random String s foo e f blah blah End of file
as the expression is non-greedy, but instead I get nothing. I realize this is because it encounters the s and matches the least to the next e p. So I can evaluate each match and deal with it there, but I was wondering if there was a better way to accomplish this. Update: fixed the data space and added more information to the testcase.

==
Kwyjibo. A big, dumb, balding North American ape. With no chin.

Replies are listed 'Best First'.
Re: Matching and replacing the minimum string from the tail of the regex
by Joost (Canon) on Aug 08, 2007 at 21:30 UTC
Re: Matching and replacing the minimum string from the tail of the regex
by GrandFather (Sage) on Aug 08, 2007 at 21:58 UTC

    Don't tell us what you think the code does. Tell us what you want to achieve and why.

    You've told us what output you expect for a given input, but not how the output is relate to the input. We can't tell that from your code because your code doesn't do what you want, nor even what you describe! The actual output is:

    s foo e f s adflkja

    Update:

    It may be that you want something like:

    use warnings; use strict; my $lines = ""; while (<DATA>) { $lines .= $_; } my @wanted = $lines =~ m/^(s(?:(?!e p$).)*e [^p]$)/msg; print @wanted; __DATA__ s erartt e p s foo e f s adflkja

    Prints:

    s foo e f

    DWIM is Perl's answer to Gödel
      My apologizes, I missed the final line of the data space. What I'm trying to do is eliminate pass messages from a build log while keeping all the failure text. Each test has a start and end, but some data can be shown after a failure. I will update the code to reflect this. That said, I only want to eliminate items between the start (s) and the end which passed (e p).

      ==
      Kwyjibo. A big, dumb, balding North American ape. With no chin.
Re: Matching and replacing the minimum string from the tail of the regex
by johngg (Canon) on Aug 08, 2007 at 22:05 UTC
    If your data set is small, and I guess it is as you are reading all of your lines into a single string, you could use grep to just get the lines that match a regex alternation of what you want then use another grep with a post-incremented hash so that you only get the one 's' line rather than all three.

    use strict; use warnings; my %seen = (); print grep { ! $seen{$_} ++ } grep { m{^s|foo|e f$} } <DATA> __END__ s erartt e p s foo e f s adflkja

    This produces

    s foo e f

    as you require. The more usual idiom for reading all lines of a file into a single string (slurping) is

    my $lines = ''; { local $/; $lines = <DATA>; }

    which changes the default input record separator inside the scope of the code block to undef so that the whole of the file is read into $lines in one fell swoop.

    I hope this is of use.

    Cheers,

    JohnGG

    Update: I should have placed the regex alternation in a non-capturing group. As it is, it matches lines beginning with 's', lines containing 'foo' anywhere and lines ending with 'e f'. Correct pattern is m{^(?:s|foo|e f)$}.

Re: Matching and replacing the minimum string from the tail of the regex
by Anonymous Monk on Aug 09, 2007 at 01:08 UTC
    if you know the number of lines between the starting and ending lines of the block of lines you want to elide, something like this might do the trick:

    my $starting_line = qr{ ^s [^\n]* \n }xsm; # starts with an 's' my $intervening_line = qr{ [^\n]* \n }xsm; # anything my $ending_line = qr{ e [ ] p \n }xsm; # ends with an 'e p' my $between = 1; my $line = do { local $/; <DATA> }; # slurp all the data $line =~ s{ $start_line ${intervening_line}{$between} $end_line } {}gxsm;

    this outputs:

    Random String s foo e f blah blah End of file

    any closer?

      alternatively, if it is known that the intervening line(s) will never begin with some pattern:

      my $starter = qr{ s }xsm; # starts with this string my $never = qr{ s }xsm; # never starts with this string my $ender = qr{ e [ ] p }xsm; # ends with this string my $start_line = qr{ ^ $starter [^\n]* \n }xsm; my $intervening_line = qr{ ^ (?! $never ) [^\n]* \n }xsm; my $end_line = qr{ $ender \n }xsm; my $line = do { local $/; <DATA> }; # slurp all the data $line =~ s{ $start_line $intervening_line* $end_line } {}gxsm; print $line; __DATA__ Random String s erartt e p s foo e f blah blah s adflkja wibble wobble e p End of file

      output:

      Random String s foo e f blah blah End of file
        For some reason, I had trouble with your code. Instead I turned the problem on it's head.
        use strict; my $lines = do{ local $/; <DATA> }; # reverse the order of the lines so that the RE matches the # last part of the region first my $reversetext = join("\n", reverse(split("\n",$lines))); $reversetext =~ s/^[^\n]*e p.*?s[^\n]*\n//msg; # put the lines in normal order again $lines = join("\n",reverse(split("\n", $reversetext))); print $lines; __DATA__ Random String s erartt e p s foo e f blah blah s adflkja wibble wobble e p End of file

        ==
        Kwyjibo. A big, dumb, balding North American ape. With no chin.
Re: Matching and replacing the minimum string from the tail of the regex
by hv (Parson) on Aug 11, 2007 at 21:34 UTC

    I'm not sure if there is a better way, but to me the obvious approach is to accept 'start' followed by '(not start)*' followed by 'end':

    my $lines = do { local $/; <DATA> }; $lines =~ s{ ^ s \n # start line (?: ^ (?! s \n ) .* \n )* # body excluding new start line ^ e\ p $ # end line }{}xmg; print $lines;

    Note that this does more work than the original failing substitution, so you can expect it to be slower.

    I'm assuming that the start of a test is "an 's' followed by a newline", and on that assumption being a bit stricter than your original example about matching that.

    Hope this helps,

    Hugo

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://631414]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2020-06-07 10:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (42 votes). Check out past polls.

    Notices?