Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

A regex that does this, but not that?

by bradcathey (Prior)
on Nov 14, 2003 at 23:25 UTC ( #307246=perlquestion: print w/replies, xml ) Need Help??

bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

Every monk loves a good regex question, right, especially the golfers. Here goes:
my $var = "thought test tot 1 2 3 tesset";
I want to end up with the result:
test 1 2 3
I tried:
$var =~ s/((t.*?t)([\w ]*test[\w ]*)(t.*?t))/$3/g;
but ended up with (not surprisingly):
test tot 1 2 3
Is there the possibility of (in plain English):
$var =~ s/(t.*?t) but not (test)//g;
Would love to keep it down to one line, which started this whole thing. Thanks, monks.

Update:
I "joined" the monastery to learn to be a better Perl programmer (also, to write better defined questions :-) .This node has been a great help—regexes are almost a language of their own. pg used conditionals, danger and Cody_Pendant showed practical examples of word boundries. Sometimes you just need to be nudged over the learning hump. Now, I have some code to study. Thanks.

—Brad
"A little yeast leavens the whole dough."

Replies are listed 'Best First'.
Re: A regex that does this, but not that?
by sauoq (Abbot) on Nov 14, 2003 at 23:48 UTC

    I'm not sure I entirely understand your requirements, but /t(?!est\b)\w*t/ would match any 't' not followed by 'est' and a word break then match 0 or more word characters and then match one last 't'. For example (anchors added):

    #!/usr/bin/perl -w use strict; /^t(?!est\b)\w*t$/ and print while <DATA>; __DATA__ test testset tot tesset tt
    prints everything but 'test'.

    Adapting it slightly for your original problem like this

    #!/usr/bin/perl -lw use strict; $_ = "thought test tot 1 2 3 tesset"; s/t(?!est\b)\w*t\s*//g; print;
    prints "test 1 2 3" just as you want.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: A regex that does this, but not that?
by pg (Canon) on Nov 14, 2003 at 23:49 UTC
    my $var = "thought test tot 1 2 3 tesset"; $var =~ s/(t.*?t)/($1 ne "test") ? "" : $1/ge; print $var;
      Thanks pg, that did exactly what I wanted. I have used conditionals in regexes before, but couldn't see the application here.

      Thanks to all the monks who replied.

      —Brad
      "A little yeast leavens the whole dough."
Re: A regex that does this, but not that?
by Abigail-II (Bishop) on Nov 15, 2003 at 00:11 UTC
    It's not clear what you want. Do you want to remove all words that aren't "test" or numbers? Do you want to remove the words "thought", "tot" and "tesset"? Do you want to remove all words, except the 2nd, 4th, 5th and 6th? Do you want to remove all words that start and end with a "t", but don't have "es" (and nothing else) between them?

    Being able to properly formulate what you want a regex to do solves the problem for 90%. Stating your problem by simple example just leaves people guessing.

    Abigail

      Abigail-II, I spent quite a bit of time trying to craft my example carefully, so that if there was a regex solution to return the result I specified, I'd have my answer. pg got it perfectly. But just in case you're still interested—I know you're one of the regex gurus around the monastery, and I have always appreciated your thoroughness:

      1. I want to delete any words that start with "t", end with "t", but do not contain any other "t"s within, except for the word "test".
      2. The result should only be the words: "test" and any other non t*t words. "1 2 3" was just an example.
      3. The order of words, the number of words, or the content of any other words not "t\w+t", should not be a factor.

      I'd still love to hear your thoughts as I am trying to really ramp up my coding skills. Thanks.

      —Brad
      "A little yeast leavens the whole dough."

        Well, pg's solution works for the limited input provided and you haven't given any further particulars regarding input. That solution breaks just changing the first word from "thought" to "though" :

        my $var = "though test tot 1 2 3 tesset"; $var =~ s/(t.*?t)/($1 ne "test") ? "" : $1/ge; print $var; # prints: esoesset

        But, now you mention a further constraint that the words to be deleted may not contain any 't's inside, which is not inferrable from your earlier posts at all. Providing a good specification is much more than providing a sample case (but providing test cases *is* important).

        Anyway, here's a go at your new specs:

        my $var = <<TT; target blah foo test thought 123 though tempest testament though tightest treatment thermostat tantamount taboo TT $var =~ s/(?!\btest\b)(\bt[^t\W]*t\b)//g; print $var; __END__ ## Result: blah foo test 123 though testament though tightest treatment thermostat tantamount taboo

        So, all the 't.*t' words on the second line remain because they contain a 't' character within. All the 't.*t' words on the first line get deleted except for 'test'.

        any words that start with "t", end with "t", but do not contain any other "t"s within

        OK so that's \bt[^t]+t\b -- word-boundary, then a t, then one or more other characters not a t, then a t, then a word boundary.

        Apart from the abbreviation "tt" this should be fine.

        So "tent", "tesseract", "tot", "tort" and "test" itself will match this pattern.

        However, "testament" will fail it because of the "t" in the middle.

        Then you need a special case for "test" itself, which you can do with the /e modifier and the ternary operator, as in pg's example above.

        So something like this:

        #!/usr/bin/perl -w use strict; my $words='test Buffy testament Anya tot Willow tesseract Faith tent'; $words =~ s/\b(t[^t]+t)\b/$1 eq "test" ? $1 : ''/ge; print $words; # prints 'test Buffy testament Anya Willow Faith';

        Where the regex means "Find words matching t, something-not-t, then t at the end. Replace them with nothing, unless they're the word test, in which case, replace them with themselves".

        You could replace the ternary thing with this more longwinded version if you liked:

        $words =~ s/\b(t[^t]+t)\b/ my $temp = $1; if($temp eq 'test'){ $temp }else{ '' }/xge;


        ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print
        I spent quite a bit of time trying to craft my example carefully, so that if there was a regex solution to return the result I specified, I'd have my answer.

        But the problem is that you left it at the example. I could have given you a couple of regexes that solved your example, but would probably have failed to do what you wanted on the second example you tried.

        pg got it perfectly.
        Then you and he got lucky. If he came up with a different regexp that solved your one example, but that would do something else on other sentences, he would have wasted time formulating a useless answer. However, is it really true that pg's answer got it right? Your requirements say:
        I want to delete any words that start with "t", end with "t", but do not contain any other "t"s within, except for the word "test".
        and pg's regex is:
        s/(t.*?t)/($1 ne "test") ? "" : $1/ge;
        Now, to me that regex just deletes strings starting with a t, and ending with the next t, with the exception of the word "test". So, let's try it on another example:
        $_ = "this is the wristwatch"; s/(t.*?t)/($1 ne "test") ? "" : $1/ge; print; __END__ he wrisch
        Now, that might be exactly what you had in mind, but it doesn't suit the requirements.

        Abigail

        s/\s*\bt(?!est)[^t\W]*t\b//g;

        Makeshifts last the longest.

Re: A regex that does this, but not that?
by BrowserUk (Pope) on Nov 14, 2003 at 23:49 UTC
    $var =~ s[\bt(?!est).*?t\b\s*][]g;

    Update: sauoq's right. I omitted a \b.

    perl> $var = "thought testament test tot 1 2 3 tesset"; perl> $var =~ s[\bt(?!est\b).*?t\b\s*][]g; print $var; test 1 2 3

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

      But, if $var contained "testament", for example, that would fail.

      -sauoq
      "My two cents aren't worth a dime.";
      

        Yep!. Needs an extra \b.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!
        Wanted!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://307246]
Approved by kvale
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2021-05-13 04:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (134 votes). Check out past polls.

    Notices?