Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re^2: RegEx + vs. {1,}

by Athanasius (Chancellor)
on Oct 10, 2012 at 16:03 UTC ( #998250=note: print w/replies, xml ) Need Help??

in reply to Re: RegEx + vs. {1,}
in thread RegEx + vs. {1,}

Thank-you for this!

I would have forced backtracking by setting pos:

my %seen; for my $len (2 .. length $string) { ++$seen{$1}, pos($string) -= length($1) - 1 while $string =~ /(\w{ +$len})/g; }

but this is clumsy and verbose beside your elegant one-liner!

Took me a while to figure out what (?!) was doing. YAPE::Regex::Explain told me:

---------------------------------------------------------------------- (?{$seen{$1}++}) run this block of Perl code ---------------------------------------------------------------------- (?!) fail ----------------------------------------------------------------------

and in Extended Patterns I found:

(*FAIL) (*F) This pattern matches nothing and always fails. It can be used to force + the engine to backtrack. It is equivalent to (?!), but easier to rea +d. In fact, (?!) gets optimised into (*FAIL) internally.

So I’ve learned something that’s really useful.

Now a couple of minor quibbles:

  1. The non-greedy quantifier in (\w{2,}?) is redundant, since the forced backtracking ensures that all substrings will be found whether the search is greedy or not.
  2. If, as per your sort, you are looking for only the shortest substring with the maximum frequency, then you need only search for substrings exactly 2 characters long (the minimum length): (\w{2}). (Proof: Say XYZ occurs N times. XY and YZ occur once in each occurrence of XYZ, so each is guaranteed to occur at least N times in the string. Similar reasoning applies to longer substrings.)

++trizen for demonstrating this elegant technique to force regex backtracking!

Update: Because (?!) forces backtracking via repeatedly-failing matches, neither the while loop nor the /g modifier are necessary:

$string =~ /(\w{2,})(?{$seen{$1}++})(?!)/;

does the identical job! (as is easily confirmed by dumping the contents of %seen).

Athanasius <°(((><contra mundum

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://998250]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2018-08-16 21:16 GMT
Find Nodes?
    Voting Booth?
    Asked to put a square peg in a round hole, I would:

    Results (172 votes). Check out past polls.