Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Five orphaned obfuscations

by ihb (Deacon)
on Jun 19, 2004 at 22:13 UTC ( [id://368214]=note: print w/replies, xml ) Need Help??


in reply to Five orphaned obfuscations

The usage for the emb function is
   emb($str, $rm, $result) and print $result; The function remove the chars in $rm from $str if all chars are present and in the same order as specified in $rm. If success true is returned and the result, i.e. the filtered $str, is assigned to $result.

Example:
my $str = 'foXobYar'; my $rm = 'XY'; my $result; emb1($str, $rm, $result) and print $result; __END__ foobar

Update: &emb2 has a bug. I noticed this when I dissected it. The subpattern (?{ $$i .= $' }) should be removed and and $$i .= $', 1 should be put at the end of the subroutine.

DISSECTION

Perhaps it's not a real dissection as it's not dissected in great detail, but all algorithms and key functions (as I see it) are explained. To explain every detail would be too great an effort.

&emb1 is a clever one, though not algorithmically as it bruce forces the result. The key to understanding it is that $^R is the value of the last (?{}) assertion and that $^R is restored upon backtracking. $^R->[0] is the skipped chars, and $^R->[1] is the filtered string. It selects one char at the time. That's what (?{ [ $^R->[0], $^R->[1] . $2 ] }) does. As the pattern backtracks, more and more chars get selected and finally you have only the ones you wanted to remove left. So it uses backtracking to look for the right result and stores each attempt in $^R and at the end it sees if the skipped chars match the second argument through a (?()|) assertion which is the regex version of ?:. If it wasn't right, it again forces backtracking through (?!) which is a negative look-behind assertion that always fails as the subpattern in it (nothing) always matches. At the end it simply assigns the filtered string to $_[2].

&emb2 uses a recursive pattern through the interpolating assertion (??{}). They key here is [^\Q$^R\E]*\Q$^R. $^R is the next char to be removed. The pattern says to match all non-$^R and then a $^R. The match minus the last char (i.e. the one not wanted) is then assigned to $$i which is the second argument ($_[2] can't be used as the dealiasing @_ = @_ has been done before to enable the destructive substr() on $_[1]) and then the pattern is called again, until there's no chars left to remove. Then, at last, the rest of the string is appended to $$i (with my patch--see above).

&emb3 is pretty much the same as &emb2 except it isn't recursive but it's basically the same algorithm.

&emb4 pregenerates the pattern that's being done recursively resp. iteratively in &emb2 and &emb3. The pattern is designed so that the filtered string is built as the match proceeds, which also utilizes that @- and @+ is built and available as the match proceeds. @- and @+ are offset arrays for the submatches. $+ could just as well have been used.

&emb5 is like &emb4 except it builds the filtered string after the match is all done utilizing the same technique as &emb4.

ihb

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://368214]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-24 18:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found