Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
OK, I think far too little attention has been devoted to the sexeger(tm) (coined just a while back, it was recently trademarked by a numbered company out of New York in 2001 as applied to computer software -- though I'm not sure who would try to market a product with a name like that! ;) ... Also a patent is pending in the United States -- what's confusing is the application talks about "artistic license" and has a strange demonic looking camel stamp in the applicant space - odd really.)

As is explained on an unnamed saints website and previously brought to the attention of the monks, sexeger are primarily useful as a way to increase the speed of regular expression matches. It is demonstrated that a speed increase of many orders of magnitude is possible through the proper application of sexeger.

However, others may simply find it useful, in cases where obfuscation is desired or they wish to make their code even less maintainable, a sort of perl programming poison pill strategy. It's easy to demonstrate your mastery by listing your code backward, but not _exactly_ backward. See in sexeger speak, regular expressions preserve their grouping and other logical elements while reversing the strings in each element and where appropriate ordering them likewise in reverse. An inappropriate reversal is the character class. Due to their commutative-like properties, there is no need to reverse these classes. A better obfuscation strategy would be to take reoccuring classes and randomly permute their elements so as to make them harder to decipher at first glance and/or break out into sub-expressions.

Now as explained, there are many cases where sexegers are apropos and helpful, however it has also come to my attention that there may similarly be cases, where employing them would benefit the cause of obfuscation more so than any performance increase:
#4p1s0: # Already optimal matching forward (?: a (?: ble | n(?:ce|t) | te | l ) | e (?: ment | n(?:ce|t) | r ) | i (?: ble | sm | ti | ve | ze | c ) | ment | ous? ) #4p0s1r: # No performance gains matching in reverse (?: (?: elb | (?:ec|t)n | et | l ) a | (?: tnem | (?:ec|t)n | r ) e | (?: elb | ms | it | ev | ez | c ) i | tnem | s?uo ) #4p1s0r: # Likewise this is not an optimal match, although: (?: ci | e (?: cn[ae] | lb[ai] | ta | vi | zi ) | iti | la | msi | re | suo | tn (?: eme? | [ae] ) | uo ) #4p0s1: # It's better than this: (?:ic|(?:[ae]nc|[ai]bl|at|iv|iz)e|iti|al|ism|er|ous|(?:e?me|[ae])nt +|ou)

One can analyse the above matches like follows:
It can be said that each element in a match group with sub groups is a root or lowest common factor of the group. Thus the match knows if it finds this common root of the sub-group it has also found one of the match elements of the sub-group -- and if not, it can discard these from the solution set immediately. The less the amount of such factors at the base level of a match group, usually the better the performance. An equation might summarise the total time required to perform a match. And I'll leave this as an open challenge, as I haven't perfected this to a science yet. So far, I've just been doing trial and error benchmarking using the well-known Benchmark by Jarkko Hietaniemi and using my common sense knowledge (I'm no master regexer) of the regex engine.

Also consider:
#3p0s1: # Less optimal (?: (?: icat | ativ | aliz ) e | iciti | (?: ica | fu ) l | ness ) #3p1s0r: # More optimal - performance gains in reverse (?: e (?: taci | vita | zila ) | itici | l (?: aci | uf ) | ssen )

In the above it is obvious that the latter is a good sexeger due to it's suffix heavy commonality. Remember that in a sexeger a prefix becomes a suffix and a suffix a prefix.


So far there is no way to automatically generate all sexeger given the regexes wishing to be transformed. However on the site above, the unnamed saint, has indicated work is in progress on just such a tool. Until then for complex regexes, hand reversal can prove to be both instructional and fun. Usually it takes a very small amount of time to do, once you can get over the initial disorientation and accidental typing of (:?) instead of (?:). My strategy is to do like follows now:
(?:abcd|efg|won|I|[now]|know|my|regexes) (?:abcdcba|efgfe|wonow|[now]|I|knowonk|mym|regexesexeger) (?x-ism: dcba|gfe | now |I| [own] |wonk|y m|sexeger)


Yes indeed, it is all very fun. I could do this for hours on end, and make a day out of it. Joy! ... umm.. But I just had to cheat. :)

As the above code blurbs might have you gather, I have been using a helper script to create these different forms of the same search list. As a new way to create word search regexes employing segexer, I'll show expressions like these can be automatically generated like so:

from helper.pl:
#!/usr/bin/perl use Regex::PreSuf; my @step4list =qw( al able ance ant ate ence ent ement er ible ic ism iti ive ize ment ou ous ); grep $_=reverse, @step4list; print "4p1s0r:", presuf ({suffixes=>0}, @step4list), "\n"; print "4p1s1r:", presuf ({suffixes=>1}, @step4list), "\n"; print "4p0s1r:", presuf ({prefixes=>0}, @step4list), "\n";

This makes use of Regex::PreSuf ALSO by the Finnish perl hacker Jarkko Hietaniemi <jhi@iki.fi>. Regex compression is a new way of looking at multiword searching -- instead of iterating over a list -- try using an optimal regex match for the word list. You may be pleasantly surprised by the results.

Just wait a sEcond though:
And now I'll give credit where it's due. Only one
Person is insane enough to come up with such an intentionally conFusing way of doing things...
He is the bringer of obFuscated code, short perl quips, and eye-straining regexes.
Yes that's right, sexeger was coined by, this Perl hacker: ... well you can probably guess who it is by now.

--darksym

Edit by dws to add <readmore> tag


In reply to More sexeger by darksym

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (13)
    As of 2014-09-18 17:01 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (120 votes), past polls