Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

The craziest RegExes you ever created

by Ieronim (Friar)
on Jul 02, 2006 at 17:27 UTC ( #558868=perlmeditation: print w/replies, xml ) Need Help??

In the Web and in different books everybody can find smth like "connon regex lists" containig dozens of reallife regular expressions simple enough to be understood by everybody.

Being quite simple, these regexes are generally used to solve routine problems every Perl programmer meet.

But there are some tasks, which require regular expressions of much more complexity.

I want to make a list of complicated (obfuscated, odd etc.) regular expressions used to solve diffucult real problems (and then i plan to make it availible online somewhere outside this thread :) ). I will be very obliged if you post here examples of your most interesting regexes combined with chunks of data they were intended to match against.

My own favourite (it is combined from two regexes, one of which is recursive):

$brackets_pattern = qr{ # recursive pattern to search brackets lik +e [mmm[ hh[f]]ll] \[ (?: (?>[^\[\]]+ ) # non-brackets | (??{$brackets_pattern}) #new pattern for inside brackets )* \] }x; my $pat = qr/(?-xism:(?-xism:[ab?x][DLSRX?]Glc(?:[pfa?]|-ol|)N\(1\-4\) +)(?x-ism:\[(?:(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*(?x-ism:\[(?: +(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*\[(?x-ism:(?:(?>[^\[\]]+?)| +(??{$brackets_pattern}))*)(?:t\)|(?<![\])]))(?x-ism:\[(?:(?>[^\[\]]+) +|(??{$brackets_pattern}))*\])*(?-xism:[ab?x][DLSRX?]Glcp\(1\-6\))\](? +x-ism:\[(?:(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*(?-xism:[ab?x][D +LSRX?]GalpN(?=\(|$)))/;
Of course, i didn't type the second regex myself; it is generated by my substructure search engine for the Bacterial Carbohydrate Structure Database as a response to a usual request. That's why i used the word "created" instead of "wrote" in the title some interesting regexes are never typed, but are used intensively :)
Sample data to match against:
-6)[xR3HOBut(1-3)]aDGlcpN(1-4)[aDGlcp(1-6),Ac(1-2)]aDGalpN(1-3)[Ac(1-2 +)]bDGalpN(1-2)aDGlcp(1-P-

I hope to see yor examples described in the way i described mine :)

UPDATE:

Short list of the IMHO best ones i found in replies (ordered by time the comment was posted):


([^e]|e([^s]|s([^\.]|\.([^c]|c([^o]|o([^m]|m([^p]|p([^\.]|\.([^o]|o([^ +s]|s([^\.]|\.([^l]|l([^i]|i([^n]|n([^u]|u[^x])))))))))))))))
by Hue-Bond
A simple grep -E regex aimed to determine cross-posts between certain newsgroups.


#!/usr/bin/perl -l "AB~ACFI~ADGJ~AE~BCDE~BFHJ~BI~EGHI~EJ~IJ" =~ /([^~])[^~]*([^~]).*~[^~] +*([^~])[^~]*([^~])(?{local$z=$1 and local$y=$2 and local$x=$1 eq$3?$4 +:$1 eq$4?$3:($z=$2)&&($y=$1)&&$2 eq$3?$4:$2 eq$4?$3:0}).*~[^~]*((??{$ +y})[^~]*(??{$x})|(??{$x})[^~]*(??{$y}))(?{$x{join" - ",sort$x,$y,$z}+ ++})(?!)/; print for sort(keys %x), keys(%x) . " triangles found";
by !1 The regex (be stricter, this mix of regex and perl code ;)) in the heart of this short script finds all triangles for this quest and puts them all in the %x hash.
URL matching RegEx by abigail
Author's comment:
This does only a subset of the possible URLs:
I had to put it under <spoiler> because of its length :)
forking regular expression by Ovid As this is a complete Perl script (the forking regex standalone has no sense), i have put it under spoiler too.
An abridged (due to incredible size of the original) version of ikegami's generated regex to solve Sudoku puzzles:
The regexes become stranger and stranger :) Whose will be the next? ;)

Replies are listed 'Best First'.
Re: The craziest RegExes you ever created
by GrandFather (Sage) on Jul 02, 2006 at 20:47 UTC

    I don't know about crazy, but my very first PerlMonks post was a rather neat little regex to wrap text without mangling words:

    s/(.{5,12}\s+)/$1\n/g

    which generates (given "Check out http://www.perlmonks.org/?node_id=438189 (it has nice line-wrapping snippet)."):

    Check out http://www.perlmonks.org/?node_id=438189 (it has nice line-wrapping snippet).

    DWIM is Perl's answer to Gödel
Re: The craziest RegExes you ever created
by !1 (Hermit) on Jul 02, 2006 at 20:56 UTC

    Not really the craziest I've ever made, but certainly one of my more creative ones can be found here.

    The regex is:

    /([^~])[^~]*([^~]).*~[^~]*([^~])[^~]*([^~])(?{local$z=$1 and local$y=$ +2 and local$x=$1 eq$3?$4:$1 eq$4?$3:($z=$2)&&($y=$1)&&$2 eq$3?$4:$2 e +q$4?$3:0}).*~[^~]*((??{$y})[^~]*(??{$x})|(??{$x})[^~]*(??{$y}))(?{$x{ +join" - ",sort$x,$y,$z}++})(?!)/

    Basically I used the regex engine to search through all the possible triangles and failed everytime I found one so I could continue searching for the rest.

Re: The craziest RegExes you ever created
by Hue-Bond (Priest) on Jul 02, 2006 at 17:43 UTC

    Mine was implementing negative lookaround assertions with plain grep -E. I did it for detecting cross-posts between certain newsgroups. The precise regex I used was anchored to the beginning of the line:

    ^([^e]|e([^s]|s([^\.]|\.([^c]|c([^o]|o([^m]|m([^p]|p([^\.]|\.([^o]|o([^s]|s([^\.]|\.([^l]|l([^i]|i([^n]|n([^u]|u[^x])))))))))))))))

    I even made some pseudo-algorithm for building more:

    ^ ([^e]|e ([^s]|s ([^\.]|\. ([^c]|c <blah> ([^n]|n ([^u]|u [^x] )))))))))))))))

    But didn't bother to automate it. In fact, I've never used this again.

    --
    David Serrano

      can you give an example of string it needed to match against? :)
        can you give an example of string it needed to match against? :)

        Sure, but off topic I think. It was for using in an .slrn-score file (slrn is the news reader I use). I wanted to give -100 points to posts that where sent to some es.comp.os.linux.* group and to other group outside that hierarchy.

        --
        David Serrano

Re: The craziest RegExes you ever created
by eyepopslikeamosquito (Chancellor) on Jul 02, 2006 at 22:08 UTC

    In the Web and in different books everybody can find smth like "connon regex lists" containig dozens of reallife regular expressions simple enough to be understood by everybody.

    There are also lists of useful and complex regexes that are hard to understand. In particular, Regexp::Common contains a number of useful regexes, both simple and complex.

    Also, Jeffrey Friedl's Mastering Regular Expressions has examples of large, complex regexes (one generated one to match an email address that is about a page long IIRC).

      Regexp::Common is a cool module :) Though i have never used it myself, i enjoy its idea :)

      I have seen the incredible e-mail address regular expression from Jeffry Friedl's book. Btw, it was removed from the book in the second edition ;)

Re: The craziest RegExes you ever created
by Ovid (Cardinal) on Jul 03, 2006 at 12:41 UTC

    My craziest regex:

    use re 'eval'; my $string = "abc"; my $length = length $string; my $regex = qr/(\G[$string]{0,$length}(?{print "# [$&][$'][$string]\n"}))/ x + 2; $string =~ $regex;

    Bonus points if you can figure out why I wrote that. As a fun exercise, try to figure out what it does and how it does it.

    Of course, this forking regular expression might top it. Since the regex engine is not re-entrant, if you need that power, you have to fork.

    Cheers,
    Ovid

    New address of my CGI Course.

      Since the regex engine is not re-entrant,

      Yet.

      My money is on dave_the_m getting it re-entrant before 5.10, which will just be one of many regex enhancements that will be in 5.10. (The short story is the engine will be a lot faster for many regexes.)

      ---
      $world=~s/war/peace/g

        demerphq wrote:

        My money is on dave_the_m getting [regular expressions] re-entrant before 5.10 ...

        Oh. My. God.

        I want. I want badly. Drool. (How did I not hear about this?)

        Hmm, who wants to bet that a Perl 6 alpha will be out first? :)

        Cheers,
        Ovid

        New address of my CGI Course.

        That would be the evilest ever, making 5.10 wait on that. I don't see why it hasn't just been pushed out the door already.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      I can only suppose that you were debugging some of your bigger regexes and made this regex to view how the regular expressions engine works. The regex you wrote prints out all steps of matching :)

      Am i right?

Re: The craziest RegExes you ever created
by sh1tn (Priest) on Jul 02, 2006 at 22:58 UTC
Re: The craziest RegExes you ever created
by gellyfish (Monsignor) on Jul 03, 2006 at 12:25 UTC
Re: The craziest RegExes you ever created
by demerphq (Chancellor) on Jul 03, 2006 at 12:17 UTC

    Id just like to point out that regexes like these, with test data and expected output would worthy additions to the regex test suite. This is especially true of unicode regexes. The perl core is curently impoverished in terms of unicode regex tests.

    Update: Please don't upvote this node to show your approval. Contribute some test cases instead. Its much more valuable to everybody.

    ---
    $world=~s/war/peace/g

Re: The craziest RegExes you ever created
by McDarren (Abbot) on Jul 03, 2006 at 00:52 UTC
    The following is not one that I wrote, but it is the first example of a complex regular expression that I ever encountered - so I always remember it. It can be found in the procmailrc(5) manpage.
    (^(Mailing-List:|Precedence:.*(junk|bulk|list)|To: Multiple recipient +s of |(((Resent-)?(From|Sender)|X-Envelope-From):|>?From )([^>]*[^(. +%@a-z0-9])?(Post(ma?(st(e?r)?|n)|office)|(send)?Mail(er)?|daemon|m(md +f|ajordomo)|n?uucp |LIST(SERV|proc)|NETSERV|o(wner|ps)|r(e(quest|sponse)|oot)|b(ou +nce|bs\.smtp)|echo|mirror|s(erv(ices?|er)|mtp(error)?|ystem)|A(dmin(i +strator)?|MMGR |utoanswer))(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).* +)?)?$([^>]|$)))

    I've never tried to dissect it, but looking at it now (and having a somewhat better understanding of regular expressions than I did when I first saw it) I guess it's really not all that complex at all - mainly just lots of alternation.

    Cheers,
    Darren :)

Re: The craziest RegExes you ever created
by strat (Canon) on Jul 03, 2006 at 08:25 UTC

    Well, my most crazy (or cruel and insecure?) REs were written before I learned negative character classes (e.g. /x=[^a-z;]/g;) or Regexp::Common ...

    Best regards,
    perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

Re: The craziest RegExes you ever created
by ikegami (Pope) on Jul 03, 2006 at 18:08 UTC

    I wrote a program that builds a regexp to solves Sudoku puzzles. The generated regexp is hidden underneath the "readmore" below.

    Update: The generated regexp is too big to fit in a post. An abridged version was used.

Re: The craziest RegExes you ever created
by zshzn (Hermit) on Jul 04, 2006 at 02:05 UTC
    This is a very silly script I wrote to demonstrate the very possibility of how something could be done this way. The regex is to convert roman numerals to integers. It is neither useful nor witty.

    use strict; sub l { return length shift } sub roman2int { $_ = shift; /(M*)(?{$a+=1000*l$1})(D*)(?:(?!M)(?{$a+=500*l$2})|(?{$a-=500*l$2} +))(C*)(?:(?![MD])(?{$a+=100*l$3})|(?{$a-=100*l$3}))(L*)(?:(?![MDC])(? +{$a+=50*l$4})|(?{$a-=50*l$4}))(X*)(?:(?![MDCL])(?{$a+=10*l$5})|(?{$a- +=10*l$5}))(V*)(?:(?![MDCLX])(?{$a+=5*l$6})|(?{$a-=5*l$6}))(I*)(?:(?![ +MDCLXV])(?{$a+=1*l$7})|(?{$a-=1*l$7}))(M*)(?{$a+=1000*l$8})(D*)(?{$a+ +=500*l$9})(C*)(?:(?![MD])(?{$a+=100*l$10})|(?{$a-=100*l$10}))(L*)(?:( +?![MDC])(?{$a+=50*l$11})|(?{$a-=50*l$11}))(X*)(?:(?![MDCL])(?{$a+=10* +l$12})|(?{$a-=10*l$12}))(V*)(?:(?![MDCLX])(?{$a+=5*l$13})|(?{$a-=5*l$ +13}))(I*)(?:(?![MDCLXV])(?{$a+=1*l$14})|(?{$a-=1*l$14}))(C*)(?{$a+=10 +0*l$15})(L*)(?:(?![MDC])(?{$a+=50*l$16})|(?{$a-=50*l$16}))(X*)(?:(?![ +MDCL])(?{$a+=10*l$17})|(?{$a-=10*l$17}))(V*)(?:(?![MDCLX])(?{$a+=5*l$ +18})|(?{$a-=5*l$18}))(I*)(?:(?![MDCLXV])(?{$a+=1*l$19})|(?{$a-=1*l$19 +}))(L*)(?{$a+=100*l$20})(X*)(?:(?![MDCL])(?{$a+=10*l$21})|(?{$a-=10*l +$21}))(V*)(?:(?![MDCLX])(?{$a+=5*l$22})|(?{$a-=5*l$22}))(I*)(?:(?![MD +CLXV])(?{$a+=1*l$23})|(?{$a-=1*l$23}))(X*)(?{$a+=10*l$24})(V*)(?:(?![ +MDCLX])(?{$a+=5*l$25})|(?{$a-=5*l$25}))(I*)(?:(?![MDCLXV])(?{$a+=1*l$ +26})|(?{$a-=1*l$26}))(X*)(?{$a+=10*l$27})/; return $a; } print roman2int shift;
      It is neither useful nor witty.
      So i won't add it to my list, in spite of its incredible length :)
        Although it is both crazy and interesting, in its own way, and those were two attributes you requested :)
Re: The craziest RegExes you ever created
by peterdragon (Beadle) on Jul 04, 2006 at 08:30 UTC
    I had to robustify a database layer that was talking to a DB that couldn't store a range of characters including single quote and other punctuation. Also some of the calling code, which I wasn't able to touch, would fail to escape quote characters properly.
    Code snippets:
    sub fix_dodgy_chars { my $s = shift; $s = '' unless defined $s; $s =~ s/[:;<>\[\]`{|}\000-\037]/ /g; # replace dodgy chars with sp +ace $s =~ s/\'/\_/; # single quote translated to underscore $s; } sub Prepare { ... # map non-escaped embedded single or double quotes to underscore where + neither at end of line # nor followed by semicolon, comma, whitespace or rightbracket $sqlstring =~ s/([^\\\s\,\(])([\'\"])(?!($|[\;\,\s\)]))/$1_/g; # then examine quoted strings and replace characters that sql cannot h +andle, currently # : ; < > [ ] ` { | } & $sqlstring =~ s/\"([^\"]*)\"/'"'.fix_dodgy_chars($1).'"'/ge;} ... }
    --
    peterdragon

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://558868]
Approved by Hue-Bond
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2019-04-21 04:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I am most likely to install a new module from CPAN if:
















    Results (110 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!