Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
In the Web and in different books everybody can find smth like "connon regex lists" containig dozens of reallife regular expressions simple enough to be understood by everybody.

Being quite simple, these regexes are generally used to solve routine problems every Perl programmer meet.

But there are some tasks, which require regular expressions of much more complexity.

I want to make a list of complicated (obfuscated, odd etc.) regular expressions used to solve diffucult real problems (and then i plan to make it availible online somewhere outside this thread :) ). I will be very obliged if you post here examples of your most interesting regexes combined with chunks of data they were intended to match against.

My own favourite (it is combined from two regexes, one of which is recursive):

$brackets_pattern = qr{ # recursive pattern to search brackets lik +e [mmm[ hh[f]]ll] \[ (?: (?>[^\[\]]+ ) # non-brackets | (??{$brackets_pattern}) #new pattern for inside brackets )* \] }x; my $pat = qr/(?-xism:(?-xism:[ab?x][DLSRX?]Glc(?:[pfa?]|-ol|)N\(1\-4\) +)(?x-ism:\[(?:(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*(?x-ism:\[(?: +(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*\[(?x-ism:(?:(?>[^\[\]]+?)| +(??{$brackets_pattern}))*)(?:t\)|(?<![\])]))(?x-ism:\[(?:(?>[^\[\]]+) +|(??{$brackets_pattern}))*\])*(?-xism:[ab?x][DLSRX?]Glcp\(1\-6\))\](? +x-ism:\[(?:(?>[^\[\]]+)|(??{$brackets_pattern}))*\])*(?-xism:[ab?x][D +LSRX?]GalpN(?=\(|$)))/;
Of course, i didn't type the second regex myself; it is generated by my substructure search engine for the Bacterial Carbohydrate Structure Database as a response to a usual request. That's why i used the word "created" instead of "wrote" in the title some interesting regexes are never typed, but are used intensively :)
Sample data to match against:
-6)[xR3HOBut(1-3)]aDGlcpN(1-4)[aDGlcp(1-6),Ac(1-2)]aDGalpN(1-3)[Ac(1-2 +)]bDGalpN(1-2)aDGlcp(1-P-

I hope to see yor examples described in the way i described mine :)


Short list of the IMHO best ones i found in replies (ordered by time the comment was posted):

([^e]|e([^s]|s([^\.]|\.([^c]|c([^o]|o([^m]|m([^p]|p([^\.]|\.([^o]|o([^ +s]|s([^\.]|\.([^l]|l([^i]|i([^n]|n([^u]|u[^x])))))))))))))))
by Hue-Bond
A simple grep -E regex aimed to determine cross-posts between certain newsgroups.

#!/usr/bin/perl -l "AB~ACFI~ADGJ~AE~BCDE~BFHJ~BI~EGHI~EJ~IJ" =~ /([^~])[^~]*([^~]).*~[^~] +*([^~])[^~]*([^~])(?{local$z=$1 and local$y=$2 and local$x=$1 eq$3?$4 +:$1 eq$4?$3:($z=$2)&&($y=$1)&&$2 eq$3?$4:$2 eq$4?$3:0}).*~[^~]*((??{$ +y})[^~]*(??{$x})|(??{$x})[^~]*(??{$y}))(?{$x{join" - ",sort$x,$y,$z}+ ++})(?!)/; print for sort(keys %x), keys(%x) . " triangles found";
by !1 The regex (be stricter, this mix of regex and perl code ;)) in the heart of this short script finds all triangles for this quest and puts them all in the %x hash.
URL matching RegEx by abigail
Author's comment:
This does only a subset of the possible URLs:
I had to put it under <spoiler> because of its length :)
forking regular expression by Ovid As this is a complete Perl script (the forking regex standalone has no sense), i have put it under spoiler too.
An abridged (due to incredible size of the original) version of ikegami's generated regex to solve Sudoku puzzles:
The regexes become stranger and stranger :) Whose will be the next? ;)

In reply to The craziest RegExes you ever created by Ieronim

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others drinking their drinks and smoking their pipes about the Monastery: (5)
    As of 2020-11-24 04:47 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found