Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
If I understand correctly your updates, you are looking for approximate matching (aka fuzzy matching). This is much more complicated than just using regular expressions. I'll just throw here a few pointers. Fuzzy matching often implies computing the Hamming distance or the Levenshtein edit distance (look up these names), which both basically reflect the number of differences (or mismatches) between two strings.

You might also look for the Baeza-Yates and Gonnet algorithm (sometimes also called the Baeza-Yates k-mismatches algorithm. Another known algorithm of possible interest is the Wu-Mamber k-differences. You might also take a look at the longest common subsequence problem.

The String::Approx CPAN module might be of some interest.

Finally I would suggest you take a deep look at the http://www.bioperl.org/wiki/Main_Page/. I am not using it personally (as I said, I am not a biologist) and can't say much about it, but it is quite big and is likely to have implemented some of the things you are trying to do. Also the functionalities in there are likely to be optimized for alphabets of 4 letters such as DNA sequences.


In reply to Re: Random shuffling by Laurent_R
in thread Random shuffling by onlyIDleft

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (6)
    As of 2020-10-23 09:42 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      My favourite web site is:












      Results (236 votes). Check out past polls.

      Notices?