http://www.perlmonks.org?node_id=961125


in reply to approximate regular expression

If you would like a non-regex brute force method.

#! C:/Perl/bin/perl use strict; use warnings; my $pattern = "JEJE"; my $string = "EJKJUJHJDJEJEJEDEJOJOJJJAHJHJSHJEFEJUJEJUJKIJS"; my @pattern_list = split //, $pattern; my $pattern_length = @pattern_list; for my $x ( 0..((length $string) - $pattern_length) ){ my $test_string = substr $string, $x, $pattern_length; my @result_array = split //, $test_string; my $score = 0; for my $y ( 0..$#pattern_list ){ $score++ if $pattern_list[$y] eq $result_array[$y]; } if( $score > 1 ){ print "String: $test_string, position: $x, score: $score\n"; } }

Results

String: JKJU, position: 1, score: 2 String: JUJH, position: 3, score: 2 String: JHJD, position: 5, score: 2 String: JDJE, position: 7, score: 3 String: JEJE, position: 9, score: 4 String: JEJE, position: 11, score: 4 String: JEDE, position: 13, score: 3 String: DEJO, position: 15, score: 2 String: JOJO, position: 17, score: 2 String: JOJJ, position: 19, score: 2 String: JJJA, position: 21, score: 2 String: JHJS, position: 26, score: 2 String: SHJE, position: 29, score: 2 String: JEFE, position: 31, score: 3 String: FEJU, position: 33, score: 2 String: JUJE, position: 35, score: 3 String: JEJU, position: 37, score: 3 String: JUJK, position: 39, score: 2

Replies are listed 'Best First'.
Re^2: approximate regular expression
by Marshall (Canon) on Mar 23, 2012 at 06:12 UTC
    Yes, split() is certainly "brute force".
    If you have bench-marked this, you know that this is a very "expensive operation".
    @array = split (//,$some_var) is super "expensive" and your code does it many times.

    Going "with the flow" of the language is (usually) going to execute faster and in general "be better", meaning easier to understand.

      Marshall thank you for your feedback

      Honestly I don't have a good handle on what perl "with the flow" really means. I guess I was responding to jrblas's request regarding fuzzy regex's. And by that I mean that fuzzy regex's mostly land in the TODO bucket of the regex wizards from what I have read. I do say that as a regex weakling so there may be something out there that I don't know about. Specifically Marpa seems to promise some alternatives but that is even farther beyond my current grasp.

      With that said I have to confess to laziness in calculating the match score. As a guess the original question appears to fall in the bio-perl realm which upon further study would also benefit from regex Look-Around add-ons. So I offer the following in penance.