Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

searching characters

by Anonymous Monk
on Jun 17, 2009 at 14:12 UTC ( [id://772405]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I got a code like this:
my @ebase = ("A","T"); my $limit = 15; sub lineOK { ($line) = @_; for(@ebase) { $regex = ("$_.*") x $limit; return 0 if($line =~ /$regex/); } return 1; }
and the $regex value will be something like "A.*A.*A.*".. because I want to match $limit A's in the string. is there any way I can make this more efficiently ?

Replies are listed 'Best First'.
Re: searching characters
by moritz (Cardinal) on Jun 17, 2009 at 14:35 UTC
    Yes:
    my $regex = join "[^$_]*", ($_) x $limit;

    That way the regex won't backtrack.

Re: searching characters
by Transient (Hermit) on Jun 17, 2009 at 14:37 UTC
    You could also do something like:
    sub mylineOK { my ($line) = @_; my @ebase = ("A","T"); my $limit = 15; for ( @ebase ) { my $cnt = ( $line =~ s/$_//g ); return 0 if ( $cnt >= $limit ); } return 1; }
    Which will just count the number of times the ebase is stripped from the line.

    Update:

    All of these return the correct answer (in my very limited dataset). Here are some benchmarks.
    Rate Original Moritz Tr m Strip +ExplicitTr Original 87.3/s -- -95% -97% -99% -99% + -100% Moritz 1613/s 1748% -- -50% -74% -81% + -98% Tr 3226/s 3596% 100% -- -48% -62% + -96% m 6250/s 7060% 288% 94% -- -27% + -92% Strip 8571/s 9720% 431% 166% 37% -- + -89% ExplicitTr 75000/s 85825% 4550% 2225% 1100% 775% + --
    Note that explicit tr uses a hardcoded "A" and "T" in place of a variable. (i.e. tr/A/A/ and tr/T/T/) The eval is what kills the performance.
Re: searching characters
by JavaFan (Canon) on Jun 17, 2009 at 14:41 UTC
    Something like the following untested code:
    sub lineOK { my $line = shift; foreach my $code (@ebase) { return 0 if eval "\$line =~ tr/$code/$code/" >= $limit; } return 1; }
Re: searching characters
by cdarke (Prior) on Jun 17, 2009 at 15:53 UTC
    TMTOWTDI
    my $line = 'ABCABCABCABC'; my $limit = 4; my $base = 'A'; if ($line =~ /^[^$base]*($base[^$base]*){$limit}$/) { print "Wow\n"; } else { print "Doh!\n"; }
Re: searching characters
by citromatik (Curate) on Jun 17, 2009 at 14:40 UTC

    You can use the tr operator:

    return 0 if (($line =~ tr/$base/$base/) >= $limit);

    citromatik

      tr/// doesn't interpolate. You could use m// or s/// instead.
      return 0 if ( () = $line =~ /\Q$base/g ) >= $limit;
      return 0 if ( $line =~ s/\Q$base//g ) >= $limit;
      tr doesn't do interpolation:
      my $line = "ABABABAB"; my $limit = 3; my $base = "A"; say "Yes 1" if $line =~ tr/$base/$base/ >= $limit; say "Yes 2" if eval "\$line =~ tr/$base/$base/" >= $limit; __END__ Yes 2
      You need an eval to interpolate.
Re: searching characters
by Anonymous Monk on Jun 17, 2009 at 16:28 UTC
    thanks for all the replys, plenty of options now.

    never through hardcoding the regex would be so faster, now I'm considering this.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://772405]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2024-04-25 15:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found