Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^7: Using regex with a variable

by AnomalousMonk (Archbishop)
on Mar 15, 2016 at 16:41 UTC ( [id://1157826]=note: print w/replies, xml ) Need Help??


in reply to Re^6: Using regex with a variable
in thread Using regex with a variable

Another trick to reduce processing time is to compose all the  @strings_to_be_matched strings into a single regex.

c:\@Work\Perl\monks>perl -wMstrict -le "my @strings_to_be_matched = qw(foo bar wibble wobble fee_fie foe fum) +; ;; my ($ur) = map qr{ $_ }xms, join q{ | }, map quotemeta, reverse sort @strings_to_be_matched ; print qq{\$ur: $ur}; ;; my $reg3 = qr/extern.+\b$ur\b\s*/i; print qq{\$reg3: $reg3}; " $ur: (?^msx: wobble | wibble | fum | foo | foe | fee_fie | bar ) $reg3: (?^i:extern.+\b(?^msx: wobble | wibble | fum | foo | foe | fee_ +fie | bar )\b\s*)
I'm making a couple of assumptions:
  • the  @strings_to_be_matched strings are all C/C++ or similar keywords or identifiers and so consist entirely in  \w characters;
  • the  $ur pattern is always bounded by  \b assertions whenever it is interpolated.
If these assumptions are true, then a couple of steps in creating the  $ur regex are redundant, but will do no harm.

So your final code might look something like this (untested):

my @strings_to_be_matched = ...; my ($ur) = map qr{ $_ }xms, join q{ | }, map quotemeta, reverse sort @strings_to_be_matched ; my $reg1 = qr/=/i; my $reg2 = qr/\S+=\S+/i; my $reg3 = qr/extern.+\b$ur\b\s*/i; my $reg4 = qr/;$/i; my $reg5 = qr/.+\b$ur\b\s*/i; foreach my $ln (@contents_of_file) { if ($ln =~ $reg3 and $ln =~ $reg4) { ... } if ($ln =~ $reg5 and $ln=~ $reg4 and ($ln !~ $reg1 or $ln =~ $reg2 +)) { ... } if ($ln =~ $reg3 and $ln !~ $reg4) { ... } if ($ln =~ $reg5 and $ln !~ $reg4 and $ln !~ $reg1) { ... } }
(But please see Discipulus's remarks above about using a while-loop rather than a for-loop for processing file contents line-by-line.)

Another thing that may affect speed is that you have all your regexes  $reg1 $reg2 $reg3 $reg4 $reg5 modified as  /i (case insensitive). Case insensitivity slows down regex execution. Some of your regexes have only assertions, characters or character classes  \S $ ; = to which case insensitivity does not apply. As noted above, the  @strings_to_be_matched strings seem to be C/C++ or suchlike keywords or identifiers; is case insensitivity ever appropriate here? I would seriously reconsider the use of case-insensitivity.

Last but not least: As a beginner, it's important always to usewarnings; and usestrict; and avoid global variables.


Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1157826]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 12:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found