Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Regex simplification

by Arien (Pilgrim)
on Aug 26, 2002 at 08:27 UTC ( [id://192797]=note: print w/replies, xml ) Need Help??


in reply to [untitled node, ID 192753]

Extracting the lines that match for an array of lines using the Perl function grep (as opposed to the program) is no more complicated than this:

my @matches = grep /PATTERN/, @lines;

Now, since you will be extracting the usernames from these matches as well, you might as well do that while matching, as explained by Popcorn Dave.

Don't use "dot start" (.*) in your regex (although some regexes above do), because it will cause unnecessary backtracking. Dot matches anything but a newline by default and the star indicates "zero or more of the preceeding". So, when trying to match a line and getting to "dot star" this will match to the end of the line and after that the dot will let go, bit by bit, anything necessary for an overall match. Things will get worse when "dot star" makes more appearances in the regex.

As far as the regex goes, it seems from your code that this will do just fine:

/<!-- USER \d+ - (\S+) -->/i

That is, match <!-- USER followed by a space, some number, a space, a minus, a space, one or more occurences of a non-whitespace, a space, and finally -->. All this case-insensitively.

Although non-backtracking subpatterns admittedly will help you somewhat in making your code faster, I would not use them if they're not really needed: they would just obscure what is happening.

Putting it all together, you would end up with something like this:

my @users; foreach (@lines) { /<!-- USER \d+ - (\S+) -->/i and push @users, $1; }

You may see people doing the same thing like this:

my @users = map { /<!-- USER \d+ - (\S+) -->/i ? $1 : () } @lines;

What is happening here is that for each element of @lines you check if the line matches your regex. If so, you add the value of $1 (the username) to the list of @users; if not, you add an empty list (ie. nothing) to @users. This might come in handy when reading other peoples' code.

Hope this helps.

— Arien

Edit: Also, if you know what you are looking for can only appear at the start of the line you can speed things up by anchoring your regex (using ^) like this:

/^<!-- USER \d+ - (\S+) -->/i

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://192797]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (2)
As of 2025-02-09 02:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (95 votes). Check out past polls.