http://www.perlmonks.org?node_id=391491


in reply to Matching against list of patterns

For this matching, is there only one user per line? If a user is found on a line, is it then not ever found again? I've written samples for you which answer those questions differently. There may be ways to optimize each of these for your specific problem. It just depends on which problem you are solving.

There is a bug in your original code - you said keys %$users and then dereferenced the key directly like $user->{'Pattern'}. $user is a plain string so that is a symbolic reference. Using strict would have caught that bug for you. You meant to write $users->{ $user }{ 'Pattern' } which properly looks up the value named $user in the hash reference $users.

There is a potential bug depending on your data. The string "aa" matches "a" and "aa". If you ask for only the first match, then the more complete, perhaps more correct match will not be attempted if you stop. You may need to adjust your logic to account for the length of the match to see which pattern matched "better". None of my examples correct for this.

Each line may match multiple users and once found, are not looked for anymore. This may be be the fastest because it can reduce the search space by multiples with each iteration.

# Precompile all the patterns and store them into the key # CompiledPattern $_->{'CompiledPattern'} = qr/$_->{'Pattern'}/i for values %$users; my %unmatched_users; @unmatched_users{ keys %$users } = (); while ( my $line = <> ) { ... my @users = grep $line =~ $users->{$_}{'CompiledPattern'}, keys %unmatched_users; if ( @users ) { warn "Great, we found " . join( ', ', map $_->{'Pattern'}, @{$users}{ @users } ) . " user(s)!\n"; delete @unmatched_users{ @users }; } else { warn "$line didn't match any users.\n"; } }

Each line may match one user. Once a user is found, it is not looked for anymore. This may be be the fastest because it reduces the search space with each successful match and if any match is found, stops looking for any more.

use List::Util 'first'; # Precompile all the patterns and store them into the key # CompiledPattern $_->{'CompiledPattern'} = qr/$_->{'Pattern'}/i for values %$users; my %unmatched_users; @unmatched_users{ keys %$users } = (); while ( my $line = <> ) { ... my $user = first { $line =~ $users->{$_}{'CompiledPattern'} } keys %unmatched_users; if ( defined $user ) { warn "Great, we found pattern $user->{'Pattern'}!\n"; delete $unmatched_users{ $user }; } else { warn "$line didn't match any users.\n"; } }

Each line may match *one* user but users may be found on multiple lines. The search space remains constant.

# Precompile all the patterns and store them into the key CompiledPatt +ern $_->{'CompiledPattern'} = qr/$_->{'Pattern'}/i for values %$users; while ( my $line = <> ) { ... my $user = first { $line =~ $users->{$_}{'CompiledPattern'} } keys %$users; if ( $user ) { warn "Great, we found pattern $user->{'Pattern'}!\n"; } else { warn "$line didn't match any users.\n"; } }

Each line may match multiple users and users may be found on multiple lines. This is the worst case sample you already had.

# Precompile all the patterns and store them into the key # CompiledPattern $_->{'CompiledPattern'} = qr/$_->{'Pattern'}/i for values %$users; while ( my $line = <> ) { ... my @users = grep $line =~ $users->{$_}{'CompiledPattern'}, keys %$users; if ( @users ) { warn "Great, we found " . join( ', ', map $_->{'Pattern'}, @{$users}{ @users } ) . " user(s)!\n"; } else { warn "$line didn't match any users.\n"; } }