Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

How do I match for several strings, matching the longest stringfirst?

by g man (Initiate)
on Apr 16, 2000 at 19:42 UTC ( [id://7769]=perlquestion: print w/replies, xml ) Need Help??

g man has asked for the wisdom of the Perl Monks concerning the following question:

The following is a excerpt of text file:
right lymph node lymph fluid
how would i match for the longest string first, then shorter string in the example of above i would want my program to print out lymph node as the tissue type in line 1 but lymph in line 2 i have a table of relevant terms to match to, but there are separate entries for lymph and lymph node

Originally posted as a Categorized Question.

  • Comment on How do I match for several strings, matching the longest stringfirst?
  • Download Code

Replies are listed 'Best First'.
Re: How do I match for several strings, matching the longest string first?
by btrott (Parson) on Apr 16, 2000 at 22:14 UTC
    I think a good solution would be to sort the terms that you're matching for and create a regexp of the strings in sorted order. Sort them so that the regexp tries to match the longest string first, then moves on down in length until it's trying to match the shortest one.

    Something like the following should work:

    my @terms = ('lymph', 'lymph node'); my @text = ('right lymph node', 'lymph fluid'); # create a regexp that will match the longest # string first and capture the string that matched my $words = '\b(' . join('|', sort { length $b <=> length $a } @terms) . ')\b'; for my $text (@text) { if ($text =~ /$words/) { print $text, ": matched => ", $1, "\n"; } }
Re: How do I match for several strings, matching the longest string first?
by chromatic (Archbishop) on Apr 17, 2000 at 01:42 UTC
    Another option is to arrange your search terms into a sorted list:
    my @terms = sort { length $b <=> length $a } ('lymph', 'lymph node'); my @text = ('right lymph node', 'lymph fluid'); my %results; foreach my $term (@terms) { $results{$term} = (grep /\b$term\b/, @text); # find matches @text = grep !/\b$term\b/, @text; # remove matches } foreach (keys %results) { print "$_:\t", $results{$_}, "\n"; }
    This is likely less expensive with more search terms than building a large regexp, but the grep unfound operation may not help.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://7769]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-25 18:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found