laziness, impatience, and hubris | |
PerlMonks |
Efficient matching with accompanying databy Endless (Beadle) |
on Jul 11, 2013 at 00:23 UTC ( [id://1043602]=perlquestion: print w/replies, xml ) | Need Help?? |
Endless has asked for the wisdom of the Perl Monks concerning the following question: Hello friends, I am converting a lexical processor I wrote in Java to Perl; text processing is supposed to be very good in Perl, and I'm using it as an opportunity to learn Perl. However, although my initial write-up produces the right output, it does it around 70 times slower than my Java implementation where I was using a home-made Trie. According to Diag::NYTProf, the hangup is in _walk_tree of Tree::Trie, which brings me to my question: what is a highly time-effective way to perform matching for words and/or phrases against a target sentence, where the match will also return/allow access to supplementary data on the matched item? Here is the algorithm I need to implement efficiently:
Supplementary data includes topics and sentiment values corresponding to each word/phrase in my dictionary. In the end, I need to know all the topics that match in each tweet. Important caveat: The dictionary may include multi-word entries, so these need to be matched as well and preferred over shorter matches. The QuestionWhat might be the best Perl structure to fulfill my needs for:
Is there a more efficient tree implementation? Is Perl's internal hash implementation likely to offer sufficiently efficient alternatives? Can you think of something I'm missing? Thank you very much for your help! Update:For my project, the best results were in line with BrowserUK's suggestion: hashes were vastly superior, although a little trickier to get multi-word matches than regex would have been. Switching from Trie to Hash improved my speed by a factor of nearly 800.
Back to
Seekers of Perl Wisdom
|
|