Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

searching problem

by baxy77bax (Chaplain)
on Jan 18, 2013 at 10:17 UTC ( #1014021=perlquestion: print w/replies, xml ) Need Help??
baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I have a little problem here. I also have a solution, but maybe someone has a better one. So the problem is as follows: I have a list of words like :

mamy dady bal foo bar ymamaa ...
and i need to find out if there is a word which reverse can be matched to any other word in my set. Example:

(reverse) (match) mamy -> ymam -> ymamaa
The easiest solution here would be to create a reverse of every word, join them to the original set, label them , sort them (lex. of course). That way 'ymam' and 'ymamaa' would be one next to another and since I have labelled them I can say in O(c) (c=constant) time if there exists a match to the reverse of 'mamy' or not.

Does anyone have a better solution to this ??

Thank you,


Update: Thnx chordoba that is true, but i missed out one simple detail. These sets are huge. Therefore regex-ing would take much longer then my sorting suggestion. But yes you are correct (++) Thnx

Update : Yes only from the start ! :) That is also a good solution , but trie's are a bit memory heavy wouldn't you say. so that is why I thought sorting is the simplest one plus building a trie takes in practice much longer then sorting. Wouldn't you say?

Replies are listed 'Best First'.
Re: searching problem (reverse substrings)
by LanX (Chancellor) on Jan 18, 2013 at 10:33 UTC
    You mean matching only from the start?

    Your sorting technique sounds clever, but the sorting itself isn't constant, I suppose something like O(n*log(n)).

    IMHO the fastest solution should be a trie, if it's worth the overhead depends on the number of words to search.

    Another simpler approach is abusing the regex engine, because from 5.10 on it has internal trie optimization and is f*ckingly fast!

    $str = join "\0",@wordlist, @reverts = map { "\0" . reverse $_ } @wordlist; $pattern = '(' . join('|',@reverts) . ')' ; @matches = ( $str =~ /$pattern/g );

    thats untested pseudo-code I hope you get the idea! =)


    now tested:

    DB<131> @wordlist=qw(mamy dady bara foo bic ymamaa arabic ); => ("mamy", "dady", "bara", "foo", "bic", "ymamaa", "arabic") DB<132> $str = join "\0",@wordlist, => "mamy\0dady\0bara\0foo\0bic\0ymamaa\0arabic" DB<133> $pattern = join '|',map { "\0" . reverse $_ } @wordlist; => "\0ymam|\0ydad|\0arab|\0oof|\0cib|\0aamamy|\0cibara" DB<134> @matches = ( $str =~ /($pattern)/g ); => ("\0ymam", "\0arab")

    Cheers Rolf

Re: searching problem
by choroba (Chancellor) on Jan 18, 2013 at 10:27 UTC
    You can create a regex that matches any of the reversed words:
    #!/usr/bin/perl use warnings; use strict; my @words = qw(mamy dady bal foo bar ymamaa arabic); my $regex = join '|', map scalar reverse, @words; $regex = qr/$regex/; print "$_\n" for grep /$regex/, @words;
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      A small simplification:
      my $regex = reverse join '|', @words;

      The way I understood the OP was print out the word which when in reverse matches any other word. Your solution does the opposite. For example instead of printing bar since 'rab' matches in arabic, it prints arabic. ymamaa was also printed since 'mamy' in reverse matches it. My point is that arabic in reverse doesn't match anything.

      #!/usr/bin/perl use warnings; use strict; my @words = qw(mamy dady bal foo bar xrabbit ymamaa arabic); my $list = join ' ', @words; foreach( @words ) { my $rev = reverse $_; print "$_\n" if $list =~ /$rev/; } __END__ mamy bar
Re: searching problem
by RichardK (Parson) on Jan 18, 2013 at 11:58 UTC

    If you need to run this often and your word lists are fairly constant, store them in indexed database tables.

    sqlite is a simple way to experiment, and databases are designed for that sort of job :)

    Then you only have to create the reverse strings once when you populate the DB.

Re: searching problem
by Khen1950fx (Canon) on Jan 18, 2013 at 17:24 UTC
    I was thinking of something along these lines:
    #!/usr/bin/perl -l BEGIN { $| = 1; $^W = 1; } use strict; use warnings; use List::MoreUtils 'any'; my($string) = q(mamy dady bal foo bar); my(@allwords) = split(" ", $string, 0); my($revwords) = join(" ", reverse(@allwords)); my(@strings_list) = scalar reverse $revwords; my $match = any( sub { /^ymam*/}, @strings_list, ); print "match" if $match; print "no match" unless $match;
      my($string) = q(mamy dady bal foo bar);

      Your wordlist doesn't contain anything that will match what the OP requested.

      my(@strings_list) = scalar reverse $revwords;

      This will put all the words into the first element of @strings_list so there is no need to use any(). Would have been much simpler to just reverse the original string and then split it into an array.

      my $rev = reverse $string; my @strings = split ' ', $rev; print join ',',@strings;
      sub { /^ymam*/}, @strings_list,

      ^ymam* will match 0 or more 'm's after yma which is not going to work. The * is not needed here.

Re: searching problem
by punch_card_don (Curate) on Jan 18, 2013 at 16:12 UTC
    What about converting the letters to numbers and using a reverse check-sum?

    Time flies like an arrow. Fruit flies like a banana.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1014021]
Front-paged by Arunbear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2017-05-29 21:25 GMT
Find Nodes?
    Voting Booth?