Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Weird "soundex" algorithm

by giulienk (Curate)
on Aug 28, 2003 at 14:59 UTC ( #287394=perlquestion: print w/ replies, xml ) Need Help??
giulienk has asked for the wisdom of the Perl Monks concerning the following question:

I need to work out a bunch of data with an algorithm i didn't choose. The algorithm has nothing to do with the "real" soundex algorithm (see Text::Soundex), even though it was called like that where i read it. So from now on i will baptize this algorithm Weirdex; this is described to receive a string in input and then:
  • Strip chars outside [a-zA-Z] range
  • Take all consonants except doubles (that should be reduced to one)
  • Take just the first vowel

Examples:
OriginalWeirdex
giulienkgilnk
larry walllarywl
etheroskedasticityethrskdstcty

My first coding of Weirdex is

sub weirdex { local $_ = shift; my ($weirdex, $vowel) = ('', 0); s/[^a-zA-Z]//g; for (split '') { if (/[aeiou]/i) { $vowel++ or $weirdex .= $_; } else { $weirdex .= $_ if substr($weirdex, -1, 1) ne $_; } } return $weirdex; }

I'm interested in more elegant/efficient solutions, only regexp solutions and even Golf/Obfu ones.
Thank you.


$|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g

Comment on Weird "soundex" algorithm
Download Code
Re: Weird "soundex" algorithm
by broquaint (Abbot) on Aug 28, 2003 at 15:17 UTC
    sub weirdex { local $_ = shift; tr/a-zA-Z//dc; tr/a-zA-Z//s; m/[aeiou]/g and substr($_, pos) =~ s/[aieuo]//g; $_; } print "$_ - ", weirdex($_), $/ for "giulienk", "larry wall", "etheroskedasticity"; __output__ giulienk - gilnk larry wall - larywl etheroskedasticity - ethrskdstcty
    tr//, m// and s///, pos and substr are your friends :)
    HTH

    _________
    broquaint

      Thanks to yours and jmcnamara's reply: i totally forgot about the power of tr/// modifiers, especially c and s, which i never used before. :)


      $|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g

      Nice. I was unhappy with your vowel stripping though. You use a looping match for offsets into $_ but really, you only have to find the *first* vowel and then no looping is required.

      # m/[aeiou]/g and substr($_, pos) =~ s/[aieuo]//g; /[aeiou]/ and substr( $_, $+[0] ) =~ tr/aeiou//d;
        You use a looping match for offsets into $_ but really ...
        Er, what looping? The /g matches the first vowel then saves the position of the match for substr which the replace then operates on. I didn't want to use the $+ variable because of the overhead it invokes.
        HTH

        _________
        broquaint

Re: Weird "soundex" algorithm
by jmcnamara (Monsignor) on Aug 28, 2003 at 15:20 UTC

    sub weirdex { local $_ = $_[0]; tr/a-zA-Z//s; tr/a-zA-Z//cd; my $i; s/([aeiou])/$1if!$i++/eg; $_; }

    --
    John.

Re: Weird "soundex" algorithm
by Anonymous Monk on Aug 28, 2003 at 15:37 UTC
    /[aeiou]/i should be  /[aeiouy]/i.

    Y is a vowel, isn't it?

      Sometimes, so it may not be necessary for this particular algorithm. It would also appear that two Welsh words in the dictionary use 'w' as a vowel.
      HTH

      _________
      broquaint

      Y and W are pseudo-vowels. They represent vowel sounds, but are not vowel letters. This is just one of the many contradictions between spoken and written language. I think W is a vowel in Croatian.
Re: Weird "soundex" algorithm
by rcaputo (Chaplain) on Aug 28, 2003 at 15:46 UTC
Re: Weird "soundex" algorithm
by Aristotle (Chancellor) on Aug 28, 2003 at 16:03 UTC
    use strict; use Test::More 'no_plan'; sub weirdex { local $_ = shift; tr/A-Za-z//cd; tr/A-Za-z//s; /[aeiou]/g; substr($_, pos()) =~ s/[aeiou]+//g; return $_; } my %test = ( 'giulienk' => 'gilnk', 'larry wall' => 'larywl', 'etheroskedasticity' => 'ethrskdstcty', ); my ($input, $result); is(weirdex($input), $result, "$input") while ($input, $result) = each +%test; __END__ ok 1 - etheroskedasticity ok 2 - larry wall ok 3 - giulienk 1..3

    Makeshifts last the longest.

Re: Weird "soundex" algorithm
by giulienk (Curate) on Sep 01, 2003 at 06:56 UTC
    I add a couple comments for posterity:
    • All the solutions (except the original one) have the same problem: they don't care about upper-case vowels. All examples were in lower cases so nobody noticed it.
    • The original implementation was wrong has it would consider consonant separated by a "non-first-vowel" as contigous ones and squash them if they are equal.
    • The illustrious surname "Schwartz" with its consonant-to-vowel ratio of 7.0 render this algorithm pretty useless :)


    $|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://287394]
Approved by jwest
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2015-07-05 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls