in reply to How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet

I must agree, code please!

I'm no linguist but I like to play one on IRC!

  • Comment on Re: How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet

Replies are listed 'Best First'.
Re: Re: How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet
by Willard B. Trophy (Hermit) on Oct 18, 2002 at 03:56 UTC
    This is what I did -- I'm sorry I don't have any code, it's currently hidden away on same publisher's system.

    1) Took the Spanish-English dictionary, munged the spanish translations into translator's notes for the Catalan team, and then hashed each entry against its headword, such that $hash{'headword'} would contain the complete entry text.

    2) For each word in the Catalan list:
    a) checked to see if there was an exact match in the hash keys to a Spanish headword; if not:
    b) tried to apply each one of a list of ending heuristics; if one of these matched exactly, use it, else try a fuzzy match.

    I seem to remember that String::Approx returned a list of possible matches from keys(%hash). I used the simple expedient of using the first one it returned. There were probably better ways of doing it, but this seemed to be adequate.

    Sorry for lack of code. If it's any consolation, I remember what I called the program: The Hortalizer. That's because hortalissa is Spanish for vegetable, while the Catalan is hortaliza. I even put in one guess, erm heuristic, just to catch this word.

    --
    foreach(split('',"\3\3\3c>\0>c\177cc\0~c~``\0cc\177cc")) {$_=unpack('B8',$_);tr,01,\40#,;print$_,"\n";}##IYDKINT!

      That's because hortalissa is Spanish for vegetable, while the Catalan is hortaliza.

      I'm afraid you mixed up the Spanish (hortaliza) and the Catalan (hortalissa) words. If I remember correctly, there are no words in Spanish with two s in a row.

      See http://www.diccionarios.com/ for a Catalan-Castilian Spanish, Castilian Spanish-Catalan Dictionary.

      -- Ricardo
      Use MacPerl;

        Uh, thanks. I did say I'm no linguist in my original post …

        --
        foreach(split('',"\3\3\3c>\0>c\177cc\0~c~``\0cc\177cc")) {$_=unpack('B8',$_);tr,01,\40#,;print$_,"\n";}##IYDKINT!