Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Re: How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet

by Willard B. Trophy (Hermit)
on Oct 18, 2002 at 03:56 UTC ( #206212=note: print w/ replies, xml ) Need Help??


in reply to Re: How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet
in thread How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet

This is what I did -- I'm sorry I don't have any code, it's currently hidden away on same publisher's system.

1) Took the Spanish-English dictionary, munged the spanish translations into translator's notes for the Catalan team, and then hashed each entry against its headword, such that $hash{'headword'} would contain the complete entry text.

2) For each word in the Catalan list:
a) checked to see if there was an exact match in the hash keys to a Spanish headword; if not:
b) tried to apply each one of a list of ending heuristics; if one of these matched exactly, use it, else try a fuzzy match.

I seem to remember that String::Approx returned a list of possible matches from keys(%hash). I used the simple expedient of using the first one it returned. There were probably better ways of doing it, but this seemed to be adequate.

Sorry for lack of code. If it's any consolation, I remember what I called the program: The Hortalizer. That's because hortalissa is Spanish for vegetable, while the Catalan is hortaliza. I even put in one guess, erm heuristic, just to catch this word.

--
foreach(split('',"\3\3\3c>\0>c\177cc\0~c~``\0cc\177cc")) {$_=unpack('B8',$_);tr,01,\40#,;print$_,"\n";}##IYDKINT!


Comment on Re: Re: How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet
Re: Re: Re: How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet
by Sisyphus (Hermit) on Oct 19, 2002 at 04:49 UTC
    That's because hortalissa is Spanish for vegetable, while the Catalan is hortaliza.

    I'm afraid you mixed up the Spanish (hortaliza) and the Catalan (hortalissa) words. If I remember correctly, there are no words in Spanish with two s in a row.

    See http://www.diccionarios.com/ for a Catalan-Castilian Spanish, Castilian Spanish-Catalan Dictionary.

    -- Ricardo
    Use MacPerl;

      Uh, thanks. I did say I'm no linguist in my original post …

      --
      foreach(split('',"\3\3\3c>\0>c\177cc\0~c~``\0cc\177cc")) {$_=unpack('B8',$_);tr,01,\40#,;print$_,"\n";}##IYDKINT!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://206212]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (13)
As of 2014-09-30 18:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (381 votes), past polls