Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
No such thing as a small change
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Depending upon the number of foriegn words your looking at, it mght be better to run through your files verifying the words against the dictionary for the predominant language, and flag any that do not show up.

You could write the name of the files to a "pages to check" file, and wrap the words in something glaringly obvious (like the hated <blink> tags :). Then you (or your native language editor person) could look at the suspect words in context and make a decision based on that. Of course that won't help you with words like your example that have meanings in several different languages.

Probably the best way to deal with that is to also flag any words that show up in more than one language dictionary.

I think that if performance is anything of an issue, then you should probably avoid storing your dictionaries in a SQL database. However, using the DBI interface to one of the flat-file databases you can achieve some pretty amazing performance as was prooved to me by grantm in this thread Fast wordlist lookup for game.


Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.


In reply to Re: detecting the language of a word? by BrowserUk
in thread detecting the language of a word? by domm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (5)
    As of 2014-04-21 04:56 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      April first is:







      Results (490 votes), past polls