Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: X-Prize: Natural Language Processing

by BrowserUk (Patriarch)
on Oct 17, 2004 at 02:03 UTC ( [id://399858]=note: print w/replies, xml ) Need Help??


in reply to Re^2: X-Prize: Natural Language Processing
in thread X-prize software challenge?

that person would ask for context. A properly-written NLP program would do the same thing.

If I sent you a /msg saying "Why do you deprecate tie and not bless?", you will immediately be able to respond to that question.

Ask an (existing) NLP to translate that into & from any other language, or even host of other languages and you get:

  1. German: "Warum mißbilligen Sie Riegel und segnen nicht?"

    "Why do you disapprove latch plates and do segnen not?"

  2. French:"Pourquoi est-ce que vous désapprouvez la cravate et ne la bénissezpas?"

    "Why do you disapprove the tie and it bénissezpas?"

  3. Chinese: "?????????????".

    "Why do you belittle the tie and do not bless?"

  4. Japanese: "????????????????"

    "Why, you criticize the tie, don't praise?"

  5. Dutch:"Waarom keurt zegent niet u band af en?"

    "Why inspects don't you bless link finished and?"

  6. Russion:"?????? ?? deprecate ????? ? ?? ???????????????"

    "Why you deprecate connection and do not bless?"

  7. Korean:"?? ? ??? ?????? ???? ????"

    "Tie under criticizing it boils it doesn't bless why it spreads out?"

That looked damned impressive when I pasted it, and rubbish now I've submitted it:(

I'm only vaguely fluent in one of those languages, and I would be hard pushed to recognise the question I asked, even though I am completely aware of the context, background and content.

How can a NLP "ask for context"? Most of the context of that question is completely absent from this post; this entire thread; some of it depends upon information only coveyed between us (you and I) through private communications. Without having all of the background information, and/or one of us to interrogate, can you ever imagine an NLP being able to communicate the essence of the question I am asking in another language?

Even a human being, fluent in English and whichever other langauge(s) we want it translated into, would be extremely hard pushed to convey the essence of that question without they also have an pretty intimate knowledge of not just programming in general, but Perl 5 specifically. Indeed, the question would be confusing and essentially meaningless, even in English, to anyone without the requisite background.

And that's my point. Human speech shorthands so much, on the basis of the speaker's knowledge of the of the listener's knowledge and experience. Try the mental exercise of just how much extra information would be required to allow another native English speaker, who has no knowledge of computers or Perl, to understand that question. I seriously doubt it could be done in less than 50,000 words?

Now imagine trying to translate those 50,000 words into Navaho Indian, or Inuit such that a native of those languages without computer and Perl 5 experience could understand it?

By now, your probably thinking "But the recipient of such a message would have that experience, otherwise you wouldn't be asking them that question", and you would be right. But it is my contention, that if the NLP is going to be able to convey the essence of the words 'tie' and 'bless' in the question, into suitably non-bondage-related and non-religious-related terms in the target language, it would need that same knowledge.

Of course, then you might say that: "If the recipient knows Perl programming, then the is no need, and it would in fact be detrimental, to translate those terms at all". But then the NLP has to have that knowledge in order to know not to translate those two terms. It would also need to 'know' that the recipent had the knowledge to understand the untranslated terms!

Apply that same logic to conversation between neurosurgeons, or particle physisists, or sushi chefs, or hair-stylists, or mothers.

Spoken and written langauge is rife with supposed knowledge, and contextual inference. Just as I would have extreme difficulty trying to explain the background of the question to a Japanese Sushi chef. He would have extreme difficulty in explaining the process of preparing Blowfish to me.

Not only can I see no way to encapsulate all that disparate knowledge into a computer program, neither can I see how to program the computer to ask the right questions to allow the translation of such information.

And, I think, it's a critical idea. If we can come up with an intermediate language representing the actual concepts being communicated, that would revolutionize philosophy, linguistics, computer science, and a host of other fields. It's not a matter of whether this project is worthwhile. I think it's a matter of we cannot afford to not do it.

I agree with the sentiment of this, but not the approach. Not because I wouldn't like it to suceed, but because I simply do not see the time when this will be feasible. Even with Moore's Law working for use (for how much longer?), I do not see the means by which it would be acheivable.

I also think that the underlying problem will be resolved before we find a technological solution to it, in a rather more prosaic, but ultimately more practical fashion. I think over time, the diversity of human langauge will steadily reduce until the problem "goes away".

I suspect that a single common langauge will become universal. I doubt it will be recognisably any one of the currently existing langauges. More a bastardisation of several of the more widely spoken ones all run together. I imagine that there will be a fairly high content of English (because it's lingua franca in so many fields already), French (because the French are stubborn enough to ensure it. Besides which it's too nice a language to allow to die), Chinese and one of the Indian sub-continental languages (because between them they cover about 1/3rd of the world's population), and probably many bits of many others.

Basically, we (our children('s children?)) will all become bi-lingual. Our 'native' tongues, and "worldspeak".

Always assuming that we don't run out of fuel, water or air before then!


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
  • Comment on Re^3: X-Prize: Natural Language Processing

Replies are listed 'Best First'.
Re^4: X-Prize: Natural Language Processing
by dragonchild (Archbishop) on Oct 17, 2004 at 02:54 UTC
    Maybe the goal for the NLP X-Prize should be a little more ... constrained. I was thinking about constraints this evening. What about the following change?

    A successful NLP X-Prize program should be able to translate any paragraph that is:

    • self-contained with respect to context
    • restricted to a given wordlist (maybe 1000-2500 of the most common words)
    and translate that paragraph from any one of N languages to any other of those languages. The languages would be set in the final specification, but would include the following:
    • English
    • French
    • Chinese (written)
    • Japanese (Kanji, probably)
    • Hindi
    • German
    • Russian
    • Arabic
    • Navajo (or some other AmerIndian language)
    The program would perform the translation under 1 second per word of input or output, whichever is greater.

    The wordlist would be chosen to be dialect-agnostic, as much as is possible. Most of these words would be the words most people learn in grammar school. I'm talking about words like "in", "is", "have", "run", "cat", "dog", etc.

    The X-Prize wouldn't go to the program that can translate physics texts, just like the Ansari prize didn't go to the ship that would actually ferry passengers. It went to the ship that demonstrated the feasibility of technologies. Now that SpaceShipOne has succeeded, new ships will be built to actually make it a commercial venture. I would expect the same to happen for any other X-Prize.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://399858]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (9)
As of 2024-04-19 13:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found