Teaching computers to climb the tower of babel

Subtleties are important in language. I learnt this by using the phrase ‘tengo 26 anos’ in Spanish where I should have used ‘tengo 26 a√±os’. As I discovered, the difference is slight but surprisingly meaningful.

While a computer is fooled by my error, a Spanish speaker would likely find it hilarious, but would get my intended meaning, because, in language, context is everything.

One of the most difficult things for computer translation is that context includes not only the other words in a sentence, but the state of the world, shared cultural assumptions, and even the mental state of each person in the conversation.

If this seems like an impossible problem to solve, Wired has an article on companies trying to create better translators, and they’ve managed it with some significant success.

They’ve achieved their success by ‘doing a Google‘ and taking advantage of the fact that while it’s impossible to get a computer to understand human concepts, it is possible to use the massive amount of text on the internet as a database of human assumptions.

The computer translator generates as many translations as it can, and then matches each one to the ‘database’ of text to see which one is most like real human language. The one that matches is most likely to be the best translation.

There’s a bit more to it than that, but that’s the general idea.

For the first time in history, the internet has provided a massive amount of self-generating human data that can be easily accessed by our tools of analysis.

Rather than expecting computers to be individually intelligent, it might be more fruitful to get them to process the structure of behaviour, and get meaning from the real humans.

Link to Wired article ‘Me Translate Pretty One Day’.