Language Translation in the Internet Age
‘My Hovercraft Is Full of Eels’
by Nancy K. Herther
Sociology/Anthropology Librarian, University of Minnesota Libraries
‘My Hovercraft Is Full of Eels’
Remember this phrase from the British comedy show Monty Python’s Flying Circus? It came from a sketch about a poorly translated English-Hungarian phrasebook. (You can watch it again at www.youtube.com/watch?v=G6D1YI-41ao.) Funny, yes, but in today’s multilingual, global economy, poor translation is no laughing matter.
A July 201 headline on the BBC website read: “Spelling Mistakes ‘Cost Millions’ in Lost Online Sales” [www.bbc.co.uk/news/education-14130854]; the article notes that “when you sell or communicate on the internet, 99% of the time it is done by the written word.” This is hardly earth-shattering news, but it still highlights the importance of words to websites or any type of communication. This article cited statistics suggesting that a single spelling mistake on an ecommerce site can affect credibility to the point of losing half of a company’s potential revenue.
The internet has become a major communications system engulfing all aspects of commerce, government, education, information, healthcare, and other arenas. However, webpages are developed not only to convey information, but to market items. Webpages are designed to attract users and to keep them coming back. Along with poor design and typos, issues of unclear messages plague many websites today.
The first part of this article series covered transcription and voice-command searching over the internet (“Voice Recognition Arrives!,” Vol. 19, No. 9, November 2011, pp. 20–29, 46). Another key area for which technology is hoping to assist searchers is in the area of language translation.
A Profusion of Languages
With the use of computers for nearly everything today — news, shopping, education, communication, information — it seems only natural to assume that computers will somehow play a major role in helping us deal with the increasingly global aspect of the internet and, in particular, the profusion of languages being used.
Take, for example, the case of the European Union. The EU is currently composed of 27 individual member states. Today’s EU webpage offers a choice of 23 languages in order to “localize” EU information and opportunities to their members [http://europa.eu]. Today, the EU is also involved in 20 research projects, related to “interface of language and digital content, supported by €67 million of EU funding and the new projects submitted this year will get an additional €50 million.” In order to “ensure more accessibility to web content for everyone,” the EU has embarked on a program called the Digital Agenda for Europe [http://ec.europa.eu/information_society/digital-agenda/index_en.htm].
One of the issues, which surfaced in EU-sponsored research released in January 2011, provides telling insight into user behavior in a multilingual environment:
While 90% of internet surfers in the EU prefer to access websites in their own language, 55% at least occasionally use a language other than their own when online. …
However, 44% of European Internet users feel they are missing interesting information because web pages are not in a language that they understand and only 18% buy products online in a foreign language. The results underline the need for investment in online translation tools so that EU Internet users are not excluded from finding information or products online because they lack the language skills.
The Paradox of Translation
Translation is intended to make communication more precise and accessible to an audience. However, the problems in trying to translate information in a global community can seem overwhelming.
The enormity and complexity of the translation problem can be seen in the experience of India. The country has 22 official languages — and a hundred others as well — all spoken by some percent of the country’s nearly 900 million people. A concerted effort led by the Language Technologies Research Centre [http://ltrc.iiit.ac.in] is working to apply advanced technology, statistical machine learning, and dictionary- and rules-based algorithms to make it easier to translate from any one of these 122 languages to another, allowing for better communication within India. A prototype system was launched in 2010 [http://sampark.iiit.ac.in/sampark/web/index.php/content].
Studies have shown, as with the EU case, that factors impacting ongoing use of a website include its being in one’s native language and meeting expectations and cultural sensibilities. (See Figure 1 below.) “On the Internet, Web users spend more time and come back more often to the Web sites that are intheir native language and appeal to their cultural sensibilities. Visitors to a Web site would stay twice as long if the content on the Web site were available intheir own language. Their willingness to buy something online increases by at least four times if the Web site is localized to meet their needs to thoroughly research the product and the company” (He, Shaoyi. “Multilingual Issues in Global E-Commerce Web Sites,” 21st Century Management: A Reference Handbook. 2007. SAGE Publications, pp. 391–400.)
True Seamless Global Communication — A Pipe Dream?
Real-time, ongoing, and especially speech-to-speech translation has been called a pipe dream. Still, if a truly global economy and community are to be achieved, transparent communication is essential, not only for global communication but for precise search and retrieval as well.
We’ve long known that certain languages — especially English — dominate communication. According to Ethnologue: Languages of the World, “It turns out that 389 (or nearly 6%) of the world’s languages have at least one million speakers and account for 94% of the world’s population. By contrast, the remaining 94% of languages are spoken by only 6% of the world’s people” [www.ethnologue.com/ethno_docs/distribution.asp?by=size]. World Internet Stats still holds that English continues to dominate webpages [www.internetworldstats.com/stats7.htm]. However. as the proportion of web users shifts from Europe/North America to East Asia and other areas [www.internetworldstats.com/stats.htm], all websites and web users will require that websites offer multilingual interfaces. (See Figures 2 and 3 on page 23.)
This shifting language tide began to turn in 2000, according to Global Reach (now called Guava A/S [www.guava.co.uk]), which found that the number of non-English-speaking web users had surpassed the number speaking English. Since then, the balance has continued to tilt, not to another single language besides English, but to a variety of disparate languages representing the spoken tongues of today’s world.
Translation Efforts Today
Even translation between just two languages can be daunting. As researchers noted in an IEEE Transactions article, “a translation from Japanese to English requires 1) a word separation process for Japanese because Japanese has no explicit spacing information, and 2) transforming the source sentence into a target sentence with a drastically different style because their word order and their coverage of words are completely different, among other factors” (Nakamura, et. al., “The ATR Multilingual Speech-to-Speech Translation System,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, pp. 365–376, 2006).
The U.S. Department of State actually has a classification system with which it rates languages for difficulty. The “category four” languages (meaning extremely hard for English speakers) include some of the languages spoken in the most populous areas in the world today: Arabic, Chinese, Japanese, and Korean [www.state.gov/m/fsi]. So, expecting people to learn more languages would appear a hopeless scenario.
The issue of translation has serious domestic implications as well. In the U.S., calls for English to be declared the official language (to the exclusion of others) are regularly offered at all levels of government. In October 2011, New York Governor Andrew Cuomo signed an executive order requiring 27 state agencies to provide their official forms and translation services in six languages beyond English. “For many, many years, we assumed in New York state government that it was up to the person to figure out how to communicate with government. No! The roles are reversed. It’s government’s responsibility to figure out how to communicate with the person. Government serves the person. The person doesn’t serve the government, especially in New York,” Cuomo said.
In March 2011 the European Patent Office formally signed an agreement with Google to “offer translation of patents on its website into 28 European languages, as well as into Chinese, Japanese, Korean and Russian.” After years of haggling, the EU had decided earlier that month to introduce “a common system requiring patent applications to be submitted in English, French or German — the three working languages of the EPO” [www.euractiv.com/innovation/eu-patent-office-google-seal-pact-translation-news-503495].
For years, the U.S. National Institute of Standards and Technology has annually conducted evaluations of translation systems. These detailed performance evaluations of speech translation systems were conducted to support the Defense Advanced Research Projects Agency’s (DARPA) need for quick and truly reliable translation in the field. Traditionally, military services have depended on human translators. However, they aren’t always available for often dangerous missions, so the government has actively been seeking technology-based solutions. The DARPA project, called TRANSTAC (spoken language communication and TRANSlation system for TACtical use) is currently testing handheld devices in this effort [www.nist.gov/el/isd/language_072110.cfm].
Today’s translation software benefits greatly from statistical analysis — and mighty computing power — to build models or algorithms to help speed the translation of information on demand. This is based on hundreds of thousands of hours of recorded speech, as Nuance and other companies have gathered. Also, scientists have been analyzing documents translated by human experts — government documents, literary works, and so on — looking for patterns and building huge stores of accumulated knowledge on language use and meaning.
Clearly, linguists are dealing with something far more complex and important than direct translation of terms or words: the interpretation of meaning that is inbred in our language systems. For legal or other professional and business applications, your best bet is still using a pro. For more casual applications, there are plenty of good quality machine translation services and products — many of which are free — to help you get the gist of meaning from some text.
Translation Software Today
For decades researchers have sought ways to mechanize the process of translating information and documents from one language to another. Progress has been amazing, but there is still plenty of room for improvement.
For the uninformed, the terminology itself can be intimidating.
Translation: Determining and communicating the meaning of some information into a form understandable by another language or culture. Usually involving text materials, interpretation is the term generally used to refer to nonwritten communication forms.
Machine Translation (MT): Also called automated translation, MT involves the application of computers to translate from one natural language to another; however, usually in most professional applications, a human is involved in the process as well to provide oversight or final approval/editing.
Interpreters: Generally refers to the professionals who work with people (in courts, business, medical situations) to better communicate across languages and cultures, not in translating texts.
Translators: Traditionally refers to professionals who work with written texts or other means to ensure understanding and clear communications (such as in medical and legal instances as well as for sign language and other applications.
Computer-Assisted Translation: Alsoreferred to as computer-aided translation or machine-aided human translation, this centers on humans who use computers to assist in their translation projects.
Today the quality of available programs — over the web as well as free websites and software for purchase — have made amazing advances. Today’s programs, which use massive storage and algorithms and rules based on thousands (or millions) of archived communication, can reach far higher levels of accuracy due to advances in understanding grammar and sentence structures, idioms, and local communication patterns. Common features present in systems include the following:
- Translation in multiple formats, from cut-and-paste phrases to PDF, DOC, TXT, HTML, and other formatted documents
- Dictionaries and other special features to more closely analyze specific words or phrases
- Links to other sources or translation features
- Spellchecking and diacritics help
- Quick movement of text from one language to others
Top Translation Software Packages Available Today
Most everyone is familiar with the “big three” translation systems available for web searchers:
Bing Translator (previously Live Search Translator and Windows Live Translator)
Microsoft’s Bing Translator integrates with several other Microsoft products: Internet Explorer, and Microsoft Office 2003, and newer. In addition, owners of websites or blogs may integrate a Bing Translator widget into their websites or blogs; this widget will enable visitors to translate the webpage into which the widget is integrated into their language of choice. Nearly 40 languages are now covered.
Introduced in 2006 originally for Arabic, the underlying software is also used in Babel Fish, Yahoo!, and AOL translation products. In May 2011, Google announced that Google Translate would be terminated; however, due to public pressure. in June, it announced that a paid version of the Translate API would remain available for developers.
Yahoo! Babel Fish
With software from SYSTRAN and based on Google Translate, the product, which appeared first on Alta Vista, moved to Yahoo! in 2008.
Today, there are also a variety of software packages available — the value of any of these depends on the degree of precision that you require and the specific languages that you might need. Nothing reliably matches human translation for serious work; however, products such as these work to make daily surfing a bit easier.
No pricing is given here due to deep discounts sometimes available through different sales outlets. Many of these sites also allow for free versions, limited-time downloads for sampling products, or other options. For more comparative information, see Wikipedia’s coverage of this product category at http://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications
Now in v. 9 for PC and v.3 for Mac, this package has generally received good critical review and offers translation of more than 75 different languages, full webpage and document translation, and seamless integration with Microsoft Office speller.
With links to more than 265 dictionaries in 73 languages, Foreignword allows you to translate “small blocks” of text in 60 languages.
This software translates English, Spanish, French, and Italian. It also translates and a variety of documents and file types.
This is a very nice site that not only links you to multiple translation sites for small-scale translation in 13 languages but attempts to help you decide which language some type of text is written in using the Language Identifier/Guesser programs from the either the University of Groningen or Fuzzums.
LingvoSoft offers a variety of software solutions for different applications, languages, and uses, including tools for learning languages. LingvoSoft Talking Translator 2011 is available for translation in specific languages: Russian, Spanish, Portuguese, Polish, Italian, German, and French.
Translate text files, blogs, email, webpages, search items, instant messages, and chats for seven languages: English, French, German, Italian, Portuguese, Russian, and Spanish.
Now in v. 9, the package translates Microsoft Office, Open-Office.org, Firefox, and other texts as well as internet messages, PDFs, and other document types in English, German, Spanish, Portuguese, French, Italian, and Russian.
The company offers mobile and computer-level applications. Currently in v. 7, SYSTRAN “uses the same robust translation engine selected by leading Internet portals, global corporations, and the US intelligence community.” The software provides nearly instant translation of Word documents, webpages, emails, all texts, and tweets.
This site provides translation for 14 languages in various text, email, PDF, and web applications as well as plug-ins for Microsoft Word and Internet Explorer.
Intended for business applications rather than the public at large, this software provides translation for 18 languages — including five non-English bidirectional translations and one-way translation for another five languages.
WhiteSmoke Translator currently covers nine languages and provides either full-text or word-to-word translation for any text application. It includes a dictionary and thesaurus feature.
WorldLingo offers free translation of up to 500 words in 32 languages on the webpage and offers more for a fee.
This list is not intended to be comprehensive. For a good, current listing of companies, check out http://www.translationguide.com/translation_company_links.php#add
Having Gone So Far … Yet So Far Still to Go
Lack of validation and fact-checking has been a problem for searchers since the beginning of time. Searchers know well the issues raised by extremists over the internet who purposefully tinker with facts to persuade people to their causes. Translation adds its own problems. Just Googling translation will bring up examples of how poor translation has created international embarrassment for countries and their leaders.
As linguist Jarek Krajka put it: “Translation is a complex process. ... Meaning is paramount, and the translation should accurately reflect the meaning of the original. Moreover, it is the form which should also correspond; of course, often it needs to be translated as well. The register and style are to be retained, with the translator not influencing the meaning by often unintentional choice of language structures. What is more, the influence of the source language, especially in the area of translating idioms and collocations, has to be controlled and limited” [www.tewtjournal.org/VOL%204/ISSUE%204/05_YOURMOTHETONGUE.pdf].
Bill Cope of the University of Illinois–Urbana-Champaign and co-author of Towards a Semantic Web: Connecting Knowledge in Academic Research (Woodhead Publishing, 2011) believes that “barriers to meaning are progressively reduced through machine translation and semantic markup. The subtleties of natural language will never be entirely computable, because much meaning is situational, pragmatic and located in shared assumptions.”
Those subtleties are apparently a lot more complicated than we might imagine. Abubaker Almabruk, a doctoral student at the University of Leicester, analyzed the reading difference of individuals across languages and found that “Arabic readers recognise words in a different way from readers of other languages”[www.sciencedaily.com/releases/2011/05/110518080109.htm].
“For certain types of communications, today’s technology works reasonably well,” explains Stephen Arnold, consultant at ArnoldIT. “However, for context free messages, unusual accents, and certain language dialects, machine translation does not work as well as an informed native speaker. The ‘good enough’ approach is what is in the market. There is room for considerable innovation.”
Less than 3 years ago, I wrote an article for Searcher entitled “The Changing Language of Search” (February 2009, pp. 42–44, 46–50) that focused on how a form of global English was developing and the challenges it posed for searchers and our communities. As the internet continues to morph and grow, clearly the problem of accessing information is becoming more complex than I imagined then. Machine translation offers a partial solution — however, it is no panacea.
Despite this, machine translation (MT) has little to fear. John Hutchins, a renowned expert in this area, notes that “MT is being used not for ‘pure’ translation but to aid bilingual communication in an ever-widening range of situations; and MT is becoming just one component of multilingual, multimodal document (text) and image (video) extraction and analysis systems. The future scope of MT and its applications seems to be without limit.”
Walter Bacak, executive director of the American Translators Association [www.atanet.org], boasts “11,000 members in more than 90 countries include translators, interpreters, teachers, project managers, web and software developers, language company owners, hospitals, universities, and government agencies.” Bacak notes that “anytime you are concerned with quality translation, a professional, qualified translator is required. Translation is so much more than just word-for-word changes, it has to include the idioms, cultural context and intention of the writing — to get to the true meaning of some text. Today business, government and education are taking place over the internet. The overwhelming number of different languages and cultures in the world cannot be handled by today’s technologies. The field is doing well and fulfilling a major need today.”
Kevin Devlin, executive director of Stanford University’s Center for the Study of Language and Information, notes that “the use of statistical techniques, coupled with fast processors and large, fast memory, will certainly mean we will see better and better translation systems that work tolerably well in many situations, but fluent translation, as a human expert can do, is in my view, not achievable” (“The Illusive Goal of Machine Translation,” Scientific American special supplement, Innovations, Vol. 294, No. 3, pp. 92–95). Words are intricately interwoven with cultural and personal meanings, ideas, feelings, and tone. Today’s computers are not able to delve these depths.
Today we have many very good options for machine translation — however, none are close to acceptable for most professional or business purposes. So the thousands of interpreters and translators out there apparently are assured job security.
Esperanto and Other Pipe Dreams
Over the years, many efforts have been made to improve communication across boundaries and cultures. More than 120 years ago, ophthalmologist Ludwig Zamenhof created a new language — Esperanto — to foster better understanding and goodwill among the peoples of the world:
The place where I was born and spent my childhood gave direction to all my future struggles. In Bialystok … I was brought up as an idealist; I was taught that all people were brothers, while outside in the street at every step I felt that there were no people, only Russians, Poles, Germans, Jews and so on. This was always a great torment to my infant mind, although many people may smile at such an “anguish for the world” in a child. Since at that time I thought that “grown-ups” were omnipotent, so I often said to myself that when I grew up I would certainly destroy this evil.
Esperanto was his effort in the 1880s to develop a universal second language to aid in cross-cultural or international communication. Of course, it didn’t catch on, but it hasn’t stopped the urge to find some type of “universal communicator” like in Star Trek to make communication in any language as easy as to understand as your own.
And this desire isn’t limited to humans. Horse whispering has many adherents. A recent book called The Human Pack (BookSurge Publishing, 2007) is intended to help humans translate “the intricate language of canines to use this knowledge in order to create a harmonious ‘human pack.’” You can even buy Translator for Cats and Translator for Dogs apps today for your cellphone [http://itunes.apple.com/us/app/translator-for-cats-free/id367963934?mt=8].
Avoiding Misunderstandings in the First Place
Writing for an audience that may move beyond your local circle? Here are some tips to ensure that your message gets through as you intended it.
1. Don’t use slang or colloquialisms that might not translate well.
2. Write in clear, concise sentences — the simpler the sentence, the better.
3. Use correct punctuation to help people or machines better parse your meaning.
4. Check for spelling — remember that even within English itself, there are a variety of “acceptable” ways to spell many terms or words.
5. Be consistent and direct — and always provide links, FAQs, or emails for human contact as needed by your readers.
“The internet will drive changes in the nature and application of MT. What users of Internet services are seeking is information, in whatever language it may have been written or stored — translation is just a means to that end. Users will want seamless integration of information retrieval, extraction and summarization systems with automatic translation. There is now increasingly active research in cross-lingual information retrieval, multilingual summarization, multilingual text generation from databases, and so forth. …”
– John Hutchins, consultant & MT expert
“It may be decades before we have the talking computers popularized in science fiction. Moreover, we cannot be sure that it would be possible to build a computer program that understands human language perfectly. We have tried to show in this paper some directions for the future of natural language understanding systems so that they show a behavior that seems really intelligent.”
– Gérard Sabah, senior researcher, LIMSI Laboratory, Université Paris-Sud
Translating Blogs and Web Postings
Translation products and services are getting more advanced — and more tailored — to meet the needs of clients across the globe and across industries and applications.
Mojofiti [www.mojofiti.com] is a new product that automatically takes your blog posts and makes them available to readers in 27 languages. With the motto of “the world without language barriers,” Mojofiti is based in Denver, and its system is powered by Google’s translation API software. SpeakLike [www.speaklike.com], a pricier human-aided system, seeks to provide clients with a “central translation hub for website localization. ... SpeakLike’s human translators look at the message as a whole, figuring out the meaning of the message before translating it.”
Meedan [http://news.meedan.net] provides a fascinating model of how translation can work to facilitate better communication and understanding. Billed as “an Arabic-English forum using machine translation with expert corrections,” the site translates news stories to/from English and Arabic and displays the two versions side-by-side, with reader comments instantly added to both languages as they are made.
Human Translation Standards
In 2006, the European Committee for Standardization enacted EN 15038, which sets a quality standard for translation services. This standard has been gaining global acceptance since its inauguration and has been set as a benchmark in its tender specifications in the European Union. The standard sets requirements for the competencies of linguists: It defines the ability to translate to an acceptable level regarding terminology, phraseology, style, grammar, and the aspects unique to the target locale. Linguists are required to have a high level of proficiency in the target language for translation, cultural competence, behavior, value systems, and cultural standards. Translators must be proficient in using needed software and hardware.
In the U.S., the translation services standard is the ASTM F2575-06 Standard Guide for Quality Assurance in Translation, which provides a framework for negotiating the specific requirements of a translation project. It provides parameters but no specific criteria for the expected quality of translation or projects and — as the name implies — is more a guide than a prescriptive standard, as is the EN 15038.
Nancy K. Herther is a Sociology/Anthropology Librarian at the University of Minnesota Libraries. Her e-mail address is firstname.lastname@example.org.