Thinking Strategically About Content Destined for Machine Translation

50 %
50 %
Information about Thinking Strategically About Content Destined for Machine Translation
Business & Mgmt

Published on October 25, 2013

Author: contentrules

Source: slideshare.net

Description

Are you treating your content as a strategic asset? If not, you should. This presentation looks at the history of content creation and translation, how we create and translate content today, and how the quality of your source content effects the output of machine translation.

Thinking Strategically About Content Destined for Machine Translation Val Swisher Founder & CEO @contentrulesinc © 2013. Content Rules, Inc. All rights reserved.

Who Am I?  Founder and CEO of Content Rules  25+ years in content arena  Specialty areas:  Global content strategy  Terminology management  Content quality  Single-sourcing / XML / DITA  Finishing third book, “Global Content Strategy,” due out in 2014 © 2013. Content Rules, Inc. All rights reserved.

What is Content Rules?  Professional services firm specializing in: • Content strategy / Global content strategy • Content creation • Content quality / Global readiness     Based in Silicon Valley Founded in 1994 Acrolinx Authorized Services Provider Authorized provider of The Rockley Strategic Method™ © 2013. Content Rules, Inc. All rights reserved.

Global Readiness  Ensure content is translatable    Readability Grammar and style Reuse  Evaluate and improve content quality using state-of-the-art tools     Reports Metrics Recommendations Fixes  Save money on translation © 2013. Content Rules, Inc. All rights reserved.

© 2013. Content Rules, Inc. All rights reserved.

Today’s Presentation       Importance of content Historic background Types of machine translation Content quality affects machine translation results Bleu scores Pre-editing instead of post editing © 2013. Content Rules, Inc. All rights reserved.

Content Is Important 87% of respondents to a recent CMO Council survey said that content had a moderate to major impact on their buying decisions © 2013. Content Rules, Inc. All rights reserved.

Content Is A Strategic Asset © 2013. Content Rules, Inc. All rights reserved.

What Does It Mean to be Strategic? stra·te·gic [struh-tee-jik] adjective 1. pertaining to, characterized by, or of the nature of strategy: strategic movements. 2. important in or essential to strategy. 3. forming an integral part of a stratagem: a strategic move in a game of chess. © 2013. Content Rules, Inc. All rights reserved.

Content Creation In the Past  Content wasn't so easy to create and distribute  Created by trained professionals  Only they had access to the content © 2013. Content Rules, Inc. All rights reserved.

Content Creation Today  Everyone creates content  Very easy to distribute  Now, we have loads and loads of content • Some of it good • Some of it mediocre • Some of it downright awful © 2013. Content Rules, Inc. All rights reserved.

Translation In The Past  Content wasn't so easy to translate.  Trained professionals  Only they understood multiple languages well enough to translate content © 2013. Content Rules, Inc. All rights reserved.

Translation Today  It is easy and free to translate content  We have loads and loads of translated content • Some of it good • Some of it mediocre • Some of it downright awful © 2013. Content Rules, Inc. All rights reserved.

More Machine Translation All The Time  Machine Translation (MT) is becoming more relied upon as a way to get cost-effective, fast translations  %18.05 year-over-year growth of MT expected over next 3 years*  Must pay a more attention to the source content that goes into it  A machine cannot figure what we meant to say based on what we actually wrote  Garbage In – Garbage Out *http://www.researchandmarkets.com/research/2gpj3p/global_machine © 2013. Content Rules, Inc. All rights reserved.

Source Content And Machine Translation  Types of MT engines and the effect of source content on them  What are Bleu scores  How quality of content affects MT output © 2013. Content Rules, Inc. All rights reserved.

MT Engine Types There are three types of MT Engines: 1. Rule-based 2. Statistical 3. Hybrid © 2013. Content Rules, Inc. All rights reserved.

Rule-Based MT (RBMT)      Uses linguistic rules Extensive use of bilingual dictionaries Transfers structure of source language into target language Results are literal translations based on rules Does not handle ambiguity well (word or phrase having more than one meaning) © 2013. Content Rules, Inc. All rights reserved.

Statistical MT (SMT)       Based on analysis of content Engine trained over time More content = better results Need at least 2,000,000 million words per domain Better quality content = better results Results are more natural translations, based on previous source | destination pairs  Google Translate © 2013. Content Rules, Inc. All rights reserved.

Hybrid     Combines rule-base and statistical Provides predictability and consistency of RBMT Provides fluency and flexibility of SMT Reduces the amount of data needed to train the engine © 2013. Content Rules, Inc. All rights reserved.

Training The SMT Beast  Training SMT software extremely important  Poor quality source = Poor quality translations  Some companies have such poorly trained MT engines that fixing the content first is actually not an option  The engine has been trained to translate poor quality source © 2013. Content Rules, Inc. All rights reserved.

The Effect Of Poor Content On SMT And Hybrid MT  Poor or unpredictable translations  Increased time to retrain the system with correct information  Increased post-editing, per language  Wasted money © 2013. Content Rules, Inc. All rights reserved.

Evaluating MT Precision - Bleu Scores  Introduced in 2002 by the IBM Watson Research Center  Automatic evaluation metric used to compare MT output with reference human translation “The closer a machine translation is to a professional human translation, the better it is.” *  Metric widely used throughout the industry *http://acl.ldc.upenn.edu/P/P02/P02-1040.pdf © 2013. Content Rules, Inc. All rights reserved.

Bleu Scores – Helpful Or Hype? According to Callison-Burch, Osborne, and Koehn of the School on Informatics, University of Edinburgh, Bleu scores have many issues*:  Synonyms and paraphrases difficult to score  All words are weighted equally  Difficult to calculate *http://homepages.inf.ed.ac.uk/pkoehn/publications/bleu2006.pdf © 2013. Content Rules, Inc. All rights reserved.

That’s Okay. We Can Post Edit. Original Source Content Post-Edited Translations © 2013. Content Rules, Inc. All rights reserved.

Why Not Pre-Edit Instead?     Fewer issues = less post editing Save time Save money Improve quality © 2013. Content Rules, Inc. All rights reserved.

Create Global-Ready Content      Reduce word count Standardize terminology Enforce correct grammar Eliminate jargon and colloquialisms Increase reuse © 2013. Content Rules, Inc. All rights reserved.

Results of Pre-Editing      Save money Improve quality Faster time to market Fewer in-country iterations Better translation consistency © 2013. Content Rules, Inc. All rights reserved.

Summary      Content is a strategic asset Machine translation is becoming more popular Poor quality content incorrectly trains MT engines Poor quality content results in increased post editing Pre-editing saves money and time, and improves translation quality © 2013. Content Rules, Inc. All rights reserved.

Val Swisher vals@contentrules.com @contentrulesinc © 2013. Content Rules, Inc. All rights reserved.

Val Swisher CEO & Founder vals@contentrules.com @contentrulesinc

Add a comment

Related presentations

Canvas Prints at Affordable Prices make you smile.Visit http://www.shopcanvasprint...

30 Días en Bici en Gijón organiza un recorrido por los comercios históricos de la ...

Con el fin de conocer mejor el rol que juega internet en el proceso de compra en E...

With three established projects across the country and seven more in the pipeline,...

Retailing is not a rocket science, neither it's walk-in-the-park. In this presenta...

What is research??

What is research??

April 2, 2014

Explanatory definitions of research in depth...

Related pages

Thinking Strategically | LinkedIn

View 3748 Thinking Strategically ... Main content starts below. ... Most people seem to be out of practice in thinking creatively and strategically ...
Read more

Localization World: Program Description

Intuit’s journey into machine translation ... Thinking Strategically About Content Destined for Machine Translation Systems . Speaker: Val Swisher ...
Read more

Google Translate

Google's free online language translation service instantly translates text and web pages. This translator supports: English, Afrikaans, Albanian, Amharic ...
Read more

SimulTrans

SimulTrans provides customized solutions for software localization and document translation. ... Machine Translation; ... Thinking Strategically About Content;
Read more

Localization World: Program Description

A08: Thinking Strategically About Content. Speaker: Scott Abel (The Content Wrangler) ... (LSPs) are now beginning to work with machine translation ...
Read more

Nothing Endures but Change : Thinking Strategically about ...

... Thinking Strategically about ICT Convergence. ... Thinking Strategically About Content Destined for Machine Translation. Thinking Strategically: ...
Read more

Translation Exchange

... thinking about translation. ... content to our platform for translation. 2. Choose how you want to translate your content; machine translation, ...
Read more

Simplify your business translations - Language Translation ...

Simplify your business translations. ... your content types (from machine translation to ... beyond translation alone. With teams strategically ...
Read more

Defining Critical Thinking

Critical thinking...the awakening of the intellect to the study of itself. Critical thinking is a rich concept that has been developing throughout the past ...
Read more