DITA: The Obama of Global Content?
Ultan，November 29, 2008
Is Darwin Information Typing Architecture (DITA) the answer to all our content globalization problems? On its own, no. The big issues remain fundamentally the same as before. Yet, that’s not the impression we’re often given when out shopping for solutions, but rather a switch to DITA will somehow solve issues of cost, quality, content in every language. But then, most of us have yet to come across a vendor who wouldn’t say, “Yes, we can” (for a price) either, have we? Here’s my analysis…
Time and time again there’s a claim that the use of DITA leads to big translation savings, better content quality to translate, easily delivered content in every language and so no. Usually, the use of DITA is positioned in this context along with the use of some content management systems that are then plugged into various localization workflows.
This kind of DITA globalization solution stuff has been kicking around for 4 or 5 years now, and various “out of the box” solutions are pushed by various vendors. Of course, people have solutions to sell, white papers to post, and PowerPoint Karaoke to rehearse for the next localization conference, but all this touching faith in DITA per se from solutions vendors needs to be challenged.
Perhaps think about the following issues and questions when you’re considering a DITA globalization solution:
* If you have existing content, including translated material stored in a CMS or in TMs, especially content created from a non-structured environment, then how do you migrate to DITA? What about internal tags that might be stored in TMs? How does a format-based content creation system map to a structured environment? What for example would you map the STRONG or B element in a format-based HTML environment to in DITA land? Or an heading level 5 equivalent in RTF? Oh, and can we see a large-scale solution please? Not one based on the translation of a couple of hundred of pages. Anyone who has been involved in these migration projects knows that it is not a trivial undertaking - even with customized tools. Sorry folks, no out of the box solutions there.
* Why would DITA reduce word count, as has been claimed, if you can still write as much content as you like, in any way you like? Just like in any other environment, structured or otherwise, you need to establish authoring rules, educate about them, enforce the rules and then measure the resulting volume and re-use. DITA on its own will not help.
* Why does it improve content quality? It cannot. DITA is about structuring content, not QA of that content. You need manual or automatic review tools or a combination of these. Just like any other authoring environment you need a process for this. The tools and processes that might - like controlled authoring - work on the same principles as non-DITA content.
* Why does DITA make product globalization easier - “content in any language”? Just because you can structure your content does not mean the rendering of that content is automatically provided for. In fact, the two issues - structure and formatting are deliberately separate. So, think about how your ability to render your XML content as Arabic PDF using XSL-FO (a little more complicated than CSS) or whatever. Why would using DITA make such rendering easier than if you used any other flavor of XML to write simple topics? As far as I can see there are a good few DITA pushers out there who simply haven’t a clue about rendering in this regard. Oh, and how does DITA solve the old problem that nobody wants to address (and I’ve been asking about for 10 years) - the automatic and correct alphabetical sorting of localized content such as online and print indices, glossaries, and so on?
* What is the relationship between DITA and the ITS and XLIFF? Do you translate DITA directly? If not, why not?
* How do you address the problem of topic-based authoring from a translation viewpoint? If you’re translating piecemeal, then obviously there is less overall context, so what happens when you assemble it using a bookmap? At lower granularity, the use of DITA element names like step or shortdesc don’t help that much (particularly if they content they’re supposed to express bears no relationship to the element name - oh, but that problem exists in any XML environment).
* Translation rules - with the exception of some best practices from Joann Hackos and the DITA translation subcommittee (practices that I rarely see cited) - has anyone considered the potential translation problems of conrefs and the challenges of indexing topic-based materials with keywords? There are other areas too. Even conrefs at the paragraph levels present challenges for translation.
* What translation tools support DITA out of the box (I mean non-specialization)? When I last checked the leading tag-editing tool couldn’t do it, and required faffing about with INI files and so on to cater for the different non-specialized topic solutions of DITA. Plus, if there is any specialization of the DITA DTD or schema, then even if there was an out of the box solution, content authors would still need to tell content translators what those XML elements and attributes really meant. Er, just like you did years ago with HTML too…
* Most importantly, will DITA speed up the arrival of my economic stimulus check from the IRS?
So, is anyone up to the challenge of addressing these issues? Asking the questions, and demonstrating the answer for real?