| Andrzej Zydroń , xml-Intl Ltd.
xml:tm (XML-based Text Memory) is the vendor-neutral
open XML standard for embedding text memory within an XML document.
xml:tm leverages the namespace syntax of XML to embed text memory
information within the XML document itself. xml:tm provides a radical
new approach to the task of authoring and translating XML documents.
To learn more about xml:tm, please read "Using
XML technology to reduce the cost of authoring and translation",
and "How
to Leverage the Maximum Potential of XML for Localization" by
Andrzej Zydron.
At the core of xml:tm is the concept of "text memory".
Text memory comprises two components:
Author Memory
XML namespace is used to map a text memory view onto a document.
This process is called segmentation. The text memory works at the
sentence level of granularity - the text unit. Each individual xml:tm
text unit is allocated a unique identifier. This unique identifier
is immutable for the life of the document. As a document goes through
its life cycle the unique identifiers are maintained and new ones
are allocated as required. This aspect of text memory is called
author memory. It can be used to build author memory systems which
can be used to simplify and improve the consistency of authoring.
Translation Memory
When an xml:tm namespace document is ready for translation the namespace
itself specifies the text that is to be translated. The tm namespace
can be used to create an OASIS
XLIFF document for translation. xml:tm allows for much more
focused and better defined translation memory matching:
Exact Matching
Author memory provides exact details of any changes to a document.
Where text units have not been changed for a previously translated
document xml:tm provides the basis for declaring an "Exact
match" with the previously translated target language document.
In document leveraged matching
Database Leveraged matching
When an xml:tm document is translated the translation
process provides perfectly aligned source and target language text
units. These can be used to create traditional translation memories.
In document fuzzy matching
The text units contained in the leveraged memory database can
also be used to provide fuzzy matches of similar previously translated
text from within the same document.
Fuzzy matching
The text units contained in the leveraged memory database can
also be used to provide fuzzy matches of similar previously translated
text.
Non translatable text
Text units that are made up solely of numeric, alphanumeric, punctuation
or measurement items can be identified during authoring and flagged
as non translatable, thus reducing the translation count metrics.
Interoperability with other Localization Industry Standards
xml:tm was designed from the outset to integrate closely with and
leverage the potential of other XML based Localization Industry
Standards as well as that of XML syntax itself. In particular:
SRX
(Segmentation Rules eXchange) xml:tm mandates the use of SRX for
text segmentation of paragraphs into text units.
Unicode
Standard Annex #29-9
xml:tm mandates the use of Unicode Standard Annex #29 for tokenization
of text into words.
XLIFF
1.2
xml:tm mandates the use of XLIFF for the actual translation process.
xml:tm is designed to facilitate the automated creation of XLIFF
files from xml:tm enabled documents, and after translation to easily
create the target versions of the documents.
GMX-V (Global
Information Management Metrics eXchange - Volume)
xml:tm mandates the use of GMX-V for all metrics concerning authoring
and translation.
TMX (Translation
Memory eXchange)
xml:tm facilitates the easy creation of TMX documents, aligned at
the sentence level.
DITA
(Darwin Information Technology Architecture)
xml:tm complements the DITA standard by allowing text reuse at the
sentence level within DITA documents.
W3C ITS
xml:tm mandates the use of W3C ITS Document Rules for identifying
translatable text within an XML document as well as W3C ITS Best
Practices with regard to XML document localization.
Implementation
The effective implementation of xml:tm benefits greatly from the
existence of an environment which provides the ability to store
and retrieve previous source and target language versions for a
given XML document. Such an environment is usually provided by a
Content Management System (CMS).
Download xml:tm
xml:tm has been approved on 21st July 2006 by the OSCAR Steering
Committee for public comment prior to final ratification as a standard.
Its contents and format may change prior to official adoption. The
current version (July 21, 2006) can be downloaded (ZIP
file) or viewed online here.
This article has been reprinted from http://www.lisa.org/standards/xmltm/
© 2005 LISA All Rights Reserved
Return to main newsletter page
©2006 by the Center for Information-Development
Management. All rights reserved.
Tel. (303) 232-7586 Fax. (303) 232-0659 info@infomanagementcenter.com |