Adelheid Certik, Mizuho OSI
November 1, 2025

Tips and Tricks for Success

A solid content conversion strategy is one of the key pillars of a successful DITA implementation. It requires some planning and consideration of several factors:

  1. What content will we convert?
  2. Will we convert content in house, or will we outsource conversion?
  3. What pre-conversion work do we need to do?
  4. What tools can help us with content conversion?

During our DITA implementation, we chose to convert content ourselves and learned a lot along the way about tools that can automate much of the process if you do the necessary pre-work. Regardless of how you choose to handle your content conversion, the following tips should help you in your journey.

While it is good to start thinking about your conversion strategy early, having a solid Information Model before making any final decisions is helpful. The Information Model becomes critical once you’re ready to start pre-conversion tasks.

 

Choosing Content to Convert

When looking at legacy content, it may not be practical or desirable to convert all existing content. There are many things to consider when determining what content should be converted. Is content representative enough of your use cases to act as your pilot project? Does content contain all of your reuse examples? Will content be stable, or relatively stable, throughout the conversion process?

I found it easiest to categorize content for each product based on a continuum from high priority for conversion to content that will never be converted.

 

The following table provides some examples of things to think about when categorizing your content.

Choosing to Convert In-House vs. Outsourcing

Choosing whether to do the conversion work yourself is not always a straightforward task. Certainly, content volume is a huge consideration, but so is the size of your team, anticipated workload, and the cost of outsourcing. While the decision most often revolves around a “Time vs. Money” equation, there are other things to consider, such as exploration and education opportunities for your team. You can also choose different options at different phases of the journey.

pictogram of a lit light bulb on a light blue background. You may choose to convert your pilot content in-house, while the bulk of your content conversion is outsourced later. This can be a solid option for a team that needs more hands-on experience with DITA in the earlier phases of the project.

 

In-House conversion may provide more opportunities for writers not as familiar with DITA and structured authoring to get more comfortable with the process. Converting pilot content in-house may also allow for further refinement of your Information Model and stylesheet during the conversion process. This could reduce the amount of post-conversion clean-up work needed or outsourcing rework needed due to changes in the Information Model.

Completing Pre-Conversion Tasks

Once you’ve determined what content you are going to convert, you will need to assess the content for any pre-conversion clean-up work needed. This is true for outsourcing conversion as well as in-house conversion.

The three main things to consider are:

  1. Does the current content structure follow your Information Model? If not, how much editing is required to fix the structural problems?
  2. Does your current content use styles consistently? Does it use enough styles to cover all the structure identified in your information model?
  3. Where are your major reuse areas? Can they be easily identified for special handling during conversion?

When outsourcing DITA conversion work, discuss with your vendor the level of pre-work needed and who will be responsible for which tasks.

1.      Edit Content Structure

Any major structural issues must be identified and corrected ahead of conversion. If your content in its unstructured form does not adhere to the structural requirements of your Information Model, and you do not correct it now, it will have to be manually corrected during conversion. This will limit the time-savings from any automation in the conversion process.

Minor structural issues may not be as critical to correct, but cleaning up source content as much as possible will make the conversion process much more smooth, and will lead to a better end result.

2.      Create or Update Existing Template

If your existing content does not use standardized styles, create a new template with styles that you can apply to your content. If you already have a standard template with styles, review the template to identify any additional styles that need to be created. These would most likely be for inline elements that currently share the same formatting style. Create a revised template just for content conversion.

pictogram of a lit light bulb on a light blue background. The best way to ensure content gets converted properly is to map a specific formatting style name to a specific DITA element or element/attribute combination.

3.      Chunk and Identify Reuse Content

Once the new or revised style template has been applied, break the content down into topics. Identify reuse content so that can be handled appropriately during the conversion process.

Converting Content

If you have decided to convert content in-house, you may already have some tools that can help you apply XML structure to your content. If your source content includes FrameMaker files, FrameMaker has a conversion table function that can assign your styles to specific XML elements and attributes. If your source content is in MS Word and you are using Oxygen for XML authoring, you can use an Oxygen plug-in to create a conversion table.

Setting up the conversion table using either tool takes a bit of work, but once completed, these conversion tables can be used for all your FrameMaker or MS Word source content, so long as the content uses the same styles. Both tools allow for batch processing of all files within a folder, so a huge amount of content can be converted relatively quickly.

The refactoring tools in Oxygen can also be used if further adjustments to the xml files are needed.

FrameMaker

pictogram of a lit light bulb on a light blue background. If using FrameMaker, I recommend creating separate conversion tables for each topic type (concept, task, reference). Since the table can nest elements within other elements, I recommend starting with lowest level element and then wrap other elements around them.

 

Here is a simple example that illustrates a conversion table for content that uses the paragraph styles Note, Caution, and Warning:

 

The first three rows say: for any content using this style, wrap it in a paragraph element with a qualifier of note, caution, or warning.

The last three rows say: for any paragraph tag with this qualifier, wrap it in a note element and use the type attribute containing a value of note, caution, or warning.

Oxygen (for conversion from MS Word)

Oxygen’s conversion table for MS Word uses an HTML intermediary. Because of the limited number of elements in HTML, you’ll be assigning attributes to help with additional conversion steps after the initial batch conversion. This conversion table is meant to be used in conjunction with Oxygen’s refactoring tools or a larger XSLT script.

Here is a simple example that illustrates a conversion table for content that uses the paragraph styles Note, Caution, and Warning:

 

These instructions say: for any content using this style, wrap it in a paragraph tag with an outputclass attribute containing this value. This is the equivalent of what the first three rows in the FrameMaker conversion table will do.

pictogram of a lit light bulb on a light blue background. If you rely primarily on refactoring tools to complete the structural requirements for DITA, I recommend writing up a list of instructions with the proper sequence for the refactoring steps. Each refactoring instruction can also be applied to multiple files to speed up the process.

 

Completing Post-Conversion Tasks

Once topics are converted to DITA, they can be uploaded to the CCMS you are using. Once uploaded, you can start adding all links for images or cross-references create your maps, and apply reuse strategies (content references, key references).

Closing Thoughts

Content conversion requires planning and content auditing to be successful. To get the most out of tools that can help automate parts of the process, some rework of your existing content will be needed.

With recent advances in generative AI (not available when we worked through our initial content conversion project), there is even more streamlining and time-saving potential available:

  • Helping to identify where content may need to be rewritten to adhere to your Information Model.
  • Helping with chunking content.
  • Helping to develop XSLT to automate conversion.

Whether you choose to outsource or tackle this project in-house, I hope this provides ideas and insights to help with your content conversion.