HTML¶
The Translate Toolkit is able to process HTML files using the html2po converter.
Conformance¶
Can identify almost all HTML elements and attributes that are localisable.
The localisable and localised text in the PO/POT files is fragments of HTML. Therefore, reserved characters must be represented by HTML entities:
Content from HTML elements uses the HTML entities
&(&),<(<), and>(>).Content from HTML attributes uses the HTML entities
"(") or'(').
Leading and trailing tags are removed from the localisable text, but only in matching pairs.
Can cope with embedded PHP, as long as the documents remain valid HTML. If you place PHP code inside HTML attributes, you need to make sure that the PHP doesn't contain special characters that interfere with the HTML.
Ignoring Content from Translation¶
Two methods are available to exclude specific content from translation:
data-translate-ignore attribute
Any HTML element with the
data-translate-ignoreattribute will have its content and all nested elements excluded from extraction.<p>This will be translated</p> <div data-translate-ignore> <p>This won't be translated</p> <p>Neither will this</p> </div> <p>This will be translated again</p>
This is useful for:
Legal disclaimers that must remain in original language
Code examples or technical content
Brand names and trademarks
Copyright notices
translate:off/on comment directives
Content between
<!-- translate:off -->and<!-- translate:on -->HTML comments will be excluded from extraction.<p>This will be translated</p> <!-- translate:off --> <div class="technical-content"> <p>Technical documentation in English</p> <code>function example() { return "code"; }</code> </div> <!-- translate:on --> <p>This will be translated</p>
This is useful for:
Temporarily disabling translation for sections during development
Excluding large blocks without modifying HTML attributes
Working with generated HTML where attributes can't be easily added
Both methods work with html2po extraction and po2html conversion,
preserving ignored content in the original language.
Adding Translator Comment¶
Translators often need additional context to provide accurate translations. The
Translate Toolkit supports adding translator comments using the
data-translate-comment HTML5 data attribute.
data-translate-comment attribute
Any HTML element can have a data-translate-comment attribute to provide
context for translators. These comments are extracted as automatic comments
(marked with #.) when using the --keepcomments option with html2po.
<h1 data-translate-comment="This is the first text that is displayed">Hello world!</h1>
<p data-translate-comment="Welcome message for visitors">Welcome to our site!</p>
<button data-translate-comment="Primary call-to-action button">Sign Up Now</button>
When extracted with html2po --keepcomments, these generate PO files like:
#. This is the first text that is displayed
#: example.html+h1:1-1
msgid "Hello world!"
msgstr ""
#. Welcome message for visitors
#: example.html+p:2-1
msgid "Welcome to our site!"
msgstr ""
#. Primary call-to-action button
#: example.html+button:3-1
msgid "Sign Up Now"
msgstr ""
This is useful for:
Explaining character limits or formatting requirements
Providing UI context (e.g., "Button text", "Error message")
Clarifying ambiguous terms or technical jargon
Indicating target audience or tone
Documenting brand names that should not be translated
The data-translate-comment attribute works alongside regular HTML comments
(<!-- comment -->). Both are extracted and combined when using
--keepcomments, giving translators comprehensive context.
Note
The data-translate-comment attribute uses the HTML5 data-* attribute
specification,
which is designed for custom data that doesn't affect rendering. This means
your HTML remains valid and the attributes are safely ignored by browsers.
Message Context (Disambiguation)¶
Sometimes the same source string needs different translations depending on
where it appears (e.g. "Open" as a verb vs. noun). You can disambiguate such
cases using the data-translate-context attribute. Its value becomes the
msgctxt in the generated PO file, allowing separate translations for the
same msgid, without the excessively differentiated contexts from automated
context disambiguation.
If an element does not specify data-translate-context, the following fallbacks
apply when disambiguated contexts are needed due to duplicate source strings:
{filename}:{element_id}when the element has anidattribute{filename}+{ancestor_id}.{relative_tag_path}:{line}-{column}when the elementdoes not have an
idbut is a descendant of an element with anid. Therelative_tag_pathis the dot-separated tag path from the ancestor to the element.line/columncome from the element's source location.
<p data-translate-context="verb">Open</p>
<p data-translate-context="noun">Open</p>
After extraction with html2po the PO will contain two distinct entries:
msgctxt "verb"
msgid "Open"
msgstr ""
msgctxt "noun"
msgid "Open"
msgstr ""