Conversion from (La)TeX to HTML

Translating LaTeX documents (partially or fully) to HTML is a difficult problem, primarily because the two document formats address very different needs: TeX is intended to produce statically laid out documents with fixed dimensions, ultimately representing ink on paper. HTML, on the other hand, assumes a variety of differently sized and scaled screens and consequently prefers to express layouts in more abstract terms, the typesetting of which are ultimately left to the browser to interpret, ideally responsively — i.e. we want the document layout to adapt to different screen sizes, ranging from 8K desktop monitors to cell phone screens.

This means that there is no one “correct” way to convert TeX to HTML — rather there are many choices to be made; most notably, which aspects of the static layout with fixed dimensions described by TeX code to preserve, and which to discard in favour of leaving them up to the rendering engine, thus explaining the plurality of existing converters.

Naturally, many LaTeX macros are somewhat aligned with tags in HTML; for example, sectioning macros (\chapter, \section, etc.) correspond to <h1>, <h2>, etc.; the {itemize} and {enumerate} environments and the \item macro correspond to <ul>, <ol> and <li>, respectively; and so on. Most converters therefore opt for the reasonable strategy of mapping common LaTeX macros directly to their closest HTML relatives, with no or minimal usage of (simple) CSS, effectively focusing on preserving the document semantics of the used constructs (e.g. “paragraph”, “section heading”, “unordered list”). In many situations, this is the natural approach to pursue, especially if we can reasonably assume that the document sources to be converted are sufficiently “uniform”, so that we can provide a similarly uniform CSS style sheet to style them, and this is largely the way existing converters work. To name just a few:

However, the approach described above has notable drawbacks: Firstly, it requires special treatment of LaTeX macros that plain TeX would expand into primitives, and the number of LaTeX macros is virtually unlimited — CTAN has (currently) a collection of 6399 packages, tendency growing, which get updated regularly, and authors can add their own macros at any point. Supporting only the former is a never-ending task, and providing direct HTML translations for the latter is impossible. This is made worse by the very real and ubiquitous practice among LaTeX users of copy-pasting and reusing various macro definitions and preambles assembled from StackOverflow, friends and colleagues, and handed down for (by now literally) generations, even in situations where (unbeknownst to them) “official” packages with better solutions (possibly supported by HTML converters) exist.


Sources: