Tl;dr: skip to Putting It All Together to read the conclusion without having to read the adventure it took to get there.
This is part 2 of a multi-part series where I do a deep dive into Adobe InDesign’s export to HTML functionality, with this post’s focus on tables.
Part 1, Exporting InDesign to HTML: The Basics took a look at exporting basic document elements: title, headers, text (including bolded and italicized text), bulleted lists and numbered lists.
I much prefer posting long-form documents to the web as HTML. It’s not that I dislike PDFs, but experiencing a PDF on a smartphone and, sometimes, on a tablet is irritating. But creating a PDF is much faster than marking up a long document in HTML.
Enter Adobe InDesign, a document production app most often associated with print documents (and PDFs). But in the past few years Adobe has worked to make it viable in the digital publishing space. And one aspect of that is the ability to export a document to HTML.
That functionality intrigues me because the potential efficiencies created by a fully-featured, well-executed export function are measurable. So I decided to do a deep dive into this with three questions framing my assessment:
- How semantic and clean is the markup on export?
- How does a complex document export to HTML?
- How robust are the options when exporting to HTML?
I want to continue being methodical about this, so that as I found pitfalls in the process I would be able to know where to fix things and adjust along the way.
- Update the source Word document I’d created in Part 1 to include a table;
- Use the standard paragraph and character styles found in Word to be consistent across apps;
- Format the table in Word so that the first row of the table is a header; and
- Iteratively add complexity once I met with success doing the simple markup export and successes with each level of complexity added.
Update the Source Word Doc
I added a simple three column by three row table to the Word doc and assigned the top row as the header.
Update the InDesign Document
I re-imported the text from the Word doc to InDesign and realized several things:
- There is no standard paragraph style in Word for table headers;
- In many instances the typography in tables is different than the body copy type; and
- I can’t assign the
<td>tags to the “Normal” Word paragraph style, so I’m going to need to create some new styles.
So I created the two new paragraph styles:
- Table Header: to customize the font and font size for a table’s header and to map the
- Table Normal: to customize the font and font size for content in a table and to map the
I also updated two Word docs with the new styles:
- Style Mapping – Word to HTML doc: since this is my “source of truth” document, all the styles should be in this document.
- InDesign to HTML Word Doc: the document with the dummy text that I’m using to test the functionality
Export Attempt 1
With my new table-specific paragraph styles created, including the mapping to the
<td> tags, I did my first export. I kept the same “Export to HTML” settings in InDesign that I’d done in Part 1.
The table code is very clean, except by mapping the
<td> tags to my newly-created, table-specific paragraph styles it ended up duplicating the tags.
I also noted that there is no
<caption> tag inside the
Export Attempt 2
I removed the
<td> tag mapping in the Table Header and Table Normal paragraph styles respectively. InDesign wouldn’t let the field be blank, so I chose “[Automatic].”
I created the “Table Caption” paragraph style and mapped it to the <p> tag with the class “table-caption.”
This time the export:
- Put the table caption in a
<p class=“table-caption”>tag, and
- Surrounded the table cell text in
<p>tags. In my research, proper HTML table markup does not require text to be surrounded by
I could not find a way to have the text export without the
<p> tag, so I assigned the class “excess-p” to both paragraph styles. That way I can do a find/replace in the HTML to remove it after export.
Export Attempt 3
Before exporting I updated the “CSS Options” in InDesign’s “Export to HTML > Advanced” panel settings to:
- Include classes in HTML,
- Generate CSS, and
- Preserve local overrides
On export I received this error:
Error notification: CSS name collision : 2 detected. Paragraph Style “Table Header” and “Table Normal” generate conflict css name “excess-p”
But it let me continue and exported the HTML. This export produced some messier markup. InDesign:
- Created an external CSS file (probably because the “Generate CSS” checkbox was checked), and
- Added unnecessary classes to the
<td>tags; it also added unnecessary classes to the nested unordered lists elsewhere in the markup.
Export Attempt 4
This time I updated the “CSS Options” in InDesign’s “Export to HTML > Advanced” panel settings to Include classes in HTML but unchecked Generate CSS, which then made the Preserve local overrides option unavailable.
This time I did not get the CSS error prior to export and InDesign only assigned unnecessary classes to the
Export Attempt 5
After doing some searching on the Web, I came no closer to finding a way to prevent InDesign from adding unnecessary classes to the
<td> tags, so I decided to take a look at table and cell styles.
Cell styles, like paragraph and character styles, provide extensive options to quickly style a table cell. One of those options is to assign a paragraph style to text that appears in a table cell. I created two cell styles:
- Table Header: to create a standard design for table headers across Brettro documents, including assigning the Table Header paragraph style; and
- Table Data: to create a standard design for table cells across Brettro documents, including assigning the Table Normal paragraph style.
Table styles, also like paragraph and character styles, provide extensive options to quickly (and consistently) format tables in documents, like choosing cell styles for header rows, body rows and footer rows. I created a table style called “Table Normal” with the “Header Rows” set to the “Table Header” cell style and the “Body Rows” set to “Table Data” cell style.
When exported with table and cell styles applied, InDesign adds those styles as classes to the HTML markup. While they’re still unnecessary, it is a hook for a search-and-replace to quickly remove them in the HTML.
Now that these are automatically included, I updated the “Table Header” and “Table Normal” paragraph styles export mapping to “[Automatic]” and removed the “excess-p” class. It is unnecessary.
Putting it All Together
It does not appear that InDesign provides a way to export tables to HTML without assigning unnecessary classes and HTML tags. That’s unfortunate, but using regular expressions easily and quickly removes the extra classes and tags using the find/replace feature available in just about every code editor.
To create clean, semantically correct HTML markup for tables you’ll need to do two things:
- Create paragraph styles specific to tables and then create cell and table styles. you’ll only need to do this once and then add those styles to your “Presos, Proposals and Pub Type Styles” CC library for repeated quick access to them.
- After exporting to HTML, do several find/replace steps using regular expressions to remove the extra markup. you’ll need to do this with every document you export, but it is a very quick step.
Create Paragraph, Cell and Table Styles
- Create “Table Header” paragraph style with export mapping the “Tag:” to [Automatic].
- Create “Table Normal” paragraph style with export mapping the “Tag:” to [Automatic].
- Create “Table Caption” paragraph style with export mapping the “Tag:” to p with class table-caption.
- Create “Table Header” cell style assigning the “Table Header” paragraph style.
- Create “Table Data” cell style assigning the “Table Normal” paragraph style.
Find and Replace
Using a regular expression to quickly remove unnecessary classes, spaces and tags:
- Fix the
<p class="table-caption">and the
<p class="table-caption">(.*?)</p>[\r\n\t]+<table id="(.*?)" class="Table-Normal">
\ninserts a new line and the multiple
\t’s insert tabs.)
- Fix the
<tr>. (A regular expression is not used here.)
- Fix the
<th>tags that appear as
<td>tags, remove unnecessary classes and unnecessary
<td class="Table-Normal Table-Header">[\r\n\t]+<p>(.*?)</p>[\r\n\t]+</td>
- Fix the
<td>tags, remove unnecessary classes and unnecessary
<td class="Table-Normal Table-Data">[\r\n\t]+<p>(.*?)</p>[\r\n\t]+</td>
Most long-form documents have more complexity than just the basic HTML tags I mapped in part 1 and the table workflow I mapped here, so as I explore this functionality stay tuned for additional entries on this topic.