Post

Restoring Non-Breaking Spaces in Pandoc 3.x with a Lua Filter

Restoring Non-Breaking Spaces in Pandoc 3.x with a Lua Filter

Pandoc 3.x changed how it handles the tilde character (~) in Markdown-to-LaTeX conversion. Where earlier versions treated ~ as a LaTeX non-breaking space, Pandoc 3.x now emits \textasciitilde — a literal tilde glyph. This breaks documents that rely on ~ for non-breaking spaces in names, units, and references. A small Lua filter restores the expected behavior.

The Problem

In LaTeX, ~ is a tie — a non-breaking space that prevents line breaks between adjacent words. Technical documents use it constantly:

1
2
3
4
Figure~1        % keeps "Figure" and "1" on the same line
Dr.~Smith       % prevents break between title and name
100~MHz         % keeps value and unit together
Section~3.2     % anchors label to number

Prior to version 3.x, Pandoc passed ~ through to LaTeX unchanged, preserving this behavior. Starting with Pandoc 3.x, the Markdown reader interprets ~ as a literal tilde and the LaTeX writer emits \textasciitilde, which renders as a raised tilde glyph (~) instead of a space.

A document with Figure~1 now produces Figure\textasciitilde 1 in the LaTeX output — visually wrong and typographically broken.

The Filter

The fix is a Lua filter that intercepts Str elements in the AST, finds tildes, and replaces them with raw LaTeX ~ characters:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
--- Convert literal ~ in text to LaTeX non-breaking space (~).
--- Pandoc 3.x treats ~ as \textasciitilde in markdown→LaTeX;
--- this filter restores the expected non-breaking space behavior.
function Str(el)
  if el.text:find("~") then
    local parts = {}
    for part in el.text:gmatch("[^~]+") do
      table.insert(parts, part)
    end
    local result = {}
    for i, part in ipairs(parts) do
      if part ~= "" then
        table.insert(result, pandoc.Str(part))
      end
      if i < #parts then
        table.insert(result, pandoc.RawInline('latex', '~'))
      end
    end
    return result
  end
end

Walkthrough

The filter operates on Str elements — the AST nodes that contain literal text strings. Pandoc splits inline content into typed elements: Str for text, Space for whitespace, Code for inline code, and so on. A tilde embedded in text lands inside a Str node.

Early Exit

1
if el.text:find("~") then

The guard clause uses Lua’s string.find to check whether the text contains a tilde at all. Most Str elements don’t, so this skips them without allocating any tables.

Splitting on Tildes

1
2
3
4
local parts = {}
for part in el.text:gmatch("[^~]+") do
  table.insert(parts, part)
end

gmatch("[^~]+") extracts every run of non-tilde characters. For the input Figure~1, this produces {"Figure", "1"}. For A~B~C, it produces {"A", "B", "C"}.

Reassembling with Raw LaTeX

1
2
3
4
5
6
7
8
9
10
local result = {}
for i, part in ipairs(parts) do
  if part ~= "" then
    table.insert(result, pandoc.Str(part))
  end
  if i < #parts then
    table.insert(result, pandoc.RawInline('latex', '~'))
  end
end
return result

Each text fragment becomes a pandoc.Str element, and each tilde position becomes a pandoc.RawInline('latex', '~') — a raw LaTeX non-breaking space that Pandoc passes through verbatim to the writer.

The if i < #parts condition places a tilde between every pair of fragments but not after the last one. For Figure~1, the result is:

1
[Str "Figure", RawInline "~", Str "1"]

Returning a list from a filter function tells Pandoc to replace the original element with the list contents spliced into the document.

Usage

Save the filter as nonbreaking-tilde.lua and pass it to Pandoc:

1
2
3
pandoc input.md \
  --lua-filter=nonbreaking-tilde.lua \
  -o output.pdf

In a multi-filter pipeline, this filter should run early — before filters that modify or consume inline elements:

1
2
3
4
5
pandoc input.md \
  --lua-filter=nonbreaking-tilde.lua \
  --lua-filter=other-filters.lua \
  --filter=pandoc-minted.py \
  -o output.pdf

Verifying the Fix

Check the intermediate LaTeX to confirm the filter is working:

1
2
pandoc input.md --lua-filter=nonbreaking-tilde.lua -o output.tex
grep '~' output.tex

You should see bare ~ characters in the output rather than \textasciitilde.

Why Lua Over Python

This filter is a good example of when to use a Lua filter instead of a Python JSON filter. Lua filters run inside Pandoc’s process with no serialization overhead. A Python filter would need to parse the full JSON AST, walk every element, and serialize it back — adding hundreds of milliseconds for what amounts to a string replacement. The Lua version adds negligible overhead since it skips elements that don’t contain tildes.

References

This post is licensed under CC BY 4.0 by the author.