ADR-0001: code block syntax highlighting preservation
Status: accepted | Date: 2026-02-13
References: RFC-0002
Context
Copy-paste HTML platforms (WeChat, Zhihu, etc.) require syntax highlighting to be preserved in code blocks, but the current parser discards it.
Problem Statement
In src/html_utils/util.rs::extract_code_text (lines 27-38), we intentionally strip all HTML tags (including <span> tags with highlighting styles) to produce plain text for the CodeBlock::code field. This was designed for Confluence, which uses CDATA sections requiring pure text.
However, copy-paste HTML platforms like WeChat preserve and display highlighted code when pasted from the preview. Currently, users see monochrome code after pasting, losing syntax highlighting entirely.
Constraints
- RFC-0002:C-PIPELINE-STAGES defines
CodeBlock { code, language, attrs }— changing this structure affects all adapters. - Confluence needs plain text (CDATA cannot contain HTML entities).
- Copy-paste HTML platforms need styled HTML for best UX.
- Markdown platforms need plain text + language tag (not HTML).
Options Considered
- Store both plain and highlighted HTML in
CodeBlock(dual storage). - Add a capability flag to control parser behavior per-adapter.
- Re-highlight at serialization time using a Rust syntax highlighter.
Decision
We will store both plain text and highlighted HTML in CodeBlock, and add a code_highlight capability to let each adapter/profile declare which representation it needs.
Data Structure Change
#![allow(unused)]
fn main() {
// Before
CodeBlock { code: String, language: String, attrs: Attrs }
// After
CodeBlock {
code: String, // Plain text (always available)
highlighted: Option<String>, // HTML with syntax highlighting (if rendered)
language: String,
attrs: Attrs
}
}
Capability Addition
Add code_highlight field to adapters.toml and profiles.toml:
# adapters.toml
[[adapter]]
id = "ghost"
code_highlight = true # Use highlighted HTML in code blocks
...
[[adapter]]
id = "confluence"
code_highlight = false # Use plain text (CDATA requirement)
...
# profiles.toml
[[profile]]
id = "wechat"
format = "html"
code_highlight = true # Preserve syntax highlighting
...
[[profile]]
id = "csdn"
format = "markdown"
code_highlight = false # Markdown uses plain code
...
Why This Approach
- Dual storage — both representations available; no information lost.
- Explicit capability — each platform declares its need; no guessing.
- Zero behavioral change by default —
code_highlight = falsepreserves current behavior. - Single source of truth — both representations come from the same render pass.
Implementation Notes
- Parser change:
parse_code_blockstores rawinner_htmlinhighlightedfield. - Capability check: Serializers check
adapter.code_highlight()to decide which field to use. - Confluence:
code_highlight = false→ usescodefor CDATA. - Ghost/WordPress:
code_highlight = true→ useshighlightedfor styled HTML. - Markdown profiles:
code_highlight = falseby default (Markdown doesn’t support inline HTML).
Consequences
Positive
- Copy-paste HTML platforms preserve syntax highlighting — better UX for WeChat, Zhihu, etc.
- Language detection works in editor paste —
data-langattribute preserved in highlighted HTML. - Each platform explicitly declares its code rendering preference — no implicit behavior.
- API adapters like Ghost/WordPress can also benefit from syntax highlighting.
- Future-proof — new adapters declare their capability; no code change needed.
Negative
- Slight memory overhead —
highlightedfield duplicates content. Mitigation: code blocks are typically small. - Schema change in
HtmlElement::CodeBlock— all pattern matches must be updated. Mitigation: compiler enforces exhaustive matching. - New capability field in two TOML files — requires updating
adapters.tomlandprofiles.toml. Mitigation: defaults tofalsefor backward compatibility.
Neutral
highlightedfield isNonefor code blocks constructed programmatically. Serializers fall back tocode.code_highlight = falseis the default for all platforms initially; existing behavior unchanged until explicitly enabled.