Skip to content

Checks Reference

Horn implements 21 check modules validating 78 individual rules across the Matterhorn Protocol checkpoints. Each check targets specific PDF/UA-1 (and some PDF/UA-2) failure conditions.

Check modules

baseline — Checkpoint 01–31

Built-in pdf_oxide PDF/UA-1 structure validation. Provides foundational checks that other modules build on.

structure — Checkpoint 01, 02

Tagged PDF structure validation.

RuleDescription
01-003/MarkInfo must exist with /Marked = true
01-004/StructTreeRoot must exist and have children
02-001/RoleMap entries must resolve to standard types; circular chains detected

content_stream — Checkpoint 01, 10, 31

Content stream analysis.

RuleDescription
01-001All text operations must be inside BMC/BDC..EMC marked content sequences
01-002Image XObject invocations (Do) must be inside marked content sequences
01-005Artifact content must not be nested inside tagged content
31-025Text operations must not reference the .notdef glyph (CID 0)

language — Checkpoint 02, 11

Natural language specification for text strings.

RuleDescription
02-001Outline /Title text must have language context (catalog /Lang or struct-level /Lang)
02-002Annotation /Contents text must have language context
02-003Widget /TU (tooltip) text must have language context
02-004XMP dc:title must have a real language (not just x-default) when no catalog /Lang
11-002All /Lang values must be valid BCP 47 tags
11-005Elements with /Alt text need language context
11-006Elements with /ActualText need language context
11-007Elements with /E (expansion text) need language context

version — Checkpoint 05

PDF/UA version identification and XMP extension schema.

RuleDescription
05-001/Metadata stream must exist in catalog
05-002pdfuaid:part value must match standard (1 for UA-1, 2 for UA-2); UA-2 requires PDF 2.0
05-003pdfuaid:part identifier must be present
05-004Extension schema for pdfuaid must be properly defined (correct URI and prefix)
05-005No duplicate extension schema definitions for the PDF/UA namespace

metadata — Checkpoint 06

Document-level metadata validation.

RuleDescription
06-001Document catalog must contain /Lang entry
06-002XMP must contain pdfuaid:part identifier
06-003ViewerPreferences/DisplayDocTitle must be true
06-004XMP must contain dc:title

dict_entries — Checkpoint 07, 25

Dictionary-level validation and structural integrity.

RuleDescription
07-001StructTreeRoot must contain /ParentTree; completeness validated
07-002MarkInfo/Suspects must not be true
07-003Non-standard structure types must have RoleMap entries
25-001Reference XObjects (/Ref on Form XObjects) are forbidden

nesting — Checkpoint 09

Structure element parent-child rules.

RuleDescription
09-001TR must be inside Table/THead/TBody/TFoot
09-004TH/TD must be inside TR
09-006Container child type rules: Table→TR/THead/TBody/TFoot/Caption; TR→TH/TD; L→LI/Caption; LI→Lbl/LBody; TOC→TOCI/TOC/Caption
Cardinality: at most one THead/TFoot per Table; THead/TFoot require TBody
Caption position: first or last for Table; first for TOC and List

images — Checkpoint 13

Figure accessibility.

RuleDescription
13-004Figure elements must have /Alt or /ActualText
13-005/Alt must not be empty

headings — Checkpoint 14

Heading hierarchy validation.

RuleDescription
14-002First heading should be H1
14-003Generic H headings must use nesting to convey hierarchy (no sibling H elements)
14-006No skipped heading levels (e.g., H1 followed by H3)
14-007Must not mix numbered (H1–H6) and generic (H) headings

tables — Checkpoint 15

Table structure and header association.

RuleDescription
15-002Tables must contain TR with TH or TD children
15-003Tables must have at least one TH header cell
15-004TH cells must have /Scope (/Row, /Column, or /Both); invalid values flagged
15-005Complex tables need /Headers, /Scope, or THead/TBody for header association
15-006RowSpan/ColSpan must be valid positive integers within table dimensions

lists — Checkpoint 16

List structure validation.

RuleDescription
16-001L must contain LI children
16-002LI must contain Lbl or LBody
16-003LBody structure validation

math — Checkpoint 17

RuleDescription
17-001Formula elements must have alternative text (/Alt)

notes — Checkpoint 19

RuleDescription
19-001Note elements must have /ID attribute
19-002NoteRef links validation

optional_content — Checkpoint 20

RuleDescription
20-001Optional content groups must have /Name
20-002Default OCG configuration must be valid
20-003/AS entry (auto-state) is forbidden

embedded_files — Checkpoint 21

RuleDescription
21-001Embedded files must have /AF relationship
21-002File specification must have /Desc

xfa — Checkpoint 25

RuleDescription
25-001Document must not contain XFA form data

security — Checkpoint 26

RuleDescription
26-001Encryption must not block assistive technology access
26-002Security handler must allow content extraction for accessibility

annotations — Checkpoint 28

Annotation accessibility (page-level checks).

RuleDescription
28-001Pages with annotations must have /Tabs = /S (structure order)
28-004Link annotations must have /A (action) or /Dest (destination)
28-006Annotations should have /Contents for accessible text
28-009Form fields must have /T (field name) or /TU (tooltip)

annot_struct — Checkpoint 28

Annotation-to-structure-tree cross-validation.

RuleDescription
28-002All annotations (except Popup/PrinterMark) must have OBJR in structure tree
28-003Parent struct element type must match annotation subtype (Link→/Link, Widget→/Form)
28-005Screen annotations must have /CT (content type) on media clip
28-006Annotations under /Annot struct elements need /Contents or /Alt; zero-size (invisible) annotations exempted
28-007TrapNet annotations forbidden; PrinterMark validation
28-008FileAttachment must have /FS with /F and /UF
28-009Form fields need /TU (tooltip) or /Alt on parent; zero-size (invisible) widgets exempted

fonts — Checkpoint 31

Font embedding, encoding, Unicode mapping, and font program validation. Uses ttf-parser for TrueType font introspection.

RuleDescription
31-001All fonts must be embedded (FontFile/FontFile2/FontFile3)
31-002CIDFontType2 must have /CIDToGIDMap (/Identity or stream)
31-003CIDFont must have valid /CIDSystemInfo; CMap encoding validated; CIDFont Supplement must not exceed CMap Supplement
31-004/CIDSet must be a valid stream when present
31-005/Encoding must be a valid predefined name or dictionary; non-symbolic TrueType must have encoding
31-006Font must have /ToUnicode CMap or standard encoding
31-007ToUnicode CMap must not map to U+0000 (null), U+FFFE, or U+FEFF (noncharacters)

Severities

Each finding has a severity level:

SeverityMeaning
errorThe PDF violates a PDF/UA-1 requirement
warningPotential issue that may affect accessibility
infoInformational finding or best-practice suggestion

Check outcomes

Each check produces one of four outcomes:

OutcomeMeaning
PassThe document satisfies this check
FailA violation was found (includes a message and optional location)
NeedsReviewThe check cannot determine compliance automatically — manual review required
NotApplicableThe check does not apply to this document

Listing checks

Run horn list-checks to see all checks registered in your version of Horn:

bash
horn list-checks

Released under the MIT / Apache 2.0 License.