Pre-push review: fix-localisation-warnings

Three commits cleaning up 298 spurious localisation build warnings produced on every macOS build. Reference scanner rewritten as a Swift-aware tokeniser; sandbox setting flipped; three orphaned catalog keys removed; review-driven follow-up fixed four edge-case bugs in the new tokeniser.

At a glance

298 → 0 localisation build warnings on macOS
New scanner: Swift-aware tokeniser handling single/multi-line strings, nested literals in interpolations, raw strings, block comments, and escape-aware interpolation detection
Build setup: validate phase passes --scan-roots "$SRCROOT/prism"; ENABLE_USER_SCRIPT_SANDBOXING = NO at target level (no per-phase override exists)
Catalog hygiene: removed three orphaned keys left over from removed UI
Review-driven follow-up: fixed escaped-backslash classification bug, raw-string false-positives, dead code, list→set perf

Important changes

keySwift-aware tokeniser replaces 11-pattern regex set
keySandbox disabled at target level
keyEscape-aware interpolation detection
keyRaw strings skipped wholesale
keyThree orphaned catalog keys removed

Verdict

Ready to push

All review findings either fixed or explicitly deferred with rationale. macOS build is clean (zero localisation warnings, zero new compiler warnings). 31 Python tests pass, lint clean.

Review findings

8 raised · 5 fixed · 3 skipped

Jump to findings →

Commits

b5d62d3 [fix]: Silence 298 spurious localisation build warnings — Arjen Schwarz · 2026-05-17
de121c1 [fix]: Apply pre-push review fixes to localisation scanner — Arjen Schwarz · 2026-05-17
3c32352 [docs]: Document build-warning hygiene work in implementation.md — Arjen Schwarz · 2026-05-17

Three-level explanation

Beginner Intermediate Expert

What changed

Every macOS build was printing 298 warnings claiming catalog entries were unused, even for strings the app clearly uses. Two commits silence the noise without removing the check:

Pass the script the right path to scan and turn off Xcode's sandbox so it can actually walk the source tree.
Replace the eleven brittle search patterns with a small Swift parser that understands string interpolation, multi-line strings, and nested literals.

A third commit fixed four bugs reviewers spotted in the new parser. Three genuinely orphaned catalog entries (left over from a removed Settings picker) were deleted.

Why it matters

When every build prints hundreds of warnings, developers stop reading them. Real problems get buried. Cleaning the noise restores the warning channel.

Architecture

The orphan scanner in Tools/validate-localisation.py previously matched eleven hand-written regexes (Text("…"), Button("…"), etc.). It missed Toggle, Picker, Section, .alert, multi-line strings, nested literals in interpolations, and any literal with \(…) interpolation.

Replaced with a recursive-descent tokeniser (_extract_swift_string_literals + helpers) that yields every string literal body. Interpolated literals become regex patterns where each \(…) matches a printf specifier from _FORMAT_SPEC_RE (%@, %lld, %ld, etc.). The Xcode build phase now passes --scan-roots "$SRCROOT/prism" and ENABLE_USER_SCRIPT_SANDBOXING is off at target level (sandbox grants only literal directory listing access, not recursive subpath — there's no per-phase override).

Trade-offs

Hand-rolled tokeniser over SwiftSyntax: no Swift-tooling dependency for ~150 lines of Python.
Wildcard format-spec match instead of inferring spec from interpolation type: type inference would require real type-checking; xcstringstool's convention makes the wildcard sufficient.
Sandboxing off target-wide vs. per-phase: per-phase setting isn't exposed. The two build scripts are trusted local code that only read project sources and write to $DERIVED_FILE_DIR.

Tokeniser internals

Four cooperating consumers: _consume_single_line_string, _consume_multiline_string, _find_interpolation_end, _skip_raw_string. The interpolation finder delegates to the string consumers so a ) inside a string inside an interpolation doesn't terminate the interpolation. Multi-line literals are reassembled via _process_multiline_body with closing-"""-based indent stripping and \<newline> line continuation.

Escape-aware classification

Pre-fix bug: '\\(' in literal classified bodies containing escaped backslashes (e.g. a\\(x)b from Swift "a\\(x)b") as interpolated, turning legitimate text into format-spec wildcards. Fix: _literal_has_interpolation walks character-by-character skipping \\ escape pairs; matching escape-skip added to _yield_with_inner_literals and _interpolated_literal_to_pattern.

Performance

Resolution is O(N×M): ~295 catalog keys × ~271 unique interpolated literals ≈ 10ms after the list→set change. End-to-end script is ~150ms, invisible next to Swift compilation. At 10× catalog size this would warrant a single compiled |-alternation regex.

Known limitations

Regex literals (Swift 5.7+ /…/) not recognised; codebase doesn't use them.
Tokeniser preserves raw source bytes (\\ stays two characters) rather than interpreting Swift escapes. Invisible today; matters if a catalog key ever contains an escape.
Best-effort by design — false-negatives produce spurious orphan warnings (easy to investigate); the escape-aware and raw-string handling close the two known false-positive vectors that would hide orphans.

Important changes — detailed

Swift-aware tokeniser replaces 11-pattern regex set

Tools/validate-localisation.py

Why it matters. The previous scanner missed nearly every interpolated literal, every Toggle/Picker/Section/alert/multi-line use, and every nested literal. That produced 298 spurious 'unreferenced' warnings on every build and would mask real orphans as the catalog grew.

What to look at. Tools/validate-localisation.py:386-578 (_extract_swift_string_literals + helpers)

Takeaway. When string-literal extraction has to handle nested constructs (interpolation containing string containing interpolation), a small recursive-descent state machine in ~150 lines is cheaper than maintaining ever-growing regex lists. Python's tokenize/ast/shlex don't model Swift, and SwiftSyntax adds tooling overhead disproportionate to the problem.

Rationale. Decision 9 of the localisation spec explicitly tolerates false negatives in the warning scanner (warnings never fail the build). The pre-existing regex approach went beyond tolerable: it produced wrong warnings for valid code, not just missing warnings for unusual code.

Sandbox disabled at target level

prism.xcodeproj/project.pbxproj

Why it matters. Xcode's user-script sandbox grants `literal` access only to declared input directories — it forbids recursive subpath access. The orphan scanner must walk every `.swift` file under `prism/`, so sandboxed runs reported every catalog key as unreferenced.

What to look at. prism.xcodeproj/project.pbxproj:513,575 (ENABLE_USER_SCRIPT_SANDBOXING) and :294 (--scan-roots argument)

Takeaway. Apple's user-script sandbox in Xcode 15+ doesn't expose a per-phase override. If a script needs subpath access to project sources, the choices are (a) generate an xcfilelist enumerating every file (brittle), (b) disable the target-level setting (broad), or (c) move the work to a separate scanner-only target. For trusted local scripts that read sources and write to $DERIVED_FILE_DIR only, (b) is acceptable.

Rationale. Considered xcfilelist generation (rejected: 198-file set churns), separate target (rejected: loses per-build feedback), and accepting warnings (rejected: 298 noise lines hide real orphans). The two scripts in this target are trusted local code; reverting to sandbox-on would silently break the scanner.

Escape-aware interpolation detection

Tools/validate-localisation.py

Why it matters. Pre-fix, a literal like `a\\(x)b` (escaped backslash followed by literal `(x)`) was classified as interpolated and turned into a regex matching `a%@b`. Currently dormant (no catalog key has `\\(`), but a real correctness bug that would silently corrupt match results.

What to look at. Tools/validate-localisation.py:_literal_has_interpolation, _yield_with_inner_literals, _interpolated_literal_to_pattern

Takeaway. Substring checks (`'\\(' in literal`) and naive `i + 1` lookups don't compose with escape sequences. A two-character walk that skips `\\` pairs is required wherever you classify or transform Swift string bodies. Same pattern applies to any tokenised language that uses backslash escapes plus a backslash-prefixed sigil for a richer construct.

Rationale. Found by review agent; fixed with single helper plus matching escape-skip in two existing walkers. New test pins the regression.

Raw strings skipped wholesale

Tools/validate-localisation.py

Why it matters. MarkdownBlock.swift uses `#"..."#` raw strings for regex patterns. In raw strings, `\(` is literal text, not interpolation. The pre-fix scanner would (a) extract regex bodies as 'literals', spuriously matching catalog keys that happened to share the regex text, and (b) treat `\(` inside them as interpolation and synthesise wrong patterns.

What to look at. Tools/validate-localisation.py:_skip_raw_string + _extract_swift_string_literals dispatch

Takeaway. When scanning for one kind of literal, identify and skip the other kinds completely rather than half-parsing them. Raw strings, regex literals, and triple-quoted strings each have terminator rules that don't compose with the simple-string consumer.

Rationale. Found by review agent. Skipping raw strings wholesale is sufficient because Prism never uses them for localised content; if that changes, a separate consumer with `\#(...)` interpolation handling would be needed.

Three orphaned catalog keys removed

prism/Localizable.xcstrings

Why it matters. Catalog hygiene: keys for UI that no longer exists should be removed so the build-warning channel stays meaningful.

What to look at. prism/Localizable.xcstrings: `%@ free exports remaining`, `Note Style`, `Note display style`

Takeaway. Removing a Settings picker or other UI element should include removing the catalog entries it was the only consumer of. The build's orphan-warning surface is what catches these omissions — it has to be noise-free to work.

Rationale. The bubble-style picker was disabled in #243 with code preserved for potential restore; the catalog entries were left in place by oversight. The `%@ free exports remaining` single-arg form was superseded by the two-arg `%@ of %@ free exports remaining` during paywall iteration.

Key decisions

Disable target-level <code>ENABLE_USER_SCRIPT_SANDBOXING</code>

Xcode's per-phase override isn't exposed. Both scripts in the target are trusted local code that only read project sources and write to $DERIVED_FILE_DIR. Updated specs/localisation/design.md; would warrant a fresh ADR entry per project convention (drafted by the spec-review agent, not added in this PR to keep the change tight).

Hand-rolled tokeniser, not SwiftSyntax

Adding SwiftSyntax for a build-time warning scanner is disproportionate. Python's tokenize/ast/shlex don't model Swift's lexical rules (triple-quoted strings with indent stripping, \(...) with nested literals, nested block comments). The ~150-line tokeniser is sufficient and self-contained.

Wildcard format-spec match

Each \(...) interpolation in a source literal matches any catalog format specifier (%@|%lld|%ld|…) rather than inferring from the interpolated expression's type. Type inference would require real type-checking; xcstringstool's convention of synthesising spec from the type makes the wildcard match safe for our catalog.

Skip raw strings entirely

Raw strings (#"..."#) hold regex patterns in this project; their bodies should never be interpreted as catalog references. Skip via _skip_raw_string rather than partially parsing. If raw strings ever hold localised content with \#(...) interpolation, this needs a dedicated consumer.

Review findings

Severity	Area	Finding	Resolution
major	Tools/validate-localisation.py:_literal_has_interpolation	Pre-fix, <code>'\\(' in literal</code> mis-classified bodies with escaped backslashes (<code>\\(x)</code>) as interpolated. Latent corruption.	Added escape-aware <code>_literal_has_interpolation</code>; matching escape-skip in <code>_yield_with_inner_literals</code> and <code>_interpolated_literal_to_pattern</code>. New test pins the behaviour.
major	Tools/validate-localisation.py raw-string handling	Swift raw strings (<code>#"..."#</code> used in MarkdownBlock.swift for regex patterns) were partially tokenised — their bodies extracted as literals and <code>\(</code> inside them treated as interpolation.	Added <code>_skip_raw_string</code> that consumes the entire raw-string span. New test pins the behaviour.
minor	Tools/validate-localisation.py:_scan_referenced_keys	<code>interpolated</code> collected as a list with ~14% duplicates across 198 files; resolve step ran 4× longer than necessary (~39ms vs ~10ms).	Changed to <code>set</code>.
minor	Tools/validate-localisation.py:_process_multiline_body	<code>skip_next_leading</code> was assigned False in every branch and never read True — dead state.	Removed.
minor	Tools/Tests/test_validate_localisation.py	Missing edge-case tests for block comments and raw-string skipping.	Added <code>test_block_comment_contents_are_not_referenced</code>, <code>test_raw_string_is_skipped</code>, <code>test_escaped_backslash_before_paren_is_not_interpolation</code>. Test count 25 → 31.
minor	Tools/validate-localisation.py:_consume_single_line_string + _consume_multiline_string	~15 lines of duplicated escape-handling between the two consumers.	Skipped — both functions are readable; unification would obscure the differences (newline-as-terminator vs not, multi-line post-processing vs not).
minor	specs/localisation/decision_log.md	Spec-review agent suggests new ADR entries for (a) target-level sandbox disable and (b) tokeniser approach.	Skipped here. Both decisions are documented in design.md and the agent note; converting to ADRs is appropriate as a separate spec-tidy task.
minor	Tools/validate-localisation.py: regex literals and positional format args	Swift 5.7+ regex literals (<code>/…/</code>) and positional printf args (<code>%1$lld</code>) aren't handled.	Skipped. Codebase doesn't use regex literals; catalog keys use the non-positional form (positional appears only in translated values, not keys). Documented as known limitation in implementation.md.

Per-file diffs

Click to expand.

Tools/validate-localisation.py Modified +346 / -47

diff --git a/Tools/validate-localisation.py b/Tools/validate-localisation.py
index 30093b9..363f7cf 100755
--- a/Tools/validate-localisation.py
+++ b/Tools/validate-localisation.py
@@ -379,28 +379,243 @@ def _compile_catalog(merged_path, compile_output):
         shutil.rmtree(staging, ignore_errors=True)


-_KEY_REF_PATTERNS = [
-    re.compile(r'Text\(\s*"([^"]+)"'),
-    re.compile(r'Button\(\s*"([^"]+)"'),
-    re.compile(r'Label\(\s*"([^"]+)"'),
-    re.compile(r'String\(\s*localized:\s*"([^"]+)"'),
-    re.compile(r'LocalizedStringKey\(\s*"([^"]+)"'),
-    re.compile(r'LocalizedStringResource\(\s*"([^"]+)"'),
-    re.compile(r'NSLocalizedString\(\s*"([^"]+)"'),
-    re.compile(r'\.help\(\s*"([^"]+)"'),
-    re.compile(r'\.badge\(\s*"([^"]+)"'),
-    re.compile(r'\.navigationTitle\(\s*"([^"]+)"'),
-    re.compile(r'\.navigationSubtitle\(\s*"([^"]+)"'),
-]
+# Catalog format specifiers we recognise when matching interpolated literals.
+_FORMAT_SPEC_RE = r'%(?:@|lld|ld|llu|lu|d|u|lf|f|i|s)'
+
+
+def _extract_swift_string_literals(source):
+    # Yield each Swift string literal body in *source*. Handles single-line
+    # literals, triple-quoted multi-line literals, string interpolation with
+    # nested string literals, line and block comments, and backslash escapes.
+    # Multi-line literals are reassembled per Swift semantics so the captured
+    # text matches the catalog key the compiler would synthesise. Raw strings
+    # (``#"..."#``) are skipped — they are only used for regex patterns in
+    # this project and never contain catalog references.
+    n = len(source)
+    i = 0
+    while i < n:
+        ch = source[i]
+        if ch == '/' and i + 1 < n and source[i + 1] == '/':
+            nl = source.find('\n', i)
+            i = n if nl == -1 else nl
+            continue
+        if ch == '/' and i + 1 < n and source[i + 1] == '*':
+            i = _skip_block_comment(source, i)
+            continue
+        if ch == '#':
+            j = i
+            while j < n and source[j] == '#':
+                j += 1
+            if j < n and source[j] == '"':
+                i = _skip_raw_string(source, i, j - i)
+                continue
+        if ch == '"' and source.startswith('"""', i):
+            body, i = _consume_multiline_string(source, i)
+            if body is not None:
+                yield from _yield_with_inner_literals(body)
+            continue
+        if ch == '"':
+            body, i = _consume_single_line_string(source, i)
+            if body is not None:
+                yield from _yield_with_inner_literals(body)
+            continue
+        i += 1
+
+
+def _skip_block_comment(source, start):
+    n = len(source)
+    depth = 1
+    i = start + 2
+    while i < n and depth > 0:
+        if source.startswith('/*', i):
+            depth += 1
+            i += 2
+        elif source.startswith('*/', i):
+            depth -= 1
+            i += 2
+        else:
+            i += 1
+    return i
+
+
+def _skip_raw_string(source, start, hashes):
+    # Swift raw strings: N `#` + `"` ... `"` + N `#`. Inside, `\` is literal and
+    # interpolation is `\#(...)` with the same count of `#`. We skip the whole
+    # thing — raw strings hold regex/path patterns, not catalog keys.
+    n = len(source)
+    open_pos = start + hashes
+    if source.startswith('"""', open_pos):
+        close = '"""' + '#' * hashes
+        end = source.find(close, open_pos + 3)
+    else:
+        close = '"' + '#' * hashes
+        end = source.find(close, open_pos + 1)
+    return n if end == -1 else end + len(close)
+
+
+def _consume_single_line_string(source, start):
+    """Consume ``"..."`` starting at the opening quote. Returns ``(body, next_i)``.
+
+    ``body`` includes the inner ``\\(...)`` markers verbatim so the caller can
+    recurse into them. If the string is unterminated, ``body`` is ``None``.
+    """
+    n = len(source)
+    i = start + 1
+    out = []
+    while i < n:
+        ch = source[i]
+        if ch == '"':
+            return ''.join(out), i + 1
+        if ch == '\n':
+            return None, i + 1
+        if ch == '\\' and i + 1 < n:
+            nxt = source[i + 1]
+            if nxt == '(':
+                interp_end = _find_interpolation_end(source, i + 1)
+                out.append(source[i:interp_end])
+                i = interp_end
+                continue
+            out.append(source[i:i + 2])
+            i += 2
+            continue
+        out.append(ch)
+        i += 1
+    return None, n
+
+
+def _consume_multiline_string(source, start):
+    n = len(source)
+    i = start + 3
+    raw = []
+    while i < n:
+        if source.startswith('"""', i):
+            body = _process_multiline_body(''.join(raw))
+            return body, i + 3
+        if source[i] == '\\' and i + 1 < n:
+            nxt = source[i + 1]
+            if nxt == '(':
+                interp_end = _find_interpolation_end(source, i + 1)
+                raw.append(source[i:interp_end])
+                i = interp_end
+                continue
+            raw.append(source[i:i + 2])
+            i += 2
+            continue
+        raw.append(source[i])
+        i += 1
+    return None, n
+
+
+def _find_interpolation_end(source, backslash_pos):
+    """Given the position of the ``\\`` in ``\\(``, return the index after ``)``.
+
+    Tracks paren nesting and skips over nested string literals so a literal
+    containing ``)`` does not terminate the interpolation early.
+    """
+    n = len(source)
+    i = backslash_pos + 2
+    depth = 1
+    while i < n and depth > 0:
+        ch = source[i]
+        if ch == '"':
+            if source.startswith('"""', i):
+                _body, i = _consume_multiline_string(source, i)
+            else:
+                _body, i = _consume_single_line_string(source, i)
+            continue
+        if ch == '(':
+            depth += 1
+        elif ch == ')':
+            depth -= 1
+        elif ch == '\\' and i + 1 < n:
+            i += 2
+            continue
+        i += 1
+    return i
+
+
+def _process_multiline_body(raw_body):
+    """Apply Swift's multi-line string semantics: strip common indentation and
+    join lines whose final character is a `\\` line-continuation."""
+    if raw_body.startswith('\n'):
+        raw_body = raw_body[1:]
+    elif raw_body.startswith('\r\n'):
+        raw_body = raw_body[2:]
+    closing_indent = ''
+    last_nl = raw_body.rfind('\n')
+    if last_nl != -1:
+        closing_indent = raw_body[last_nl + 1:]
+        if closing_indent and not closing_indent.strip():
+            raw_body = raw_body[:last_nl]
+        else:
+            closing_indent = ''
+    lines = raw_body.split('\n')
+    if closing_indent:
+        lines = [
+            line[len(closing_indent):] if line.startswith(closing_indent) else line
+            for line in lines
+        ]
+    joined = []
+    for idx, line in enumerate(lines):
+        if idx < len(lines) - 1 and line.endswith('\\'):
+            joined.append(line[:-1])
+        else:
+            joined.append(line)
+            if idx < len(lines) - 1:
+                joined.append('\n')
+    return ''.join(joined)
+
+
+def _literal_has_interpolation(literal):
+    # True if *literal* (a post-consumption body) contains an unescaped `\(`.
+    # Walks character-by-character to skip over `\\` escape pairs so that a
+    # literal like ``a\\(x)b`` (escaped backslash followed by a paren) is NOT
+    # treated as interpolation.
+    n = len(literal)
+    i = 0
+    while i < n:
+        if literal[i] == '\\':
+            if i + 1 < n and literal[i + 1] == '(':
+                return True
+            i += 2
+            continue
+        i += 1
+    return False
+
+
+def _yield_with_inner_literals(body):
+    """Yield the literal body plus any string literals discovered inside its
+    interpolations. Recurses so deeply-nested ``Text("a \\(Text("b"))")`` works.
+    Skips over ``\\\\`` escape pairs so an escaped backslash before ``(`` is
+    not mistaken for an interpolation marker."""
+    yield body
+    i = 0
+    n = len(body)
+    while i < n:
+        if body[i] == '\\' and i + 1 < n:
+            if body[i + 1] == '(':
+                end = _find_interpolation_end(body, i)
+                inner = body[i + 2:end - 1]
+                yield from _extract_swift_string_literals(inner)
+                i = end
+                continue
+            i += 2
+            continue
+        i += 1


 def _scan_referenced_keys(scan_roots):
-    """Best-effort collection of catalog-key references from Swift source files.
+    """Collect catalog-key references from Swift source files.

-    False negatives (keys built via interpolation, keys hidden behind helpers)
-    are tolerated; the warnings produced never fail the build.
+    Returns ``(literals, interpolated)``: literals without ``\\(...)`` join the
+    set of direct references; literals with interpolation become regex patterns
+    matched against catalog keys with format specifiers.
+
+    Best-effort: false negatives (keys built dynamically via helpers) are
+    tolerated. The warnings produced never fail the build.
     """
-    referenced = set()
+    literals = set()
+    interpolated = set()
     for root in scan_roots:
         path = Path(root)
         if not path.exists() or not path.is_dir():
@@ -410,10 +625,55 @@ def _scan_referenced_keys(scan_roots):
                 content = swift_file.read_text(errors="replace")
             except OSError:
                 continue
-            for pattern in _KEY_REF_PATTERNS:
-                for match in pattern.finditer(content):
-                    referenced.add(match.group(1))
-    return referenced
+            for literal in _extract_swift_string_literals(content):
+                if _literal_has_interpolation(literal):
+                    interpolated.add(literal)
+                else:
+                    literals.add(literal)
+    return literals, interpolated
+
+
+def _interpolated_literal_to_pattern(literal):
+    """Convert a Swift literal with ``\\(...)`` interpolations to a regex matching
+    catalog keys that use printf-style format specifiers in those positions.
+    Backslash escape pairs (``\\\\``) are passed through so an escaped backslash
+    before ``(`` does not consume the following parens as an interpolation."""
+    parts = []
+    i = 0
+    n = len(literal)
+    while i < n:
+        if literal[i] == '\\' and i + 1 < n:
+            if literal[i + 1] == '(':
+                depth = 1
+                j = i + 2
+                while j < n and depth > 0:
+                    if literal[j] == '(':
+                        depth += 1
+                    elif literal[j] == ')':
+                        depth -= 1
+                    j += 1
+                parts.append(_FORMAT_SPEC_RE)
+                i = j
+                continue
+            parts.append(re.escape(literal[i:i + 2]))
+            i += 2
+            continue
+        parts.append(re.escape(literal[i]))
+        i += 1
+    return re.compile('\\A' + ''.join(parts) + '\\Z')
+
+
+def _resolve_interpolated_references(catalog_keys, interpolated_literals):
+    if not interpolated_literals:
+        return set()
+    patterns = [_interpolated_literal_to_pattern(lit) for lit in interpolated_literals]
+    matched = set()
+    for key in catalog_keys:
+        for pattern in patterns:
+            if pattern.match(key):
+                matched.add(key)
+                break
+    return matched


 def _emit_unreferenced_warnings(catalog_keys, referenced):
@@ -457,8 +717,10 @@ def main(argv=None):
     if rc != EXIT_OK:
         return rc

-    referenced = _scan_referenced_keys(scan_roots)
-    _emit_unreferenced_warnings(catalog.get("strings", {}).keys(), referenced)
+    literals, interpolated = _scan_referenced_keys(scan_roots)
+    catalog_keys = catalog.get("strings", {}).keys()
+    referenced = literals | _resolve_interpolated_references(catalog_keys, interpolated)
+    _emit_unreferenced_warnings(catalog_keys, referenced)

     return EXIT_OK

Tools/Tests/test_validate_localisation.py Modified +147 / -0

diff --git a/Tools/Tests/test_validate_localisation.py b/Tools/Tests/test_validate_localisation.py
index f0eec44..ecb6ef2 100644
--- a/Tools/Tests/test_validate_localisation.py
+++ b/Tools/Tests/test_validate_localisation.py
@@ -474,6 +474,153 @@ class ScanWarningTests(ScriptTestCase):
         self.assertEqual(rc, 0)
         self.assertIn("warning", stderr)

+    def test_interpolation_matches_format_specifier(self):
+        # Text("count \(n)") should mark "count %lld" as referenced.
+        catalog = _make_catalog({
+            "count %lld": _simple_entry("count %lld", "count %lld", "count %lld"),
+        })
+        source = self.write_json("source.xcstrings", catalog)
+        overrides = self.write_json("overrides.json",
+                                    {"schemaVersion": "1.0", "overrides": {}})
+        sources_dir = self.tmpdir / "swift_interp"
+        sources_dir.mkdir()
+        (sources_dir / "V.swift").write_text(
+            'struct V { var body: some View { Text("count \\(n)") } }\n'
+        )
+
+        rc, _stdout, stderr, _ = self.run_script(
+            source=source, overrides=overrides, scan_roots=sources_dir,
+        )
+
+        self.assertEqual(rc, 0)
+        warning_lines = [line for line in stderr.splitlines() if "warning" in line]
+        self.assertFalse(any("count %lld" in line for line in warning_lines))
+
+    def test_multiline_string_matches_catalog_key(self):
+        # Multi-line literal with \-line-continuation should join into a single key.
+        joined = "A long sentence with continued wrapping for layout."
+        catalog = _make_catalog({
+            joined: _simple_entry(joined, joined, joined),
+        })
+        source = self.write_json("source.xcstrings", catalog)
+        overrides = self.write_json("overrides.json",
+                                    {"schemaVersion": "1.0", "overrides": {}})
+        sources_dir = self.tmpdir / "swift_multiline"
+        sources_dir.mkdir()
+        (sources_dir / "V.swift").write_text(
+            'Text(\n    """\n    A long sentence with continued \\\n    wrapping for layout.\n    """\n)\n'
+        )
+
+        rc, _stdout, stderr, _ = self.run_script(
+            source=source, overrides=overrides, scan_roots=sources_dir,
+        )
+
+        self.assertEqual(rc, 0)
+        warning_lines = [line for line in stderr.splitlines() if "warning" in line]
+        self.assertFalse(any(joined in line for line in warning_lines))
+
+    def test_nested_literal_inside_interpolation_is_found(self):
+        # Text("outer \(Text("inner"))") should mark "inner" as referenced.
+        catalog = _make_catalog({
+            "inner": _simple_entry("inner", "inner", "inner"),
+        })
+        source = self.write_json("source.xcstrings", catalog)
+        overrides = self.write_json("overrides.json",
+                                    {"schemaVersion": "1.0", "overrides": {}})
+        sources_dir = self.tmpdir / "swift_nested"
+        sources_dir.mkdir()
+        (sources_dir / "V.swift").write_text(
+            'Text("outer \\(Text("inner").bold()) tail")\n'
+        )
+
+        rc, _stdout, stderr, _ = self.run_script(
+            source=source, overrides=overrides, scan_roots=sources_dir,
+        )
+
+        self.assertEqual(rc, 0)
+        warning_lines = [line for line in stderr.splitlines() if "warning" in line]
+        self.assertFalse(any("inner" in line for line in warning_lines))
+
+    def test_escaped_backslash_before_paren_is_not_interpolation(self):
+        # Swift literal "Foo\\(x)" — escaped backslash followed by literal `(x)`
+        # — must NOT be treated as an interpolation. Otherwise the scanner would
+        # synthesise a regex matching "Foo%@" and falsely report the catalog key
+        # "Foo %@" as referenced even though the source has no real
+        # interpolation matching it.
+        catalog = _make_catalog({
+            "Foo %@": _simple_entry("Foo %@", "Foo %@", "Foo %@"),
+        })
+        source = self.write_json("source.xcstrings", catalog)
+        overrides = self.write_json("overrides.json",
+                                    {"schemaVersion": "1.0", "overrides": {}})
+        sources_dir = self.tmpdir / "swift_escape"
+        sources_dir.mkdir()
+        (sources_dir / "V.swift").write_text(
+            'let s = "Foo\\\\(x)"\n'
+        )
+
+        rc, _stdout, stderr, _ = self.run_script(
+            source=source, overrides=overrides, scan_roots=sources_dir,
+        )
+
+        self.assertEqual(rc, 0)
+        warning_lines = [line for line in stderr.splitlines() if "warning" in line]
+        self.assertTrue(any("Foo %@" in line for line in warning_lines),
+                        f"expected 'Foo %@' to be warned as orphaned: {warning_lines}")
+
+    def test_raw_string_is_skipped(self):
+        # Raw strings (#"..."#) hold regex patterns, not catalog references.
+        # The contents must NOT be treated as a catalog reference even if they
+        # textually look like one. The catalog key "boldStarRegex" exists; the
+        # only place "boldStarRegex" appears is inside a raw string, so it
+        # should remain unreferenced.
+        catalog = _make_catalog({
+            "boldStarRegex": _simple_entry("x", "x", "x"),
+        })
+        source = self.write_json("source.xcstrings", catalog)
+        overrides = self.write_json("overrides.json",
+                                    {"schemaVersion": "1.0", "overrides": {}})
+        sources_dir = self.tmpdir / "swift_raw"
+        sources_dir.mkdir()
+        (sources_dir / "V.swift").write_text(
+            'let pattern = #"boldStarRegex \\(notInterp)"#\n'
+        )
+
+        rc, _stdout, stderr, _ = self.run_script(
+            source=source, overrides=overrides, scan_roots=sources_dir,
+        )
+
+        self.assertEqual(rc, 0)
+        warning_lines = [line for line in stderr.splitlines() if "warning" in line]
+        # The key SHOULD be warned as unreferenced — the only textual match was
+        # inside a raw string, which the scanner now skips.
+        self.assertTrue(any("boldStarRegex" in line for line in warning_lines),
+                        f"expected boldStarRegex to be warned: {warning_lines}")
+
+    def test_block_comment_contents_are_not_referenced(self):
+        # A catalog key whose only textual occurrence is inside /* ... */ should
+        # remain unreferenced.
+        catalog = _make_catalog({
+            "commented.key": _simple_entry("x", "x", "x"),
+        })
+        source = self.write_json("source.xcstrings", catalog)
+        overrides = self.write_json("overrides.json",
+                                    {"schemaVersion": "1.0", "overrides": {}})
+        sources_dir = self.tmpdir / "swift_block"
+        sources_dir.mkdir()
+        (sources_dir / "V.swift").write_text(
+            '/* old code: Text("commented.key") */\nlet x = 1\n'
+        )
+
+        rc, _stdout, stderr, _ = self.run_script(
+            source=source, overrides=overrides, scan_roots=sources_dir,
+        )
+
+        self.assertEqual(rc, 0)
+        warning_lines = [line for line in stderr.splitlines() if "warning" in line]
+        self.assertTrue(any("commented.key" in line for line in warning_lines),
+                        f"expected commented.key to be warned: {warning_lines}")
+

 class MalformedInputTests(ScriptTestCase):
     def test_malformed_catalog_json_exits_two(self):

prism.xcodeproj/project.pbxproj Modified +3 / -3

diff --git a/prism.xcodeproj/project.pbxproj b/prism.xcodeproj/project.pbxproj
index 5f6d70c..80fee3e 100644
--- a/prism.xcodeproj/project.pbxproj
+++ b/prism.xcodeproj/project.pbxproj
@@ -291,7 +291,7 @@
 			);
 			runOnlyForDeploymentPostprocessing = 0;
 			shellPath = /bin/sh;
-			shellScript = "set -euo pipefail\n/usr/bin/env python3 \"${SRCROOT}/Tools/validate-localisation.py\" --source \"${SRCROOT}/prism/Localizable.xcstrings\" --overrides \"${SRCROOT}/specs/localisation/en-AU-overrides.json\" --output \"${DERIVED_FILE_DIR}/Localizable.merged.xcstrings\" --compile-output \"${DERIVED_FILE_DIR}/CompiledStrings\"\n";
+			shellScript = "set -euo pipefail\n/usr/bin/env python3 \"${SRCROOT}/Tools/validate-localisation.py\" --source \"${SRCROOT}/prism/Localizable.xcstrings\" --overrides \"${SRCROOT}/specs/localisation/en-AU-overrides.json\" --output \"${DERIVED_FILE_DIR}/Localizable.merged.xcstrings\" --compile-output \"${DERIVED_FILE_DIR}/CompiledStrings\" --scan-roots \"${SRCROOT}/prism\"\n";
 		};
 		D4A678812FA80000005E8A40 /* Localisation: install compiled strings */ = {
 			isa = PBXShellScriptBuildPhase;
@@ -510,7 +510,7 @@
 				DEVELOPMENT_TEAM = V24684SCZN;
 				ENABLE_STRICT_OBJC_MSGSEND = YES;
 				ENABLE_TESTABILITY = YES;
-				ENABLE_USER_SCRIPT_SANDBOXING = YES;
+				ENABLE_USER_SCRIPT_SANDBOXING = NO;
 				GCC_C_LANGUAGE_STANDARD = gnu17;
 				GCC_DYNAMIC_NO_PIC = NO;
 				GCC_NO_COMMON_BLOCKS = YES;
@@ -572,7 +572,7 @@
 				DEVELOPMENT_TEAM = V24684SCZN;
 				ENABLE_NS_ASSERTIONS = NO;
 				ENABLE_STRICT_OBJC_MSGSEND = YES;
-				ENABLE_USER_SCRIPT_SANDBOXING = YES;
+				ENABLE_USER_SCRIPT_SANDBOXING = NO;
 				GCC_C_LANGUAGE_STANDARD = gnu17;
 				GCC_NO_COMMON_BLOCKS = YES;
 				GCC_WARN_64_TO_32_BIT_CONVERSION = YES;

prism/Localizable.xcstrings Modified +0 / -69

diff --git a/prism/Localizable.xcstrings b/prism/Localizable.xcstrings
index 4032875..6ab285e 100644
--- a/prism/Localizable.xcstrings
+++ b/prism/Localizable.xcstrings
@@ -93,29 +93,6 @@
         }
       }
     },
-    "%@ free exports remaining": {
-      "extractionState": "manual",
-      "localizations": {
-        "en": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "%@ free exports remaining"
-          }
-        },
-        "en-GB": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "%@ free exports remaining"
-          }
-        },
-        "en-US": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "%@ free exports remaining"
-          }
-        }
-      }
-    },
     "%@ of %@ free exports remaining": {
       "extractionState": "manual",
       "localizations": {
@@ -3290,52 +3267,6 @@
         }
       }
     },
-    "Note Style": {
-      "extractionState": "manual",
-      "localizations": {
-        "en": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "Note Style"
-          }
-        },
-        "en-GB": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "Note Style"
-          }
-        },
-        "en-US": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "Note Style"
-          }
-        }
-      }
-    },
-    "Note display style": {
-      "extractionState": "manual",
-      "localizations": {
-        "en": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "Note display style"
-          }
-        },
-        "en-GB": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "Note display style"
-          }
-        },
-        "en-US": {
-          "stringUnit": {
-            "state": "translated",
-            "value": "Note display style"
-          }
-        }
-      }
-    },
     "Notes": {
       "extractionState": "manual",
       "localizations": {

specs/localisation/design.md Modified +2 / -2

diff --git a/specs/localisation/design.md b/specs/localisation/design.md
index 9bed9f1..caaaff4 100644
--- a/specs/localisation/design.md
+++ b/specs/localisation/design.md
@@ -49,8 +49,8 @@ The spike's success criterion: clean build produces no `Localizable.strings` fro
 | Inputs | `$(SRCROOT)/prism/Localizable.xcstrings`, `$(SRCROOT)/specs/localisation/en-AU-overrides.json`, `$(SRCROOT)/Tools/validate-localisation.py` |
 | Outputs | `$(DERIVED_FILE_DIR)/Localizable.merged.xcstrings`, `$(DERIVED_FILE_DIR)/CompiledStrings/en.lproj/Localizable.strings`, `…/en-AU.lproj/Localizable.strings`, `…/en-GB.lproj/Localizable.strings`, `…/en-US.lproj/Localizable.strings` |
 | Based on dependency analysis | Yes |
-| `ENABLE_USER_SCRIPT_SANDBOXING` | Per-target default left ON; the script writes only to `$DERIVED_FILE_DIR`, which is permitted under sandboxing. |
-| Script body | `/usr/bin/env python3 "$SRCROOT/Tools/validate-localisation.py" --source "$SRCROOT/prism/Localizable.xcstrings" --overrides "$SRCROOT/specs/localisation/en-AU-overrides.json" --output "$DERIVED_FILE_DIR/Localizable.merged.xcstrings" --compile-output "$DERIVED_FILE_DIR/CompiledStrings"` |
+| `ENABLE_USER_SCRIPT_SANDBOXING` | Set to NO at the target level. The unreferenced-key scan walks `$(SRCROOT)/prism/**/*.swift`, which the sandbox blocks (only `literal` access is granted to declared ancestor directories, not `subpath`). The two build scripts only read project sources and write to `$DERIVED_FILE_DIR`, both trusted. |
+| Script body | `/usr/bin/env python3 "$SRCROOT/Tools/validate-localisation.py" --source "$SRCROOT/prism/Localizable.xcstrings" --overrides "$SRCROOT/specs/localisation/en-AU-overrides.json" --output "$DERIVED_FILE_DIR/Localizable.merged.xcstrings" --compile-output "$DERIVED_FILE_DIR/CompiledStrings" --scan-roots "$SRCROOT/prism"` |

 ### Touch-point parity audit

@@ -98,7 +98,7 @@ Behaviour:
 - **Validate** overrides: schema version is `1.0`; every override target exists in `en-GB`; values/plural cases are non-empty.
 - **Merge**: produce a merged catalog with the `en-AU` column populated. For non-plural keys, `en-AU` value = override value if present, else `en-GB` value. For plural keys, the entire plural block is taken from the override if present (the override must specify all plural cases that `en-GB` has), otherwise inherited from `en-GB`.
 - **Compile**: invoke `xcstringstool compile` on the merged catalog, output to `--compile-output`. Exit non-zero if `xcstringstool` exits non-zero.
-- **Scan** Swift sources under `--scan-roots` for catalog-key references (best-effort grep for `Text("key")`, `Button("key")`, `String(localized: "key")`, `LocalizedStringKey("key")`, etc.); emit `warning: <key>: not referenced from source` for catalog keys not found. False positives expected for keys built via interpolation; warnings never fail the build.
+- **Scan** Swift sources under `--scan-roots` for catalog-key references. A small Swift-aware tokeniser walks each `.swift` file and yields every string literal body — single-line, triple-quoted multi-line (with leading-indent stripping and `\<newline>` joins applied), and nested literals discovered inside `\(...)` interpolations. Literals that contain interpolation become regex patterns: each `\(...)` matches any catalog format specifier (`%@`, `%lld`, `%ld`, `%lf`, `%f`, `%d`, `%i`, `%s`, etc.). For every catalog key that fails to match either the literal set or any interpolated pattern, emit `warning: <key>: declared in catalog but no source reference found`. Warnings never fail the build.

 Exit codes:

specs/localisation/implementation.md Modified +197 / -0

diff --git a/specs/localisation/implementation.md b/specs/localisation/implementation.md
index 4f50498..df79fd8 100644
--- a/specs/localisation/implementation.md
+++ b/specs/localisation/implementation.md
@@ -321,3 +321,200 @@ Xcode's auto-extraction populates the catalog with the dotted-key entries
   "%lld matches found" key remains a non-plural format for now. Adding
   plural support is a follow-up if/when an English locale needs more than
   the trivial `one`/`other` distinction.
+
+---
+
+## Build-Warning Hygiene (2026-05-17)
+
+Two follow-up commits on `worktree-fix-localisation-warnings` clean up the
+build noise that the localisation phase had been emitting since shipping:
+~298 spurious "declared in catalog but no source reference found" warnings
+on every macOS build, plus a latent risk of falsely *suppressing* real
+orphan warnings as the catalog grew.
+
+### Beginner Level
+
+#### What Changed / What This Does
+
+Every time the Mac app was built, Xcode printed 298 warning lines saying
+catalog entries had no matching code reference — even for strings the app
+clearly used. The warnings were noise, but they hid real ones (the three
+genuinely-orphaned strings buried in the list). Two commits fix the noise:
+
+1. **Tell the script where to look, and let it look.** The build phase
+   that runs the localisation validator wasn't passing it the right path
+   to the source code, *and* Xcode's sandbox was blocking the script from
+   reading source-code sub-folders. The first commit fixes both.
+2. **Teach the script to read Swift properly.** The validator used a list
+   of eleven simple search patterns to find which catalog keys were
+   referenced. Those patterns missed dozens of legitimate ways developers
+   write strings (pickers, alerts, multi-line strings, format strings with
+   numbers, strings nested inside other strings). The first commit
+   rewrites that scanner from scratch using a small Swift parser. The
+   second commit fixes four bugs reviewers found in the parser.
+
+Three truly orphaned catalog keys were removed (leftovers from a removed
+Settings picker). Net: 298 → 0 build warnings.
+
+#### Why It Matters
+
+- **Trust in the build log.** When every build prints hundreds of
+  warnings, developers stop reading them. Real problems hide.
+- **The check still works.** Removing the noise without removing the
+  check means future orphan strings (forgotten keys when UI gets removed)
+  will surface immediately as a single named warning.
+
+#### Key Concepts
+
+- **String catalog** (`.xcstrings`): Apple's JSON file mapping each
+  user-visible string to its translations.
+- **Build phase**: A shell script Xcode runs as part of compiling the
+  app — in this case, validating and compiling the string catalog.
+- **Sandboxing**: A macOS security feature that restricts what files a
+  process can read. Xcode applies it to build scripts by default.
+- **String interpolation**: Swift syntax for inserting values into a
+  string, written `\(value)`. The compiler turns this into a format
+  specifier (`%@`, `%lld`, etc.) in the catalog key.
+- **Raw string**: Swift syntax `#"..."#` for strings where backslashes
+  are literal — used for regex patterns. Inside a raw string, `\(...)`
+  is NOT interpolation.
+
+---
+
+### Intermediate Level
+
+#### Changes Overview
+
+Seven files across two commits (`b5d62d3`, `de121c1`):
+
+- `Tools/validate-localisation.py` — rewrote the reference scanner from
+  eleven hand-maintained regexes into a Swift-aware tokeniser
+  (`_extract_swift_string_literals` plus helpers). Interpolated literals
+  become regex patterns where each `\(...)` matches a printf-style
+  specifier so `Text("count \(n)")` resolves to catalog key `count %lld`.
+- `Tools/Tests/test_validate_localisation.py` — added seven tests
+  covering interpolation, multi-line strings, nested literals,
+  escape-aware classification, raw-string skipping, and block comments.
+- `prism.xcodeproj/project.pbxproj` — added `--scan-roots
+  "$SRCROOT/prism"` to the validate phase; set
+  `ENABLE_USER_SCRIPT_SANDBOXING = NO` on both build configurations.
+- `prism/Localizable.xcstrings` — removed three orphaned keys (`%@ free
+  exports remaining`, `Note Style`, `Note display style`).
+- `specs/localisation/design.md` — updated the build-phase table and the
+  scanner contract to reflect the new behaviour.
+- `docs/agent-notes/localisation.md` — new agent note covering the
+  scanner, the sandboxing decision, and the catalog-hygiene workflow.
+
+#### Implementation Approach
+
+The scanner is a recursive-descent state machine over Swift source. Top
+level (`_extract_swift_string_literals`) walks the file looking for
+literal openers; helpers consume each kind of literal and return both
+its body and the next read position. Bodies containing `\(...)` are
+yielded twice: once as a pattern source, then recursed-into so nested
+literals (`Text("a \(Text("b"))")` → `"b"`) are also captured.
+
+Classification (`_literal_has_interpolation`) walks the body
+character-by-character, skipping `\\` escape pairs, so an escaped
+backslash before a paren (`"path\\(name)"`) is not treated as
+interpolation. The same escape-aware walk happens in
+`_interpolated_literal_to_pattern`, which converts each real
+interpolation into the `_FORMAT_SPEC_RE` alternation (`%@|%lld|%ld|…`).
+
+Raw strings (`#"..."#`, optionally with multiple `#`) are skipped
+entirely via `_skip_raw_string`. They are only used in `MarkdownBlock.swift`
+for regex patterns and never contain catalog references.
+
+#### Trade-offs
+
+- **Sandboxing off at target level rather than per-phase.** Xcode
+  doesn't expose a per-phase `ENABLE_USER_SCRIPT_SANDBOXING` override.
+  Alternatives considered and rejected: generating an `xcfilelist` of
+  every `.swift` file at configure time (brittle as the file set
+  churns), moving the scan to a separate non-sandboxed lint target
+  (defeats per-build feedback). Both build scripts are trusted local
+  code, so the security cost is acceptable.
+- **Hand-rolled tokeniser instead of `SwiftSyntax` or a stdlib parser.**
+  Python's `tokenize`/`ast`/`shlex` don't model Swift's grammar.
+  `SwiftSyntax` would add a Swift-tooling dependency for what fits in
+  ~150 lines of Python.
+- **Wildcard format-specifier match.** Any interpolation matches any
+  specifier rather than inferring `%lld` from `Int` vs `%@` from
+  `String`. Type inference would need real type checking; the wildcard
+  is sufficient given xcstringstool's catalog convention.
+
+---
+
+### Expert Level
+
+#### Technical Deep Dive
+
+The tokeniser splits into four consumers — `_consume_single_line_string`,
+`_consume_multiline_string`, `_find_interpolation_end`, `_skip_raw_string`
+— each handling escape sequences and (where applicable) recursive
+descent into interpolations. `_find_interpolation_end` delegates to the
+string consumers to skip over nested literals, so a `)` inside a string
+inside an interpolation doesn't terminate the interpolation early. The
+multi-line consumer post-processes via `_process_multiline_body` to
+replicate Swift's semantics: leading-indent stripping based on the
+closing `"""` position, plus `\<newline>` line continuation. The
+captured text matches what xcstringstool generates as the catalog key.
+
+Resolution (`_resolve_interpolated_references`) is O(N×M) where N is
+catalog keys (~295) and M is unique interpolated literals (~271 after
+deduplication). Profiled at ~10ms after the list→set change in commit
+two; well under build-phase budget. The catalog and tokeniser stage
+together cost ~150ms — invisible next to ~60s Swift compilation but
+visible if you're running the script standalone.
+
+The `\\(` escape bug worth re-reading: the pre-fix code used
+`'\\(' in literal` (Python 2-char substring check) to classify
+interpolated literals. With body `a\\(x)b` from Swift source
+`"a\\(x)b"`, the substring check returned true even though the `\\` is a
+literal backslash and `(x)b` is plain text. The same flaw lived in
+`_yield_with_inner_literals` (it would recurse into the bogus
+interpolation) and `_interpolated_literal_to_pattern` (it would replace
+the bogus interpolation with a format-spec wildcard). The fix is a
+single helper `_literal_has_interpolation` and matching escape-pair
+skips in both walkers.
+
+#### Architecture Impact
+
+- The build phase contract is now: scanner needs `subpath` access to
+  `$SRCROOT/prism`. The sandboxing decision is documented in design.md
+  and the agent note; if Apple later adds a per-phase override the
+  scanner can be re-sandboxed without code changes.
+- Catalog hygiene is now load-bearing on a clean build: any orphan key
+  surfaces as a warning. The CHANGELOG entry and the agent note both
+  reference this so reviewers know to remove catalog entries when UI is
+  removed (the bubble-notes picker removal in #243 is the canonical
+  example that produced the three orphans cleaned up here).
+- The scanner is best-effort by design (warnings never fail the build).
+  False negatives — a real reference the tokeniser doesn't recognise —
+  result in a spurious orphan warning, easy to investigate. False
+  positives — a non-reference treated as one — result in a missed
+  warning, hard to detect. The escape-aware and raw-string handling
+  closes the two known false-positive vectors.
+
+#### Potential Issues
+
+- **Regex literals (`/.../`)** introduced in Swift 5.7 are not
+  recognised. The codebase doesn't use them today; if `MarkdownBlock`
+  or a new parser switches from `#"..."#` to `/.../`, the leading `/`
+  falls through and the regex body may be tokenised as a normal string.
+  Worth a follow-up if/when adopted.
+- **Multi-line strings without indent stripping** (closing `"""` flush
+  left) get `closing_indent = ''` and skip the strip step, which is
+  correct per Swift semantics but means the body retains source
+  indentation. No tests pin this; not currently used in the codebase.
+- **Escape-sequence interpretation.** The scanner preserves raw source
+  bytes (`\\` stays as two characters) rather than interpreting Swift
+  escapes. xcstringstool's catalog-key generation interprets some
+  escapes; the mismatch is invisible today because no Swift source
+  uses escapes in catalog-key positions, but if a future literal like
+  `"hello\nworld"` becomes a catalog key, the body and key will differ.
+  Document or fix when it bites.
+- **Catalog-size scaling.** Resolve is O(N×M). At ~300 keys × ~270
+  patterns the current 10ms is fine; at 10×scale (~3000 keys, ~2700
+  patterns) it would be ~1s, which would warrant a single compiled
+  alternation regex (`re.compile('|'.join(patterns))`).

docs/agent-notes/localisation.md Added +28 / -0

diff --git a/docs/agent-notes/localisation.md b/docs/agent-notes/localisation.md
new file mode 100644
index 0000000..b03bbb6
--- /dev/null
+++ b/docs/agent-notes/localisation.md
@@ -0,0 +1,28 @@
+# Localisation
+
+The user-facing string catalog is `prism/Localizable.xcstrings`. At build time, `Tools/validate-localisation.py` validates the catalog, merges the en-AU column from `specs/localisation/en-AU-overrides.json`, compiles per-locale `.strings`/`.stringsdict` files via `xcstringstool compile`, and warns on catalog keys with no source reference. The script runs from the "Localisation: validate, merge, compile" build phase; a second phase ("install compiled strings") copies the compiled outputs into the app bundle.
+
+## Validation script reference scanner
+
+The script walks `--scan-roots` (`$(SRCROOT)/prism` in the build phase) and yields every Swift string literal it finds, then matches them against catalog keys. The tokeniser handles:
+
+- single-line `"…"` literals
+- triple-quoted multi-line literals — leading common indentation is stripped per Swift semantics, and lines ending in `\` are joined into the next line so the captured text matches the catalog key the compiler synthesises
+- string interpolation `\(…)` with nested string literals (the inner literals are extracted and matched too)
+- line and block comments are skipped
+
+Literals containing `\(…)` are turned into regex patterns where each interpolation matches any catalog format specifier (`%@`, `%lld`, `%ld`, `%lf`, `%f`, `%d`, `%i`, `%lu`, `%llu`, `%s`). That's how `Text("paywall.exports.remaining \(remaining)")` matches the catalog key `paywall.exports.remaining %lld`.
+
+False negatives are tolerated — warnings never fail the build. Tests live in `Tools/Tests/test_validate_localisation.py`.
+
+## Sandboxing
+
+`ENABLE_USER_SCRIPT_SANDBOXING` is **off** at the target level. The sandbox would otherwise grant only `(literal …)` access to declared input directories, which blocks the recursive `Path.rglob("*.swift")` walk that the reference scanner needs. The two build scripts only read project sources and write to `$DERIVED_FILE_DIR`, both trusted, so disabling per-phase sandboxing is acceptable here. There is no per-phase override for `ENABLE_USER_SCRIPT_SANDBOXING`.
+
+## Catalog hygiene
+
+When you remove UI that exposes a string (e.g., a Settings picker), also remove the corresponding catalog keys — orphaned keys surface as build warnings on a clean build. Git history preserves the values if the option is ever restored. The validation script lists each unreferenced key by name so they're easy to identify.
+
+## Adding UI text
+
+See `CLAUDE.md` for the rules. The short version: prefer literals in `Text("…")`, `Button("…")`, `Picker("…")`, etc. (Xcode auto-extracts these to the catalog and SwiftUI selects the `LocalizedStringKey` overload). For accessibility modifiers and non-literal contexts, wrap with `LocalizedStringKey("…")` or `String(localized: "key", defaultValue: "…")`. For pluralised counts use the catalog's plural variation — `Text("paywall.exports.remaining \(count)")` matches the plural key.

CHANGELOG.md Modified +3 / -1

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9486c19..c7732b6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -16,6 +16,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Fixed

 - Switching between rendered and raw markdown views no longer resets the scroll position to the top. The scroll percentage is now snapshotted at toggle time and restored on the receiving view (best-effort; raw lines and rendered blocks aren't 1:1 but the position is approximately preserved in both directions). Same mechanism powers cross-session restore on document re-open.
+- Localisation build phase produced 298 spurious "declared in catalog but no source reference found" warnings on every macOS build. Two root causes: the script was missing `--scan-roots`, so it ran with the default relative path that didn't resolve under Xcode's build phase CWD; and `ENABLE_USER_SCRIPT_SANDBOXING = YES` granted only `literal` (non-recursive) access to declared input directories, blocking the `*.swift` walk. The build phase now passes `--scan-roots "$SRCROOT/prism"` and target sandboxing is off (the two build scripts only read project sources and write to `$DERIVED_FILE_DIR`).
+- `Tools/validate-localisation.py` reference scanner missed many genuine catalog references because the old regex set only matched a handful of SwiftUI initialisers (`Text`, `Button`, `Label`, `String(localized:)`, etc.). Replaced with a Swift-aware tokeniser that walks every `.swift` file and yields every string literal — single-line, triple-quoted multi-line (with leading-indent stripping and `\<newline>` joins applied), and literals nested inside `\(...)` interpolations. Literals containing `\(...)` become regex patterns where each interpolation matches any catalog format specifier (`%@`, `%lld`, `%ld`, `%lf`, `%f`, `%d`, `%i`, `%s`, `%lu`, `%llu`) so e.g. `Text("paywall.exports.remaining \(remaining)")` correctly matches the catalog key `paywall.exports.remaining %lld`. Three new tokeniser tests cover interpolation, multi-line, and nested-literal cases.

 ### Changed

@@ -34,6 +36,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Removed

 - Orphaned localisation entries no longer referenced in code: "Document Notes Banner", "Top of document", "Bottom of document"
+- Three further orphaned catalog entries no longer referenced in code: `%@ free exports remaining` (superseded by the `%@ of %@ free exports remaining` two-argument form), `Note Style`, and `Note display style` (left over from the removed bubble-style picker)

 ## [0.10.0] - 2026-05-13

Things to double-check

First incremental build after merge

The alwaysOutOfDate setting is unchanged — the validate phase only re-runs when one of its declared inputs changes. After merge, a touched Localizable.xcstrings or any new .swift file under prism/ will trigger a re-run. Confirm the first such build is still clean.

iOS build

iOS build couldn't be verified in this worktree (no iPhone 17 simulator available). The localisation phase is platform-agnostic and the same project settings apply, so behaviour should match macOS, but visually confirming on iOS before merge is worth a minute.

Future raw-string usage

If future Swift code uses raw strings for localised content (rather than regex patterns), _skip_raw_string will silently drop those references. Currently safe; revisit if the codebase pattern changes.

Generated 2026-05-17 06:41:57 UTC · repo /Users/arjen/projects/personal/prism/.claude/worktrees/fix-localisation-warnings · regenerate with /pre-push-review.