Round-trip refinement as fixed-point verification¶
be-ddsl #autoformalization #bidirectional-transformation #self-consistency #fixed-point #category-theory #FFP¶
This note develops the observation that round-tripping between two representations of the same object is a fixed-point problem, and that convergence (or failure to converge) is itself a verification signal. The immediate application is matsya's --refine mode, which iterates YAML \(\to\) formal MDP \(\to\) YAML until the YAML stabilises, but the structure is more general and connects to several active research threads.
The core construction¶
Let \(\mathsf{Y}\) be the space of dolo-plus stage YAMLs and \(\mathsf{M}\) the space of formal MDP writeups. We have two maps:
The round-trip operator is:
A YAML \(y\) is self-consistent when \(\Phi(y) = y\): translating to a formal MDP and back recovers the same YAML. The --refine loop computes the sequence \(y_0, y_1 = \Phi(y_0), y_2 = \Phi(y_1), \ldots\) and checks for convergence.
The dual operator \(\hat\Phi \coloneqq F \circ G : \mathsf{M} \to \mathsf{M}\) checks MDP self-consistency. Compare the semiconjugacy structure in semiconjugacy-iterates.md: the iterate identities \(F \circ \Phi^n = \hat\Phi^n \circ F\) hold by the same argument (with \(S = G \circ F\), \(\hat S = F \circ G\)). So convergence on the YAML side implies convergence on the MDP side, and vice versa.
Why this matters: the information the fixed point carries¶
The interesting thing is not that a fixed point exists — for a deterministic LLM with temperature zero it trivially does after one step. The interesting thing is what the iteration reveals before it converges:
- Iter 0 \(\to\) 1: catches syntactic violations (wrong perch tags, missing
@dist, arguments on values,**instead of^). - Iter 1 \(\to\) 2: catches semantic drift (the MDP formalisation resolves ambiguities in the YAML, and the re-translation reflects a sharper model).
- Failure to converge: signals genuine ambiguity — the YAML and MDP representations disagree on the model's semantics, and no amount of iteration will resolve it without human input.
This is precisely the diagnostic structure of counterexample-guided abstraction refinement (CEGAR): iterate an abstraction/concretisation loop, and use the diff at each step as a counterexample that tightens the abstraction.
Connections to existing work¶
Self-Refine (Madaan et al., 2023)¶
Self-Refine showed that a single LLM can generate, critique, and refine its own output without any external reward signal or training. The generate–feedback–refine loop achieves ~20% improvement over single-pass generation across seven tasks. Our round-trip is a special case: the "feedback" is implicit in the representation change (YAML \(\to\) MDP forces the model to make all semantic commitments explicit), and the "refinement" is the back-translation. The key difference: Self-Refine uses a single representation with explicit self-critique; we use two distinct representations where each acts as the other's critic.
Autoformalization and back-translation¶
The autoformalization survey (Weng et al., 2025) documents a striking asymmetry: informalization (formal \(\to\) informal) achieves ~70% accuracy on competition problems while formalization (informal \(\to\) formal) achieves ~30%. This asymmetry is exactly what makes round-tripping useful — the easier direction serves as an error-correction channel for the harder direction. Back-translation for data augmentation (Azerbayev et al., 2023; Jiang et al., 2024) exploits this same asymmetry to generate training data by informalising formal corpora and then re-formalising.
PAT-Agent (Zuo et al., 2025)¶
PAT-Agent is the closest existing system to what we have built. It autoformalises natural-language system descriptions into CSP# models, verifies them against specified properties using the PAT model checker, and uses counterexample traces to drive iterative repair. The verification step (model checking) plays the role that our MDP formalisation plays: it forces the representation into a domain where well-formedness is checkable. Ablation studies confirm that removing either the structured forward translation or the verification-driven repair loop significantly degrades performance — both directions of the round-trip are essential.
Bidirectional transformations and lens laws¶
The BX classification (Hidaka et al., 2016) provides the formal vocabulary for characterising round-trip properties. The classical lens laws are:
- GetPut: \(G(F(y)) = y\) (round-trip recovery — our convergence condition).
- PutGet: \(F(G(m)) = m\) (dual recovery).
- Hippocraticness: if nothing changed, nothing should be updated.
Our \(\Phi\) is an approximation to GetPut. Exact GetPut would mean \(\Phi = \mathrm{id}\); in practice we settle for \(\Phi^n \to \mathrm{id}\) (convergence). The BX literature notes that most real systems satisfy only relaxed variants of these laws — which matches our experience with LLM-mediated translation.
AutoDSL (Shi et al., 2024)¶
AutoDSL automates DSL design itself via bidirectional optimisation: top-down from programming-language priors (context-free grammars), bottom-up from domain corpora (EM + Dirichlet process clustering). The DSL acts as a bridge representation between unstructured natural language and machine-executable procedures. This is structurally analogous to dolo-plus's role as a bridge between informal model descriptions and executable Bellman operators — and AutoDSL's quality metrics (soundness, lucidity, completeness, laconicity) are a useful framework for evaluating dolo-plus YAML quality.
The Backus / FFP connection¶
In Backus's FFP framework, a program is an object that denotes a function — the meaning map \(\mu : \text{Objects} \to \text{Functions}\) is a fixed point of the metacircular semantics. The DDSL analog is the Υ map:
sending symbolic stage descriptions (YAML) to semantic operators (Bellman/Euler/forward operators on function spaces). Round-trip refinement tests whether a YAML is in the image of \(\Upsilon^{-1} \circ \Upsilon\) — i.e., whether the syntactic description is semantically saturated: it contains exactly the information that the semantics needs, no more and no less.
In Backus's algebra-of-programs terms, the round-trip laws are program equivalence laws: if \(\Phi(y) = y\), then the YAML and MDP representations are provably equivalent under the translation. The diffs produced by --refine are counterexamples to equivalence — they point to exactly where the representations disagree.
The deeper categorical structure is a retraction: if \(G \circ F = \mathrm{id}_{\mathsf{Y}}\) (exact GetPut), then \(\mathsf{Y}\) is a retract of \(\mathsf{M}\) and \(F\) is a section. In practice the maps are LLM-mediated and only approximately functorial, so we are really computing an approximate retraction — but the convergence criterion is still well-defined and practically useful.
What this does not give us¶
Round-trip refinement is a necessary but not sufficient condition for correctness. A YAML can be self-consistent under round-tripping and still be wrong — the model it describes may not match the user's intention. What round-tripping catches is internal inconsistency: mismatches between what the YAML declares and what the formal MDP implies. It does not catch:
- Missing economic content (a constraint the user intended but did not state).
- Correct syntax encoding a wrong model.
- Errors in the RAG context that both directions reproduce faithfully.
For these, we need external verification — the Lean 4 type-checking layer (planned), or human review.
Open questions¶
- Convergence rate: empirically, well-formed YAMLs converge in 1 round-trip. Is there a characterisation of which models require more iterations?
- Temperature and stochasticity: at temperature > 0, \(\Phi\) is a random operator. Does the sequence \(\Phi^n(y)\) converge in distribution? Is the stationary distribution concentrated on "good" YAMLs?
- Partial convergence: can we define a metric on \(\mathsf{Y}\) such that \(d(y_n, y_{n+1}) \to 0\) monotonically, even when exact convergence fails? The normalised diff size is a crude version of this.
- Compositionality: if stages \(S_1\) and \(S_2\) each pass round-trip verification, does their composition (via connectors/twisters) also pass? This would connect to the functoriality of \(\Upsilon\).
References¶
- Madaan et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback. NeurIPS 2023.
- Weng et al. (2025). Autoformalization in the Era of Large Language Models: A Survey. arXiv:2505.23486.
- Hidaka, Tisi, Cabot, Hu (2016). Feature-Based Classification of Bidirectional Transformation Approaches. SoSyM 15, 907–928.
- Zuo et al. (2025). PAT-Agent: Autoformalization for Model Checking. ASE 2025.
- Shi et al. (2024). AutoDSL: Automated Domain-Specific Language Design. ACL 2024.
- Backus (1978). Can Programming Be Liberated from the von Neumann Style? Turing Award Lecture. CACM 21(8).