arxiv:2601.21116

AI-Assisted Engineering Should Track the Epistemic Status and Temporal Validity of Architectural Decisions

Published on Jan 28

Authors:

Abstract

AI-assisted software engineering needs frameworks to track decision validity and evidence expiration, proposing a First Principles Framework with epistemic layers, conservative aggregation, and automated decay tracking.

AI-generated summary

This position paper argues that AI-assisted software engineering requires explicit mechanisms for tracking the epistemic status and temporal validity of architectural decisions. LLM coding assistants generate decisions faster than teams can validate them, yet no widely-adopted framework distinguishes conjecture from verified knowledge, prevents trust inflation through conservative aggregation, or detects when evidence expires. We propose three requirements for responsible AI-assisted engineering: (1) epistemic layers that separate unverified hypotheses from empirically validated claims, (2) conservative assurance aggregation grounded in the Gödel t-norm that prevents weak evidence from inflating confidence, and (3) automated evidence decay tracking that surfaces stale assumptions before they cause failures. We formalize these requirements as the First Principles Framework (FPF), ground its aggregation semantics in fuzzy logic, and define a quintet of invariants that any valid aggregation operator must satisfy. Our retrospective audit applying FPF criteria to two internal projects found that 20-25% of architectural decisions had stale evidence within two months, validating the need for temporal accountability. We outline research directions including learnable aggregation operators, federated evidence sharing, and SMT-based claim validation.

View arXiv page View PDF Add to collection

Community

sgilda

about 16 hours ago

Author here (Sankalp).

This is a position paper about a failure mode we see as LLM copilots get embedded into engineering workflows: architectural decisions get produced faster than teams validate them, and the evidence behind them quietly expires.

We argue responsible AI-assisted software engineering needs three explicit requirements:

Epistemic layers that separate hypotheses from empirically validated claims (L0 to L2)
Conservative assurance aggregation grounded in fuzzy logic (Godel t-norm / min) to prevent trust inflation in serial dependency chains
Automated evidence decay ("valid-until") so stale assumptions surface before they cause failures

We formalize these requirements as the First Principles Framework (FPF) and introduce a Gamma-invariant quintet constraining valid aggregation operators.

I would love feedback or criticism from folks building agentic SWE tooling, architecture copilots, or verification systems. In particular:

How would you set validity windows in practice?
Should aggregation be topology-aware (serial vs redundant evidence) by default?
What would a good benchmark for "epistemic drift" look like?

Happy to answer questions.

sgilda

about 16 hours ago

Quick FAQ (since these come up often):

Q: Is this saying LLMs are bad?
A: No. This is about engineering governance primitives when LLMs accelerate decision-making.

Q: Why use min (Godel t-norm)?
A: For serial dependency chains, weakest-link semantics are a safety property. It prevents trust inflation. We discuss non-serial cases and future topology-aware operators in the paper.

Q: Is this a tool paper?
A: No. It is a position paper plus formalization, with deployment evidence motivating the problem.

sgilda

about 16 hours ago

If you want to build on this:

A practical next step could be a small "decision governance" layer that stores:

a decision graph (claims and dependencies)
evidence items with (F, scope, R) and valid-until
a computed Reff score with weakest-link explanation
alerts when evidence expires

If anyone is working on agentic SWE or architecture copilots and wants to collaborate on an implementation or benchmark, feel free to reply here.

sgilda

about 16 hours ago

Figure 1: F-G-R trust tuple (FPF claim metadata: Formality, Scope, Reliability)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.21116 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.21116 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.21116 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.