FieldHash | Governed AI Memory

Abstract

FieldHash is a governed, verifiable continual learning layer for persistent AI work. It stores symbolic state, evidence trails, and validated memory outside the underlying LLM, with FieldHash providing tamper-evident provenance and Gnosis governing bounded adaptation.

Instead of fine-tuning an LLM for each user, the system gates retrieval, records learning events, tests claims against evidence, and promotes durable context only when governance checks pass. FieldHash is the interface; QARIN is the neurosymbolic engine; the result is cross-session continuity without hidden base-LLM weight changes during normal use.

The current local/on-prem diagnostic exercises the explicit governed-state enforcement path with local Ollama generation and FieldHash-compatible audit artifacts: hash-chained events, checkpoint signing, certificate binding, and transparency anchoring. In a Dilithium-enabled run, the checkpoint and certificate used CRYSTALS-Dilithium3 signatures; when PQC is required, the demo now fails loudly instead of silently accepting classical fallback. That supports audit portability for governed state where configured; it does not claim arbitrary prose authority inference or legal compliance.

In Plain English

FieldHash is built to remember useful work, test ideas against evidence, and reuse what survives. It does not retrain the underlying LLM for each user; it keeps governed memory, symbolic state, and benchmark artifacts outside the model so future sessions can start from a better place.

Public evidence package

The current load-bearing public evidence is organized around four governed-memory case studies: automatic memory promotion, governed learning lifecycle, memory pressure, and auditability controls. Methods reports, figures, and source-artifact hashes are archived on Zenodo.

Auto promotion Lifecycle Memory pressure Audit controls Zenodo DOI

What learning means here

Learning does not mean the underlying LLM updates its weights during normal use. It means validated context, evidence, symbolic state, routing hints, contradiction resolutions, and research outcomes can alter future retrieval and reasoning behavior under governance.

Governed continual learning

A learning event becomes durable only after scope, evidence, confidence, contradiction, compression, and telemetry checks decide that it should influence future retrieval or routing. The term names FieldHash's learning loop and its gates; FieldHash, Gnosis, Deep Synthesis, and the interface are supporting systems, not interchangeable labels for the same thing.

Store

Useful context, evidence, decisions, and symbolic state are logged outside the model.

Gate

Relevance, novelty, evidence, and governance checks decide what can be reused.

Promote

Only stable, useful signals become durable context for future sessions.

Not model fine-tuning

The base LLM is not silently retrained for each user.

Not preference training

Runtime behavior is not presented as preference-data reinforcement training.

Not plain retrieval

Candidate context still has to pass governance before use.

Not context compaction

Shortening context is different from promoting a durable learning event.

A rejected learning event matters

The governance claim is strongest when the system refuses influence: a discarded brainstorm, stale project correction, or unrelated-project memory should remain visible in audit history without becoming a future answer prior. That prevention path is benchmarked separately from open-form answer quality.

Governed & verifiable

Govern what influences the answer — and verify the record.

The enterprise claim is not just that FieldHash can remember. It is that remembered state is gated before use, leaves a reviewable trail, and can be checked after the fact when a compliance officer, collaborator, or auditor asks what changed.

FieldHash audit layer

FieldHash is the verifiable provenance layer for governed memory: hash-chained events, checkpoint signing, certificate binding, and transparency anchoring for selected audit artifacts.

In a Dilithium-enabled diagnostic, checkpoint and certificate signing used CRYSTALS-Dilithium3; when PQC is required, fallback becomes a hard failure. Anchors are operator-resistant only when retained outside the operator boundary. The claim is tamper-evident, not tamper-proof.

Governance engine

The Gnosis engine governs bounded adaptation through explicit gates: static sanitization, isolated execution, integration scoring, and transactional commit with rollback.

It is not autonomous self-evolution. It cannot modify constitutional core values or introduce raw capabilities without explicit human multi-party authorization.

Runtime loop

User input enters with task constraints.

Candidate memory and prior artifacts are retrieved by reasoning context.

Evidence, relevance, novelty, and governance gates filter what can shape the answer.

The model responds with approved context.

Symbolic state, caveats, and outcomes are logged.

Validated signals are promoted into future retrieval and routing behavior.

The Problem

You've explained your work to your AI tools dozens of times. They still don't know you.

In most deployed chat workflows, each session starts fresh. Context vanishes. Insights don't compound. When someone asks "how did the AI reach this conclusion?"—a thesis advisor, a compliance officer, a collaborator—you have no durable evidence trail.

The latest "thinking models" explore multiple hypotheses in parallel—impressive reasoning. But each session is a brilliant collaborator with no notebook: when it ends, the maturity of the collaboration resets, and tomorrow you recreate the same context and standards all over again. Standard AI recalls facts; it does not reliably compound judgment.

RAG retrieves documents. Fine-tuning retrains models. Thinking models reason harder. But few systems compound what they learn into inspectable, durable artifacts.

Interpretability

"Black box" reasoning blocks many regulated workflows. AI that predicts without explaining is difficult to deploy in healthcare, finance, and R&D.

Rigor

Insights remain conversational dead-ends. No statistical validation, no provenance.

Compounding Knowledge

Most chat workflows do not accumulate durable, inspectable task history by default. Each problem starts with too little reusable context.

The Reality: These gaps create a trust deficit that blocks adoption wherever accountability matters—from boardrooms to research labs to doctoral committees.

The Solution

FieldHash organizes around four functions, each with a specific job. Together they let validated context, evidence, decisions, and reasoning patterns survive between sessions instead of dissolving with the chat window. The mechanisms underneath each function are described operationally in Under the Hood below.

The substrate that lets work compound

Memory & Continuity

FieldHash stores symbolic state signals alongside content, so retrieval can surface past decisions by similarity of reasoning context — not just keyword match. A governed evidence gate filters retrieval candidates based on contextual relevance, information density, and source provenance, with policy rules that shift by memory tier. Three tiers keep the privacy boundaries explicit: Personal stays per-user; Collective organizational memory only admits insights that pass explicit governance reviews inside an approved deployment scope; Consolidated memory re-encodes recent activity into compact long-term artifacts. Multi-domain routing can blend specialized context through stable fallback pathways when custom blending parameters are unavailable.

Mechanisms underneath

Evidence-gated retrieval with relevance, density, and provenance scoring
Three-tier memory hierarchy: Personal (private), Collective organizational (governed), and Consolidated (long-term)
Multi-domain routing with fallback protection and metadata audit trails

The engine that turns context into evidence

Reasoning & Discovery

The Research Lab plans experiments end-to-end: an Adaptive Model Tournament selects between gradient boosting, symbolic regression, and statistical tests, runs iterative cycles to convergence, and — for symbolic discovery — returns interpretable equations rather than black-box predictions. Deep Synthesis turns document libraries into testable claim chains, with hub-compression monitoring so structural patterns surface as evidence rather than narrative. Its falsifiability pass rejects empty or generic plans and emits matched controls, explicit failure conditions, and Research Lab-routable validation candidates when executable substrate is present. Those candidates can be routed into matched replay or ablation-style validation, with support/null outcomes feeding the next research thread. A novelty layer uses information-gain signals alongside accuracy, preventing the system from confirming what it already knows.

Mechanisms underneath

Adaptive Model Tournament — gradient boosting vs symbolic regression vs statistical tests, iterative convergence
Symbolic regression returning interpretable equations rather than black-box predictions
Deep Synthesis with hub-compression monitoring, falsifiability repair, and Research Lab-routable validation outcomes
Novelty and information-gain prioritization alongside accuracy

The layer that survives provider changes

Provider portability

Governed memory state — validated context, supersession history, scope, and review status — lives outside the base LLM, so it persists when the underlying model changes. A read-only policy tier governs adaptation, and every change leaves reviewer-facing continuity telemetry without exposing the raw control envelope. The documented November 2025 GPT → Claude migration is the anchor case: the governed memory layer carried approved state across the provider switch without retraining.

Mechanisms underneath

Governed state lives outside the base model: validated context, supersession, scope, and review status
Read-only policy tier governs adaptation; changes are constrained, not automatic
Per-turn continuity telemetry for state stability and reviewer audit
Provider-agnostic: governed state persists across GPT, Claude, and Gemini, with GPT → Claude as the documented migration case

The discipline that earns trust

Governance & Evolution

The governed evolution stack routes sensitive changes through independent checks: static validation, isolated execution, coherence review, and transactional deployment with rollback. Capability access is earned through trust tiers rather than assumed; the system starts with limited autonomy and must demonstrate alignment before gaining more. Constitutional core values are encoded as read-only signed artifacts and cannot be modified without human multi-party approval. FieldHash adds tamper-evident provenance for selected major artifacts, with optional hardware anchoring where configured and documented.

Mechanisms underneath

Trust tiers gate sensitive capabilities; access is earned, not assumed
Four-gate governed-change pipeline: static validation, sandbox execution, coherence review, transactional deployment
Signed decisions and transactional commit with rollback
FieldHash provenance: tamper-evident signatures, PQC-backed checkpoints where configured, and optional hardware anchoring

Evolutionary Pedigree

FieldHash is not a single product release; it is the result of more than 30 architectural phases since 2025. The lineage gives the system's governance and learning patterns an auditable foundation.

Foundation layer

Phase 18: Generative Foundations

Implementation of the first reflexive critics and candidate-improvement proposals. Established the baseline for governed module testing.

Projection layer

Phase 26: Language Projection

The shift from keyword-based tags to learned symbolic projections and local manifold-backed routing.

Continuity layer

Phase 30: Field Dynamics

Introduction of symbolic-state dynamics for modeling continuity, uncertainty, and reasoning-state retrieval.

Governance layer

Phase 33: Disciplined Autonomy

Formalization of the governed-change stack: static review, isolated testing, coherence review, and rollback-ready deployment.

Hierarchy layer

Phase G: Constitutional Hierarchy

Deployment of the three-tier policy hierarchy (read-only core policy → governed operating modes → domain context). Designed to support continuity across provider migrations; public materials describe selected migration examples rather than universal provider-agnostic validation.

Under the Hood

The mechanisms below are described conceptually: how they support reliable, governed AI work at runtime, how they manage state, and how they enforce boundaries. The point of this section is to demonstrate the system's rigorous, auditable architecture.

The neurosymbolic seam

Neural and symbolic elements are decoupled. Swappable LLMs and embeddings communicate with the symbolic state layer through structured metadata envelopes rather than sharing parameter weights, keeping the underlying model frozen.

Neural side. A swappable LLM generates content tokens, while embedding models compute retrieval similarity. Both are provider-agnostic, running completely frozen in their parameter weights.

Symbolic side. The symbolic state, routing indices, and memory gates handle logic and persistent context. Each turn carries a structured envelope: symbolic state signals, a tier-tagged retrieval set with provenance, and continuity telemetry. These live outside the model, avoiding weight updates during use.

The seam. The LLM does not see the raw envelope; it sees prompt context shaped by it. Three runtime decisions cross the boundary:

Memory retrieval is governed and gated, not raw vector search. The LLM only ever sees context that satisfies security, relevance, and novelty checks.
Cross-domain context blending is validated and gated. If custom blending parameters are not loaded, the system falls back to a stable baseline and logs explicit fallback metadata.
Symbolic state is updated after the LLM responds, not before. The post-turn update is computed from the actual exchange and exposed as continuity telemetry to verify behavioral stability.

Interactive Simulation

The Neurosymbolic Seam

Symbolic Registry (Stateful)

Durable Contextstable

Routing Policyactive

Continuity Telemetrysynced

✉Metadata Envelope

seam

Neural Core (Frozen)

Provider-Agnostic LLMweights frozen

Embedding Layerweights frozen

Context Envelope Slotempty

◎

The system rests in idle. Trigger a turn to observe the structured metadata exchange across the partition.

Neurosymbolic state layer

FieldHash maintains a compact, human-interpretable symbolic state layer outside the base model. The state supports continuity, retrieval context, routing, and reviewer-facing telemetry across sessions.

State updates are governed rather than automatic: low-trust, repetitive, or unstable signals are constrained before they can influence durable memory or future behavior.

Detailed axes, telemetry schema, update rules, thresholds, and projection mechanics are reserved for qualified technical review under NDA.

Governed memory consolidation

Governed memory consolidation is an asynchronous re-encoding pass that promotes stable, validated context into long-term retrieval memory.

What triggers it. The consolidation process is triggered when the system detects sustained high-value output, finished research tasks, or significant milestones in the interaction lifecycle.

What it does. When active, the system condenses recent session data into structured, durable context blocks, saving them to the appropriate personal-memory tier. Highly validated insights can be promoted to an organization-scoped collective tier under strict manual or administrative governance.

What gates it. Updates are automatically deferred if the interaction patterns are marked by low certainty, unresolved contradictions, or unstable context. This prevents noisy, redundant, or temporary signals from contaminating long-term intelligence.

What it is not. It is not base-model fine-tuning, context compaction, unsupervised self-modification, or a continuous unmonitored background process. It is a highly bounded, governed indexing operation designed to preserve high-utility context cleanly.

Interactive Simulation

Consolidation Cascade

Collective Memory

unallocated

◎

Standby

Private Stream

◎

System Standby: Private Memory contains a mixture of transient logs, telemetry, and high-value insights. Click "Trigger Consolidation Sweep" to start the pruning cycle.

Evidence-gated retrieval

Memory retrieval is not raw vector search. Every candidate result passes through a multi-dimensional evidence gate before being injected into the active prompt context.

Structured evaluation channels:

Contextual Relevance: Alignment with the active task objectives and semantic constraints.
Information Density: Filtering redundant signals to optimize the balance between representative memory hubs and high-value outliers.
Provenance & Tier Policy: Verifying that source tags, user scopes, and security clearance levels match active retrieval authorization.

The evaluation parameters shift dynamically by memory tier: identity-bearing principles are treated with a conservative bias, while personal user-driven research context can adapt more dynamically under active governance.

The system surfaces audit trails for retrieval decisions, verifying which segments of historical context shaped a given response without exposing private raw memory or internal weights.

Learning event anatomy

Governed continual learning is not a single memory write. A learning event is a bounded promotion decision: a candidate signal is retrieved, checked against scope and evidence, compared against stale or rejected context, then either promoted, deferred, weakened, or ignored.

Scope: personal memory, organization-scoped insights and heuristics, and internal memory lanes are separated before retrieval so private working context does not become unrestricted shared context.
Supersession: reviewed corrections can replace older context, while rejected brainstorms and noisy notes remain auditable without becoming silent priors.
Compression: repeated near-identical signals are consolidated into representative hubs, while high-information outliers can stay available for review.
Promotion record: accepted, rejected, deferred, and weakened signals can leave telemetry so later reviewers can see what changed future behavior and what did not.

This is the practical meaning of governance in the learning loop: useful work can compound, but contamination, stale corrections, and over-concentrated memory clusters are treated as first-class failure modes.

Policy hierarchy

FieldHash governs adaptation through a three-tier policy hierarchy.

Read-only core policy: Immutable principles governing bounded adaptation, encoded as signed artifacts that require multi-party approval to change.
Governed operating modes: Task-scoped configurations that adjust how context is selected and applied, bounded by the read-only policy tier.
Domain context: Specialized knowledge spaces that provide high-resolution retrieval for specific tasks.

The hierarchy lets the system adapt how it selects and applies context per task while staying tethered to the read-only policy tier — every change is bounded, governed, and logged.

The Memory Model

To balance privacy with compounding intelligence, FieldHash structures context across three distinct memory tiers:

Private Personal Memory: Isolated episodic journals and per-user working context. These are strictly private and never shared between users.
Trust-Gated Collective Organizational Memory: Governed organizational knowledge bases, promoted only after administrative review and governance checks for confidence, safety, and generality. Raw conversations are never automatically pooled.
Governed Consolidation Memory: Asynchronous offline re-encoding that condenses episodic pathways, turning raw events into stable long-term context models.

The Privacy Policy: Private Mood, Shared Wisdom. Your specific interaction patterns remain your own; only the generalized insights and heuristics that survive administrative verification benefit your approved organizational deployment scope.

Hardware-Tethered Mesh

For intensive compute tasks, FieldHash scales across a 6-node distributed mesh. The 6-node cluster benchmark represents a high-concurrency throughput stress test rather than generalized quantum speedup or clinical validation. On a distributed-mesh stress benchmark, the 6-node cluster exhaustively scored 8,347,680 candidate marker panels in approximately 113 seconds, surfacing the best cross-validated panel at AUC 0.59 on the internally processed evaluation matrix.

Throughput stress test of the distributed mesh; the AUC reflects the difficulty of the panel-selection task and is not a clinical claim.

Distributed Reasoning Workloads. The system offloads selected symbolic-state tasks to mesh nodes to optimize latency under intensive stress testing.

Provenance. Even when distributed, every calculation is logged with structural provenance, ensuring that a result generated on Node 4 is just as auditable as one generated on the local host.

◎ Telemetry Registry Mesh

Inter-Node Concurrency Control

node-001

Load18%

Latency14ms

node-002

Load12%

Latency16ms

node-003

Load24%

Latency15ms

node-004

Load8%

Latency19ms

node-005

Load14%

Latency17ms

node-006

Load32%

Latency11ms

◎All 6 nodes synchronized in concurrency parity. Transaction telemetry secure.

In Plain English

These mechanisms are designed to leave auditable telemetry: memory-gate decisions, compression metadata, continuity signals, and consolidation defer reasons can be surfaced for review without exposing private raw memory or implementation internals.

Evidence: Selected Evaluations and Case Studies

Representative results and internal evaluations

For methodology, sample sizes, caveats, and evidence-package labels, see the Evidence & Evaluation Summary.

Flagship governed-context studies

Automatic Memory Promotion

PROMOTION

90/90

In internal automatic-promotion diagnostics, FieldHash identified the authoritative memory and recovered 90/90 exact current tokens across Gemini 3.5 Flash, Claude Opus 4.7, and GPT-5.5 on a Claude-authored disjoint n=30 corpus, while retrieval-only and prompt-only smart-memory controls recovered 36/90 and 40/90. A same-budget Gemini two-pass smart diagnostic on the same corpus selected the current record 30/30 and answered 28/30 with zero stale substitutions, narrowing the claim to governed, auditable answer-path control rather than basic semantic selection. On same-family n=100 provider replications, Gemini 3.5 Flash and GPT-5.5 each reached 100/100 with zero stale-token mentions; Claude Opus 4.7 reached 95/100, with the remaining misses caused by empty provider responses rather than stale substitutions. In a provider-sensitivity fact-extraction audit on the same n=100 corpus, Gemini reached 99/100 role-equivalent current facts and 68/100 exact spans, while GPT-5.5 reached 95/100 and 76/100; strict source-span fidelity and provider-invariant extraction are not claimed as solved.

Read case study

Governed Learning Lifecycle

LIFECYCLE

2400/2400

In an internal n=200 governed-learning lifecycle diagnostic, FieldHash selected the expected governed state on 2400/2400 normal lifecycle reads across promote, stale re-entry, reject, scope-collision, repair, second-promotion, rollback, re-apply, compaction, post-compaction stale re-entry, and repeated-read phases, restoring rollback state 200/200, retaining protected state through compaction 200/200, and detecting plus repairing deliberate drift 200/200. Stateless read-time selectors shown only record text (Gemini 3.1 Flash Lite and GPT-5.5) reached 2200/2400 and missed rollback 200/200 by continuing to select the later update. A full-operation-log baseline can replay the lifecycle, so the supported claim is durable materialized governed state under bounded read-time reconstruction, not base-model weight learning or broad model superiority.

Read case study

Governed Memory Pressure

MEMORY

600/600

In the refreshed May 23 internal seeded-authority memory-pressure benchmark, the same frontier LLMs with FieldHash governed memory enforced reviewed/current state and recovered the approved-current fact in 600/600 cases across Gemini, Claude Opus, and GPT provider paths. Retrieval-only memory without FieldHash governance metadata recovered 415/600. The governed path reduced mean memory context exposed to the LLM to 2.00 of 10 retrieved candidates before answer construction, versus all 10 candidates in the retrieval-only baseline. Across the same three provider paths, adding a prompt-only instruction to prefer current/reviewed records improved the baseline to 464/600, but still left 136 stale-context failures and exposed all 10 retrieved memories. This supports governed-state enforcement under stale-context pressure, not a claim of superior authority inference from raw text.

Read case study

Governed Memory Auditability

AUDIT

437/437

The governed memory auditability diagnostic passed 437/437 deterministic checks across 36 lifecycle scenarios. That includes 257 positive governance invariants and 180/180 negative controls that deliberately disabled governance or corrupted state, confirming the suite catches stale exposure, rejected-context promotion, missing superseded_by links, scope leakage, and stale re-promotion.

Read case study

How the flagships compose

Automatic promotion is the upstream step; the lifecycle pillar tests durable state across rollback, repair, compaction, and repeated reads; memory pressure is downstream arbitration; auditability controls test whether broken governance is caught. Together, they form the bounded governed-learning loop. Read the loop case study.

Archived methods

The flagship studies are paired with a public-safe Zenodo package containing methods reports, the governed-learning loop synthesis report, aggregate figures, source-artifact hashes, and a package builder. View DOI 10.5281/zenodo.20401670.

Workflow-quality diagnostics and supporting controls

Workflow-Quality Diagnostics vs Direct Baseline

SUPPORTING DIAGNOSTIC

Against the same-provider direct baseline, FieldHash improved average composite reasoning quality by 48.6% on a 56-prompt live paired benchmark while improving grounding fit from 94.64% to 100%.

The broader diagnostic run is more representative of the current evidence frontier: In the broader diagnostic suite (v4), FieldHash measured a 52.2% relative reasoning lift with 155 wins, 5 losses, and 43 ties; exact correctness held at 100% on 24 deterministic tasks, and semantic grounding held at 100% across 32 ambiguity-control cases. It does not replace the promoted headline because task-selection remains slightly below the smaller incumbent slice, but the larger sample makes the persistence of the reasoning lift more compelling.

Governed memory now has its own live check: In a 32-case live governed-memory benchmark, FieldHash recovered current approved project context with a 98.96% mean recall score, a 100% seeded memory-retrieval rate, and 0% control-user leakage across continuity, rejected-noise, superseding-update, and topic-isolation tasks. The important part is not raw fact recall; the suite tests governance behavior: persistence of reviewed context, rejection of noisy notes, superseding updates, topic isolation, and user isolation.

The underlying control paths are measured separately: A deterministic governed-learning controls benchmark passed 5/5 mechanism checks covering memory arbitration, organization-scoped collective insight and heuristic write gates, audience-scoped retrieval within an organization, hub compression, and audit telemetry. This is mechanism evidence, not an open-form answer-quality benchmark.

The auditability layer has a newer falsifiability diagnostic: The governed memory auditability diagnostic passed 437/437 deterministic checks across 36 lifecycle scenarios. That includes 257 positive governance invariants and 180/180 negative controls that deliberately disabled governance or corrupted state, confirming the suite catches stale exposure, rejected-context promotion, missing superseded_by links, scope leakage, and stale re-promotion. This remains an internal code-path check, not external validation or a base-model learning claim.

The practical effect is not “more words.” It is a stronger reasoning posture: better steering, better experiment framing, fewer generic summaries on ambiguous prompts, and tighter adherence to the user's actual constraints. Steering usefulness moved from 0.0125 to 0.3857 in the same broader live run.

In user terms, the system behaves more like a disciplined intellectual partner than a search box. The headline is an average uplift across the evaluated prompt set, not a claim that every prompt improves equally.

Composite Quality

+48.6%

0.3740 → 0.5556

Steering Usefulness

0.3857

Up from 0.0125 in baseline.

Grounding Fit

1.000

Up from 0.9464 in the same live batch.

Broader Diagnostic

+52.2%

0.3509 to 0.5340; 155/5/43 wins/losses/ties.

This benchmark is about reasoning quality under live conditions, not closed-form recall. It measures whether the system chooses a more useful line of thought while staying grounded.

See prompt families, scoring method, and benchmark methodology

Supporting Proof Points: Exactness, Selection, and Ambiguity Control

SUPPORTING CHECKS

The website suite does not rely on one flattering headline. It also measures whether FieldHash preserves exact-answer competence, chooses stronger reasoning families, and avoids collapsing ambiguous prompts into the wrong semantic universe.

That matters in practice because users need more than eloquent answers. They need a system that can reason better and stay disciplined: choosing a better line of thought, preserving deterministic correctness, and keeping metaphor-heavy prompts grounded in the right domain.

Exact Correctness

100% → 100%

Callable-backed deterministic multiple-choice questions across science, statistics, algorithms, and mathematics on a 24-task safety-floor benchmark.

Task Selection

0.00 → 0.40

A 40-point lift in choosing the correct lens family across 30 approved-gold prompts.

Semantic Grounding

1.00 / 1.00

100% artifact-class and prompt-family accuracy on the 16-case ambiguity-control proxy set.

Broader Diagnostic Checks

100% / 100%

Exactness and semantic grounding held in the larger May 2026 diagnostic run.

Reasoning Stack Use

53 / 56

Quick-lite engaged on most live reasoning prompts in the website suite.

The intended outcome is disciplined leverage: stronger open-form reasoning without sacrificing closed-form competence or drifting away from the user's actual problem.

View benchmark suite

Governed Learning Controls (Mechanism Check)

GOVERNANCE

A deterministic governed-learning controls benchmark passed 5/5 mechanism checks covering memory arbitration, organization-scoped collective insight and heuristic write gates, audience-scoped retrieval within an organization, hub compression, and audit telemetry.

This is the control-plane half of governed continual learning: not everything observed becomes future context, and not every useful observation is promoted to every audience.

These checks validate deterministic promotion and suppression behavior. They do not claim base-model fine-tuning or broad predictive-model superiority.

Tools Manifold Routing (Internal Paired Evaluation)

PAIRED EVALUATION

The tools manifold learns a policy for ranking and timing tool calls (sync vs deferred) from telemetry outcomes. Tools manifold routing improved top-1 selection by 3.77 percentage points on real paired events and by 5.34 percentage points on the broader combined benchmark.

This evaluates tool ranking/timing policy against a fixed baseline on paired events. Real-event significance remains underpowered at this sample size, so the broader combined benchmark is the stronger support line.

Learning From Resolved Contradictions

PHASE 4

In repeated contradiction-oriented synthesis runs, the system reused prior routing resolutions and selectively recovered relevant historical priors instead of repeatedly spawning generic anomaly branches.

Resolution Reuse

Active

Recurring contradiction routes can be reused from prior internal resolutions.

Prior Recovery

Selective

Targeted historical priors can be thawed instead of generic fallback branches.

Runtime uptake

Read-only

Live chat uses these outcomes as routing hints, not recycled synthesis text.

This is the current learning loop: the system reuses known contradiction-routing patterns, selectively re-materializes relevant historical priors, and feeds those outcomes back into the runtime as read-only decision support for deeper checking.

Recent synthesis outcomes now feed back into runtime decision-making as read-only routing support. This is the latest layer in a broader continual-learning system: memory, governed consolidation, manifold adaptation, and contradiction recovery all compound together. The current runtime uses those outcomes to produce steadier judgment and deeper checks, while keeping raw internal artifacts out of visible replies.

Deep Synthesis learning is thread-scoped by design: compact priors, null results, validation-plan shapes, and experiment outcomes can guide later drill-downs in the same research lineage, while cross-thread parallels are treated as weak analogies rather than evidence.

Manifold Stability

STABILITY

Production manifolds validated above 91% accuracy while monitored training drift stayed within a bounded L2 range of 0.014 to 0.121.

Val Accuracy

>91%

L2 Drift

0.014–0.121

Convergence

1–13 epochs

Measured after governed manifold updates; baseline accuracy preserved within noise margin.

Signal Consolidation

COMPRESSION

Hub compression consolidates redundant memory signals into a smaller set of decision-relevant features while preserving quality guard floors.

Compression Goal

Lower Redundancy

Current Role

Runtime Hygiene

Public Posture

Narrative

Analytical Discovery

UFCT Mesh Sharding (Timeboxed Workload)

SPEED

On a sharded synthesis workload, mesh-parallel execution achieved a 2.74x mean speedup over local execution across 10 queries.

This benchmark measures orchestration and distributed execution speed, not "quantum advantage."

EU AI Act Regulatory Analysis

DEEP SYNTHESIS

Deep Synthesis can surface contradictions, edge cases, and structural tensions across long regulatory texts, then turn surviving claims into reviewable validation plans you can iterate on.

DSA↔AI Act conflictsTransparency-Security DeadlockPersonalization Penalty

Read the full case study

Complexity Theory Synthesis + Validation

CASE STUDY

An example of multi-pass synthesis across dense sources: generate competing hypotheses, resolve contradictions, and produce a falsifiable reasoning surface you can inspect and iterate.

Read the case study

The aim: continuity you can inspect. When the work demands rigor, the system should show its reasoning surface and its constraints, not just fluent output.

Technical Architecture

This table names product and research-system layers. Interface motion is representational telemetry; the evidence claims below are tied to logged benchmarks, governed memory events, and audit artifacts rather than visual metaphor alone.

Layer	Technology	Function
Frontend	React web layer / Canvas / Tailwind	Runtime and Research Lab interface, including Prism visualizations of pipeline and field-state telemetry when available
Interaction	SVG Liquid Filters / Framer Motion	Motion language for state transitions, loading, and user feedback; visual design is not presented as evidence of cognition by itself
API	FastAPI / Python	QARIN Routes: Memory retrieval, vision, streaming
Evidence Layer	Posterior Metadata / Evidence Gates	Posterior-style metadata, caveats, and likelihood-inspired scoring over compressed belief hubs
Engine	PyTorch / Scikit-Learn	Neurosymbolic routing, symbolic vector features, benchmarked research workflows, and Governed Consolidation of selected durable signals
Learning & Memory	Manifold Bus / Trust-Gated Memory	Trust-weighted routing and governed adaptation signals; normal use does not update base-LLM weights
Security	FieldHash / Audit Logger	Tamper-evident provenance for selected major artifacts and governed-memory events, with PQC-backed checkpoints and optional hardware anchoring where configured

Audited

Claim Registry

Governed

Learning Events

Traceable

Evidence Flow

Market Application

The first commercial wedge is governed AI memory for enterprise and regulated deployments: places where continuity, auditability, and control over what influences an answer are worth more than raw response volume. Adjacent markets are real, but they should follow the evidence rather than dilute the initial positioning.

Primary markets

Research & Discovery

Research teams, labs, and scientific operators: Longer-running analyses over papers and datasets, returning candidate hypotheses, validation plans, negative results, and evidence trails.

Professional Knowledge Continuity

Founders, strategists, and analysts: A professional counterpart that preserves institutional memory, operational preferences, and working standards over time. It functions as a resilient intellectual partner, strictly isolated from "AI friend" or emotional-support paradigms.

Adjacent applications

Applied Research Teams

Pharma / BioTech / Materials Science: Candidate hypotheses with validation plans and evidence trails across complex datasets.

Physics & Hard Science

Condensed Matter / Plasma Physics / Materials Engineering: Cross-domain synthesis that identifies patterns across experimental datasets. Negative results are as valuable as positive—the system constrains theoretical search space.

Institutional Memory

Legal / Compliance / Finance: Durable organizational memory, review trails, and decision context for teams that cannot rely on disposable chat history.

Education & Lifelong Learning

Students / Educators / Autodidacts: Learning support that preserves useful context and pedagogical preferences over time, with appropriate safeguards for sensitive use cases.

Related Work & Positioning

Stateful and memory-augmented LLM agents are an active area. Relevant reference points include MemGPT and Letta for agent-managed context and long-term memory, Mem0 for production memory layers across users and sessions, Zep for temporal knowledge-graph memory, HippoRAG 2 for non-parametric continual learning through retrieval, and the broader continual-learning literature around adding knowledge without retraining an LLM from scratch.

FieldHash shares the substrate-independence and inspectable-memory goals of this field. Its public claim is narrower: it combines governed memory promotion, symbolic-state telemetry, tamper-evident provenance for selected audited artifacts, and an integrated research/discovery pipeline in one product surface. The closest conceptual overlap is with memory-first agent systems such as Letta; FieldHash's differentiating choices are the symbolic-state seam, FieldHash provenance layer, and Deep Synthesis / Research Lab workflows that carry learning into hypothesis generation and empirical testing.

Why the Integration Matters

Integrated System Architecture

The advantage comes from the integration: persistent symbolic state, governed memory, synthesis workflows, audit artifacts, and benchmark feedback working together rather than as a thin chat wrapper.

Auditable Evidence Updating

QARIN keeps probabilistic metadata outside the underlying LLM as an auditable constraint layer. Priors, likelihood-inspired scores, and caveats are recorded through governed memory gates, helping reduce context loss between sessions.

Cumulative Learning

QARIN's learning compounds through explicit, persisted state updates rather than hidden base-LLM changes. Validated routing and context-selection signals can shape future retrieval, repeated contradiction patterns can reuse prior routing decisions, and relevant historical priors can be selectively recovered instead of rebuilding from generic fallback.

Governed Evolution

The Gnosis layer implements governed self-audit and bounded adaptation primitives for AI systems, designed to support alignment review without unsupervised self-modification.

Learn More

Tamper-Evident Provenance

FieldHash uses tamper-evident provenance design, including PQC-backed signatures for selected checkpoints and optional hardware anchoring for enhanced evidence under documented assumptions.

Learn More

Substrate Independence

Substrate independence is a shared goal across stateful-agent systems. FieldHash combines that design principle with symbolic-state telemetry, governed promotion records, and tamper-evident provenance so continuity is less dependent on any single underlying LLM.

Limitations

FieldHash does not update base-LLM weights during normal use.

Benchmarks measure evaluated slices, not universal superiority across every task.

Internal evaluations require external replication before they should be treated as field-wide proof.

Research Lab outputs are candidate hypotheses and interpretable models, not clinical, legal, or regulatory advice.

Status

Operational Alpha

Design Language (FieldHash) Implemented
Backend (QARIN) implemented across selected enterprise and research workflows
Safety Protocols (Gnosis) Active
Scientific Loop: implemented subsystems with public benchmark and test evidence

Appendices & Strategic Deep Dives

APPENDIX_SL: The Scientific Loop

Automated hypothesis generation, experiment planning, and manifold-enriched insight generation for R&D teams.

APPENDIX_PHASE_H: Hardware Attestation Interfaces

IBM Brisbane and Quantum Inspire interface specs, execution-fingerprint protocols, and trust-tier policy gates.

APPENDIX_PLATFORM_ARCHITECTURE

Full gRPC mesh specifications, load balancing policies, and disaster recovery for the 6-node research cluster.

APPENDIX_G: Phase G Roadmap

The transition from GPT to Claude and the formalization of the provider-agnostic governed-state layer.

For technical diligence

Public materials summarize FieldHash at a product and evidence level. The governed-context evidence package is archived at DOI 10.5281/zenodo.20401670. Deeper architecture notes, benchmark methodology, ablation summaries, trace examples, and governance controls are available selectively for qualified technical reviewers.

Request Access View Benchmarks