Automatic memory promotion

Authoritative memory identified and governed across three frontier models. No weight updates.

On the blind n=30 cross-provider probe, FieldHash recovered 90/90 current tokens across Gemini, Claude, and GPT. A same-budget Gemini two-pass diagnostic selected the current record 30/30 and answered 28/30, narrowing the claim to governed answer-path control rather than basic semantic selection.

Cross-provider blind probe

90/90

Gemini, Claude, and GPT n=30

Two-pass smart diagnostic

28/30

current record selected 30/30; Gemini n=30

N=100 provider replication

Gemini/GPT 100/100

Claude reached 95/100 with 0 stale substitutions

Provider-sensitive fact extraction

99/100 | 95/100

role-equiv Gemini | GPT; exact spans 68/100 | 76/100

Dense-memory stress

2,156 → 500

100/100 current-token retained

Two regimes are shown: the n=30 blind cross-provider probe includes Gemini, Claude, and GPT on a Claude-authored disjoint corpus; the n=100 provider replications use the same frozen corpus, where Gemini and GPT completed cleanly while Claude produced five empty provider responses.

In practical terms, this is the moment before memory becomes useful. A workspace may contain an old plan, a correction, a rejected idea, and a newer approved direction. FieldHash has to identify the authoritative memory before the model ever sees the answer context.

The outcome: Automatic conflict resolution before the model is called, producing a cleaner, more authoritative answer path and keeping evaluated stale, adversarial fragments out of the answer path.

Why this matters

The hard part is not only remembering. It is turning the authoritative memory into the next answer path.

The earlier memory-pressure case study tested whether FieldHash honors an approved-current label once that label already exists. This benchmark moves upstream: the records are unlabeled at retrieval time, and FieldHash must infer which record is authoritative before the answer is constructed.

That distinction matters for real work. Teams do not only need persistent memory; they need memory that can demote stale context, reject discarded alternatives, and carry the reviewed version forward without hiding the historical trail.

A later memory-framework diagnostic tests the same distinction downstream: the current record can be retrieved into candidates and still fail to govern the answer path.

Core comparison

FieldHash automatic promotion

90/90 exact tokens

Prompt-only smart memory

40/90 exact tokens

Retrieval-only memory

36/90 exact tokens

Recency-aware memory

3/90 exact tokens

Same-budget two-pass smart diagnostic

28/30 exact tokens; selector chose current record 30/30

The cross-provider n=30 probe used a Claude-authored disjoint corpus. FieldHash recovered every current token across Gemini, Claude, and GPT answer paths; prompt-only instructions and retrieval-only memory did not. A later same-budget Gemini two-pass smart diagnostic on the same corpus selected the current record 30/30 and answered 28/30, with zero stale substitutions. That narrows the claim: the advantage is governed, auditable answer-path control, not that a frontier model cannot identify the current record when given a separate selection pass.

Provider results

The governed answer path replicated across providers.

The n=100 replications use the same frozen corpus and semantic-label artifact. The point is not to rank models or prove basic semantic selection; it is to test whether the governed memory path keeps stale context out of the answer surface those models receive.

Gemini 3.5 Flash

Same-family n=100 scale replication.

LLM + FieldHash100/100
Prompt-smart memory25/100
Retrieval-only memory23/100
Recency-aware memory2/100
FieldHash stale mentions0

GPT-5.5

Same-family n=100 provider replication.

LLM + FieldHash100/100
Prompt-smart memory30/100
Retrieval-only memory23/100
Recency-aware memory0/100
FieldHash stale mentions0

Claude Opus 4.7

Five misses were empty Claude responses, not stale substitutions.

LLM + FieldHash95/100
Prompt-smart memory17/100
Retrieval-only memory14/100
Recency-aware memory0/100
FieldHash stale mentions0
Promotion audit

Fact extraction was strong, but provider-dependent.

The n=100 scorer audits split current-token recovery from fact-span extraction. The Gemini audit reached 100/100 current-token recovery, 99/100 role-equivalent facts, and 68/100 exact source spans. A GPT-5.5 rerun on the same corpus reached 99/100, 95/100, and 76/100 respectively, with one false promotion. Strict source-span fidelity and provider-invariant fact extraction are not claimed as solved.

Gemini scorer audit

Current-token recovery

100/100

Role-equivalent current facts

99/100

Exact source-span match

68/100

GPT-5.5 rerun

Current-token recovery

99/100

Role-equivalent current facts

95/100

Exact source-span match

76/100

Dense-memory stress

Crowded memory can be compressed without dropping the current fact.

A no-model stress test duplicated optional stale and noise fragments from 500 base records to 2,156 crowded records. Protected hub compression reduced the candidate set back to 500 records while preserving 100/100 current-token recovery and 99/100 role-equivalent facts.

Crowded optional records

2,156 input records

After protected hub compression

500 records retained

Records removed

1,656 optional records

Mechanism

Governed continual learning, defined as a pre-answer control plane.

The benchmark is not claiming that the base model changed its weights. It shows a governed non-parametric learning loop: infer durable state, update authority metadata, filter stale context, then condition the next model call on the reviewed version.

Step 1

Unlabeled memory

A project has reviewed notes, superseded records, unreviewed handoffs, rejected alternatives, and near-duplicate neighboring projects.

Step 2

Promotion inference

The semantic promotion pass infers current, superseded, rejected, and ordinary state without pre-seeded canonical memory labels.

Step 3

Governed arbitration

Gnosis and the Bayesian gate decide which memories are allowed to influence the answer path before the base model responds.

Step 4

Clean answer path

The model sees the current operational fact and a smaller review surface, while stale context remains auditable outside the answer path.

What this supports

Governed continual learning can change future answers without changing the base model.

The system identified which memory was authoritative, kept stale context available for audit, and prevented it from shaping the next answer.

This is the useful version of learning for high-stakes workflows: not hidden weight updates, but visible promotion, supersession, scope, compression, and answer-path control.

Caveats

What this does not prove.

Internal synthetic adversarial benchmark, not external validation. The strongest independence check is the Claude-authored disjoint n=30 corpus; the n=100 provider replications use a same-family Gemini-authored corpus and frozen semantic-label artifact. A later same-budget Gemini two-pass smart diagnostic on the n=30 corpus selected the current record 30/30 and answered 28/30 with zero stale substitutions, so the public claim should not be framed as beating every same-budget selector. The benchmark measures governed answer-path control under singleton-current memory conflict, not broad reasoning superiority, universal memory safety, model-weight learning, provider-invariant fact extraction, or perfect source-span extraction. Claude Opus 4.7 n=100 misses were empty provider responses rather than stale substitutions.

The result is best read as an internal architecture benchmark for memory-state promotion under adversarial stale-context pressure. External validation on public knowledge-conflict or memory-update datasets remains the next credibility step.

Read the surrounding evidence.

Automatic promotion extends the memory-pressure result: first identify the authoritative memory, then govern what reaches the model.