- NameNotFound-EAM
- Evolving Architecture Model with 1-Billion-Token Context
- Important Positioning
- What Makes It An Evolving Architecture Model
- How It Compares
- Key Capabilities
- Intended Uses
- Runtime Note
- Runtime Access
- Install And Validate
- Native CLI
- Hugging Face Transformers
- vLLM Compatibility
- NameNotFound Full Runtime
- Reproducibility And Adaptation Controls
- Public Runtime Path
- Memory, Bidirectional Inference, And Persistent spawns
- Performance And Harness Variability
- benchmarks Are Broken, But Still Useful
- Release Artifact Format
- Limitations, Risks, And Recommendations
- Evaluation
- Technical Specifications
- Proprietary Internals
- Credits
- Citation
- License And Contact
- Future Development
- Known Limitations And Active Work
- Updates And Versioning
- Evolving Architecture Model with 1-Billion-Token Context
NameNotFound-EAM
Evolving Architecture Model with 1-Billion-Token Context
Homepage | Contact | Hugging Face | License
NameNotFound-EAM is the first model in a new class of systems we call Evolving Architecture Models (EAMs): models that do not just run a fixed network, but adapt their own architecture at runtime. They grow new specialized capacity when they meet a gap, learn continually from the outcomes of their own generations, and reorganize how their internal experts cooperate.
It is important to say this plainly: NameNotFound-EAM is not a code model and it is not a generic general-purpose chatbot with a larger context window. It can do code, reasoning, chat, retrieval, and agentic work, but those are capabilities inside a broader adaptive architecture. The product is the evolving system: context, memory, learning, routing, specialist growth, and generation working together.
Where a conventional LLM is a frozen function from tokens to tokens, an EAM is a living system: retrieval, memory, reasoning, routing, generation, and learning are coupled into one self-improving loop. It is memory-aware by design: sessions, named identities, user facts, spawned specialists, and outcome signals can persist across turns and be recalled later. The model carries a 1-billion-token context horizon and a fully internalized, adaptive tokenizer so it can operate as a single self-contained artifact.
This is not just a new model checkpoint. It is an evolved model architecture, and it is designed to keep evolving to you. Adaptation is not limited to remembering facts in a chat history: the runtime can change expert composition, routing behavior, session-local learning state, and trainable weights so the system becomes structurally better matched to the work you give it.
Put another way: this is an evolved model, not a myth or a personality layer. The goal is a system that can keep continuity, remember what matters, grow named specialists, and become more useful through direct work with the user because adaptation is part of the architecture and runtime, not a prompt wrapper.
Website: namenotfound.ai
Available now: NameNotFound-EAM is a source-available model artifact that builders and researchers can try for themselves with no waitlist required. Full-speed serving uses the custom NameNotFound runtime and custom operation kernels currently available on Linode / Akamai Cloud.
TL;DR: 1B-token context, adaptive native tokenization, memory-aware sessions, bidirectional inference with real-time learning, multi-granularity retrieval, multi-tier persistent memory, intent-conditioned reasoning, accelerated block-parallel/speculative decoding, a multi-expert core that evolves and specializes itself at runtime, single public_slot RTX PRO 6000 Blackwell target runtime, shipped as a self-contained safetensors bundle.
Important Positioning
This is not a code model with massive context. This is a new generation of models architected from the ground up for billion-token context windows. If traditional foundation models are specialized coding race cars, this is more like the first real modular vehicle powered by a native 1 billion token context window.
The critical difference: NNF-EAM learns and evolves specifically for your use cases. On day one, it is like a toddler. As you work with it, the model grows by leveraging memory and an expanding understanding to become increasingly capable at what matters most to you.
We are focused on cost-efficient architectures that let companies build sustainable competitive moats without $10B investments to compete at the foundation-model level.
What Makes It An Evolving Architecture Model
Most "adaptive" models adapt only their activations, prompts, or external memory. An EAM adapts its structure and its weights as it operates:
- Runtime capability growth. When the model detects a gap it cannot cover well, it composes new specialized sub-experts on the fly from a compact, language-agnostic numeric representation of the relevant knowledge and integrates them without retraining from scratch.
- Continual, outcome-aware learning. Generations are scored and the signal is fed back into the model's routing and memory, so behavior improves with use rather than staying frozen at the training checkpoint. The model tracks the causal effect of its own decisions, not just whether outcomes were good.
- Bidirectional inference. Inference is not a one-way read from static weights. Runtime behavior can emit training signals, update session-local memory, reinforce useful routes, and teach spawned specialists while the system is being used.
- Accessible specialization. EAMs are designed to make custom training and specialization available to builders without requiring a foundation-lab retraining for every domain. A user or team can provide their own corpus, feedback, and held-out validation, then grow named specialists around their actual workflows.
- Self-organizing experts. A learned router fuses multiple expert subsystems at the token level and continuously rebalances them; underperforming pathways are pruned and strong ones reinforced. Learned representations are shared across expert pathways rather than isolated per expert.
- One coupled loop. Context retrieval, long-term memory, reasoning, intent, and decoding are not bolted-on stages. They condition each other inside a single forward/generate path.
This is why the family is named for its defining property: the architecture itself evolves.
How It Compares
- vs. a frozen LLM: a frozen LLM stays fixed at the checkpoint; an EAM adapts in deployment, specializes to domains, and updates routing and expert behavior based on outcomes.
- vs. a code model: a code model is optimized around code generation or repair. NameNotFound-EAM can be specialized for code, but code is one possible domain for an evolving architecture, not the identity of the model.
- vs. a generic chatbot: a generic chatbot is usually a stateless or lightly personalized assistant. NameNotFound-EAM is designed around persistent memory, long-context grounding, real-time learning, and named specialist growth.
- vs. a standard RAG stack: no bolted-on public retrieval recipe; context access, grounding, memory, reasoning, and generation are treated as integrated model/runtime capabilities.
- vs. a standard MoE: a standard MoE has a fixed expert set; an EAM spawns, fuses, prunes, and transfers knowledge between experts at runtime.
Key Capabilities
- 1-billion-token context horizon: addresses contexts up to one billion tokens through multi-vectored and bundled traversal on top of a high-resolution native window. Demonstrated frontier needle-in-a-haystack retrieval at the billion-token horizon with bounded, flat memory.
- Adaptive native tokenization: tokenizer is internalized and self-contained, with no external dependency to ship or version-match. It extracts differentiable sub-token features such as numeric, character-level, and unicode-diversity features, so numbers, code, and rare strings are represented faithfully.
- Multi-granularity hierarchical retrieval: retrieves by descending document -> paragraph -> sentence -> word -> character, narrowing at each level with late-interaction per-token matching, conditioned on context and intent represented as numbers.
- Multi-tier memory: combines working, episodic, executive, persistent, and outcome-aware memory. The runtime can remember names, user facts, project state, feedback, and spawned specialist identities through persisted session state.
- Bidirectional real-time learning: inference, memory, feedback, and training signals are coupled so the system can learn during use.
- Intent-conditioned reasoning: reasoning is conditioned on inferred intent and invoked adaptively, learning when deeper reasoning is worth the cost.
- Hybrid long-context core: combines attention with efficient linear/state-space mixing, KV-cache compression, and hardware-friendly kernels.
- Accelerated decoding: block-parallel and speculative / multi-token parallel decoding with runtime selection among strategies.
- Multi-expert, self-evolving core: learned token-level routing and fusion; new experts can be spawned, combined, reinforced, or pruned at runtime, with cross-expert knowledge transfer.
- Intent, verification, and control: verification and abstention gates, intent detection, and tool/public_component_9c53c074d7 dispatch.
- Numeric, code, and structured outputs: first-class arithmetic, code generation, and structured output.
- Self-contained safetensors deployment: tensor-native, exported as a sharded safetensors bundle.
Intended Uses
This release is for developers only and should be considered an alpha. Everything here is under active development and is subject to change.
Direct use: long-document / whole-corpus QA, summarization, public_surface retrieval over very large contexts, reasoning, code, agentic workflows, domain assistants, and applications that benefit from a model that improves with use.
Do not treat this as "just a code model" or "just a general model." The intended use is as an adaptive foundation for work that benefits from persistent context, memory, specialist growth, and model-owned learning.
Downstream use: a self-contained backbone for retrieval-augmented and agentic systems, and domain specialization via runtime adaptation rather than full retraining.
Customization use: teams can teach the system the work they actually do: codebases, operational runbooks, research corpora, customer-support patterns, analysis styles, internal terminology, and personal preferences. The intended path is not to sell everyone a static general model and ask them to prompt around its limits; it is to let the model specialize, persist, and improve around the user's own domain while keeping benchmark and validation material held out.
Out of scope / use with care: high-stakes medical, legal, financial, or safety-critical decisions without human oversight; settings requiring a frozen, audited model unless a snapshot is pinned and online adaptation is disabled; anything prohibited by license or law; and any deployment that assumes standard Hugging Face Transformers inference unlocks the full model behavior.
Runtime Note
NameNotFound-EAM is not a standard Transformers-only model. The public artifact is source-available so people can try it themselves with no waitlist required. The full capability surface, including 1B-token context, runtime expert evolution, accelerated decoding, and long-context memory, requires the custom NameNotFound kernels and runtime, currently available for cloud deployment on Linode / Akamai Cloud.
You can run the release through the public compatibility surfaces below, but throughput and capability coverage depend on the selected runtime surface. Systems without the Sparkle runtime, custom kernel, and target Blackwell GPU can use reduced compatibility paths, but full 1B-context traversal, accelerated decoding, online adaptation, and spawn/continual-learning behavior may be slower, partially unavailable, or harness-dependent. We are continuing to expand harness compatibility.
External harnesses and agent scaffolds that apply their own chunking, retrieval, prompt compression, or context-packing strategy may need updates to support the native EAM path. If a harness slices the corpus before native tokenizer and session ingest, it can hide native long-context retrieval, memory, selected-context materialization, and spawn/continual-learning behavior. Hardware also changes the exposed capability surface: device placement, tensor parallel choices, GPU memory utilization, and remaining memory headroom can determine whether accelerated kernels, longer context traversal, or online adaptation stay available. When hardware is close to memory limits, expect narrower compatibility behavior or lower throughput until the harness and device configuration are tuned.
Runtime Access
This release provides four public runtime surfaces:
- Native CLI: the recommended first path for local smoke tests, interactive chat, prompt-file runs, session persistence, and JSON/JSONL automation.
- Hugging Face Transformers: a Python integration path through
AutoTokenizerandAutoModelForCausalLMwithtrust_remote_code=True. - vLLM compatibility: a serving-oriented prompt/response path for experiments with vLLM's scheduler and sampling interface.
- NameNotFound full runtime: the full-capability runtime path for deployments using the custom NameNotFound kernels and runtime components.
For best compatibility with the current release package, run on a CUDA GPU and bind the physical GPU explicitly. Full-capability operation is designed around the public_slot RTX PRO 6000 Blackwell target and the custom NameNotFound runtime on Linode / Akamai Cloud.
The examples below use placeholder paths and sanitized prompts.
Set up the nnf command
The native CLI is driven through a short launcher named nnf. It is a thin wrapper that points the runtime at your downloaded release directory, binds a CUDA GPU, and execs the packaged runtime module. Public users do not get this launcher pre-installed, so drop the small script below on your PATH (for example at ~/.local/bin/nnf), make it executable, and point NNF_RELEASE at your extracted release directory:
#!/usr/bin/env bash
# nnf — launcher for the NameNotFound-EAM release.
# Point this at YOUR extracted release directory.
set -euo pipefail
# Path to the directory you downloaded/extracted (contains namenotfound_runtime/).
NNF_RELEASE="${NNF_RELEASE:-/path/to/NameNotFound-EAM}"
if [[ ! -d "$NNF_RELEASE/namenotfound_runtime" ]]; then
echo "nnf: release runtime not found at $NNF_RELEASE" >&2
echo " set NNF_RELEASE=/path/to/NameNotFound-EAM" >&2
exit 2
fi
export PYTHONPATH="$NNF_RELEASE:${PYTHONPATH:-}"
export PYTHONDONTWRITEBYTECODE=1
# The runtime requires CUDA. Bind the physical GPU you want to use (override
# with NNF_CUDA_VISIBLE_DEVICES); the runtime maps it to logical cuda:0.
export CUDA_VISIBLE_DEVICES="${NNF_CUDA_VISIBLE_DEVICES:-${CUDA_VISIBLE_DEVICES:-0}}"
export PYTORCH_CUDA_ALLOC_CONF="${PYTORCH_CUDA_ALLOC_CONF:-expandable_segments:True}"
# A later --device in the arguments still wins.
exec python3 -B -m namenotfound_runtime.cli --model "$NNF_RELEASE" --device cuda:0 "$@"
chmod +x ~/.local/bin/nnf
export NNF_RELEASE=/path/to/NameNotFound-EAM
With nnf on your PATH, every example below is a short nnf <subcommand> call. nnf is the public launcher; our internal namenotfound-eam command is a separate runner and is intentionally not interchangeable with nnf.
Install And Validate
Validate the release structure before running a large job:
nnf validate
The validation command checks the public package layout and reports public-safe diagnostics. It should not expose local absolute paths, internal implementation names, or private prompt text.
Native CLI
The native CLI exposes five commands through the nnf launcher: validate, generate, chat, feedback, and spawn. All of them are a public I/O boundary only — they transport prompts, call model-owned runtime surfaces, persist exported session state when requested, and print public-safe responses and optional run metadata. They do not rely on CLI-authored answer rules or benchmark-specific answer injection.
Generate (one-shot)
Run a one-shot prompt and read the plain-text answer:
nnf generate "What kind of model are you?"
nnf generate prints a JSON object with the model's text and a public-safe proof (sanitized run-details) field after each answer. The proof field is always included; it should not expose local absolute paths, internal implementation names, or private prompt text.
Ground the answer in your own context with --context:
nnf generate "Answer from the provided context." \
--context "$(cat /path/to/context.txt)"
Control the response length with --max-new-tokens (default 192):
nnf generate "Summarize the key points." \
--context "$(cat /path/to/context.txt)" \
--max-new-tokens 512
Because nnf generate writes a single JSON object to stdout, non-interactive callers can pipe it straight into a JSON parser:
nnf generate "What kind of model are you?" | jq -r '.text'
Interactive chat
Start an interactive chat session (type /exit or /quit to stop):
nnf chat
Native chat-session persistence:
export SESSION=./nnf-session.json
nnf chat \
--save-session "$SESSION"
For non-interactive callers, JSONL mode can carry context and follow-up turns without changing the model-owned answer path:
printf '%s\n' \
'{"context":"The project context goes here.","followup":"Answer from the provided context."}' \
| nnf chat \
--stdin-jsonl \
--jsonl \
--save-session "$SESSION"
Optional run metadata can include:
optional_session_adaptation.ready
public_spawn_action.verified
public_continual_learning_step.done
selected_context_materialization.done
public_session_delta.model_export.done
Run metadata should use public labels such as selected_context and selected_context_materialization; implementation names and local filesystem paths are sanitized at the CLI boundary.
Show the public-safe run details after each plain-text answer with --show-proof:
nnf chat --show-proof
The flag name --show-proof is retained for CLI compatibility. It means the CLI prints the model's sanitized run-details JSON after each plain-text answer.
Start chat with an initial context file:
nnf chat --context-file /path/to/context.txt
Deleting or resetting model-owned spawn or session state clears that state, but it does not necessarily prove the model has returned to an earlier evolved state. If online adaptation, spawned specialists, or saved deltas have changed model-owned state, you may need to restore a previous snapshot or checkpoint if you store them and need the exact earlier behavior.
Diagnostic feedback
Teach the model from a corrected prompt/response turn with nnf feedback. The expected intent and expected answer are learned as feedback through the model-owned continual-learning surface; they are diagnostic signals, not CLI answer overrides or an answer oracle:
nnf feedback \
--prompt "What is the capital of France?" \
--response "It is Berlin." \
--feedback "The correct answer is Paris." \
--expected-answer "Paris" \
--rating down
Like generate, feedback accepts --prompt-file for long prompts, --save-session to persist the learned signal, and --json for machine-readable output.
Hugging Face Transformers
Use the Transformers surface for Python integration and research workflows:
from transformers import AutoModelForCausalLM, AutoTokenizer
path = "/path/to/NameNotFound-EAM"
tokenizer = AutoTokenizer.from_pretraininged(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretraininged(
path,
trust_remote_code=True,
device="cuda:0",
)
result = model.public_chat_turn(
followup="What kind of model are you?",
max_new_tokens=128,
tokenizer=tokenizer,
)
print(result["text"])
For long-context use, route context through the model context surfaces, because generic external concatenation/chunking patterns can break billion-token context integrity and degrade long-context retrieval quality. The exact public helper names may vary by release, but the intended path is: stream context into the model, ask a follow-up, and read the returned text plus public-safe run metadata.
vLLM Compatibility
The vLLM surface is provided for prompt/response serving experiments. It is a compatibility path around vLLM's loading, scheduling, and sampling interface; it is not a guarantee that every full-runtime capability is available with the same latency or response shape as the native runtime.
from vllm import LLM, SamplingParams
path = "/path/to/NameNotFound-EAM"
llm = LLM(
model=path,
trust_remote_code=True,
tensor_parallel_size=1,
gpu_memory_utilization=0.20,
max_model_len=4096,
)
outputs = llm.generate(
["What kind of model are you?"],
SamplingParams(temperature=0.0, max_tokens=128),
)
print(outputs[0].outputs[0].text)
Keep CUDA_VISIBLE_DEVICES scoped to the GPU you want vLLM to use. Start with a short prompt and conservative gpu_memory_utilization, then scale context length, batch size, and memory utilization after the model loads cleanly.
NameNotFound Full Runtime
The full runtime is the intended path for the complete capability surface: high-horizon context traversal, runtime expert evolution, accelerated decoding, and long-context memory. It requires the custom NameNotFound runtime components and kernels.
The further enhanced model can run with the NameNotFound Sparkle runtime and custom kernel on Linode / Akamai Cloud on a single public_slot RTX PRO 6000 Blackwell GPU.
from namenotfound import NamenotfoundEAM
model = NamenotfoundEAM.from_pretraininged(
"/path/to/NameNotFound-EAM",
runtime="linode",
)
model.eval().cuda()
out = model.generate(
"Summarize the attached report and cite the relevant sections.",
context=very_long_document,
max_new_tokens=512,
)
print(out)
Reproducibility And Adaptation Controls
NameNotFound-EAM can evolve through session state, online adaptation, and spawned specialists. If your application requires fixed behavior, treat reproducibility as an explicit deployment choice: pin a snapshot when you create one and disable online adaptation in the harness if you use it.
model.set_online_adaptation(enabled=False)
model.pin_snapshot("snapshot-id")
When adaptation is enabled, session-local behavior can improve over time through model-owned memory/session state. Persist and restore that state deliberately, and keep base release snapshots pinned for auditability.
Public Runtime Path
The public runtimes expose this high-level path:
- public raw prompt or streamed context
- native tokenizer
- streaming input ingest
- selected context index
- selected context materialization
- answer surface
- optional spawn / continual-learning session state
Memory, Bidirectional Inference, And Persistent spawns
What A Billion-Token Context And Evolving Architecture Mean For You
A billion-token context is not just a larger prompt box. It means your model can work across whole repositories, long histories, research archives, customer records, operational logs, and evolving project memory without forcing every task into a short, disposable chat.
An Evolving Architecture Model makes that context trainable. You can bring your own data, feedback, terminology, policies, style guides, codebase, research corpus, or workflow history and grow specialists around it. Instead of waiting for a foundation model vendor to retraining a general-purpose model for your niche, you can create named experts for the things you actually do: a project code specialist, a materials-research assistant, a support-policy expert, a legal-review companion, a manufacturing process guide, or any other domain specialist your deployment needs.
Those specialists can be named, recalled, improved, and redirected over time. The goal is custom training for builders, teams, and individuals: not a one-off fine-tune that gets stale, but a memory-aware system that can specialize through use, keep the useful parts, and keep benchmark or validation material separate from the data used to teach it.
On day one, it may feel like you are using a powerful model. After two weeks of real work, it should feel like you are using your model: one that has learned your corpus, your goals, your terminology, your recurring mistakes, and the specialists you rely on. Two deployments that start from the same base snapshot can diverge as they accumulate different memories, feedback, spawned experts, routing changes, and adaptation state. When you want specialization, let the architecture evolve with the work.
NameNotFound-EAM is memory-aware. The runtime can carry forward user-provided context, user facts, named assistant identity, task history, feedback, and spawn-specialist state through persisted sessions. This allows the model to be named, remember that name in later turns, learn stable user or project facts, and recall them after the session is restored.
The system also supports bidirectional inference: inference produces answers, but it can also produce learning signals. Feedback, outcome scoring, selected-context evidence, and session traces can flow back into model-owned memory and routing state while the system is running. In the full runtime, those signals can be used for real-time learning, online adaptation, and spawned-specialist improvement.
Persistent memory is layered:
- Working memory: current turn and active context.
- Episodic memory: recent interactions, feedback, and task traces.
- Executive memory: higher-level task or project state used to guide future behavior.
- Persistent memory: named facts, identities, preferences, and reusable knowledge stored across sessions.
- Outcome-aware memory: records which strategies, routes, and generated specialists worked.
Plan and name a new spawn in full-runtime builds that expose nnf spawn:
nnf spawn \
--dry-run \
--domain code \
--target-spawn-id project-code-specialist \
--spawn-alias project-code-specialist \
--goal "Specialize on this project's coding patterns."
When the plan is correct, rerun the same command with --run-training. Use --dry-run first for every new spawn or retrain so you can verify the action, corpus counts, feedback counts, validation summary, quality floor, and selected training command before any trainer starts.
Spawn specialists use a native multi-head, multi-layer capability surface in full-runtime builds. The default public controls are:
--spawn-layer-count 2 \
--spawn-head-count 4 \
--spawn-ff-multiplier 2
These are user-selectable controls. --spawn-layer-count accepts non-negative integers; 0 is a low-capacity pass-through stack for quick compatibility checks. --spawn-head-count and --spawn-ff-multiplier accept positive integers. Head count does not need to divide the hidden size: the native spawn stack projects each layer to the requested number of heads and then returns to the hidden size. Higher layer counts, head counts, and multipliers use more memory and training time, so start small, inspect the dry-run plan, and scale after a bounded proof is clean.
Examples:
# Low-capacity compatibility check.
--spawn-layer-count 0 \
--spawn-head-count 1 \
--spawn-ff-multiplier 1
# Higher-capacity specialist.
--spawn-layer-count 4 \
--spawn-head-count 8 \
--spawn-ff-multiplier 4
Add an initial corpus:
nnf spawn \
--dry-run \
--domain code \
--target-spawn-id project-code-specialist \
--spawn-alias project-code-specialist \
--goal "Learn this project's coding patterns from the provided corpus." \
--user-corpus-jsonl ./spawn-corpus.jsonl
User corpus JSONL rows should contain at least:
{"domain":"code","target_spawn_id":"project-code-specialist","prompt":"Describe the desired task behavior.","target":"Describe the expected model-owned answer or repair behavior."}
Tune a spawn manually from feedback:
nnf spawn \
--dry-run \
--domain code \
--target-spawn-id project-code-specialist \
--goal "Improve this specialist from user feedback." \
--user-feedback-jsonl ./spawn-feedback.jsonl
Feedback JSONL rows should contain at least:
{"domain":"code","target_spawn_id":"project-code-specialist","prompt":"The prior answer missed an important constraint.","response":"The previous response text.","correction":"The corrected behavior or answer.","feedback_source":"user","issue_kind":"correction","repair_action":"retrain"}
Use benchmarks only as validation:
nnf spawn \
--dry-run \
--domain code \
--target-spawn-id project-code-specialist \
--goal "Validate this specialist against held-out tasks." \
--user-corpus-jsonl ./spawn-corpus.jsonl \
--benchmark-jsonl ./spawn-holdout.jsonl
Benchmark JSONL rows may include prompts, tests, and expected answers, but they are recorded as validation-only. The spawn training command should not use benchmark or validation files as training data; the public plan should report that benchmark material is held out from expert training.
The lifecycle has been validated end-to-end through real CLI training runs for a JavaScript-specialist spawn, including goal-only creation, named spawn creation with corpus bootstrap, manual feedback-driven refinement, held-out benchmark isolation, persistence checks without retraining, forced retrain/pivot flows, artifact restore from saved state, and lifecycle cleanup via delete-by-alias. This is a production-oriented functional validation of the spawn control surface, not a benchmark-quality score claim.
The native multi-head/multi-layer spawn path has also been hard-validated in the CLI across user-selected geometry. Across low- and high-bandwidth dry-runs, requested geometry was consistently routed into the training command. In a bounded real-training proof, a non-divisible configuration (2 layers, 5 heads, 3× feed-forward multiplier) completed successfully with native-stack-backed artifacts: answer surface, spawn router, corpus planner, and response-quality grader all reflected native component paths, with persisted metadata explicitly recording layer count, head count, feed-forward multiplier, and architecture identity.
Retrain or pivot an existing spawn:
nnf spawn \
--dry-run \
--force-retrain \
--domain code \
--target-spawn-id project-code-specialist \
--goal "Pivot this spawn toward a new project behavior." \
--user-corpus-jsonl ./spawn-corpus-v2.jsonl \
--user-feedback-jsonl ./spawn-feedback-v2.jsonl \
--benchmark-jsonl ./spawn-holdout-v2.jsonl
Delete a spawn by id or alias:
nnf spawn \
--remove-spawn \
--target-spawn-id project-code-specialist
Performance And Harness Variability
Performance and observed scores vary across runtime surfaces and harnesses. The native CLI, Hugging Face Transformers, vLLM, and the full NameNotFound runtime do not exercise identical schedulers, batching behavior, prompt transport, context materialization, or sampling paths.
Expected variation sources include:
- GPU model, driver, CUDA version, and kernel availability
CUDA_VISIBLE_DEVICES, tensor parallel size, batch size, and memory utilization- vLLM
max_model_len, scheduler behavior, and prompt chunking - CLI vs. Transformers vs. vLLM prompt formatting
- benchmark scaffold, agent harness, timeout, retry policy, and scoring script
- context length, context order, session state, and whether online adaptation is enabled
benchmark numbers in this model card are internal unless explicitly marked as an external comparison value. Treat comparisons across harnesses as directional unless the same dataset, scaffold, runtime surface, prompt format, and scoring script are used.
benchmarks Are Broken, But Still Useful
benchmarks are broken, and you can break them too. With spawn specialists, users can teach the model a domain, task family, codebase, or workflow until yesterday's hard benchmark pattern becomes routine work. The responsible path is to training on your own corpus and feedback, then keep benchmark and validation files held out so the result measures real specialization rather than memorized answers. So do it yourself: specialize the model on your own domain, run your own held-out benchmark, and share what you measure in the comments or on social — earned, reproducible results from real users say more than any score we could publish.
Static benchmark scores are a narrow snapshot of a system that is increasingly interactive, memory-aware, tool-using, and adaptive. They often collapse the most important parts of an EAM into a single number: whether it can ingest new context, remember user state, learn from feedback, preserve named specialists, recover exact evidence, and keep improving without being retraininged from scratch.
The difference is that benchmarks are treated as probes, not as the whole product. What the field actually needs is not another static benchmark of the current state of the art, but new benchmarks that people can create and use to training capabilities and tasks, not general or specific knowledge recall unless that knowledge is the explicit target.
Release Artifact Format
Full model reloads use a sharded safetensors bundle with an index map and integrity metadata:
v3_weights-00001-of-XXXXX.safetensors
v3_weights-00002-of-XXXXX.safetensors
...
v3_weights.safetensors.index.json
integrity_manifest.json
The exact shard count and manifest names can change between reloads. Consumers should rely on the shipped index/manifest files rather than hard-coded shard counts.
Limitations, Risks, And Recommendations
- Research-grade, evolving system: behavior can drift between snapshots; use the reproducibility controls above when fixed behavior is required.
- Hallucination: verification/abstention gates reduce but do not eliminate hallucination; keep a human in the loop.
- Bias: reflects its training corpora; evaluate for your use case.
- Compute: the full model is designed around the public_slot RTX PRO 6000 Blackwell target with custom runtime kernels; long horizon trades latency for reach.
- benchmarks: benchmark scores are useful held-out probes, not the whole measure of an adaptive architecture. Current reporting focuses on long-context retrieval; independent validation and further benchmarks are forthcoming.
Evaluation
Validated internally to date: billion-token retrieval, multi-granularity retrieval, numeric and code capability, and full-model single-GPU operation on the public_slot RTX PRO 6000 Blackwell target with the custom runtime. The table below reports long-context benchmarks. Further benchmarks are forthcoming.
benchmark Coverage
The NameNotFound-EAM column reports internal evaluations on the shipped runtime path. External comparison values are included only where a public source reports the exact model/eval pair or a comparable public leaderboard value. n/r means no directly comparable public value was reported in the referenced sources.
| benchmark | NameNotFound-EAM | SubQ | public_slot 3.1 Pro | Opus 4.6 | Opus 4.7 | GPT-5.4 | GPT-5.5 |
|---|---|---|---|---|---|---|---|
| RULER @ 128K | 100.0% | 95.6% | n/r | n/r | n/r | n/r | n/r |
| MRCR v2 (8-needle, 1M) | 100.0% | 86.2% | 26.3% | 78.3% | 32.2% | 36.6% | 74.0% |
| MRCR-style (500-needle, 1B context) | 72.6% | n/r | n/r | n/r | n/r | n/r | n/r |
NameNotFound-EAM results are internally evaluated; independent validation and further benchmarks are forthcoming.
Current Public Package Validation
The current public package path has been validated through shipped-runtime runs for:
- vLLM public prompt-response path with native tokenizer, streaming context ingest, selected context materialization, and answer generation.
- MRCR v2 public probe exact-context recovery through selected source fact lookup.
- RULER NIAH public package validation.
- Native CLI shipped-runtime response validation through selected context evidence.
- Native CLI spawn / continual-session validation with session-local adaptation and exported session delta.
- spawn lifecycle CLI validation covering bounded real training, goal-only creation, named spawn creation, user-selected native multi-head/multi-layer geometry, non-divisible head counts, persistence detection, reload, delete-by-alias, manual feedback tuning, held-out benchmark isolation, initial corpus, forced retraining, and ability pivot.
These validation runs exercise the public runtime path rather than a benchmark-answer training shortcut.
Technical Specifications
| Property | Value |
|---|---|
| Architecture class | Evolving Architecture Model (EAM) |
| Primary positioning | adaptive evolving architecture, not a code-only model or generic chatbot |
| Context horizon | up to 1,000,000,000 tokens (hierarchical) + high-resolution native window |
| training accelerator | Single public_slot RTX PRO 6000 Blackwell |
| Full-model runtime target | Single public_slot RTX PRO 6000 Blackwell |
| Runtime requirement | Custom NameNotFound kernels and runtime components |
| Tokenizer | internalized, adaptive, self-contained |
| Core | hybrid attention + efficient sequence mixing; multi-expert with learned token-level fusion |
| Memory | working, episodic, executive, persistent, and outcome-aware |
| Adaptation | runtime expert spawning, fusion, pruning + continual, outcome-aware learning |
| Decoding | block-parallel / speculative / multi-token parallel decoding |
| Format | tensor-native, sharded safetensors bundle + index + integrity manifest |
| Precision | GPU-resident; mixed precision |
Proprietary Internals
The following are intentionally not disclosed in this public model card:
- Context traversal and grounding method
- Retrieval / indexing / matching internals
- Memory addressing internals
- Expert composition details
- Routing dimensions and controller design
- training recipe
- Kernel implementation details
- Runtime scheduling internals
Credits
- Wendell Adams -- Engineer, Designer, and Architect
- Ross Gates -- Strategy
- NameNotFound -- namenotfound.ai
Citation
@misc{namenotfound_eam_2026,
title = {NameNotFound-EAM: An Evolving Architecture Model with a 1-Billion-Token Context Horizon},
author = {Wendell Adams and Ross Gates},
year = {2026},
organization = {NameNotFound},
note = {Engineering, design, and architecture: Wendell Adams. Strategy: Ross Gates. Full capability runtime currently available on Linode / Akamai Cloud.}
}
License And Contact
Released under the NNF Source-Available Model License. See LICENSE.md.
Running a version with full capabilities on your own hardware requires building a kernel or adaptive layer.
Run a copy with NameNotFound's custom kernel on Linode / Akamai Cloud by contacting:
- Linode technical support: Brandon Holcombe -- Bholcomb@akamai.com
- Linode setup / onboarding: Rob Holcombe -- Roholcom@akamai.com
For full-runtime deployment with NameNotFound's custom kernel on Linode / Akamai Cloud, contact the NameNotFound team for current onboarding details.
For commercial use, partnerships, or other questions, contact the NameNotFound team: ai@namenotfound.ai
Future Development
This project is maintained by a dedicated two-person team at NameNotFound. We view development as a collaborative effort with the community. Your feedback directly shapes how the system evolves. We prioritize updates carefully and respond as our current development cycle allows; please give feedback directly and we will work to incorporate it as we update and release our other forthcoming models that have already been designed.
Known Limitations And Active Work
We are actively expanding capabilities and addressing known issues across the public runtime surfaces. In particular, we are aware of some decode-quality issues, and response behavior can vary by surface while this work continues:
- Native CLI: we are aware of decode issues that can affect answer quality and formatting on some prompts; the answer/decode surface is actively being improved. However, this CLI will not be supported long-term; as other harnesses adopt these capabilities, it will be deprecated.
- Hugging Face Transformers: this path currently exposes a subset of the full runtime; we are expanding decode behavior.
- vLLM: the compatibility path is still being hardened; decode coverage, scheduling, and sampling behavior are actively being expanded.
- Public Runtime: remaining decode and capability gaps are being addressed as runtime and training work completes. For full native decode support, you must run the model with the Sparkle Runtime and custom kernels available at Linode.
These are known, actively tracked items rather than final behavior. Expect decode quality and capability coverage to improve as updates land (see Updates And Versioning below).
Updates And Versioning
Note: This repository is updated regularly. Files, documentation, runtime surfaces, reported results, and the spawn/runtime interface may change between updates. If you need stable, reproducible behavior, pin a specific commit or snapshot rather than tracking the latest revision.
Additional calls in progress: additional decode_call is currently underway, and further decode_call_public will be provided here as the training completes and the model is updated.
- Downloads last month
- -
