You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

NameNotFound-EAM

Name Not Found organization header

Evolving Architecture Model with 1-Billion-Token Context

Homepage | Contact | Hugging Face | License

NameNotFound-EAM is the first model in a new class of systems we call Evolving Architecture Models (EAMs): models that do not just run a fixed network, but adapt their own architecture at runtime. They grow new specialized capacity when they meet a gap, learn continually from the outcomes of their own generations, and reorganize how their internal experts cooperate.

It is important to say this plainly: NameNotFound-EAM is not a code model and it is not a generic general-purpose chatbot with a larger context window. It can do code, reasoning, chat, retrieval, and agentic work, but those are capabilities inside a broader adaptive architecture. The product is the evolving system: context, memory, learning, routing, specialist growth, and generation working together.

Where a conventional LLM is a frozen function from tokens to tokens, an EAM is a living system: retrieval, memory, reasoning, routing, generation, and learning are coupled into one self-improving loop. It is memory-aware by design: sessions, named identities, user facts, spawned specialists, and outcome signals can persist across turns and be recalled later. The model carries a 1-billion-token context horizon and a fully internalized, adaptive tokenizer so it can operate as a single self-contained artifact.

This is not just a new model checkpoint. It is an evolved model architecture, and it is designed to keep evolving to you. Adaptation is not limited to remembering facts in a chat history: the runtime can change expert composition, routing behavior, session-local learning state, and trainable weights so the system becomes structurally better matched to the work you give it.

Put another way: this is an evolved model, not a myth or a personality layer. The goal is a system that can keep continuity, remember what matters, grow named specialists, and become more useful through direct work with the user because adaptation is part of the architecture and runtime, not a prompt wrapper.

Website: namenotfound.ai

Available now: NameNotFound-EAM is a source-available model artifact that builders and researchers can try for themselves with no waitlist required. Full-speed serving uses the custom NameNotFound runtime and custom operation kernels currently available on Linode / Akamai Cloud.

TL;DR: 1B-token context, adaptive native tokenization, memory-aware sessions, bidirectional inference with real-time learning, multi-granularity retrieval, multi-tier persistent memory, intent-conditioned reasoning, accelerated block-parallel/speculative decoding, a multi-expert core that evolves and specializes itself at runtime, single public_slot RTX PRO 6000 Blackwell target runtime, shipped as a self-contained safetensors bundle.

Important Positioning

This is not a code model with massive context. This is a new generation of models architected from the ground up for billion-token context windows. If traditional foundation models are specialized coding race cars, this is more like the first real modular vehicle powered by a native 1 billion token context window.

The critical difference: NNF-EAM learns and evolves specifically for your use cases. On day one, it is like a toddler. As you work with it, the model grows by leveraging memory and an expanding understanding to become increasingly capable at what matters most to you.

We are focused on cost-efficient architectures that let companies build sustainable competitive moats without $10B investments to compete at the foundation-model level.

What Makes It An Evolving Architecture Model

Most "adaptive" models adapt only their activations, prompts, or external memory. An EAM adapts its structure and its weights as it operates:

  • Runtime capability growth. When the model detects a gap it cannot cover well, it composes new specialized sub-experts on the fly from a compact, language-agnostic numeric representation of the relevant knowledge and integrates them without retraining from scratch.
  • Continual, outcome-aware learning. Generations are scored and the signal is fed back into the model's routing and memory, so behavior improves with use rather than staying frozen at the training checkpoint. The model tracks the causal effect of its own decisions, not just whether outcomes were good.
  • Bidirectional inference. Inference is not a one-way read from static weights. Runtime behavior can emit training signals, update session-local memory, reinforce useful routes, and teach spawned specialists while the system is being used.
  • Accessible specialization. EAMs are designed to make custom training and specialization available to builders without requiring a foundation-lab retraining for every domain. A user or team can provide their own corpus, feedback, and held-out validation, then grow named specialists around their actual workflows.
  • Self-organizing experts. A learned router fuses multiple expert subsystems at the token level and continuously rebalances them; underperforming pathways are pruned and strong ones reinforced. Learned representations are shared across expert pathways rather than isolated per expert.
  • One coupled loop. Context retrieval, long-term memory, reasoning, intent, and decoding are not bolted-on stages. They condition each other inside a single forward/generate path.

This is why the family is named for its defining property: the architecture itself evolves.

How It Compares

  • vs. a frozen LLM: a frozen LLM stays fixed at the checkpoint; an EAM adapts in deployment, specializes to domains, and updates routing and expert behavior based on outcomes.
  • vs. a code model: a code model is optimized around code generation or repair. NameNotFound-EAM can be specialized for code, but code is one possible domain for an evolving architecture, not the identity of the model.
  • vs. a generic chatbot: a generic chatbot is usually a stateless or lightly personalized assistant. NameNotFound-EAM is designed around persistent memory, long-context grounding, real-time learning, and named specialist growth.
  • vs. a standard RAG stack: no bolted-on public retrieval recipe; context access, grounding, memory, reasoning, and generation are treated as integrated model/runtime capabilities.
  • vs. a standard MoE: a standard MoE has a fixed expert set; an EAM spawns, fuses, prunes, and transfers knowledge between experts at runtime.

Key Capabilities

  • 1-billion-token context horizon: addresses contexts up to one billion tokens through multi-vectored and bundled traversal on top of a high-resolution native window. Demonstrated frontier needle-in-a-haystack retrieval at the billion-token horizon with bounded, flat memory.
  • Adaptive native tokenization: tokenizer is internalized and self-contained, with no external dependency to ship or version-match. It extracts differentiable sub-token features such as numeric, character-level, and unicode-diversity features, so numbers, code, and rare strings are represented faithfully.
  • Multi-granularity hierarchical retrieval: retrieves by descending document -> paragraph -> sentence -> word -> character, narrowing at each level with late-interaction per-token matching, conditioned on context and intent represented as numbers.
  • Multi-tier memory: combines working, episodic, executive, persistent, and outcome-aware memory. The runtime can remember names, user facts, project state, feedback, and spawned specialist identities through persisted session state.
  • Bidirectional real-time learning: inference, memory, feedback, and training signals are coupled so the system can learn during use.
  • Intent-conditioned reasoning: reasoning is conditioned on inferred intent and invoked adaptively, learning when deeper reasoning is worth the cost.
  • Hybrid long-context core: combines attention with efficient linear/state-space mixing, KV-cache compression, and hardware-friendly kernels.
  • Accelerated decoding: block-parallel and speculative / multi-token parallel decoding with runtime selection among strategies.
  • Multi-expert, self-evolving core: learned token-level routing and fusion; new experts can be spawned, combined, reinforced, or pruned at runtime, with cross-expert knowledge transfer.
  • Intent, verification, and control: verification and abstention gates, intent detection, and tool/public_component_9c53c074d7 dispatch.
  • Numeric, code, and structured outputs: first-class arithmetic, code generation, and structured output.
  • Self-contained safetensors deployment: tensor-native, exported as a sharded safetensors bundle.

Intended Uses

This release is for developers only and should be considered an alpha. Everything here is under active development and is subject to change.

Direct use: long-document / whole-corpus QA, summarization, public_surface retrieval over very large contexts, reasoning, code, agentic workflows, domain assistants, and applications that benefit from a model that improves with use.

Do not treat this as "just a code model" or "just a general model." The intended use is as an adaptive foundation for work that benefits from persistent context, memory, specialist growth, and model-owned learning.

Downstream use: a self-contained backbone for retrieval-augmented and agentic systems, and domain specialization via runtime adaptation rather than full retraining.

Customization use: teams can teach the system the work they actually do: codebases, operational runbooks, research corpora, customer-support patterns, analysis styles, internal terminology, and personal preferences. The intended path is not to sell everyone a static general model and ask them to prompt around its limits; it is to let the model specialize, persist, and improve around the user's own domain while keeping benchmark and validation material held out.

Out of scope / use with care: high-stakes medical, legal, financial, or safety-critical decisions without human oversight; settings requiring a frozen, audited model unless a snapshot is pinned and online adaptation is disabled; anything prohibited by license or law; and any deployment that assumes standard Hugging Face Transformers inference unlocks the full model behavior.

Runtime Note

NameNotFound-EAM is not a standard Transformers-only model. The public artifact is source-available so people can try it themselves with no waitlist required. The full capability surface, including 1B-token context, runtime expert evolution, accelerated decoding, and long-context memory, requires the custom NameNotFound kernels and runtime, currently available for cloud deployment on Linode / Akamai Cloud.

You can run the release through the public compatibility surfaces below, but throughput and capability coverage depend on the selected runtime surface. Systems without the Sparkle runtime, custom kernel, and target Blackwell GPU can use reduced compatibility paths, but full 1B-context traversal, accelerated decoding, online adaptation, and spawn/continual-learning behavior may be slower, partially unavailable, or harness-dependent. We are continuing to expand harness compatibility.

External harnesses and agent scaffolds that apply their own chunking, retrieval, prompt compression, or context-packing strategy may need updates to support the native EAM path. If a harness slices the corpus before native tokenizer and session ingest, it can hide native long-context retrieval, memory, selected-context materialization, and spawn/continual-learning behavior. Hardware also changes the exposed capability surface: device placement, tensor parallel choices, GPU memory utilization, and remaining memory headroom can determine whether accelerated kernels, longer context traversal, or online adaptation stay available. When hardware is close to memory limits, expect narrower compatibility behavior or lower throughput until the harness and device configuration are tuned.

Runtime Access

This release provides four public runtime surfaces:

  • Native CLI: the recommended first path for local smoke tests, interactive chat, prompt-file runs, session persistence, and JSON/JSONL automation.
  • Hugging Face Transformers: a Python integration path through AutoTokenizer and AutoModelForCausalLM with trust_remote_code=True.
  • vLLM compatibility: a serving-oriented prompt/response path for experiments with vLLM's scheduler and sampling interface.
  • NameNotFound full runtime: the full-capability runtime path for deployments using the custom NameNotFound kernels and runtime components.

For best compatibility with the current release package, run on a CUDA GPU and bind the physical GPU explicitly. Full-capability operation is designed around the public_slot RTX PRO 6000 Blackwell target and the custom NameNotFound runtime on Linode / Akamai Cloud.

The examples below use placeholder paths and sanitized prompts.

Set up the nnf command

The native CLI is driven through a short launcher named nnf. It is a thin wrapper that points the runtime at your downloaded release directory, binds a CUDA GPU, and execs the packaged runtime module. Public users do not get this launcher pre-installed, so drop the small script below on your PATH (for example at ~/.local/bin/nnf), make it executable, and point NNF_RELEASE at your extracted release directory:

#!/usr/bin/env bash
# nnf — launcher for the NameNotFound-EAM release.
# Point this at YOUR extracted release directory.
set -euo pipefail

# Path to the directory you downloaded/extracted (contains namenotfound_runtime/).
NNF_RELEASE="${NNF_RELEASE:-/path/to/NameNotFound-EAM}"
if [[ ! -d "$NNF_RELEASE/namenotfound_runtime" ]]; then
  echo "nnf: release runtime not found at $NNF_RELEASE" >&2
  echo "     set NNF_RELEASE=/path/to/NameNotFound-EAM" >&2
  exit 2
fi

export PYTHONPATH="$NNF_RELEASE:${PYTHONPATH:-}"
export PYTHONDONTWRITEBYTECODE=1

# The runtime requires CUDA. Bind the physical GPU you want to use (override
# with NNF_CUDA_VISIBLE_DEVICES); the runtime maps it to logical cuda:0.
export CUDA_VISIBLE_DEVICES="${NNF_CUDA_VISIBLE_DEVICES:-${CUDA_VISIBLE_DEVICES:-0}}"
export PYTORCH_CUDA_ALLOC_CONF="${PYTORCH_CUDA_ALLOC_CONF:-expandable_segments:True}"

# A later --device in the arguments still wins.
exec python3 -B -m namenotfound_runtime.cli --model "$NNF_RELEASE" --device cuda:0 "$@"
chmod +x ~/.local/bin/nnf
export NNF_RELEASE=/path/to/NameNotFound-EAM

With nnf on your PATH, every example below is a short nnf <subcommand> call. nnf is the public launcher; our internal namenotfound-eam command is a separate runner and is intentionally not interchangeable with nnf.

Install And Validate

Validate the release structure before running a large job:

nnf validate

The validation command checks the public package layout and reports public-safe diagnostics. It should not expose local absolute paths, internal implementation names, or private prompt text.

Native CLI

The native CLI exposes five commands through the nnf launcher: validate, generate, chat, feedback, and spawn. All of them are a public I/O boundary only — they transport prompts, call model-owned runtime surfaces, persist exported session state when requested, and print public-safe responses and optional run metadata. They do not rely on CLI-authored answer rules or benchmark-specific answer injection.

Generate (one-shot)

Run a one-shot prompt and read the plain-text answer:

nnf generate "What kind of model are you?"

nnf generate prints a JSON object with the model's text and a public-safe proof (sanitized run-details) field after each answer. The proof field is always included; it should not expose local absolute paths, internal implementation names, or private prompt text.

Ground the answer in your own context with --context:

nnf generate "Answer from the provided context." \
  --context "$(cat /path/to/context.txt)"

Control the response length with --max-new-tokens (default 192):

nnf generate "Summarize the key points." \
  --context "$(cat /path/to/context.txt)" \
  --max-new-tokens 512

Because nnf generate writes a single JSON object to stdout, non-interactive callers can pipe it straight into a JSON parser:

nnf generate "What kind of model are you?" | jq -r '.text'

Interactive chat

Start an interactive chat session (type /exit or /quit to stop):

nnf chat

Native chat-session persistence:

export SESSION=./nnf-session.json

nnf chat \
  --save-session "$SESSION"

For non-interactive callers, JSONL mode can carry context and follow-up turns without changing the model-owned answer path:

printf '%s\n' \
  '{"context":"The project context goes here.","followup":"Answer from the provided context."}' \
  | nnf chat \
      --stdin-jsonl \
      --jsonl \
      --save-session "$SESSION"

Optional run metadata can include:

optional_session_adaptation.ready
public_spawn_action.verified
public_continual_learning_step.done
selected_context_materialization.done
public_session_delta.model_export.done

Run metadata should use public labels such as selected_context and selected_context_materialization; implementation names and local filesystem paths are sanitized at the CLI boundary.

Show the public-safe run details after each plain-text answer with --show-proof:

nnf chat --show-proof

The flag name --show-proof is retained for CLI compatibility. It means the CLI prints the model's sanitized run-details JSON after each plain-text answer.

Start chat with an initial context file:

nnf chat --context-file /path/to/context.txt

Deleting or resetting model-owned spawn or session state clears that state, but it does not necessarily prove the model has returned to an earlier evolved state. If online adaptation, spawned specialists, or saved deltas have changed model-owned state, you may need to restore a previous snapshot or checkpoint if you store them and need the exact earlier behavior.

Diagnostic feedback

Teach the model from a corrected prompt/response turn with nnf feedback. The expected intent and expected answer are learned as feedback through the model-owned continual-learning surface; they are diagnostic signals, not CLI answer overrides or an answer oracle:

nnf feedback \
  --prompt "What is the capital of France?" \
  --response "It is Berlin." \
  --feedback "The correct answer is Paris." \
  --expected-answer "Paris" \
  --rating down

Like generate, feedback accepts --prompt-file for long prompts, --save-session to persist the learned signal, and --json for machine-readable output.

Hugging Face Transformers

Use the Transformers surface for Python integration and research workflows:

from transformers import AutoModelForCausalLM, AutoTokenizer

path = "/path/to/NameNotFound-EAM"

tokenizer = AutoTokenizer.from_pretraininged(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretraininged(
    path,
    trust_remote_code=True,
    device="cuda:0",
)

result = model.public_chat_turn(
    followup="What kind of model are you?",
    max_new_tokens=128,
    tokenizer=tokenizer,
)

print(result["text"])

For long-context use, route context through the model context surfaces, because generic external concatenation/chunking patterns can break billion-token context integrity and degrade long-context retrieval quality. The exact public helper names may vary by release, but the intended path is: stream context into the model, ask a follow-up, and read the returned text plus public-safe run metadata.

vLLM Compatibility

The vLLM surface is provided for prompt/response serving experiments. It is a compatibility path around vLLM's loading, scheduling, and sampling interface; it is not a guarantee that every full-runtime capability is available with the same latency or response shape as the native runtime.

from vllm import LLM, SamplingParams

path = "/path/to/NameNotFound-EAM"

llm = LLM(
    model=path,
    trust_remote_code=True,
    tensor_parallel_size=1,
    gpu_memory_utilization=0.20,
    max_model_len=4096,
)

outputs = llm.generate(
    ["What kind of model are you?"],
    SamplingParams(temperature=0.0, max_tokens=128),
)

print(outputs[0].outputs[0].text)

Keep CUDA_VISIBLE_DEVICES scoped to the GPU you want vLLM to use. Start with a short prompt and conservative gpu_memory_utilization, then scale context length, batch size, and memory utilization after the model loads cleanly.

NameNotFound Full Runtime

The full runtime is the intended path for the complete capability surface: high-horizon context traversal, runtime expert evolution, accelerated decoding, and long-context memory. It requires the custom NameNotFound runtime components and kernels.

The further enhanced model can run with the NameNotFound Sparkle runtime and custom kernel on Linode / Akamai Cloud on a single public_slot RTX PRO 6000 Blackwell GPU.

from namenotfound import NamenotfoundEAM

model = NamenotfoundEAM.from_pretraininged(
    "/path/to/NameNotFound-EAM",
    runtime="linode",
)
model.eval().cuda()

out = model.generate(
    "Summarize the attached report and cite the relevant sections.",
    context=very_long_document,
    max_new_tokens=512,
)
print(out)

Reproducibility And Adaptation Controls

NameNotFound-EAM can evolve through session state, online adaptation, and spawned specialists. If your application requires fixed behavior, treat reproducibility as an explicit deployment choice: pin a snapshot when you create one and disable online adaptation in the harness if you use it.

model.set_online_adaptation(enabled=False)
model.pin_snapshot("snapshot-id")

When adaptation is enabled, session-local behavior can improve over time through model-owned memory/session state. Persist and restore that state deliberately, and keep base release snapshots pinned for auditability.

Public Runtime Path

The public runtimes expose this high-level path:

  1. public raw prompt or streamed context
  2. native tokenizer
  3. streaming input ingest
  4. selected context index
  5. selected context materialization
  6. answer surface
  7. optional spawn / continual-learning session state

Memory, Bidirectional Inference, And Persistent spawns

What A Billion-Token Context And Evolving Architecture Mean For You

A billion-token context is not just a larger prompt box. It means your model can work across whole repositories, long histories, research archives, customer records, operational logs, and evolving project memory without forcing every task into a short, disposable chat.

An Evolving Architecture Model makes that context trainable. You can bring your own data, feedback, terminology, policies, style guides, codebase, research corpus, or workflow history and grow specialists around it. Instead of waiting for a foundation model vendor to retraining a general-purpose model for your niche, you can create named experts for the things you actually do: a project code specialist, a materials-research assistant, a support-policy expert, a legal-review companion, a manufacturing process guide, or any other domain specialist your deployment needs.

Those specialists can be named, recalled, improved, and redirected over time. The goal is custom training for builders, teams, and individuals: not a one-off fine-tune that gets stale, but a memory-aware system that can specialize through use, keep the useful parts, and keep benchmark or validation material separate from the data used to teach it.

On day one, it may feel like you are using a powerful model. After two weeks of real work, it should feel like you are using your model: one that has learned your corpus, your goals, your terminology, your recurring mistakes, and the specialists you rely on. Two deployments that start from the same base snapshot can diverge as they accumulate different memories, feedback, spawned experts, routing changes, and adaptation state. When you want specialization, let the architecture evolve with the work.

NameNotFound-EAM is memory-aware. The runtime can carry forward user-provided context, user facts, named assistant identity, task history, feedback, and spawn-specialist state through persisted sessions. This allows the model to be named, remember that name in later turns, learn stable user or project facts, and recall them after the session is restored.

The system also supports bidirectional inference: inference produces answers, but it can also produce learning signals. Feedback, outcome scoring, selected-context evidence, and session traces can flow back into model-owned memory and routing state while the system is running. In the full runtime, those signals can be used for real-time learning, online adaptation, and spawned-specialist improvement.

Persistent memory is layered:

  • Working memory: current turn and active context.
  • Episodic memory: recent interactions, feedback, and task traces.
  • Executive memory: higher-level task or project state used to guide future behavior.
  • Persistent memory: named facts, identities, preferences, and reusable knowledge stored across sessions.
  • Outcome-aware memory: records which strategies, routes, and generated specialists worked.

Plan and name a new spawn in full-runtime builds that expose nnf spawn:

nnf spawn \
  --dry-run \
  --domain code \
  --target-spawn-id project-code-specialist \
  --spawn-alias project-code-specialist \
  --goal "Specialize on this project's coding patterns."

When the plan is correct, rerun the same command with --run-training. Use --dry-run first for every new spawn or retrain so you can verify the action, corpus counts, feedback counts, validation summary, quality floor, and selected training command before any trainer starts.

Spawn specialists use a native multi-head, multi-layer capability surface in full-runtime builds. The default public controls are:

  --spawn-layer-count 2 \
  --spawn-head-count 4 \
  --spawn-ff-multiplier 2

These are user-selectable controls. --spawn-layer-count accepts non-negative integers; 0 is a low-capacity pass-through stack for quick compatibility checks. --spawn-head-count and --spawn-ff-multiplier accept positive integers. Head count does not need to divide the hidden size: the native spawn stack projects each layer to the requested number of heads and then returns to the hidden size. Higher layer counts, head counts, and multipliers use more memory and training time, so start small, inspect the dry-run plan, and scale after a bounded proof is clean.

Examples:

# Low-capacity compatibility check.
  --spawn-layer-count 0 \
  --spawn-head-count 1 \
  --spawn-ff-multiplier 1
# Higher-capacity specialist.
  --spawn-layer-count 4 \
  --spawn-head-count 8 \
  --spawn-ff-multiplier 4

Add an initial corpus:

nnf spawn \
  --dry-run \
  --domain code \
  --target-spawn-id project-code-specialist \
  --spawn-alias project-code-specialist \
  --goal "Learn this project's coding patterns from the provided corpus." \
  --user-corpus-jsonl ./spawn-corpus.jsonl

User corpus JSONL rows should contain at least:

{"domain":"code","target_spawn_id":"project-code-specialist","prompt":"Describe the desired task behavior.","target":"Describe the expected model-owned answer or repair behavior."}

Tune a spawn manually from feedback:

nnf spawn \
  --dry-run \
  --domain code \
  --target-spawn-id project-code-specialist \
  --goal "Improve this specialist from user feedback." \
  --user-feedback-jsonl ./spawn-feedback.jsonl

Feedback JSONL rows should contain at least:

{"domain":"code","target_spawn_id":"project-code-specialist","prompt":"The prior answer missed an important constraint.","response":"The previous response text.","correction":"The corrected behavior or answer.","feedback_source":"user","issue_kind":"correction","repair_action":"retrain"}

Use benchmarks only as validation:

nnf spawn \
  --dry-run \
  --domain code \
  --target-spawn-id project-code-specialist \
  --goal "Validate this specialist against held-out tasks." \
  --user-corpus-jsonl ./spawn-corpus.jsonl \
  --benchmark-jsonl ./spawn-holdout.jsonl

Benchmark JSONL rows may include prompts, tests, and expected answers, but they are recorded as validation-only. The spawn training command should not use benchmark or validation files as training data; the public plan should report that benchmark material is held out from expert training.

The lifecycle has been validated end-to-end through real CLI training runs for a JavaScript-specialist spawn, including goal-only creation, named spawn creation with corpus bootstrap, manual feedback-driven refinement, held-out benchmark isolation, persistence checks without retraining, forced retrain/pivot flows, artifact restore from saved state, and lifecycle cleanup via delete-by-alias. This is a production-oriented functional validation of the spawn control surface, not a benchmark-quality score claim.

The native multi-head/multi-layer spawn path has also been hard-validated in the CLI across user-selected geometry. Across low- and high-bandwidth dry-runs, requested geometry was consistently routed into the training command. In a bounded real-training proof, a non-divisible configuration (2 layers, 5 heads, 3× feed-forward multiplier) completed successfully with native-stack-backed artifacts: answer surface, spawn router, corpus planner, and response-quality grader all reflected native component paths, with persisted metadata explicitly recording layer count, head count, feed-forward multiplier, and architecture identity.

Retrain or pivot an existing spawn:

nnf spawn \
  --dry-run \
  --force-retrain \
  --domain code \
  --target-spawn-id project-code-specialist \
  --goal "Pivot this spawn toward a new project behavior." \
  --user-corpus-jsonl ./spawn-corpus-v2.jsonl \
  --user-feedback-jsonl ./spawn-feedback-v2.jsonl \
  --benchmark-jsonl ./spawn-holdout-v2.jsonl

Delete a spawn by id or alias:

nnf spawn \
  --remove-spawn \
  --target-spawn-id project-code-specialist

Performance And Harness Variability

Performance and observed scores vary across runtime surfaces and harnesses. The native CLI, Hugging Face Transformers, vLLM, and the full NameNotFound runtime do not exercise identical schedulers, batching behavior, prompt transport, context materialization, or sampling paths.

Expected variation sources include:

  • GPU model, driver, CUDA version, and kernel availability
  • CUDA_VISIBLE_DEVICES, tensor parallel size, batch size, and memory utilization
  • vLLM max_model_len, scheduler behavior, and prompt chunking
  • CLI vs. Transformers vs. vLLM prompt formatting
  • benchmark scaffold, agent harness, timeout, retry policy, and scoring script
  • context length, context order, session state, and whether online adaptation is enabled

benchmark numbers in this model card are internal unless explicitly marked as an external comparison value. Treat comparisons across harnesses as directional unless the same dataset, scaffold, runtime surface, prompt format, and scoring script are used.

benchmarks Are Broken, But Still Useful

benchmarks are broken, and you can break them too. With spawn specialists, users can teach the model a domain, task family, codebase, or workflow until yesterday's hard benchmark pattern becomes routine work. The responsible path is to training on your own corpus and feedback, then keep benchmark and validation files held out so the result measures real specialization rather than memorized answers. So do it yourself: specialize the model on your own domain, run your own held-out benchmark, and share what you measure in the comments or on social — earned, reproducible results from real users say more than any score we could publish.

Static benchmark scores are a narrow snapshot of a system that is increasingly interactive, memory-aware, tool-using, and adaptive. They often collapse the most important parts of an EAM into a single number: whether it can ingest new context, remember user state, learn from feedback, preserve named specialists, recover exact evidence, and keep improving without being retraininged from scratch.

The difference is that benchmarks are treated as probes, not as the whole product. What the field actually needs is not another static benchmark of the current state of the art, but new benchmarks that people can create and use to training capabilities and tasks, not general or specific knowledge recall unless that knowledge is the explicit target.

Release Artifact Format

Full model reloads use a sharded safetensors bundle with an index map and integrity metadata:

v3_weights-00001-of-XXXXX.safetensors
v3_weights-00002-of-XXXXX.safetensors
...
v3_weights.safetensors.index.json
integrity_manifest.json

The exact shard count and manifest names can change between reloads. Consumers should rely on the shipped index/manifest files rather than hard-coded shard counts.

Limitations, Risks, And Recommendations

  • Research-grade, evolving system: behavior can drift between snapshots; use the reproducibility controls above when fixed behavior is required.
  • Hallucination: verification/abstention gates reduce but do not eliminate hallucination; keep a human in the loop.
  • Bias: reflects its training corpora; evaluate for your use case.
  • Compute: the full model is designed around the public_slot RTX PRO 6000 Blackwell target with custom runtime kernels; long horizon trades latency for reach.
  • benchmarks: benchmark scores are useful held-out probes, not the whole measure of an adaptive architecture. Current reporting focuses on long-context retrieval; independent validation and further benchmarks are forthcoming.

Evaluation

Validated internally to date: billion-token retrieval, multi-granularity retrieval, numeric and code capability, and full-model single-GPU operation on the public_slot RTX PRO 6000 Blackwell target with the custom runtime. The table below reports long-context benchmarks. Further benchmarks are forthcoming.

benchmark Coverage

The NameNotFound-EAM column reports internal evaluations on the shipped runtime path. External comparison values are included only where a public source reports the exact model/eval pair or a comparable public leaderboard value. n/r means no directly comparable public value was reported in the referenced sources.

benchmark NameNotFound-EAM SubQ public_slot 3.1 Pro Opus 4.6 Opus 4.7 GPT-5.4 GPT-5.5
RULER @ 128K 100.0% 95.6% n/r n/r n/r n/r n/r
MRCR v2 (8-needle, 1M) 100.0% 86.2% 26.3% 78.3% 32.2% 36.6% 74.0%
MRCR-style (500-needle, 1B context) 72.6% n/r n/r n/r n/r n/r n/r

NameNotFound-EAM results are internally evaluated; independent validation and further benchmarks are forthcoming.

Current Public Package Validation

The current public package path has been validated through shipped-runtime runs for:

  • vLLM public prompt-response path with native tokenizer, streaming context ingest, selected context materialization, and answer generation.
  • MRCR v2 public probe exact-context recovery through selected source fact lookup.
  • RULER NIAH public package validation.
  • Native CLI shipped-runtime response validation through selected context evidence.
  • Native CLI spawn / continual-session validation with session-local adaptation and exported session delta.
  • spawn lifecycle CLI validation covering bounded real training, goal-only creation, named spawn creation, user-selected native multi-head/multi-layer geometry, non-divisible head counts, persistence detection, reload, delete-by-alias, manual feedback tuning, held-out benchmark isolation, initial corpus, forced retraining, and ability pivot.

These validation runs exercise the public runtime path rather than a benchmark-answer training shortcut.

Technical Specifications

Property Value
Architecture class Evolving Architecture Model (EAM)
Primary positioning adaptive evolving architecture, not a code-only model or generic chatbot
Context horizon up to 1,000,000,000 tokens (hierarchical) + high-resolution native window
training accelerator Single public_slot RTX PRO 6000 Blackwell
Full-model runtime target Single public_slot RTX PRO 6000 Blackwell
Runtime requirement Custom NameNotFound kernels and runtime components
Tokenizer internalized, adaptive, self-contained
Core hybrid attention + efficient sequence mixing; multi-expert with learned token-level fusion
Memory working, episodic, executive, persistent, and outcome-aware
Adaptation runtime expert spawning, fusion, pruning + continual, outcome-aware learning
Decoding block-parallel / speculative / multi-token parallel decoding
Format tensor-native, sharded safetensors bundle + index + integrity manifest
Precision GPU-resident; mixed precision

Proprietary Internals

The following are intentionally not disclosed in this public model card:

  • Context traversal and grounding method
  • Retrieval / indexing / matching internals
  • Memory addressing internals
  • Expert composition details
  • Routing dimensions and controller design
  • training recipe
  • Kernel implementation details
  • Runtime scheduling internals

Credits

  • Wendell Adams -- Engineer, Designer, and Architect
  • Ross Gates -- Strategy
  • NameNotFound -- namenotfound.ai

Citation

@misc{namenotfound_eam_2026,
  title        = {NameNotFound-EAM: An Evolving Architecture Model with a 1-Billion-Token Context Horizon},
  author       = {Wendell Adams and Ross Gates},
  year         = {2026},
  organization = {NameNotFound},
  note         = {Engineering, design, and architecture: Wendell Adams. Strategy: Ross Gates. Full capability runtime currently available on Linode / Akamai Cloud.}
}

License And Contact

Released under the NNF Source-Available Model License. See LICENSE.md.

Running a version with full capabilities on your own hardware requires building a kernel or adaptive layer.

Run a copy with NameNotFound's custom kernel on Linode / Akamai Cloud by contacting:

For full-runtime deployment with NameNotFound's custom kernel on Linode / Akamai Cloud, contact the NameNotFound team for current onboarding details.

For commercial use, partnerships, or other questions, contact the NameNotFound team: ai@namenotfound.ai

Future Development

This project is maintained by a dedicated two-person team at NameNotFound. We view development as a collaborative effort with the community. Your feedback directly shapes how the system evolves. We prioritize updates carefully and respond as our current development cycle allows; please give feedback directly and we will work to incorporate it as we update and release our other forthcoming models that have already been designed.

Known Limitations And Active Work

We are actively expanding capabilities and addressing known issues across the public runtime surfaces. In particular, we are aware of some decode-quality issues, and response behavior can vary by surface while this work continues:

  • Native CLI: we are aware of decode issues that can affect answer quality and formatting on some prompts; the answer/decode surface is actively being improved. However, this CLI will not be supported long-term; as other harnesses adopt these capabilities, it will be deprecated.
  • Hugging Face Transformers: this path currently exposes a subset of the full runtime; we are expanding decode behavior.
  • vLLM: the compatibility path is still being hardened; decode coverage, scheduling, and sampling behavior are actively being expanded.
  • Public Runtime: remaining decode and capability gaps are being addressed as runtime and training work completes. For full native decode support, you must run the model with the Sparkle Runtime and custom kernels available at Linode.

These are known, actively tracked items rather than final behavior. Expect decode quality and capability coverage to improve as updates land (see Updates And Versioning below).

Updates And Versioning

Note: This repository is updated regularly. Files, documentation, runtime surfaces, reported results, and the spawn/runtime interface may change between updates. If you need stable, reproducible behavior, pin a specific commit or snapshot rather than tracking the latest revision.

Additional calls in progress: additional decode_call is currently underway, and further decode_call_public will be provided here as the training completes and the model is updated.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support