Living Context Engine for Claude, GPT, and Gemini Agents: Model-Specific Patterns

Tutorial · Claude 4.5 / GPT-5 / Gemini 2.5 Pro · May 2026

Why Model-Specific Patterns Matter

The architecture of a Living Context Engine is model-agnostic. The wiring is not. Claude, GPT, and Gemini differ in three concrete dimensions that change how you call the engine: context window size, tool-use shape, and streaming conventions. The right pattern is the one that matches each model's native conventions — fighting them costs latency and quality.

This post walks through the recommended pattern for each of the three frontier model families.

Pattern A — Claude (Anthropic)

Claude's strengths: very large context window (1M tokens for Claude 4.7 Opus), excellent at preserving structured input across long contexts, and tool-use that's first-class via the Messages API. The recommended pattern leans into these.

Wide-Context Read

Claude tolerates large retrieved subgraphs without quality degradation. Increase your k and hops beyond the conservative defaults:

chain = db.context_chain(query_vec, k=12, hops=3)

Structured Context Block

Preserve the graph structure when formatting context — Claude reasons better with explicit topology. Use XML-ish tags that match Claude's documented conventions:

def format_for_claude(chain):
    parts = ["<context>"]
    for hop in sorted({n.hop for n in chain.nodes}):
        parts.append(f"  <hop value='{hop}'>")
        for n in [x for x in chain.nodes if x.hop == hop]:
            parts.append(f"    <node id='{n.id}' edge='{n.edge_type or 'seed'}'>")
            parts.append(f"      {n.metadata['text']}")
            parts.append("    </node>")
        parts.append("  </hop>")
    parts.append("</context>")
    return "\n".join(parts)

Tool-Use Write-Back

Expose write_back as a tool. Claude is excellent at deciding when its own output is worth persisting — give it the agency:

tools = [{
    "name": "persist_decision",
    "description": "Persist a completed decision back to the context engine. Use after producing a final answer that should inform future calls.",
    "input_schema": {
        "type": "object",
        "properties": {
            "summary": {"type": "string"},
            "input_node_ids": {"type": "array", "items": {"type": "integer"}},
            "edge_type": {"type": "string", "enum": ["derived_from", "responds_to", "contradicts"]},
        },
        "required": ["summary", "input_node_ids"],
    },
}]

Pattern B — GPT (OpenAI)

GPT-5's strengths: fastest tool-call latency, strong structured output via JSON schema, and a tight context window relative to Claude. The pattern adapts to these.

Tighter Context Block

Keep k smaller and hops at 2. Trim the retrieved subgraph aggressively:

chain = db.context_chain(query_vec, k=5, hops=2)
chain.nodes = chain.nodes[:8]  # hard cap for tight context

JSON-Schema Output for Write-Back

Use GPT-5's JSON-schema mode to make the write-back deterministic:

response_schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "string"},
        "persist": {"type": "boolean"},
        "edge_type": {"type": "string", "enum": ["derived_from", "responds_to"]},
    },
    "required": ["answer", "persist"],
}

resp = client.chat.completions.create(
    model="gpt-5",
    response_format={"type": "json_schema", "json_schema": response_schema},
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query_with_context},
    ],
)

parsed = json.loads(resp.choices[0].message.content)
if parsed["persist"]:
    write_back(db, parsed["answer"], input_ids, edge_type=parsed.get("edge_type", "derived_from"))

Streaming-Friendly Output

GPT-5 streams well. Defer the write-back until the stream ends and you have the full output. Don't write back partial deltas.

Pattern C — Gemini (Google)

Gemini 2.5 Pro's strengths: native multimodal input, very long context (2M tokens), and tight integration with Gemini Embedding 2 (the same 768-dim space used for image + text + video). The pattern leans on the multimodal native-ness.

Multimodal Context Retrieval

The whole point of Gemini is mixed-modality reasoning. Use modality=None in context_chain to retrieve across all modalities and surface the graph that connects them:

chain = db.context_chain(query_vec, k=8, hops=2, modality=None)

Feed Gemini both text excerpts and the actual image/video assets (referenced by node payload):

contents = []
for n in chain.nodes:
    if n.modality == "image":
        contents.append({"inline_data": {"mime_type": "image/jpeg", "data": load_image_bytes(n.metadata["asset_path"])}})
        contents.append({"text": f"[image node {n.id}, edge={n.edge_type}]"})
    else:
        contents.append({"text": f"[{n.modality} node {n.id}, hop={n.hop}, edge={n.edge_type}]: {n.metadata['text']}"})

contents.append({"text": f"\nQuery: {query}"})

response = genai.GenerativeModel("gemini-2.5-pro").generate_content(contents)

Long-Context Read

Gemini's 2M context is enormous. You can pull a wide subgraph (k=20+, hops=3) without quality degradation. Use this for cross-modal queries where the connected subgraph spans dozens of related assets.

Embed With the Same Model

Critically — use gemini-embedding-exp-03-07 for the vectors stored in your Living Context Engine when you're serving via Gemini. The 768-dim space is the same one Gemini's encoders use internally; cross-encoder alignment is what makes the multimodal queries work.

Cross-Cutting Best Practices

One file per agent. Each agent gets its own .feather file. Cheap to spin up, hard isolation, easy to checkpoint.
Embed once, retrieve everywhere. Settle on one embedding model per store. Don't mix Gemini Embedding 2 vectors with OpenAI Ada vectors in the same index.
Capture downstream signal. Whichever model you use, the reinforcement step (raising importance / bumping recall on inputs that produced successful outputs) is what closes the loop. Wire it on day one.

The Common Substrate

Underneath all three patterns is the same Living Context Engine kernel. The model-specific code is at the formatting and write-back boundary — a few dozen lines each. The architectural substrate doesn't change. That portability is the point: the engine is the durable component, the models are interchangeable on top.