Living Context Engine for Claude, GPT, and Gemini Agents: Model-Specific Patterns
Each frontier model has different context window, tool-use, and streaming conventions. This guide covers the model-specific patterns for wiring a Living Context Engine into Claude, GPT-5, and Gemini 2.5 Pro agents.
Living Context Engine for Claude, GPT, and Gemini Agents: Model-Specific Patterns
Tutorial · Claude 4.5 / GPT-5 / Gemini 2.5 Pro · May 2026
Why Model-Specific Patterns Matter
The architecture of a Living Context Engine is model-agnostic. The wiring is not. Claude, GPT, and Gemini differ in three concrete dimensions that change how you call the engine: context window size, tool-use shape, and streaming conventions. The right pattern is the one that matches each model's native conventions — fighting them costs latency and quality.
This post walks through the recommended pattern for each of the three frontier model families.
Pattern A — Claude (Anthropic)
Claude's strengths: very large context window (1M tokens for Claude 4.7 Opus), excellent at preserving structured input across long contexts, and tool-use that's first-class via the Messages API. The recommended pattern leans into these.
Wide-Context Read
Claude tolerates large retrieved subgraphs without quality degradation. Increase your k and hops beyond the conservative defaults:
chain = db.context_chain(query_vec, k=12, hops=3)
Structured Context Block
Preserve the graph structure when formatting context — Claude reasons better with explicit topology. Use XML-ish tags that match Claude's documented conventions:
def format_for_claude(chain):
parts = ["<context>"]
for hop in sorted({n.hop for n in chain.nodes}):
parts.append(f" <hop value='{hop}'>")
for n in [x for x in chain.nodes if x.hop == hop]:
parts.append(f" <node id='{n.id}' edge='{n.edge_type or 'seed'}'>")
parts.append(f" {n.metadata['text']}")
parts.append(" </node>")
parts.append(" </hop>")
parts.append("</context>")
return "\n".join(parts)
Tool-Use Write-Back
Expose write_back as a tool. Claude is excellent at deciding when its own output is worth persisting — give it the agency:
tools = [{
"name": "persist_decision",
"description": "Persist a completed decision back to the context engine. Use after producing a final answer that should inform future calls.",
"input_schema": {
"type": "object",
"properties": {
"summary": {"type": "string"},
"input_node_ids": {"type": "array", "items": {"type": "integer"}},
"edge_type": {"type": "string", "enum": ["derived_from", "responds_to", "contradicts"]},
},
"required": ["summary", "input_node_ids"],
},
}]
Pattern B — GPT (OpenAI)
GPT-5's strengths: fastest tool-call latency, strong structured output via JSON schema, and a tight context window relative to Claude. The pattern adapts to these.
Tighter Context Block
Keep k smaller and hops at 2. Trim the retrieved subgraph aggressively:
chain = db.context_chain(query_vec, k=5, hops=2)
chain.nodes = chain.nodes[:8] # hard cap for tight context
JSON-Schema Output for Write-Back
Use GPT-5's JSON-schema mode to make the write-back deterministic:
response_schema = {
"type": "object",
"properties": {
"answer": {"type": "string"},
"persist": {"type": "boolean"},
"edge_type": {"type": "string", "enum": ["derived_from", "responds_to"]},
},
"required": ["answer", "persist"],
}
resp = client.chat.completions.create(
model="gpt-5",
response_format={"type": "json_schema", "json_schema": response_schema},
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": query_with_context},
],
)
parsed = json.loads(resp.choices[0].message.content)
if parsed["persist"]:
write_back(db, parsed["answer"], input_ids, edge_type=parsed.get("edge_type", "derived_from"))
Streaming-Friendly Output
GPT-5 streams well. Defer the write-back until the stream ends and you have the full output. Don't write back partial deltas.
Pattern C — Gemini (Google)
Gemini 2.5 Pro's strengths: native multimodal input, very long context (2M tokens), and tight integration with Gemini Embedding 2 (the same 768-dim space used for image + text + video). The pattern leans on the multimodal native-ness.
Multimodal Context Retrieval
The whole point of Gemini is mixed-modality reasoning. Use modality=None in context_chain to retrieve across all modalities and surface the graph that connects them:
chain = db.context_chain(query_vec, k=8, hops=2, modality=None)
Feed Gemini both text excerpts and the actual image/video assets (referenced by node payload):
contents = []
for n in chain.nodes:
if n.modality == "image":
contents.append({"inline_data": {"mime_type": "image/jpeg", "data": load_image_bytes(n.metadata["asset_path"])}})
contents.append({"text": f"[image node {n.id}, edge={n.edge_type}]"})
else:
contents.append({"text": f"[{n.modality} node {n.id}, hop={n.hop}, edge={n.edge_type}]: {n.metadata['text']}"})
contents.append({"text": f"\nQuery: {query}"})
response = genai.GenerativeModel("gemini-2.5-pro").generate_content(contents)
Long-Context Read
Gemini's 2M context is enormous. You can pull a wide subgraph (k=20+, hops=3) without quality degradation. Use this for cross-modal queries where the connected subgraph spans dozens of related assets.
Embed With the Same Model
Critically — use gemini-embedding-exp-03-07 for the vectors stored in your Living Context Engine when you're serving via Gemini. The 768-dim space is the same one Gemini's encoders use internally; cross-encoder alignment is what makes the multimodal queries work.
Cross-Cutting Best Practices
- One file per agent. Each agent gets its own
.featherfile. Cheap to spin up, hard isolation, easy to checkpoint. - Embed once, retrieve everywhere. Settle on one embedding model per store. Don't mix Gemini Embedding 2 vectors with OpenAI Ada vectors in the same index.
- Capture downstream signal. Whichever model you use, the reinforcement step (raising importance / bumping recall on inputs that produced successful outputs) is what closes the loop. Wire it on day one.
The Common Substrate
Underneath all three patterns is the same Living Context Engine kernel. The model-specific code is at the formatting and write-back boundary — a few dozen lines each. The architectural substrate doesn't change. That portability is the point: the engine is the durable component, the models are interchangeable on top.
Related: Build a Living Context Engine in Python · The 768-Dimension Bet.