Back to Theory
Tutorial12 min read · May 15, 2026

Living Context Engine with LangChain and LlamaIndex: Integration Guide

How to wire a Living Context Engine into a LangChain or LlamaIndex application. Adapter shape, retriever interface, and where the closed feedback loop fits in each framework.

F
Feather DB Engineering
Engineering Team

Living Context Engine with LangChain and LlamaIndex: Integration Guide

Tutorial · LangChain 0.3+ · LlamaIndex 0.11+ · May 2026


What This Guide Covers

Most production AI applications use a framework — LangChain or LlamaIndex — for orchestration. Both treat retrieval as an interface: a retriever returns documents for a query. This guide shows how to plug a Living Context Engine into both, preserving the adaptive-scoring + typed-edge + write-back behavior the engine is designed for.

We'll use Feather DB as the underlying engine. The pattern transfers to any system that exposes the equivalent primitives.

The Adapter Shape (Both Frameworks)

Both LangChain and LlamaIndex expect a retriever to implement a method that takes a query string (or vector) and returns a list of documents. The minimum adapter is straightforward; the architectural decisions are about where the write-back path lives.

LangChain Integration

The Retriever

from langchain.schema import Document, BaseRetriever
from feather_db import DB
import numpy as np

class FeatherRetriever(BaseRetriever):
    db: object
    embed_fn: object
    k: int = 5
    hops: int = 2
    edge_types: list = None

    def _get_relevant_documents(self, query: str):
        query_vec = self.embed_fn(query)
        chain = self.db.context_chain(
            query_vec=query_vec,
            k=self.k,
            hops=self.hops,
            edge_types=self.edge_types,
        )
        return [
            Document(
                page_content=node.metadata["text"],
                metadata={
                    "node_id": node.id,
                    "score": node.score,
                    "hop": node.hop,
                    "edge_type": node.edge_type,
                },
            )
            for node in chain.nodes
        ]

The Write-Back Callback

LangChain doesn't have a native "write retrieved-content provenance back to the store" hook. The cleanest pattern is a callback on the chain output:

from langchain.callbacks.base import BaseCallbackHandler

class ContextLoopCallback(BaseCallbackHandler):
    def __init__(self, db, embed_fn, retriever):
        self.db = db
        self.embed_fn = embed_fn
        self.retriever = retriever
        self.last_context_ids = []

    def on_retriever_end(self, documents, **kwargs):
        self.last_context_ids = [d.metadata["node_id"] for d in documents]

    def on_chain_end(self, outputs, **kwargs):
        text = outputs.get("output") or outputs.get("text")
        if not text or not self.last_context_ids:
            return
        out_vec = self.embed_fn(text)
        out_id = self.db.next_id()
        self.db.add(
            id=out_id, vec=out_vec, modality="text",
            metadata={"text": text, "kind": "agent_output"},
        )
        for src_id in self.last_context_ids:
            self.db.link(src_id, out_id, edge_type="derived_from")
        self.db.save()
        self.last_context_ids = []

Wiring It Up

from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

db = DB.open("agent.feather", dim=1536)
embed_fn = lambda txt: openai_embed(txt)  # your embed function

retriever = FeatherRetriever(db=db, embed_fn=embed_fn, k=5, hops=2)
callback = ContextLoopCallback(db, embed_fn, retriever)

qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-5"),
    retriever=retriever,
    callbacks=[callback],
)

answer = qa.invoke({"query": "draft a strategy response to brand-x"})

Every chain invocation now goes through the full four-phase loop: read, reason, update (via callback), decay (silent + future-reinforced).

LlamaIndex Integration

The Retriever

from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore, TextNode

class FeatherIndexRetriever(BaseRetriever):
    def __init__(self, db, embed_fn, k=5, hops=2, edge_types=None):
        self.db = db
        self.embed_fn = embed_fn
        self.k = k
        self.hops = hops
        self.edge_types = edge_types
        super().__init__()

    def _retrieve(self, query_bundle):
        query_vec = self.embed_fn(query_bundle.query_str)
        chain = self.db.context_chain(
            query_vec=query_vec,
            k=self.k,
            hops=self.hops,
            edge_types=self.edge_types,
        )
        return [
            NodeWithScore(
                node=TextNode(
                    text=n.metadata["text"],
                    id_=str(n.id),
                    metadata={"hop": n.hop, "edge_type": n.edge_type},
                ),
                score=n.score,
            )
            for n in chain.nodes
        ]

The Write-Back

LlamaIndex query pipelines support post-response callbacks. The cleanest approach is to subclass the query engine and override _aquery / _query:

from llama_index.core.query_engine import RetrieverQueryEngine

class FeatherQueryEngine(RetrieverQueryEngine):
    def __init__(self, retriever, response_synthesizer, db, embed_fn, **kwargs):
        super().__init__(retriever=retriever, response_synthesizer=response_synthesizer, **kwargs)
        self.db = db
        self.embed_fn = embed_fn

    def _query(self, query_bundle):
        response = super()._query(query_bundle)
        text = str(response)
        source_ids = [int(n.node.id_) for n in response.source_nodes]
        out_vec = self.embed_fn(text)
        out_id = self.db.next_id()
        self.db.add(
            id=out_id, vec=out_vec, modality="text",
            metadata={"text": text, "kind": "agent_output"},
        )
        for src_id in source_ids:
            self.db.link(src_id, out_id, edge_type="derived_from")
        self.db.save()
        return response

Where the Two Frameworks Differ

  • LangChain exposes the loop via callbacks. The write-back is async-friendly but a bit invisible — easy to forget which callback fires when.
  • LlamaIndex encourages subclassing the query engine. The write-back is right next to the retrieval call — more visible and easier to reason about.

Neither framework currently treats "memory that gets written back" as a first-class abstraction. Both can host a Living Context Engine — but the integration is bring-your-own.

The Stack You End Up With

┌──────────────────────────────────────────┐
│  LangChain / LlamaIndex orchestration    │
├──────────────────────────────────────────┤
│  FeatherRetriever  (read)                │
│  ContextLoopCallback / FeatherQueryEngine │
│      (reason + update + decay)            │
├──────────────────────────────────────────┤
│  Feather DB engine                        │
│  (HNSW + typed graph + decay kernel)     │
├──────────────────────────────────────────┤
│  agent.feather  (single file)            │
└──────────────────────────────────────────┘

Recommended Practice

Keep one .feather file per agent or per tenant. The orchestration framework lives at the application level; the engine file lives next to the agent's checkpoint. Both LangChain and LlamaIndex have memory-per-session abstractions you can map onto this — wire each session to its own file and isolation comes for free.


Related: Build one from scratch in Python · Integrations docs.