Chatbot & Agent Memory
The recommended end-to-end pattern for using Hebbrix as the memory layer for a conversational chatbot or an autonomous agent — provisioning, corrections, read-after-write consistency, low-latency search, and current-vs-retired facts.
1. One collection per user
2. Process corrections
3. Poll processing_status
4. Search with fast:true
5. Filter current facts
6. Audit superseded
7. Scoped chat
Core principles
One collection per user
The collection is the isolation boundary — every read is hard-scoped to it. Keep one per user (or per agent / namespace).
Writes durable on return; indexing async
/memories/process returns immediately. A memory becomes searchable once processing_status == "completed" (~2–8s). Poll it for read-after-write.
conflict_status == "none" is the current truth
Corrections retire the old fact (conflict_status = "superseded"). The default listing hides retired facts; search excludes them.
superseded_by_id traces the chain
A superseded memory points at the memory that replaced it. Use include_superseded=true for audit views only.
fast:true for latency-sensitive search
Skips the neural reranker (fusion-only ranking): ~350–450ms and consistent. Normal search adds reranking quality but its p95 can include occasional ~2–3s outliers.
Scope chat with collection_id
Memory chat without a scope searches the whole account and is slow. Always pass collection_id. learning:true adds no response-time cost — extraction runs in the background.
A reusable client
Node.js 18+ with built-in fetch — no SDK required.
const BASE = "https://api.hebbrix.com/v1";
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
class HebbrixMemory {
constructor(apiKey) { this.key = apiKey; }
async #req(path, { method = "GET", body } = {}) {
const res = await fetch(BASE + path, {
method,
headers: {
Authorization: `Bearer ${this.key}`,
...(body ? { "Content-Type": "application/json" } : {}),
},
body: body ? JSON.stringify(body) : undefined,
});
const text = await res.text();
const json = text ? JSON.parse(text) : null;
if (!res.ok) {
// Structured errors carry an error_id you can quote to support.
throw Object.assign(new Error(`HTTP ${res.status}`), { status: res.status, body: json });
}
return json;
}
// 1. One collection per user
createUserCollection(userId) {
return this.#req("/collections", {
method: "POST",
body: { name: `user:${userId}`, metadata: { user_id: userId } },
}).then((c) => c.id);
}
// 2. Process a conversation turn / correction
process(collectionId, messages) {
return this.#req("/memories/process", {
method: "POST",
body: { collection_id: collectionId, messages },
});
}
getMemory(id) {
return this.#req(`/memories/${encodeURIComponent(id)}`);
}
// 3. Read-after-write: wait until a memory is searchable
async waitUntilSearchable(memoryId, { maxMs = 15000, intervalMs = 1000 } = {}) {
const start = Date.now();
while (Date.now() - start <= maxMs) {
const m = await this.getMemory(memoryId);
if (m.processing_status === "completed") return m;
await sleep(intervalMs);
}
return null; // still indexing; safe to retry the search later
}
// 4. Low-latency search
search(collectionId, query, { fast = true, limit = 8, minScore } = {}) {
return this.#req("/search", {
method: "POST",
body: {
collection_id: collectionId, query, limit, fast,
...(minScore !== undefined ? { min_score: minScore } : {}),
},
});
}
// 6. Audit view: includes retired facts + supersede chain
listMemories(collectionId, { includeSuperseded = false, limit = 50 } = {}) {
const q = new URLSearchParams({ collection_id: collectionId, limit: String(limit) });
if (includeSuperseded) q.set("include_superseded", "true");
return this.#req(`/memories?${q.toString()}`);
}
// 7. Memory-aware chat (OpenAI-compatible)
chat(collectionId, messages, { model = "gpt-5-nano", learning = false } = {}) {
return this.#req("/chat/completions", {
method: "POST",
body: { model, collection_id: collectionId, messages, features: { memory: true, learning } },
});
}
}End-to-end example
const mem = new HebbrixMemory(process.env.HEBBRIX_API_KEY);
// 1. One collection per user (do this once, then store the id).
const collectionId = await mem.createUserCollection("user-123");
// 2. Process a turn, then a correction.
await mem.process(collectionId, [
{ role: "user", content: "My preferred IDE is Cursor and my timezone is America/New_York." },
{ role: "assistant", content: "Noted." },
]);
const correction = await mem.process(collectionId, [
{ role: "user", content: "Actually, my preferred IDE is VS Code, not Cursor." },
{ role: "assistant", content: "Updated." },
]);
// The UPDATE event's memory_id is the NEW (current) fact.
const updated = correction.events.find((e) => e.event === "UPDATE");
// 3. Read-after-write: wait until the new fact is searchable.
if (updated?.memory_id) await mem.waitUntilSearchable(updated.memory_id);
// 4. Search (fast path) — returns the corrected fact, not the stale one.
const results = await mem.search(collectionId, "what is my preferred IDE");
// -> [{ memory_id, content: "User's preferred IDE is VS Code.", score, conflict_status: "none" }]
// 5. Filter current facts (search already excludes retired facts; this is the
// rule for any raw-memory reads).
const current = (results.results || []).filter((r) => (r.conflict_status ?? "none") === "none");
// 6. Audit view — see the supersede chain.
const audit = await mem.listMemories(collectionId, { includeSuperseded: true });
// items include the retired fact:
// { content: "User's preferred IDE is Cursor.", conflict_status: "superseded",
// superseded_by_id: "<id of the VS Code memory>" }
// 7. Memory-aware chat, scoped to the user's collection.
const reply = await mem.chat(collectionId, [
{ role: "user", content: "Which IDE should I open?" },
], { learning: true });
// reply.choices[0].message.content references the current fact (VS Code).
// reply.memory_context.memories_used -> how many memories grounded the answer.Read-after-write consistency
Writes are durable on return; vector + keyword indexing is asynchronous (~2–8s). A memory is guaranteed searchable once its processing_status is "completed".
POST /v1/memories/process // returns immediately (events + ids)
GET /v1/memories/{id} // poll until processing_status == "completed"
POST /v1/search // now reliably reflects the writeIf you don't need strict read-after-write (e.g. background ingestion), skip the poll — the memory becomes searchable on its own.
Current vs. retired facts
- conflict_status == "none" — the current fact.
- conflict_status == "superseded" / "duplicate" — retired.
GET /v1/memories(default) and/v1/searchexclude retired facts.GET /v1/memories?include_superseded=trueincludes them, each with superseded_by_id pointing at the replacement.- superseded_by_id is populated in normal correction flows; it can be null only if the replacement memory was itself later deleted. Treat conflict_status as the authoritative "is this retired?" signal.
Latency expectations (warm, scoped)
| Endpoint | p50 | p95 | Notes |
|---|---|---|---|
| POST /v1/search (fast:true) | ~350–450ms | ~450ms | Fusion-only ranking; consistent. Use for latency-sensitive paths. |
| POST /v1/search (normal) | ~400ms | up to ~2–3s | Adds a neural reranker; p95 includes occasional outliers. |
| POST /v1/memories/process | ~5s | ~7s | LLM fact extraction / conflict resolution. |
| POST /v1/search/reason | ~2.7s | ~3.6s | Multi-step grounded reasoning (LLM). |
| POST /v1/chat/completions (memory, scoped) | ~2s | ~4.6s | learning:true adds no response-time cost. |
Figures are warm calls. The first request after idle or a deploy pays a one-time cold-start — send a lightweight warm-up request if you need a predictable first call. Memory-enabled chat without a collection_id searches the whole account and is much slower; always scope it.
Agent (policy / correction memory)
The same pattern works for an autonomous agent — store policies and corrections, and retrieve grounded policy before acting.
// Store a policy / correction
await mem.process(agentCollectionId, [
{ role: "user", content: "For customer ACME, refunds above $250 require human approval." },
{ role: "assistant", content: "Recorded." },
]);
// Before acting, retrieve grounded policy (fast path)
const policy = await mem.search(agentCollectionId, "refund approval threshold for ACME", { fast: true });
// For a grounded natural-language decision, use reasoning search
const decision = await fetch("https://api.hebbrix.com/v1/search/reason", {
method: "POST",
headers: { Authorization: `Bearer ${process.env.HEBBRIX_API_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({
collection_id: agentCollectionId,
query: "Can I approve a $275 refund for ACME without a human?",
}),
}).then((r) => r.json());
// decision.answer -> grounded answer citing the policy; non-2xx on internal failure.