Claude-Mem: Never Lose Context in Claude Code (Setup Tutorial)

Every Claude Code session starts fresh with zero memory of your previous work. You waste time re-explaining project structure, repeating architectural decisions, and rebuilding context that existed hours ago. Claude-Mem solves this completely: a persistent memory plugin that automatically captures everything Claude does, compresses it with AI, and injects relevant context into future sessions. Trending #3 on GitHub with 58,295 stars (+ 2,305 today), it eliminates context loss through intelligent 3-tier memory retrieval that achieves 10x token savings compared to dumping full history.

This tutorial shows you how Claude-Mem works under the hood, how to set it up correctly (avoiding common pitfalls), and how to use its MCP search tools effectively for maximum token efficiency.

How Claude-Mem Works: 3-Tier Architecture Explained

Claude-Mem operates through 5 lifecycle hooks that intercept Claude Code at critical moments. SessionStart injects context from your previous 10 sessions automatically. UserPromptSubmit captures your prompts. PostToolUse generates observations after every tool execution—file reads, writes, command runs. Stop triggers the worker service to compress observations using Claude Agent SDK. SessionEnd produces summaries for future reference.

All data lives locally in a SQLite database with full-text search (FTS5) combined with a Chroma vector database for hybrid semantic + keyword search. This means you can query project history naturally: “What authentication changes happened last week?” The system finds relevant observations without dumping your entire history into context.

The breakthrough is the progressive disclosure pattern—a 3-tier retrieval system that prevents token waste. Layer 1 uses the search tool to get a compact index with observation IDs (50-100 tokens per result). You skim these to identify interesting patterns. Layer 2 uses the timeline tool to see chronological context around those IDs. Layer 3 uses get_observations to fetch full details only for filtered IDs (500-1,000 tokens per result).

This workflow achieves roughly 10x token savings compared to fetching everything upfront. Instead of loading 50 observations at 500 tokens each (25,000 tokens), you might search 50 results at 75 tokens (3,750 tokens), identify 3 relevant ones, and fetch just those (1,500 tokens)—total 5,250 tokens. That’s an 80% reduction.

The architecture is deliberate: bounded memory that forces consolidation produces better understanding than infinite append-only logs. Human memory works similarly—we compress experiences into generalizations rather than recording verbatim transcripts.

Installation Tutorial: Step-by-Step Setup

Installation takes one command. Run this:

npx claude-mem install

The installer downloads prebuilt binaries, installs dependencies (Bun for the worker service, uv for vector search, SQLite for storage), configures plugin hooks for session lifecycle management, and auto-starts the worker on your first session. Verify success by opening http://localhost:37777 in your browser—you should see the web viewer UI showing real-time memory streams.

Do NOT use npm install -g claude-mem. That installs only the SDK library without registering plugin hooks or setting up the worker service. This is the most common installation mistake documented in GitHub Issues.

System requirements: Node.js 18.0.0 or higher, latest Claude Code with plugin support. Bun and SQLite install automatically if missing.

Alternative installation via plugin marketplace:

/plugin install claude-mem

Then restart Claude Code. Settings live in ~/.claude-mem/settings.json (auto-created on first run). Configuration options include AI model selection (default: haiku, the cheapest Claude option), worker port (default: 37777), data directory, log level, and 11 context injection settings.

Common pitfalls and fixes:

Port conflict: If port 37777 is already in use by another service, export a custom port before installing:

export CLAUDE_MEM_WORKER_PORT=38888
npx claude-mem install

macOS health check failures: GitHub Issue #420 documents cases where the worker starts but doesn’t respond to health checks. Fix: Verify Node.js is in your PATH. Run which node to confirm.

chromadb/Python errors during setup: Ensure npm is in your PATH and Node.js is properly installed. Download the latest installer from nodejs.org if needed.

Performance impact: GitHub Issue #1766 reports that claude-mem can significantly slow down Claude Code responses. This is a documented trade-off—memory persistence costs speed. If performance is critical, this tool may not be for you.

Real-World Usage: MCP Search Tools and Web Viewer

After installation, Claude-Mem works automatically with zero manual intervention. Every tool execution generates an observation. Observations compress into summaries. Summaries inject into future sessions. You don’t need to do anything unless you want to query project history explicitly.

MCP Search Tools workflow follows the 3-tier pattern:

Start broad: Ask Claude to search for relevant observations. “Search for authentication changes from last week.” Claude uses the search tool internally, gets observation IDs, and shows you a compact index.
Review timeline: If needed, Claude uses the timeline tool to show chronological context around interesting results. This helps you understand “what happened when.”
Fetch selectively: Claude uses get_observations to fetch full details only for filtered IDs. You get complete information without loading irrelevant observations.

Avoid this mistake: Don’t fetch all observations upfront. That causes token limit errors and defeats the purpose of progressive disclosure. Always filter first, fetch second.

Web viewer UI at http://localhost:37777 provides real-time visualization of your memory stream. You see observations being captured, compression happening live, and session summaries being generated. This is surprisingly useful for debugging memory issues—if you expected something to be remembered but it wasn’t, check the web viewer to see if it was captured correctly.

Privacy controls use simple tags to exclude sensitive data:

const apiKey = "<private>sk-1234567890abcdef</private>";

Content inside <private> tags is never captured or compressed. Use this for API keys, passwords, tokens, or any sensitive information you don’t want persisted.

Auto-generated CLAUDE.md files (v9.0.0 feature) appear in your project folders automatically. These files contain human-readable activity timelines summarizing session history. They support git worktrees and have configurable observation limits. Think of them as persistent context files that update themselves.

Performance, Cost, and Trade-Offs

Token efficiency: The 3-tier retrieval system delivers roughly 10x token savings, but only if you use it correctly. Search first (50-100 tokens/result for compact IDs), filter to relevant observations, then fetch full details (500-1,000 tokens/result). If you skip filtering and dump everything, you lose the efficiency gain.

Cost: Claude-Mem uses the haiku model by default for compression—the cheapest option in Claude’s lineup. Compression happens automatically after every tool call, so there’s a small ongoing cost, but it’s minimal. No separate API key required; it uses your existing Claude Code authentication. The plugin is free and open-source under the GNU AGPL-3.0 license, which means you can use it freely but must make source code available if you deploy it on network servers.

Performance trade-off: GitHub Issue #1766 documents real user reports of claude-mem causing significant speed impact. Claude Code responds more slowly when the plugin is active. This is a documented trade-off: memory persistence costs performance. If you’re working on performance-critical workflows where every second matters, this tool may not be worth it.

Storage growth: Long-term projects accumulate large observation databases. SQLite handles this well, but search quality may degrade over time as the database grows. No documented mitigation yet beyond manual pruning.

When NOT to use Claude-Mem:

One-off scripts or tutorials: If you’re not going to have repeated sessions on the same project, there’s nothing to remember. Setup overhead isn’t worth it.
Performance-critical workflows: Documented speed impact makes this unsuitable for workflows where every second counts.
Extreme privacy requirements: Even with <private> tags, some developers prefer zero local storage. If you’re working with highly sensitive codebases, evaluate whether local SQLite storage meets your security posture.
Learning and experimentation: Adds complexity to setup. If you’re just trying Claude Code for the first time, start without plugins.

ROI threshold: You need roughly 3-5 sessions on the same project before Claude-Mem’s value becomes positive. Initial setup takes 5-10 minutes, and the first few sessions build up the memory database. After that, context injection starts saving you time.

When to Use Claude-Mem: Decision Framework

Best use cases where Claude-Mem shines:

Long-running projects (weeks or months): Context compounds over time. After 20 sessions, Claude-Mem knows your architectural patterns, coding standards, and past decisions without you explaining anything.

Debugging sessions: Claude-Mem remembers past bug fixes, error patterns, and solutions. When a similar issue appears, it references the previous fix automatically.

Code reviews: Apply consistent standards across pull requests. Claude-Mem recalls past review comments and applies the same principles to new code.

Multi-project context switching: Switch between projects without losing context. Each project maintains separate memory. Return to an old project after weeks away, and Claude-Mem instantly re-immerses you.

Team knowledge sharing: Query project history naturally. New team members ask, “Why did we implement authentication this way?” and Claude-Mem retrieves the reasoning from past sessions.

Comparison with alternatives:

Manual context management (copy-paste every session): High effort, high token cost, incomplete coverage, no searchability. Claude-Mem automates this completely.
CLAUDE.md files (static instructions): Good for team guidelines and project rules, but static. Claude-Mem is dynamic and session-aware.
Claude’s native memory (Opus 4+ context compaction): Useful within a single session but doesn’t persist across restarts. Claude-Mem provides true cross-session persistence.
Git commit history: Shows “what changed” but misses “why.” Claude-Mem captures reasoning, discussions, and decisions that never make it into commit messages.

Pick Claude-Mem when the compounding learning effect justifies initial setup friction. After 5 sessions, you’ll spend less time re-explaining and more time building.

Key Takeaways

Claude-Mem is a persistent memory plugin for Claude Code that automatically captures tool usage, compresses it with AI, and injects relevant context into future sessions, solving the universal frustration of starting every session with zero memory.
The 3-tier architecture (search → timeline → get_observations) achieves 10x token savings by filtering before fetching expensive full details, but only if you follow the progressive disclosure workflow correctly.
Installation takes one command (npx claude-mem install), but avoid common pitfalls: don’t use npm install -g (installs SDK only), check localhost:37777 to verify the worker is running, and export a custom port if 37777 conflicts.
Real-world usage is automatic after setup: observations capture automatically, MCP search tools query history naturally, <private> tags exclude sensitive data, and the web viewer at localhost:37777 debugs memory issues.
Performance trade-offs exist: documented speed impact (GitHub Issue #1766), storage growth over time, and 3-5 session ROI threshold—use for long-running projects, debugging, and code reviews, not one-off scripts or performance-critical workflows.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.