Context Engine
The context engine enhances your AI interactions through three mechanisms: conversation compression, cross-profile context sharing, and local RAG (Retrieval-Augmented Generation).
Conversation Compression
Section titled “Conversation Compression”When conversations grow beyond a token threshold, Claudex uses an LLM to summarize older messages, keeping recent ones intact.
[context.compression]enabled = truethreshold_tokens = 50000 # compress when total tokens exceed thiskeep_recent = 10 # always keep the last N messagesprofile = "openrouter" # reuse a profile's base_url + api_keymodel = "qwen/qwen-2.5-7b-instruct" # override model (optional)How It Works
Section titled “How It Works”- Before forwarding a request, Claudex estimates total token count
- If tokens exceed
threshold_tokens, older messages (beyondkeep_recent) are replaced with a summary - The summary is generated by the configured local LLM
- The compressed conversation is then forwarded to the provider
Cross-Profile Sharing
Section titled “Cross-Profile Sharing”Share context across different provider profiles within the same session.
[context.sharing]enabled = truemax_context_size = 2000 # max tokens to inject from other profilesThis is useful when switching between providers mid-task — relevant context from previous interactions is automatically included.
Local RAG
Section titled “Local RAG”Index local code and documentation for retrieval-augmented generation. Relevant code snippets are automatically injected into requests.
[context.rag]enabled = trueindex_paths = ["./src", "./docs"] # directories to indexprofile = "openrouter" # reuse a profile's base_url + api_keymodel = "openai/text-embedding-3-small" # embedding modelchunk_size = 512 # text chunk sizetop_k = 5 # number of results to injectHow It Works
Section titled “How It Works”- On startup, Claudex indexes files in
index_pathsusing the embedding model - For each request, the user’s message is embedded and compared against the index
- The top-k most relevant chunks are injected as additional context in the request
- The provider receives richer context about your codebase