Automatic Learning
Hebbrix learns and improves completely automatically. No thumbs up/down required. It just works.
Zero User Effort
Unlike other systems that pester you for feedback, Hebbrix uses automatic quality-based ranking with reinforcement learning. It evaluates its own answers, detects hallucinations, and improves retrieval from every interaction - all behind the scenes.
How It Learns Automatically
Every time Hebbrix answers a question, it runs 6 automatic quality checks. Every time an agent gets corrected, the correction is stored permanently. No human labeling, no thumbs up/down — fully automatic.
Self-Consistency Check
Generates the same answer multiple times. If answers match → high confidence → positive reward.
Bidirectional NLI Grounding
Uses bidirectional Natural Language Inference to verify answers are grounded in actual memories. Checks in both directions to accurately detect contradictions, duplicates, and updates — 92% accuracy on complex memory pairs.
LLM-as-Judge
The AI evaluates its own answer quality: accuracy, completeness, and grounding in facts.
Answer Quality Heuristics
Checks length, specificity, and confidence. Contains numbers/dates? Good. Vague hedging? Bad.
Memory Attribution
Tracks which memories actually contributed to the answer. Good memories get rewarded and surface more often. Unhelpful memories naturally fade.
Retrieval Quality
Measures how well retrieved memories matched the query across vector similarity, keyword matching, and knowledge graph traversal.
Agent Autonomy Features
Three new APIs that make AI agents stop asking users repetitive questions:
Confidence Scoring
Agents call GET /confidence?query=... before acting to check "do I know enough to act without asking the user?" Returns a score and recommendation: act autonomously, proceed with caution, or ask the user.
Correction Memory
When a user corrects an agent, it calls POST /corrections. The correction is stored permanently. Next time any agent handles a similar situation, it checks corrections first — so the same mistake never happens twice.
Decision Outcome Logging
Agents log every decision and its outcome via POST /decisions. Success? Failure? User satisfied? This feeds into the RL training pipeline so the system learns which decisions lead to good outcomes.
What Happens Behind the Scenes
- 1User asks a question
- 2Hebbrix searches memories and generates answer
- 3Automatic reward calculator evaluates answer quality (-1 to +1)
- 4LLM-as-Judge determines which memories contributed
- 5Useful memories get their access_count increased
- 6Next time, better memories appear first!
Safety & Quality Controls
Constitutional Constraints
Hard safety rules the RL system cannot override: never delete frequently-accessed memories, never delete recent memories, rate-limit destructive operations. These protect your data regardless of what the policy learns.
Data Quality Gates
Every training batch is validated before it reaches the policy: reward range checks, variance checks (prevents training collapse), and distribution shift detection. Garbage data never corrupts the learning.
Reward Health Monitoring
Continuous monitoring detects entropy collapse (policy stopped exploring) and reward gaming (agent exploiting the reward function). Training is automatically paused if anomalies are detected.
Per-Request RL Controls
RL features are opt-out per request. Set use_rl_ranking: false or use_rl_reranking: false in your search request to get pure retrieval without any RL influence.
Research-Backed
Our approach implements techniques from cutting-edge 2024-2026 research:
- Memory-R1 (arXiv:2508.19828) — GRPO for memory-based RL
- Mem-alpha (arXiv:2509.25911) — Composite reward for memory construction
- RLSR (2025) — Self-Reward for LLMs
- Self-RAG (ICLR 2024) — Self-Reflective RAG
- DeepSeek-R1 (Nature 2025) — GRPO training methodology
- DAPO (arXiv:2503.14476) — Fixes for GRPO training stability
The Bottom Line
You don't need to do anything. Just use Hebbrix normally, and it gets smarter every day. The system learns from its own performance, not from annoying feedback prompts.
