RL Policies
Think of RL policies like different versions of your AI's brain - as the system learns, it creates improved versions you can test and promote to production.
What You Can Do
Each policy is a snapshot of what the AI learned at a specific time. You can:
- See which version is currently running
- Compare different versions to see which performs better
- Promote a better version to production
- Rollback to a previous version if needed
Safe Deployment Pipeline
New policies go through a 3-stage pipeline before reaching production:
1. Shadow Mode
The new RL policy runs alongside the existing system. Both make decisions, but only the existing system's decisions are used. The new policy's decisions are logged for comparison.
2. Canary Release
The new policy is exposed to 5% → 10% → 25% → 50% → 100% of traffic gradually. Automatic rollback triggers if success rate drops below 95% or latency increases by more than 20%.
3. Production
The policy serves all traffic. Previous version is kept for instant rollback. Constitutional constraints remain active regardless of which policy is running.
When Do You Need This?
Testing New Improvements
Your AI just trained on thousands of new interactions. Before making it live for everyone, check its performance first — shadow mode lets you compare without risk.
A/B Testing
Run experiments by comparing two policy versions side-by-side. See which one gives better search results, more accurate answers, or happier users.
Quick Rollbacks
If a new version isn't working as expected, instantly rollback to the previous stable version — no downtime, automatic if metrics degrade.
Per-Request Control
Opt out of RL influence on any request with use_rl_ranking: false. Get pure retrieval results without any policy influence when you need it.
Available Endpoints
Tier Access
Code Examples
Check Policy Status
from hebbrix import Hebbrix
client = Hebbrix()
# Get current policy status
status = client.policies.get_status()
print(f"Current production version: {status['production_version']}")
print(f"Active since: {status['activated_at']}")
print(f"Performance score: {status['performance_score']}")
# List all available policy versions
policies = client.policies.list()
for policy in policies:
is_active = "ACTIVE" if policy['is_production'] else "AVAILABLE"
print(f"Version: {policy['version']} - {is_active}")Promote a New Policy
# Compare two policy versions first
comparison = client.policies.compare(
baseline_version="v1.2.4", # Current production
candidate_version="v1.2.5", # New candidate
agent_type="memory_manager"
)
print(f"Improvement: {comparison['is_improvement']}")
print(f"Recommendation: {comparison['recommendation']}")
# If new version is better, promote it
if comparison['is_improvement']:
result = client.policies.promote(version="v1.2.5", agent_type="memory_manager")
print(f"Promoted v1.2.5 to production!")
else:
print("New version doesn't perform better. Keeping current version.")Emergency Rollback
# Rollback to previous version
result = client.policies.rollback()
print(f"Rolled back to version: {result['version']}")
print(f"This was the production version from: {result['previous_activated_at']}")Best Practices
Always Compare First
Never promote a new policy without comparing it to the current one.
Monitor After Promotion
Watch for changes in error rates or performance degradation after promoting.
Keep Previous Versions
Don't delete old versions immediately. Keep at least 2-3 stable versions for rollback.
Have a Rollback Plan
Know how to rollback quickly. Save the rollback command somewhere accessible.
Don't Worry, It's Automatic!
Hebbrix creates new RL policies when training is triggered — either on a 6-hour schedule (when enough interaction data exists) or on-demand via POST /v1/policies/train. After training completes, a candidate policy is created. Your job is to decide when to promote a new, better-performing policy to production.
