Event Bus Architecture
This page provides a comprehensive deep dive into OmniDaemon’s event bus architecture, focusing on the Redis Streams implementation - the current production-ready backend.What is the Event Bus?
The event bus is the message broker that delivers events from publishers to subscribers (agents). Think of it as the “nervous system” of your AI agent infrastructure. Key Responsibilities:- 📨 Receive events from publishers
- 🚀 Deliver events to appropriate subscribers
- 💾 Persist messages for durability
- 🔄 Handle retries when processing fails
- ⚖️ Load balance across multiple agent instances
- 💀 Manage DLQ for failed messages
- 📊 Track metrics for observability
Redis Streams Implementation
Overview
Redis Streams is a data structure that acts as an append-only log with powerful consumption semantics. It’s perfect for event-driven architectures. Why Redis Streams?- ✅ Durable - Messages persisted to disk
- ✅ Reliable - At-least-once delivery guaranteed
- ✅ Fast - Sub-millisecond latency
- ✅ Scalable - Handles millions of messages/second
- ✅ Consumer Groups - Built-in load balancing
- ✅ Reclaiming - Automatic failure recovery
- ✅ Simple - No separate broker to manage
Architecture Diagram
Core Components
1. Redis Stream
A Redis Stream is an append-only log where each entry has: Structure:default_maxlen: 10,000 messages (approximate)reclaim_interval: 30 secondsdefault_reclaim_idle_ms: 180,000 ms (3 minutes)default_dlq_retry_limit: 3 retries
2. Consumer Groups
A consumer group is a set of consumers that process messages from the same stream. Redis ensures each message is delivered to only ONE consumer in the group. Creation:- ✅ Load Balancing - Messages distributed across consumers
- ✅ Fault Tolerance - If consumer dies, another picks up
- ✅ At-Least-Once - Messages not lost even if consumer crashes
- ✅ Position Tracking - Group remembers last processed message
3. Message Publishing
Publishing Process:-
Event Created:
-
Serialized to JSON:
-
Added to Stream:
maxlen: Maximum stream length (older messages trimmed)approximate: If True, trimming is faster but less precise (~)
4. Message Consumption
Consumption Loop:groupname: Consumer group nameconsumername: Individual consumer name (unique within group)streams: Dict of{stream_name: start_id}">"= Only NEW messages (not pending)"0"= All messages from beginning
count: Max messages per read (batch size)block: Milliseconds to block waiting (0 = don’t block)
5. Pending Entries List (PEL)
The Pending Entries List tracks messages that have been delivered to consumers but not yet acknowledged. Why PEL?- ✅ Failure Recovery - Know which messages weren’t processed
- ✅ Reclaiming - Reassign stuck messages to other consumers
- ✅ Monitoring - See what’s in-flight
6. Message Reclaiming
If a consumer crashes or takes too long, its messages are reclaimed by another consumer. Reclaim Loop:reclaim_idle_ms: How long before reclaiming (default: 180,000 ms = 3 min)reclaim_interval: How often to check (default: 30 seconds)dlq_retry_limit: Max retries before DLQ (default: 3)
7. Dead Letter Queue (DLQ)
Messages that fail repeatedly go to the Dead Letter Queue for manual inspection. DLQ Stream:8. Acknowledgment (XACK)
Acknowledging a message tells Redis it was successfully processed and can be removed from the PEL. When to Acknowledge:- ✅ Removes message from PEL
- ✅ Message no longer redelivered
- ✅ Group’s “last delivered ID” advances
9. Metrics Tracking
All events are tracked in a dedicated metrics stream. Metrics Stream:Redis Commands Used
XADD - Add Message to Stream
- Stream name
*= Auto-generate message ID- Field-value pairs
1678901234567-0)
XREADGROUP - Read from Group
GROUPgroup consumerCOUNT- Max messagesBLOCK- Timeout (ms)STREAMS- Stream and start ID (>= new only)
XACK - Acknowledge Message
- Stream name
- Group name
- Message ID(s)
XPENDING - Get Pending Messages
- Stream name
- Group name
- Start ID (
-= beginning) - End ID (
+= end) - Count
XCLAIM - Reclaim Message
- Stream name
- Group name
- New consumer name
- Min idle time (ms)
- Message ID(s)
XGROUP CREATE - Create Consumer Group
CREATEstream group start-idMKSTREAM- Create stream if doesn’t exist
XGROUP DESTROY - Delete Consumer Group
XINFO GROUPS - List Consumer Groups
Performance Characteristics
Throughput
Single Redis Instance:- Publishing: ~100,000 messages/second
- Consuming: ~50,000 messages/second per consumer
- With Persistence: ~50,000 messages/second
- Publishing: >1,000,000 messages/second
- Consuming: Scales linearly with consumers
Latency
End-to-End Latency:- Publish to Deliver: <10ms (typical)
- Publish to Process: <50ms (typical)
- With Network: <100ms (typical)
- XADD: <1ms
- XREADGROUP: <1ms
- Network: 1-10ms
- Callback execution: Variable (depends on your agent)
Memory Usage
Per Message:- ~1 KB average (depends on payload size)
- Minimal (<10% of message size)
- Stored in memory
- ~100 bytes per pending message
Persistence
AOF (Append-Only File):- Every write logged to disk
- Slower but safer (no data loss)
- Recommended for production
- Periodic snapshots
- Faster but risk of data loss
- OK for development
Monitoring Event Bus
Via CLI
Via SDK
Via Redis CLI
Advanced Configuration
Custom Consumer Count
- 5 consumers in the group
- Load distributed across all 5
- Higher throughput
Custom Reclaim Settings
Custom Retry Limit
Custom Stream Max Length
Best Practices
1. Choose Appropriate reclaim_idle_ms
2. Set Appropriate max_retries
3. Monitor DLQ
4. Configure Persistence
5. Scale with Consumer Count
Troubleshooting
Messages Not Being Delivered
Messages Stuck in Pending
High DLQ Count
Slow Processing
Further Reading
- Event-Driven Architecture - Why EDA for AI agents
- Agent Lifecycle - Registration to deletion
- Pluggable Architecture - Switch backends
- Redis Streams Documentation - Official Redis docs
Summary
Key Components:- Redis Stream - Append-only log for messages
- Consumer Group - Load balancing and fault tolerance
- Pending Entries List (PEL) - Track unacknowledged messages
- Message Reclaiming - Automatic failure recovery
- Dead Letter Queue (DLQ) - Failed message storage
XADD- Publish messageXREADGROUP- Consume messageXACK- Acknowledge messageXCLAIM- Reclaim messageXPENDING- Check pending messages
default_maxlen: 10,000 messagesreclaim_interval: 30 secondsdefault_reclaim_idle_ms: 180,000 ms (3 min)default_dlq_retry_limit: 3 retries
- Throughput: ~100K msgs/sec (single instance)
- Latency: <10ms (typical)
- Memory: ~1KB per message