Event System Dashboard

Last updated: March 25, 2026
Admin Tools

Event System Dashboard

What is the Event System Dashboard?

The Event System Dashboard provides a real-time overview of Sampo's event-driven architecture - a system where different parts of the application communicate by sending and receiving events rather than direct API calls.

Why Events Matter

Think of events like a notification system:

  • When a new listing is created, an event is emitted
  • When a user updates their profile, an event is emitted
  • When a submission is converted to a listing, an event is emitted

These events trigger automated actions like:

  • Sending email notifications
  • Updating search indexes
  • Logging audit trails
  • Syncing data to external systems

Dashboard Overview

┌─────────────────────────────────────────────────────────────┐
│                    EVENT SYSTEM DASHBOARD                    │
├─────────────────────────────────────────────────────────────┤
│  [Stats Cards]    [Event Rate Chart]    [Latency Gauge]     │
│  • Total Events   • Events/min over   • P50/P95/P99        │
│  • Error Rate       time              processing times     │
│  • Avg Latency                                              │
├─────────────────────────────────────────────────────────────┤
│  [Top Event Types]          [Recent Failed Events]           │
│  • Most frequent events     • Errors needing attention       │
│  • Volume breakdown         • Quick links to Dead Letter     │
└─────────────────────────────────────────────────────────────┘

Understanding the Metrics

Stats Cards (Top Row)

Total Events

  • What it shows: Total number of events emitted in the current time period
  • Normal range: Varies by deployment activity (100-10,000+ per hour)
  • When to worry: Sudden drops to near zero indicate system issues
  • Green/Yellow/Red: Based on comparison to historical averages

Error Rate

  • What it shows: Percentage of events that failed processing
  • Formula: (Failed Events / Total Events) × 100
  • Normal: < 1%
  • Warning: 1-5% (yellow)
  • Critical: > 5% (red) - investigate immediately

Common causes of high error rate:

  • Database connection issues
  • External API failures (email service, QuickBooks, etc.)
  • Memory exhaustion
  • Code bugs in event handlers

Average Latency

  • What it shows: Average time to process an event (in milliseconds)
  • Normal: < 100ms for most events
  • Warning: 100-500ms
  • Critical: > 500ms

High latency indicates:

  • Slow database queries
  • External API delays
  • High system load
  • Inefficient event handlers

Event Rate Chart

What it shows: Events per minute over the last hour

How to read it:

  • Steady line: Normal operation
  • Spikes: High activity (e.g., bulk imports, marketing campaigns)
  • Drops: Potential issues or low activity periods
  • Pattern recognition: Look for daily/weekly patterns

Example scenarios:

  • Morning spike at 9 AM: Users starting work
  • Weekend dips: Lower business activity
  • Sudden flatline: System outage

Latency Gauge

What it shows: Processing time percentiles

  • P50 (50th percentile): Half of events process faster than this
  • P95 (95th percentile): 95% of events process faster than this
  • P99 (99th percentile): 99% of events process faster than this

Why percentiles matter:

  • Average (mean) can be misleading - one slow event skews it
  • P95/P99 show you the "worst case" experience
  • If P99 is 5 seconds, 1% of users wait 5+ seconds

Target values:

  • P50: < 50ms
  • P95: < 200ms
  • P99: < 500ms

Top Event Types

What it shows: The 5 most frequently emitted events

Example:

1. listing.created        - 1,234 events (45%)
2. user.updated           - 567 events (21%)
3. submission.converted   - 234 events (9%)
4. order.created          - 123 events (5%)
5. email.sent             - 89 events (3%)

How to use this:

  • Identify high-volume events that might need optimization
  • Spot unusual patterns (e.g., 10x normal user.login events = potential attack)
  • Plan capacity based on event volume

Recent Failed Events

What it shows: Last 10 events that failed to process

Columns:

  • Event Type: What kind of event failed
  • Error: Brief error message
  • Time: When it failed
  • Actions: Quick links to retry or view details

When to act:

  • Any failed event should be investigated
  • Multiple failures of same type = systematic issue
  • Click "View in Dead Letter Queue" for detailed analysis

How to Use This Page

Daily Health Check (2 minutes)

  1. Check Error Rate - Should be < 1%
  2. Check Latency - P95 should be < 200ms
  3. Scan Recent Failures - Any new failures?
  4. Review Top Events - Any unusual volumes?

When Investigating Issues

Scenario 1: Users report slow performance

  1. Check Latency Gauge - are P95/P99 high?
  2. Check Event Rate Chart - spike in volume?
  3. Check Top Event Types - which events are slow?
  4. Click through to Event Metrics for detailed analysis

Scenario 2: Error alert triggered

  1. Note the error rate percentage
  2. Check Recent Failed Events section
  3. Click "View in Dead Letter Queue" for full details
  4. Identify pattern (same event type? same error message?)
  5. Retry failed events after fixing root cause

Scenario 3: Unusual activity detected

  1. Check Event Rate Chart for spikes
  2. Review Top Event Types for unexpected volumes
  3. Compare to historical patterns
  4. Investigate source (marketing campaign? bot traffic?)

Common Questions

Q: Why are some events showing as "failed"?

A: Events fail when:

  • Database is temporarily unavailable
  • External API (email, QuickBooks) times out
  • Event handler has a bug
  • System is under heavy load

Fix: Go to Dead Letter Queue to retry after resolving the issue.

Q: What does "P95 latency" mean?

A: 95% of events process faster than this time. If P95 is 200ms, then 95 out of 100 events complete in under 200ms.

Q: Why is my error rate 0% but users are complaining?

A: Check the Latency Gauge. Events might be succeeding but taking too long (slow performance vs. failures).

Q: Can I see individual events?

A: This dashboard shows aggregated statistics. For individual events with trace IDs and payloads, use the Event Flow page (Phase 3 will add database persistence for full event history).

Q: How far back does the data go?

A: Currently shows last hour of in-memory statistics. For historical trends, use the Event Metrics page with longer time ranges.


Best Practices

✅ Do

  • Check this page daily as part of system health monitoring
  • Investigate any error rate above 1%
  • Use latency metrics to identify performance degradation
  • Click through to Dead Letter Queue for failed event details
  • Export metrics before system maintenance for baseline comparison

❌ Don't

  • Ignore yellow/red indicators - they indicate real issues
  • Retry failed events without fixing the root cause
  • Assume zero errors means perfect performance (check latency too)
  • Panic over single event failures - look for patterns

Related Pages


Need Help?

If you see:

  • Error rate > 10%: Contact engineering immediately
  • Latency P99 > 5 seconds: System under severe stress
  • Zero events for > 5 minutes: Potential system outage
  • Same error repeating: Check Dead Letter Queue for details

Was this article helpful?

Your feedback helps us improve our support content.

Still need assistance?

Our support team is ready to help you with more complex issues.

Contact Support