Dead Letter Queue

Last updated: March 25, 2026

Admin Tools

Dead Letter Queue

What is the Dead Letter Queue?

The Dead Letter Queue (DLQ) is a holding area for events that failed to process successfully. Think of it like a "problem bin" where failed events go so they don't get lost - you can review them, fix issues, and retry them later.

Why Events Fail

Events end up in the DLQ when:

Database is temporarily down - Event can't save data
External API fails - Email service, QuickBooks, or other integrations timeout
Code bugs - Event handler crashes due to unexpected data
Network issues - Can't reach required services
Resource exhaustion - System too busy to process

The DLQ Lifecycle

Event Emitted → Processing Attempt → FAILED → Dead Letter Queue
                                                    ↓
              ┌─────────────────────────────────────┼─────────────────┐
              ↓                                     ↓                 ↓
         [RETRY]                              [RESOLVE]          [DISCARD]
              ↓                                     ↓                 ↓
    Try processing again              Mark as handled            Delete permanently
    (after fixing issue)              (won't retry)              (data loss)

Understanding the Interface

Main Table Columns

Column	What It Means	Example
Event Type	What kind of event failed	`listing.created`, `email.sent`
Error	Why it failed (short version)	`Connection timeout`, `Null pointer`
Listener	Which handler failed	`EmailNotificationListener`
Retries	How many times we tried	`2/3` (2 attempts, 3 max)
Created	When the event originally happened	`Mar 25, 2026 2:30 PM`
Actions	What you can do	Retry, Resolve, Discard

Status Indicators

🔴 Red badge: Failed and unresolved (needs attention)
🟡 Yellow badge: Failed but retry scheduled
🟢 Green badge: Resolved (manually marked as handled)

How to Use This Page

Scenario 1: Email Service Was Down

Problem: External email API was down for 30 minutes, 50 email.sent events failed.

Solution:

Go to Dead Letter Queue
Filter by Event Type: email.sent
Select all 50 events (checkboxes)
Click "Retry Selected"
Confirm in dialog
Events will be re-processed (emails will send)

Result: ✅ All emails sent successfully

Scenario 2: Database Connection Issue

Problem: Database was restarting, multiple event types failed.

Solution:

Check that database is back online
Go to Dead Letter Queue
Filter by "Unresolved Only"
Click "Retry All Unresolved" button
Monitor Event System Dashboard for error rate drop

Result: ✅ Events process normally now

Scenario 3: Bug in Event Handler

Problem: listing.created events failing with TypeError: Cannot read property 'id' of null

Solution:

Don't retry yet! - It will just fail again
Click on failed event to see full error
Note the error pattern
Report to engineering with:
- Event type: listing.created
- Error: TypeError: Cannot read property 'id' of null
- Count: 23 events affected
- Time range: Last 2 hours
Wait for code fix
After fix deployed, retry failed events

Result: ✅ Events process successfully after bug fix

Scenario 4: One-Time Data Issue

Problem: Single event failed due to bad data that can't be fixed.

Solution:

Review event details (click to expand)
Confirm it's an isolated issue
Click "Resolve" to mark as handled
Event stays in queue but marked resolved

Result: ✅ Queue cleaned up, event won't retry

Action Buttons Explained

🔁 Retry

What it does: Attempts to process the event again

When to use:

Temporary issue resolved (database back up, API restored)
Code bug fixed
Network connectivity restored

When NOT to use:

Issue not fixed yet (will just fail again)
Data is permanently bad

Bulk retry: Select multiple events with checkboxes, then "Retry Selected"

✓ Resolve

What it does: Marks event as "handled" - won't retry, stays in queue for record

When to use:

One-time data issue that can't be fixed
Event is no longer relevant (expired, superseded)
You've handled the issue manually outside the system

When NOT to use:

Issue is fixable (use Retry instead)
You want to delete the record (use Discard)

🗑️ Discard

What it does: Permanently deletes the event from the queue

⚠️ WARNING: This is irreversible! Data is lost.

When to use:

Confirmed the event is garbage (test data, duplicate)
Storage space concerns (very large queue)
Privacy/GDPR compliance (must delete)

When NOT to use:

You might need the event later (use Resolve instead)
Not sure what the event is
Production events (unless confirmed safe)

Filtering and Search

Filter Options

Event Type: Show only specific events

Example: listing.created to see only listing creation failures

Date Range: Events from specific time period

Useful for: "Show me yesterday's failures"

Retry Count: Events by number of attempts

High retry count = persistent issue

Unresolved Only: Hide already-resolved events

Default view - shows what needs attention

Search

Search by:

Error message text
Trace ID (if you have it from logs)
Listener name

Example searches:

timeout - Find all timeout errors
QuickBooks - Find QuickBooks integration failures
trace-abc-123 - Find specific event by trace ID

Reading Error Messages

Common Errors and Solutions

Error	Meaning	Solution
`ECONNREFUSED`	Can't connect to service	Check if database/external API is up
`ETIMEDOUT`	Connection timed out	Service slow/overloaded, retry later
`404 Not Found`	Resource doesn't exist	Data inconsistency, may need manual fix
`500 Internal Server Error`	External service crashed	Wait for external service to recover
`TypeError: Cannot read property`	Code bug	Report to engineering, don't retry
`Validation failed`	Bad data	Check data format, may need manual correction
`Rate limit exceeded`	Too many requests	Wait and retry, or spread out load

Best Practices

✅ Do

Check DLQ daily as part of system health routine
Retry in bulk after known outages (database restart, API maintenance)
Filter by event type to identify systematic issues
Export to CSV for analysis or reporting
Resolve events after handling them to keep queue clean
Monitor retry counts - events with 3+ retries need investigation

❌ Don't

Don't ignore the queue - failed events indicate real problems
Don't retry before fixing - wastes resources, creates noise
Don't discard without understanding - you might lose important data
Don't resolve without action - just hides the problem
Don't panic over single failures - look for patterns and volume

Metrics to Watch

Queue Size

Normal: 0-10 events
Warning: 10-50 events (investigate)
Critical: 50+ events (urgent action needed)

Age of Oldest Event

Normal: < 1 hour
Warning: 1-24 hours
Critical: > 24 hours (stale events may be irrelevant)

Event Type Distribution

If 90% of failures are one event type:

That handler likely has a bug
Prioritize fixing that specific issue

Exporting Data

Export to CSV when:

Reporting to management
Analyzing patterns in spreadsheet
Sharing with engineering team
Creating incident documentation

Export includes:

Event type
Error message
Timestamp
Retry count
Listener name

Common Workflows

Daily Health Check (2 minutes)

Open Dead Letter Queue
Check "Unresolved Only" filter is on
Note queue size (should be < 10)
Scan error types for patterns
If events present, decide: Retry, Resolve, or Escalate

Post-Incident Cleanup

After system outage resolved
Filter by time range (during outage)
Select all events
Bulk retry
Monitor Event System Dashboard for success
Resolve any that fail again (need individual attention)

Weekly Analysis

Export queue to CSV
Open in spreadsheet
Create pivot table by event type and error
Identify top 3 failure patterns
Report to engineering for prioritization

Troubleshooting

Q: I retried events but they're still failing

A: The underlying issue isn't fixed. Check:

Is the database actually back up?
Did the external API recover?
Is there a code bug that needs deployment?

Q: Can I see the full event payload?

A: Click the event row to expand. You'll see:

Full error stack trace
Event payload (JSON data)
Metadata (user ID, deployment, timestamp)

Q: What's the difference between Resolve and Discard?

Resolve: Keeps the record, marks as handled, won't retry
Discard: Permanently deletes the record (irreversible)

Use Resolve for audit trail, Discard only for garbage cleanup.

Q: How long do events stay in the queue?

A: Until you Retry, Resolve, or Discard them. There's no automatic expiration (by design - you shouldn't lose failed events).

Q: Can I retry events from last week?

A: Yes, but consider:

Data might be stale (user already took alternative action)
Side effects might be unexpected (duplicate emails)
Review event details before retrying old events

Event System Dashboard - Overview of event health
Event Flow - Real-time event monitoring
Event Metrics - Detailed analytics
System Alerts - Automated failure notifications

Emergency Procedures

🚨 Queue Growing Rapidly (>100 events/hour)

Don't panic - events are safely queued
Check Event System Dashboard for error rate
Identify affected event types
Check system status (database, external APIs)
If widespread outage: wait for recovery, then bulk retry
If specific event type: escalate to engineering

🚨 All Events Failing (100% error rate)

Critical system issue
Check database connectivity
Check external service status
Review recent deployments
Contact engineering immediately
Do NOT retry until root cause identified

Remember: The Dead Letter Queue is your safety net. Events here aren't lost - they're waiting for you to help them succeed!

Dead Letter Queue

What is the Dead Letter Queue?

Why Events Fail

The DLQ Lifecycle

Understanding the Interface

Main Table Columns

Status Indicators

How to Use This Page

Scenario 1: Email Service Was Down

Scenario 2: Database Connection Issue

Scenario 3: Bug in Event Handler

Scenario 4: One-Time Data Issue

Action Buttons Explained

🔁 Retry

✓ Resolve

🗑️ Discard

Filtering and Search

Filter Options

Search

Reading Error Messages

Common Errors and Solutions

Best Practices

✅ Do

❌ Don't

Metrics to Watch

Queue Size

Age of Oldest Event

Event Type Distribution

Exporting Data

Common Workflows

Daily Health Check (2 minutes)

Post-Incident Cleanup

Weekly Analysis

Troubleshooting

Q: I retried events but they're still failing

Q: Can I see the full event payload?

Q: What's the difference between Resolve and Discard?

Q: How long do events stay in the queue?

Q: Can I retry events from last week?

Related Pages

Emergency Procedures

🚨 Queue Growing Rapidly (>100 events/hour)

🚨 All Events Failing (100% error rate)