Dead Letter Queue

Last updated: March 25, 2026
Admin Tools

Dead Letter Queue

What is the Dead Letter Queue?

The Dead Letter Queue (DLQ) is a holding area for events that failed to process successfully. Think of it like a "problem bin" where failed events go so they don't get lost - you can review them, fix issues, and retry them later.

Why Events Fail

Events end up in the DLQ when:

  • Database is temporarily down - Event can't save data
  • External API fails - Email service, QuickBooks, or other integrations timeout
  • Code bugs - Event handler crashes due to unexpected data
  • Network issues - Can't reach required services
  • Resource exhaustion - System too busy to process

The DLQ Lifecycle

Event Emitted → Processing Attempt → FAILED → Dead Letter Queue
                                                    ↓
              ┌─────────────────────────────────────┼─────────────────┐
              ↓                                     ↓                 ↓
         [RETRY]                              [RESOLVE]          [DISCARD]
              ↓                                     ↓                 ↓
    Try processing again              Mark as handled            Delete permanently
    (after fixing issue)              (won't retry)              (data loss)

Understanding the Interface

Main Table Columns

ColumnWhat It MeansExample
Event TypeWhat kind of event failedlisting.created, email.sent
ErrorWhy it failed (short version)Connection timeout, Null pointer
ListenerWhich handler failedEmailNotificationListener
RetriesHow many times we tried2/3 (2 attempts, 3 max)
CreatedWhen the event originally happenedMar 25, 2026 2:30 PM
ActionsWhat you can doRetry, Resolve, Discard

Status Indicators

  • 🔴 Red badge: Failed and unresolved (needs attention)
  • 🟡 Yellow badge: Failed but retry scheduled
  • 🟢 Green badge: Resolved (manually marked as handled)

How to Use This Page

Scenario 1: Email Service Was Down

Problem: External email API was down for 30 minutes, 50 email.sent events failed.

Solution:

  1. Go to Dead Letter Queue
  2. Filter by Event Type: email.sent
  3. Select all 50 events (checkboxes)
  4. Click "Retry Selected"
  5. Confirm in dialog
  6. Events will be re-processed (emails will send)

Result: ✅ All emails sent successfully


Scenario 2: Database Connection Issue

Problem: Database was restarting, multiple event types failed.

Solution:

  1. Check that database is back online
  2. Go to Dead Letter Queue
  3. Filter by "Unresolved Only"
  4. Click "Retry All Unresolved" button
  5. Monitor Event System Dashboard for error rate drop

Result: ✅ Events process normally now


Scenario 3: Bug in Event Handler

Problem: listing.created events failing with TypeError: Cannot read property 'id' of null

Solution:

  1. Don't retry yet! - It will just fail again
  2. Click on failed event to see full error
  3. Note the error pattern
  4. Report to engineering with:
    • Event type: listing.created
    • Error: TypeError: Cannot read property 'id' of null
    • Count: 23 events affected
    • Time range: Last 2 hours
  5. Wait for code fix
  6. After fix deployed, retry failed events

Result: ✅ Events process successfully after bug fix


Scenario 4: One-Time Data Issue

Problem: Single event failed due to bad data that can't be fixed.

Solution:

  1. Review event details (click to expand)
  2. Confirm it's an isolated issue
  3. Click "Resolve" to mark as handled
  4. Event stays in queue but marked resolved

Result: ✅ Queue cleaned up, event won't retry


Action Buttons Explained

🔁 Retry

What it does: Attempts to process the event again

When to use:

  • Temporary issue resolved (database back up, API restored)
  • Code bug fixed
  • Network connectivity restored

When NOT to use:

  • Issue not fixed yet (will just fail again)
  • Data is permanently bad

Bulk retry: Select multiple events with checkboxes, then "Retry Selected"


✓ Resolve

What it does: Marks event as "handled" - won't retry, stays in queue for record

When to use:

  • One-time data issue that can't be fixed
  • Event is no longer relevant (expired, superseded)
  • You've handled the issue manually outside the system

When NOT to use:

  • Issue is fixable (use Retry instead)
  • You want to delete the record (use Discard)

🗑️ Discard

What it does: Permanently deletes the event from the queue

⚠️ WARNING: This is irreversible! Data is lost.

When to use:

  • Confirmed the event is garbage (test data, duplicate)
  • Storage space concerns (very large queue)
  • Privacy/GDPR compliance (must delete)

When NOT to use:

  • You might need the event later (use Resolve instead)
  • Not sure what the event is
  • Production events (unless confirmed safe)

Filtering and Search

Filter Options

Event Type: Show only specific events

  • Example: listing.created to see only listing creation failures

Date Range: Events from specific time period

  • Useful for: "Show me yesterday's failures"

Retry Count: Events by number of attempts

  • High retry count = persistent issue

Unresolved Only: Hide already-resolved events

  • Default view - shows what needs attention

Search

Search by:

  • Error message text
  • Trace ID (if you have it from logs)
  • Listener name

Example searches:

  • timeout - Find all timeout errors
  • QuickBooks - Find QuickBooks integration failures
  • trace-abc-123 - Find specific event by trace ID

Reading Error Messages

Common Errors and Solutions

ErrorMeaningSolution
ECONNREFUSEDCan't connect to serviceCheck if database/external API is up
ETIMEDOUTConnection timed outService slow/overloaded, retry later
404 Not FoundResource doesn't existData inconsistency, may need manual fix
500 Internal Server ErrorExternal service crashedWait for external service to recover
TypeError: Cannot read propertyCode bugReport to engineering, don't retry
Validation failedBad dataCheck data format, may need manual correction
Rate limit exceededToo many requestsWait and retry, or spread out load

Best Practices

✅ Do

  • Check DLQ daily as part of system health routine
  • Retry in bulk after known outages (database restart, API maintenance)
  • Filter by event type to identify systematic issues
  • Export to CSV for analysis or reporting
  • Resolve events after handling them to keep queue clean
  • Monitor retry counts - events with 3+ retries need investigation

❌ Don't

  • Don't ignore the queue - failed events indicate real problems
  • Don't retry before fixing - wastes resources, creates noise
  • Don't discard without understanding - you might lose important data
  • Don't resolve without action - just hides the problem
  • Don't panic over single failures - look for patterns and volume

Metrics to Watch

Queue Size

  • Normal: 0-10 events
  • Warning: 10-50 events (investigate)
  • Critical: 50+ events (urgent action needed)

Age of Oldest Event

  • Normal: < 1 hour
  • Warning: 1-24 hours
  • Critical: > 24 hours (stale events may be irrelevant)

Event Type Distribution

If 90% of failures are one event type:

  • That handler likely has a bug
  • Prioritize fixing that specific issue

Exporting Data

Export to CSV when:

  • Reporting to management
  • Analyzing patterns in spreadsheet
  • Sharing with engineering team
  • Creating incident documentation

Export includes:

  • Event type
  • Error message
  • Timestamp
  • Retry count
  • Listener name

Common Workflows

Daily Health Check (2 minutes)

  1. Open Dead Letter Queue
  2. Check "Unresolved Only" filter is on
  3. Note queue size (should be < 10)
  4. Scan error types for patterns
  5. If events present, decide: Retry, Resolve, or Escalate

Post-Incident Cleanup

  1. After system outage resolved
  2. Filter by time range (during outage)
  3. Select all events
  4. Bulk retry
  5. Monitor Event System Dashboard for success
  6. Resolve any that fail again (need individual attention)

Weekly Analysis

  1. Export queue to CSV
  2. Open in spreadsheet
  3. Create pivot table by event type and error
  4. Identify top 3 failure patterns
  5. Report to engineering for prioritization

Troubleshooting

Q: I retried events but they're still failing

A: The underlying issue isn't fixed. Check:

  • Is the database actually back up?
  • Did the external API recover?
  • Is there a code bug that needs deployment?

Q: Can I see the full event payload?

A: Click the event row to expand. You'll see:

  • Full error stack trace
  • Event payload (JSON data)
  • Metadata (user ID, deployment, timestamp)

Q: What's the difference between Resolve and Discard?

A:

  • Resolve: Keeps the record, marks as handled, won't retry
  • Discard: Permanently deletes the record (irreversible)

Use Resolve for audit trail, Discard only for garbage cleanup.

Q: How long do events stay in the queue?

A: Until you Retry, Resolve, or Discard them. There's no automatic expiration (by design - you shouldn't lose failed events).

Q: Can I retry events from last week?

A: Yes, but consider:

  • Data might be stale (user already took alternative action)
  • Side effects might be unexpected (duplicate emails)
  • Review event details before retrying old events

Related Pages


Emergency Procedures

🚨 Queue Growing Rapidly (>100 events/hour)

  1. Don't panic - events are safely queued
  2. Check Event System Dashboard for error rate
  3. Identify affected event types
  4. Check system status (database, external APIs)
  5. If widespread outage: wait for recovery, then bulk retry
  6. If specific event type: escalate to engineering

🚨 All Events Failing (100% error rate)

  1. Critical system issue
  2. Check database connectivity
  3. Check external service status
  4. Review recent deployments
  5. Contact engineering immediately
  6. Do NOT retry until root cause identified

Remember: The Dead Letter Queue is your safety net. Events here aren't lost - they're waiting for you to help them succeed!

Was this article helpful?

Your feedback helps us improve our support content.

Still need assistance?

Our support team is ready to help you with more complex issues.

Contact Support