Queue issues pertaining to inactive TSA daemon post clone

Greetings,
My name is Chris and I work for iPost, I’m already familiar with a few folk in here but am refraining from mentioning to as to not get my message erased, heh.
Kumo version: kumod 2025.05.06-b29689af
Linux version: Rocky Linux 9 - Linux kumo.g005.enterprise.ipost.com 5.14.0-570.23.1.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 26 19:29:53 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Gist containing relevant files: KumoMTA Configuration · GitHub

I spun up the instance in question last year with the gracious assistance of Mike and Tom but due to clients being clients it never saw the action it was supposed to until recently. I saw a strange alert come in from our infrastructure monitoring and upon checking realized that even though the kumomta daemon had been online, the TSA daemon had been offline post cloning, resulting in a few million messages piling up. I re-enabled the daemon but am just getting flooded with errors like the following now:
{"type":"Delayed","id":"63a473fefcf311f0bcd02200c9eed0a8","sender":"errors+9z6zo3n2t6aj22b8e4u4as5geh5k0c968m67baq1gb8@ca.e.veritacanada.com","recipient":"xintan936158@hotmail.com","queue":"http://127.0.0.1:8008.tsa.kumomta","site":"","size":0,"response":{"code":400,"enhanced_code":null,"content":"Context: DueTimeWasReached, ReadyQueueWasFull. Next due in 31s 999ms 993us 110ns at 2026-01-30T00:01:50.825043380Z","command":null},"peer_address":null,"timestamp":1769731278,"created":1769678272,"num_attempts":0,"bounce_classification":"Uncategorized","egress_pool":null,"egress_source":null,"source_address":null,"feedback_report":null,"meta":{},"headers":{},"delivery_protocol":null,"reception_protocol":"LogRecord","nodeid":"e7522a63-613e-45fd-9fae-717702927722"}

Coupled with errors from /var/log/messages just saying that the ready queue is full ad infinitum.

kcli provider-summary --by-pool looks as such at the moment:

http://127.0.0.1:8008.tsa.kumomta      unspecified   624,450 2,686,878 0 5 339,104
outlook-com.olc.protection.outlook.com clientzbe.zad       0         0 0 0  40,048
outlook-com.olc.protection.outlook.com clientzbe.abz       0         0 0 0  37,098
hotmail-com.olc.protection.outlook.com clientzbe.zad       0         0 0 0  33,525
hotmail-com.olc.protection.outlook.com clientzbe.abz       0         0 0 0  30,641
canadianbreadsettlement.com            default             0       334 0 1     350
nam.olc.protection.outlook.com         clientzbe.abz       0         0 0 0       9
eur.olc.protection.outlook.com         clientzbe.zad       0         0 0 0       7
nam.olc.protection.outlook.com         clientzbe.zad       0         0 0 0       6
mx.bell.net                            clientzbe.zad       0         0 0 0       5
live-com.olc.protection.outlook.com    clientzbe.abz       0         0 0 0       3
mx.bellaliant.net                      clientzbe.zad       0         0 0 0       3
mx.bellaliant.net                      clientzbe.abz       0         0 0 0       2
eur.olc.protection.outlook.com         clientzbe.abz       0         0 0 0       1
live-com.olc.protection.outlook.com    clientzbe.zad       0         0 0 0       1
msn-com.olc.protection.outlook.com     clientzbe.abz       0         0 0 0       1
mx.bell.net                            clientzbe.abz       0         0 0 0       1
mx.sympatico.ca                        clientzbe.zad       0         0 0 0       1
``` But... the delivered column deems to be misleading in that it's just 'delivering' back into the TSA queue.

I am unsure how to best proceed in order to get things moving again, assistance would be appreciated.

As an update - the summary now looks as follows:

outlook-com.olc.protection.outlook.com clientzbe.zad 0  0 0 0 40,048
outlook-com.olc.protection.outlook.com clientzbe.abz 0  0 0 0 37,098
hotmail-com.olc.protection.outlook.com clientzbe.zad 0  0 0 0 33,518
hotmail-com.olc.protection.outlook.com clientzbe.abz 0  0 0 0 30,630
canadianbreadsettlement.com            default       0 53 0 0    358
eur.olc.protection.outlook.com         clientzbe.zad 0  0 0 0      7
mx.bell.net                            clientzbe.zad 0  0 0 0      5
live-com.olc.protection.outlook.com    clientzbe.abz 0  0 0 0      3
mx.bellaliant.net                      clientzbe.zad 0  0 0 0      3
mx.bellaliant.net                      clientzbe.abz 0  0 0 0      2
eur.olc.protection.outlook.com         clientzbe.abz 0  0 0 0      1
live-com.olc.protection.outlook.com    clientzbe.zad 0  0 0 0      1
msn-com.olc.protection.outlook.com     clientzbe.abz 0  0 0 0      1
mx.bell.net                            clientzbe.abz 0  0 0 0      1
mx.sympatico.ca                        clientzbe.zad 0  0 0 0      1```

Bart had ran `kcli rebind --domain live.ca --set queue=clientzbe.abe@live.ca --reason` which allowed for a few successful outbounds but otherwise not much has moved. The backlog for the TSA is gone so I'm guessing it worked through that.

Hi Chris. “Delay” logs are typically generated when a message cannot be immediately placed in a queue but also can’t be scheduled. You can filter these out from logs and also prevent them from shipping to TSA.

We recommend monitoring Kumo-tsa-daemon as well as kumod because that are separate processes and one will not “kick” the other.

I am mobile so have not looked at the configs yet, but I am guessing that you have some severely constrained shaping that is preventing queue insertion. If you have Prometheus and grafana wired, there may be some telling graphs there.

I would also recommend updating to the latest stable version

@veracious-lemur Also, since you are a sponsor, a) Thank you for your support, b) I’ll answer this in your private channel.