After updating to kumod 2025.01.29-833f82a8 I started getting these errors in the logs:
Feb 28 11:11:42 send.ahasend.com kumod[3824241]: 2025-02-28T11:11:42.590263Z ERROR spoolin-61 kumod::spool: failed to insert Message 3b8329e0f54b11efbd819c6b004d4752 to queue webhook.log_hook: invalid ThrottleSpec `0/day`: limit must be greater than 0!
Feb 28 11:11:42 send.ahasend.com kumod[3824241]: stack traceback:
Feb 28 11:11:42 send.ahasend.com kumod[3824241]: [C]: in function 'kumo.make_throttle'
Feb 28 11:11:42 send.ahasend.com kumod[3824241]: /opt/kumomta/etc/policy/ahasend.lua:379: in function 'ahasend.per_tenant_throttle'
Feb 28 11:11:42 send.ahasend.com kumod[3824241]: [string "/opt/kumomta/etc/policy/init.lua"]:282: in function <[string "/opt/kumomta/etc/policy/init.lua"]:268>
Feb 28 11:11:42 send.ahasend.com kumod[3824241]: . Ignoring message until kumod is restarted
I don’t want to throttle webhooks, and I don’t have a throttle set for it in the sqlite database (but I do have some valid value for all tenants), so I updated the config to specifically check for invalid values (nil or 0) and set some high value if the value is not valid as a temporary workaround. But even with this change, I still see ~100k webhook messages in stuck in the queue in the outpuut of kcli queue-summary.
Is there a way to see what throttles are set for each queue? I’m trying to figure out the best way for debugging this.
-- in init.lua
kumo.on('throttle_insert_ready_queue', function(msg)
local ok, tenant = pcall(function()
return msg:get_meta('tenant')
end)
local ok2, direction = pcall(function()
return msg:get_meta('direction')
end)
if ok2 and direction == 'inbound' then
return
end
if ok then
local throttle = ahasend.per_tenant_throttle(tenant)
throttle:delay_message_if_throttled(msg)
end
end)
-- in ahasend.lua. Excuse the mess, I've been trying to get around the issue by checking for all sorts of invalid values as a temporary workaround.
local function get_tenant_throttle(tenant_id)
if tenant_id == nil then
return "1000000000/hour"
end
local rate = '1000/day'
local ok, db = pcall(sqlite.open, "/opt/kumomta/etc/policy/config.db")
if not ok then
return rate
end
local ok, result = pcall(function ()
return db:execute("SELECT throttle FROM accounts WHERE id = :id", {
id = tenant_id
})
end)
if not ok then
return rate
end
if tenant_id == nil or result[1] == nil then
return "10000000/hour"
end
if result[1] == 0 then
return "10000000/hour"
end
return result[1] .. "/day"
end
mod.cached_tenant_throttle = kumo.memoize(get_tenant_throttle, {
name = 'tenant_throttles',
ttl = '10 minutes',
capacity = 1000,
})
mod.per_tenant_throttle = function (tenant_id)
local rate = mod.cached_tenant_throttle(tenant_id)
if rate == "0/day" then
rate = "1000000000/hour"
end
return kumo.make_throttle(
string.format('tenant-send-limit-%s', tenant_id),
rate
)
end
Also, I don’t remember exactly what happened, but the msg param passed to the handler for throttle_insert_ready_queue sometimes has an issue with the get_meta() call and raises an error - this was already happening in the previous versions and I don’t remember the exact error message, but that’s why I have those pcall calls in there.
It took some time, but it has processed the queued up webhooks now, but am I doing this right? There’s no mention of webhook messages getting throttled in the documentation and I’d initially followed the basic example from this page
But from what I understand, that example will throttle webhook calls as well?
FWIW, if you don’t want get_tenant_throttle to throttle, I’d suggest having it return nil instead of a throttle spec. Then you could do something like:
local rate = mod.cached_tenant_throttle(tenant_id)
if not rate then
return nil
end
return kumo.make_throttle(
string.format('tenant-send-limit-%s', tenant_id),
rate
)
and:
local throttle = ahasend.per_tenant_throttle(tenant)
if throttle then
throttle:delay_message_if_throttled(msg)
end
keep in mind that throttle hooks are additive with other shaping rules that you might have, so if you have a default block in your shaping setup that limits max_message_rate, and you don’t explicitly have shaping defined for your webhook with an override for its message rate, then you’ll inherit that default value
max_ready seems very small. Keep in mind that in a through-and-through high throughput scenario you will have 1 Reception and 1 Delivery log record per message transiting the system. So if you have say 1000 msgs/s throughput you will have 2000 log msgs/s going through the webhook. I generally suggest that you take 2x that peak throughput as the starting size for max_ready, which would be 4000 in that scenario.
The consequence of max_ready being too small is that those log event messages will get pushed into the scheduled queue and be delayed for a randomized duration of approx 1 minute.