Kumo locking up not responding at all

Solmea · March 20, 2024, 10:00pm

the queue-summary has like 570 lines

wez · March 20, 2024, 10:00pm

You can remove the kumo.on('should_enqueue_log_record', function(msg, hook_name) hook registration; log_hooks:new_json automatically takes care of that in the right way.

Solmea · March 20, 2024, 10:00pm

usually I tend to understand using strace or something what is going on… but it looks like kumo is looping and doing nothing…

Solmea · March 20, 2024, 10:02pm

so I remove that whole part..

wez · March 20, 2024, 10:02pm

Yeah, remove this whole bit:

kumo.on('should_enqueue_log_record', function(msg, hook_name)
  local log_record = msg:get_meta 'log_record'
  -- avoid an infinite loop caused by logging that we logged that we logged...
  -- Check the log record: if the record was destined for the webhook queue
  -- then it was a record of the webhook delivery attempt and we must not
  -- log its outcome via the webhook.
  if log_record.queue ~= 'webhook2' then
    -- was some other event that we want to log via the webhook
    msg:set_meta('queue', 'webhook2')
    return true
  end
  return false
end)

Solmea · March 20, 2024, 10:02pm

done that

wez · March 20, 2024, 10:04pm

Do you see any error messages in the systemd journal around the time you experience the problem?

wez · March 20, 2024, 10:05pm

Solmea · March 20, 2024, 10:08pm

I now notice this one… : Mar 20 22:06:59 ip-172-31-17-90 kumod[2994]: 2024-03-20T22:06:59.105529Z ERROR localset-0 kumod::http_server::admin_trace_smtp_server_v1: error in websocket: channel lagged by 2

Solmea · March 20, 2024, 10:09pm

now load-balancing 1 of the 5 connections to kumo… after I removed the above config part

Solmea · March 20, 2024, 10:10pm

could it be that logstash is with only two threads not fast enough to handle the webhooks of Kumo?

Solmea · March 20, 2024, 10:11pm

could very well be a chain reaction

Solmea · March 20, 2024, 10:11pm

but in my experience logstash is quite capable to handle lotsa logs

wez · March 20, 2024, 10:12pm

I’d suggest commenting out the webhook to see if it correlates

Solmea · March 20, 2024, 10:12pm

I have tried that but that resulted in an even faster lockup if I remember correctly

Solmea · March 20, 2024, 10:14pm

hmm now the only thing I see ipv6 deliveries failing..

wez · March 20, 2024, 10:15pm

can you share an example of one of those? It’s likely harmless

Solmea · March 20, 2024, 10:19pm

Solmea · March 20, 2024, 10:19pm

Solmea · March 20, 2024, 10:20pm

the most important part was missing: