Kumo-tsa-server Time out

Jack · February 2, 2026, 8:43am

Hi teams.

Issue report

Kumo version: ⁨kcli 2025.12.02-67ee9e96⁩

Problem:
I’m seeing the following error in the logs:
⁨```
logging-1 kumo_api_types::shaping: reading text from http://127.0.0.1:8008/get_config_v1/shaping.toml: error decoding response body: request or response body error: operation timed out. Ignoring this shaping source for now


Before upgrading, everything worked fine.

When this error occurs, if I manually run  
<U+2068>```
curl http://127.0.0.1:8008/get_config_v1/shaping.toml
```<U+2069>
it returns immediately without any delay. (The returned content is correct.）

After I restarted the TSA service, it returned to normal.

Any idea what could be causing this timeout or decoding issue after the upgrade?

---

Jack · February 2, 2026, 8:45am

init.lua 's shaper

Jack · February 2, 2026, 8:45am

https://gist.github.com/smsvip/36eace229180115a65245df158abeae7

Jack · February 2, 2026, 8:46am

I really didn’t change anything else in the configuration file—almost everything is still the example content. My setup is very simple. It worked fine before the upgrade, but this issue has occurred twice since upgrading.

Jack · February 2, 2026, 10:28am

root@kumomta /o/k/e/policy# free -h
total used free shared buff/cache available
Mem: 84Gi 19Gi 48Gi 1.4Gi 18Gi 64Gi

Jack · February 2, 2026, 12:44pm

⁨```
logging-0 mod_memoize: shaping_data (Some(ConfigEpoch(16369)), "[["/opt/kumomta/share/policy-extras/shaping.toml","/opt/kumomta/share/policy-extras/shaping.toml","/opt/kumomta/share/community/shaping.toml","/opt/kumomta/etc/policy/shaping_custom.toml","http://127.0.0.1:8008/get_config_v1/shaping.toml\“],null]”) failed: shaping_data lookup for (Some(ConfigEpoch(16369)), "[["/opt/kumomta/share/policy-extras/shaping.toml","/opt/kumomta/share/policy-extras/shaping.toml","/opt/kumomta/share/community/shaping.toml","/opt/kumomta/etc/policy/shaping_custom.toml","http://127.0.0.1:8008/get_config_v1/shaping.toml\“],null]”) timed out after 120s on semaphore acquire while waiting for cache to populate

Mike · February 2, 2026, 2:26pm

And this hasn’t returned since the restart?

tom · February 2, 2026, 2:28pm

Tailing the journal for tsa would be helpful.

Jack · February 2, 2026, 2:28pm

Yes, it’s intermittent. It works fine after a restart, but after some time (the interval is not fixed), the issue reappears. Then, even without restarting, the errors stop after about 1–2 minutes. I haven’t found any pattern yet. Also, the kumo-tsa service doesn’t show any errors.

Jack · February 2, 2026, 2:31pm

yes,but no errors at all

Jack · February 2, 2026, 2:34pm

Jack · February 2, 2026, 2:46pm

sudo -u kumod timeout 5s curl -v http://127.0.0.1:8008/get_config_v1/shaping.toml |wc -c

Trying 127.0.0.1:8008…
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0* Connected to 127.0.0.1 (127.0.0.1) port 8008 (#0)

GET /get_config_v1/shaping.toml HTTP/1.1
Host: 127.0.0.1:8008
User-Agent: curl/7.76.1
Accept: /

Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: text/plain; charset=utf-8
< vary: accept-encoding
< content-length: 107882
< date: Mon, 02 Feb 2026 14:45:09 GMT
<
{ [33136 bytes data]
100 105k 100 105k 0 0 10.2M 0 --:–:-- --:–:-- --:–:-- 10.2M
Connection #0 to host 127.0.0.1 left intact

Jack · February 2, 2026, 2:49pm

If the issue were really caused by the tsa service crashing, then executing
curl http://127.0.0.1:8008/get_config_v1/shaping.toml
should also yield no response. However, every time I test it, it responds quickly.

tom · February 2, 2026, 4:16pm

My other guess is about actual content. The error ⁨⁨error decoding response body: request or response body error: operation timed out.⁩⁩ implies that curl can’t parse the response from the request so it could be that part of the returned response actually breaks the output.

I think my testing strategy would be to periodically run the curl and when there is an error, check your logs for transfails that happen at the same time, perhaps with a script. My guess is that you are hitting a transfail that triggers a rule that contains characters that break the output.

Alternately, you may want to run curl verbose so you can see the entire transaction and when it breaks, it might actually show you what broke it. ( add a -v)

Jack · February 2, 2026, 6:54pm

Thanks. I will try. I have add “allow_stale_shaping_data = true” , But the problem still exists.

Jack · February 3, 2026, 8:50am

Restarting is the ultimate fix — after the restart, things have calmed down. I need to keep monitoring it.

tom · February 3, 2026, 4:29pm

Interesting, but a restart should not be required. I’d like to get to the bottom if that weirdness. Can I ask a few questions?

Is tsa daemon and kumod on the same server?
If not, are they both on the same version of Kumo?
What is the host profile? (CPU, RAM, Drive, NIC)
Are there other services running on it that may be using ports 25 and 8008
Are you also running Prometheus and Grafana on that host?
Can you share your sysctl settings for :
vm.max_map_count
net.core.rmem_default
net.core.wmem_default
net.core.rmem_max
net.core.wmem_max
fs.file-max
net.ipv4.ip_local_port_range
net.ipv4.tcp_tw_reuse
kernel.shmmax
net.core.somaxconn
vm.nr_hugepages
kernel.shmmni

Jack · February 3, 2026, 4:39pm

Yes, Of C

Jack · February 3, 2026, 4:39pm

Is tsa daemon and kumod on the same server?
yes. Same server

If not, are they both on the same version of Kumo?

What is the host profile? (CPU, RAM, Drive, NIC)

This Rocky Linux 9 system is running on a virtualized environment (QEMU/KVM) with:

8 vCPUs @ 2.4 GHz
88 GB RAM
1 TB virtual disk
1 active network interface (ens18)
CPU
Model: QEMU Virtual CPU version 2.5+
Vendor: GenuineIntel
Clock Speed: 2.4 GHz
Total Processors: 8
Cores per CPU: 2
Cache Size: 16 MB

Memory (RAM)
Total Memory: 88 GB (≈ 85 GB usable)
Free Memory: ~34 GB
Available Memory: ~70 GB

Jack · February 3, 2026, 4:40pm

Storage (Drive)
Device: /dev/sdb
Capacity: 1 TiB (1,099,511,627,776 bytes)
Disk Model: QEMU HARDDISK

Network (NIC)
Interface: ens18
State: UP
MAC Address: xx
MTU: 1500

Are there other services running on it that may be using ports 25 and 8008

port:2525 , 8008

Are you also running Prometheus and Grafana on that host?
Prometheus on other intranet machines