More granular memory usage metrics?

Looking for a more detailed breakdown of memory usage (for example, usage for lua contexts, resident messages, etc.), but metrics contain only memory_limit / memory_usage:

# HELP memory_limit soft memory limit measured in bytes
# TYPE memory_limit gauge
memory_limit 253277626368
# HELP memory_usage number of bytes of used memory
# TYPE memory_usage gauge
memory_usage 4799668224```

Using kumod 2024.08.17-abab4a27

It’s handy to grep for HELP to find and explain metrics:

$ curl -s 'http://127.0.0.1:8000/metrics'  | egrep 'HELP (memory|lua|message)'
# HELP lua_count the number of lua contexts currently alive
# HELP lua_event_latency how long a given lua event callback took
# HELP lua_event_started Incremented each time we start to call a lua event callback. Use lua_event_latency_count to track completed events
# HELP lua_load_count how many times the policy lua script has been loaded into a new context
# HELP lua_spare_count the number of lua contexts available for reuse in the pool
# HELP memory_limit soft memory limit measured in bytes
# HELP memory_over_limit_count how many times the soft memory limit was exceeded
# HELP memory_usage number of bytes of used memory
# HELP message_count total number of Message objects
# HELP message_data_load_latency how long it takes to load message data from spool
# HELP message_data_resident_count total number of Message objects with body data loaded
# HELP message_meta_load_latency how long it takes to load message metadata from spool
# HELP message_meta_resident_count total number of Message objects with metadata loaded
# HELP message_save_latency how long it takes to save a message to spool

I saw those, but I meant something like lua_memory_usage or message_data_resident_memory_usage

Would be helpful to figure out what exactly is eating up RAM

maybe the problem is in a lua script, maybe there are messages with very large body resident, etc

Having *_latency metrics helps a great deal when troubleshooting slowness, the same kind of granular metrics for memory usage would do the same for memory usage issues

If you are using prometheus, you can also use their node exporter to capture other system data including ram and disk usage.

yeah, but it won’t tell you what’s causing kumod memory consumption