High resource usage over time (memory leak?) #3446

TruncatedDinoSour · 2024-11-28T20:34:31Z

Background information

Dendrite version or git SHA: 0.13.8+79b87c7
SQLite3 or Postgres?: Postgres
Running in Docker?: No
go version: go version go1.23.0 linux/amd64
Client used (if applicable): Schildichat, Element, Hydrogen, or CinnyChat is what most people use I believe on the HS

Description

What is the problem: Dendrite, when running over time, begins hogging ram over time. This results in not only a big RAM hog, but also a CPU hog, since I run zRAM (maybe also related to dendrite using the CPU? Idk.) Regardless, every week or so I have to restart Dendrite because it keeps eating more and more RAM, making the overall server performance worse.
Who is affected: The server.
How is this bug manifesting: It appears as Dendrite runs in long-term, slowly eating more and more RAM and/or swap.
When did this first appear: I can't recall. I don't remember needing to restart dendrite before 1.18 I think.

Steps to reproduce

list the steps
that reproduce the bug
using hyphens as bullet points

Run dendrite
Use it for a week or so
Watch the resource, mainly RAM, usage grow over time

This is its resource usage only 2 days later after its most recent restart. And it only creeps over time until I restart it:

It's weird.

The text was updated successfully, but these errors were encountered:

TruncatedDinoSour · 2024-11-29T22:50:30Z

Had to restart it again. Could only last 3 days.

TruncatedDinoSour · 2024-12-06T21:43:54Z

s7evink/fetch-auth-events fixed it

TruncatedDinoSour · 2024-12-09T16:29:29Z

nvm lmao, it was fine for like 6 days and now its no again

neilalexander · 2024-12-11T18:04:18Z

No idea if zRAM is a setup that should be supported but without a memory profile it will be difficult to tell what’s going on.

https://element-hq.github.io/dendrite/development/profiling

TruncatedDinoSour · 2024-12-11T19:33:51Z

a setup that should be supported but without a memory profile it will be difficult

it is zram, yeah
but, is profiling a good choice over like a week ? the report would be huge, no ?
and even so, wouldnt it severely impact the performance for the week ? is there a way to check this with minimal disruption ?

edit :

its only growing :')

actually since its clearly majorly growing over a day, im down to set up reporting tomorrow day and report day after tmrw : ) ill do that

neilalexander · 2024-12-11T20:20:36Z

You just need a single memory profile captured when the memory usage is high. That should contain enough info and the files are small.

Having profiling enabled has next to no runtime cost so it’s fine to have it switched on for a long time.

TruncatedDinoSour · 2024-12-11T20:22:23Z

You just need a single memory profile captured when the memory usage is high. That should contain enough info and the files are small.

Having profiling enabled has next to no runtime cost so it’s fine to have it switched on for a long time.

oh nice, okay then, ill enable the profiler tomorrow since for today i consider myself done and want the rest of the ~~day~~evening offish xD

ill send out a memory profile capture in 1-3 days in this thread :)

TruncatedDinoSour · 2024-12-11T20:27:55Z

ok since it was just 1 environment variable i enabled it now, thought maybe its more complicated but nope, ill post the thing when the resource usage is high :)

TruncatedDinoSour · 2024-12-14T13:21:52Z

@neilalexander okay i got the heap report, idk how safe it is to send out heap reports, ive heard that its generally acceptable but for now ill stick to web ui screenshots

it is at 2.5 gigs of ram usage atm according to systemd, or was at the time when i took the report, it dropped to 2.2 now ( ?? )

● dendrite.service - dendrite
     Loaded: loaded (/etc/systemd/system/dendrite.service; enabled; preset: enabled)
     Active: active (running) since Thu 2024-12-12 21:13:13 UTC; 1 day 16h ago
   Main PID: 271785 (dendrite)
      Tasks: 20 (limit: 19144)
     Memory: 2.2G
        CPU: 2h 17min 46.905s
     CGroup: /system.slice/dendrite.service
             └─271785 /home/matrix/go/bin/dendrite --config dendrite.yaml

had to restart dendrite 2 days ago since it was refusing to write to postgresql database and both sending and receiving messages was slow :)

if you need the actual report in full lmk

phony.run is 64% somehow

something about sql dbs, im using postgres if it matters

graph :

ive now disabled pprof and restarted dendrite, if you need more info i can provide it, but for now i think this will be okay

TruncatedDinoSour · 2024-12-14T13:32:12Z

iirc @jjj333-p has experienced a similar issue with dendrite, may have extra input ?

neilalexander · 2024-12-14T13:32:30Z

Looks like it's struggling to process events. Can you try switching your instance to #3447 and see if it improves after a couple hours?

TruncatedDinoSour · 2024-12-14T13:33:44Z

Looks like it's struggling to process events. Can you try switching your instance to #3447 and see if it improves after a couple hours?

the pr or the s7evink/fetch-auth-events branch ? im already on the branch if thats what youre asking, i can try the pr i guess

edit : oh the pr is literally a merge request to merge the branch into main

edit 1 : and yes the branch drastically improves the performance, not the resource bug though, it mightve helped it partially at least though since it is not as brutal anymore

jjj333-p · 2024-12-14T13:43:15Z

i face this same issue, i can say that the fetch-auth-events pr did help a lot but its still not fixed. i also notice that after about a week of uptime it will be near oom on my system (full 4gb of ram + zram), and then i reboot and both the cpu and ram usage are way down (10-15% avg cpu, only 1-2gb of ram used). i also have all the caching i can disabled in the config. idk how helpful to this i can be though sorry

jjj333-p · 2024-12-14T13:47:36Z

dendrite itself (after 2 days up) is using 1.8gb of ram, and the postgres dedicated to just it and the sliding sync proxy is using another half a gig or more depending on how busy it is

heres it after restarting the dendrite service and sending a message in a large room

jjj333-p · 2024-12-14T14:30:10Z

ill also note that when rooms like the abandoned genshin impact room start backfilling in (thanks t2bot for not working correclty ever) it does max out the cpu and raise ram some, but not to that level. its a gradual over time thing.

neilalexander · 2024-12-14T15:06:51Z

@TruncatedDinoSour Can you attach the full profile?

TruncatedDinoSour · 2024-12-14T15:09:07Z

@TruncatedDinoSour Can you attach the full profile?

sure

heap.gz

TruncatedDinoSour · 2024-12-14T17:56:08Z

@TruncatedDinoSour Can you attach the full profile?

sure

heap.gz

ftr if it wasnt clear this is gzipped xD as in

gzip -d heap.gz for the actual profile

i had to gzip it cuz github cried about the raw one

TruncatedDinoSour · 2024-12-16T18:32:04Z

istg i cant even run it for 3 days without it crying in agony ugh

neilalexander · 2024-12-16T19:10:06Z

I saw it was gzipped yeah, it looks a lot like the profile is just showing the server is crunching some extremely complex room state. Are you a member of any very large or complex rooms?

TruncatedDinoSour · 2024-12-16T19:11:51Z

I saw it was gzipped yeah, it looks a lot like the profile is just showing the server is crunching some extremely complex room state. Are you a member of any very large or complex rooms?

no and i dont think anyone on the hs really is

neilalexander · 2024-12-16T19:18:42Z

Any interesting log entries happening at the time? Particularly level=error?

TruncatedDinoSour · 2024-12-16T19:24:01Z

Any interesting log entries happening at the time? Particularly level=error?

not that i see from recent logs, some media stuff but other than that nothing worth of concern i believe

TruncatedDinoSour · 2024-12-16T19:37:12Z

Any interesting log entries happening at the time? Particularly level=error?

not that i see from recent logs, some media stuff but other than that nothing worth of concern i believe

turns out 1 user was a part of the matrix.org HQ maybe thats why

neilalexander · 2024-12-16T19:44:15Z

Try using the admin API evacuateRoom to force everyone out of that room and see if things calm down. (Afterwards you could try purging it too, just in case the room state is corrupt from former auth errors.)

TruncatedDinoSour · 2024-12-16T19:45:37Z

Try using the admin API evacuateRoom to force everyone out of that room and see if things calm down. (Afterwards you could try purging it too, just in case the room state is corrupt from former auth errors.)

dont worry i am using both evac and purge right now

its taking some time but its working

regardless, the resource usage only has been a problem recently when that account has been there since the start of the homeserver so idk

maybe itll help ? idk, well see i guess :D

bones-was-here · 2024-12-19T17:49:29Z

If you "need" to host huge rooms on a system without much RAM you might benefit from disabling the in-memory cache:

global:
  cache:
    max_age: 0s

The cost is somewhat more work for postgres.

TruncatedDinoSour · 2024-12-19T18:12:44Z

If you "need" to host huge rooms on a system without much RAM you might benefit from disabling the in-memory cache:
global:
  cache:
    max_age: 0s
The cost is somewhat more work for postgres.

i did a while ago alrd

TruncatedDinoSour closed this as completed Dec 6, 2024

TruncatedDinoSour reopened this Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High resource usage over time (memory leak?) #3446

High resource usage over time (memory leak?) #3446

TruncatedDinoSour commented Nov 28, 2024

TruncatedDinoSour commented Nov 29, 2024

TruncatedDinoSour commented Dec 6, 2024

TruncatedDinoSour commented Dec 9, 2024 •

edited

Loading

neilalexander commented Dec 11, 2024

TruncatedDinoSour commented Dec 11, 2024 •

edited

Loading

neilalexander commented Dec 11, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 11, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 11, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 14, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 14, 2024

neilalexander commented Dec 14, 2024

TruncatedDinoSour commented Dec 14, 2024 •

edited

Loading

jjj333-p commented Dec 14, 2024

jjj333-p commented Dec 14, 2024 •

edited

Loading

jjj333-p commented Dec 14, 2024

neilalexander commented Dec 14, 2024

TruncatedDinoSour commented Dec 14, 2024

TruncatedDinoSour commented Dec 14, 2024

TruncatedDinoSour commented Dec 16, 2024

neilalexander commented Dec 16, 2024

TruncatedDinoSour commented Dec 16, 2024

neilalexander commented Dec 16, 2024

TruncatedDinoSour commented Dec 16, 2024

TruncatedDinoSour commented Dec 16, 2024

neilalexander commented Dec 16, 2024

TruncatedDinoSour commented Dec 16, 2024 •

edited

Loading

bones-was-here commented Dec 19, 2024

TruncatedDinoSour commented Dec 19, 2024

High resource usage over time (memory leak?) #3446

High resource usage over time (memory leak?) #3446

Comments

TruncatedDinoSour commented Nov 28, 2024

Background information

Description

Steps to reproduce

TruncatedDinoSour commented Nov 29, 2024

TruncatedDinoSour commented Dec 6, 2024

TruncatedDinoSour commented Dec 9, 2024 • edited Loading

neilalexander commented Dec 11, 2024

TruncatedDinoSour commented Dec 11, 2024 • edited Loading

neilalexander commented Dec 11, 2024 • edited Loading

TruncatedDinoSour commented Dec 11, 2024 • edited Loading

TruncatedDinoSour commented Dec 11, 2024 • edited Loading

TruncatedDinoSour commented Dec 14, 2024 • edited Loading

TruncatedDinoSour commented Dec 14, 2024

neilalexander commented Dec 14, 2024

TruncatedDinoSour commented Dec 14, 2024 • edited Loading

jjj333-p commented Dec 14, 2024

jjj333-p commented Dec 14, 2024 • edited Loading

jjj333-p commented Dec 14, 2024

neilalexander commented Dec 14, 2024

TruncatedDinoSour commented Dec 14, 2024

TruncatedDinoSour commented Dec 14, 2024

TruncatedDinoSour commented Dec 16, 2024

neilalexander commented Dec 16, 2024

TruncatedDinoSour commented Dec 16, 2024

neilalexander commented Dec 16, 2024

TruncatedDinoSour commented Dec 16, 2024

TruncatedDinoSour commented Dec 16, 2024

neilalexander commented Dec 16, 2024

TruncatedDinoSour commented Dec 16, 2024 • edited Loading

bones-was-here commented Dec 19, 2024

TruncatedDinoSour commented Dec 19, 2024

TruncatedDinoSour commented Dec 9, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 11, 2024 •

edited

Loading

neilalexander commented Dec 11, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 11, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 11, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 14, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 14, 2024 •

edited

Loading

jjj333-p commented Dec 14, 2024 •

edited

Loading

TruncatedDinoSour commented Dec 16, 2024 •

edited

Loading