Information Overload - Beehaw style

PenguinCoder ( @Penguincoder@beehaw.org ) · edit-2 2 years ago

Information Overload - Beehaw style

BitOneZero ( @BitOneZero@beehaw.org ) · 2 years ago

This gets it’s own section. Look, the largest issue with Lemmy performance is currently the database. We’ve spent a lot of time attempting to track down why and what it is, and then fixing what we reliably can. However, none of us are rust developers or database admins. We know where Lemmy spends its time in the DB but not why and really don’t know how to fix it in the code. If you’ve complained about why is Lemmy/Beehaw so slow this is it; this is the reason.

There is a dedicated Lemmy community, !lemmyperformance@lemmy.ml

LallyLuckFarm ( @LallyLuckFarm@beehaw.org ) · 2 years ago

Thank you for all your hard work!

KlausVonLechland ( @KlausVonLechland@beehaw.org ) · 2 years ago

As someone technologically illiterate and new to Fediverse (hi all) I wonder if this is something you need to figure out to work around your own hardware to ptimise it or is this usuall thing with Lemmy? I guess the current events and influx of people also quite stress-tests various systems.

PenguinCoder ( @Penguincoder@beehaw.org ) · 2 years ago

It’s more a matter of needing to figure out how Lemmy operates, and working around it. Some of the inefficiencies are inherent with how Lemmy is designed, so it’s usual.

kool_newt ( @kool_newt@beehaw.org ) · edit-2 2 years ago

Is it possible to use Redis to help speed up DB queries?

I’m assuming the DB is a container too, containers (Docker) and overlay networks have overhead. There could be overhead in the way the DB accesses the storage devices as well. Look into running the DB as dedicated real server if possible, otherwise a dedicated VM and not a container.

You can also look into read-replicas of the DBs as I’d imagine there are way more DB reads than writes. Take your DB backups from a read replica (you can stop one of the read replicas to get a consistent DB backup without interrupting other reads and writes).

You can set up slow-query logging if you haven’t yet to find out the problematic queries so you know where to optimize (if optimizing queries is an option).

PenguinCoder ( @Penguincoder@beehaw.org ) · 2 years ago

Thanks, we have explored these options. The Lemmy DB runs as a container, yes. The overhead of Docker on it isn’t that much, it’s more the queries themself. We do not want to increase the cost and complexity of adding another server on just for the database. We have also explored multiple DB containers and doing connection pooling. Again, this only moves the problem it does not solve it.

douglasg14b ( @douglasg14b@beehaw.org ) · 2 years ago

Honestly surprised it isn’t using redis already 😧

Often end up plopping redis in as an ad-hoc caching layer pretty early during application development for backends that are expected to be load balanced. It’s super simple to use, has low resource costs relative to it’s load capacity, and solves for a lot of low hanging fruit as far as DB access performance goes.

Opinion:

It should definitely be a reasonably high/critical priority roadmap item 🤔. The time cost is negligible assuming your ecosystem has a decent redis library, if you’re an expert in the codebase (major/primary contributor) it can be as easy as a few days to do a cleanup (assuming redis/lib familiarity/docs) and knock out all the low hanging fruit. And the benefits can be enormous, like 10x, 50x load decreases enormous.

Alternatively:

Read replicas as @kool_noot said. If not, some dev work is required.

This can sometimes work as a quick fix to address application perf problems without adding infrastructure, but time cost is more or less based on codebase quality & conventions, since you’ll be touching a lot more queries to make this change. And you’ll need to slap in a config that handles deployments without a read replica.

Then users of Lemmy could have as many read replicas as they want behind a load balancer/proxy which lets them scale in that direction going forward.

This is actually a common solution for read performance anyways.

douglasg14b ( @douglasg14b@beehaw.org ) · edit-2 2 years ago

Unfortunately you can only get so much out of config changes if the problems lie in access patterns 🫤

DB performance problems are very typical of ORM usage, which Lemmy appears to use. Though I’m not sure to what extent.

Not necessarily the ORM itself, but the database access patterns it encourages. If care is not taken to ensure performant hot paths receive more SQL and caching love, you end up with systemic performance problems.

Endemic to the habits of the devs, not to specific queries or one particular workload. But by a broad set of generally unperformant patterns that may not individually be a problem, but become one as a whole.

🤔

I also don’t code in rust unfortunately ,but I definitely understand ORM usage and how it can bite you, but I quite enjoy using them. So I’m not admonishing the choice.

Information Overload - Beehaw style

Information Overload - Beehaw style

Improving Beehaw

Details:

Details:

Details:

Details:

THE DATABASE

Details

The end?