I can fucking relax a little bit

Chris Remington ( @remington@beehaw.org ) · 3 years ago

I can fucking relax a little bit

argv_minus_one ( @argv_minus_one@beehaw.org ) · 3 years ago

This must be the one. Those are some monster queries.

I’m no database expert, but I wonder if it would be wise to break those up into multiple queries instead of joins. Joining post with person and community would result in a ton of duplicate data, wouldn’t it?

I’m actually interested in what people have to say about this, because I have a project that’s kind of sensitive to database query performance, and I’m worried that I’ll find out about some performance bottleneck the hard way like Beehaw just did. The more I learn about the subject, before my project goes to production, the better!

veroxii ( @veroxii@lemmy.ml ) · 3 years ago

No, joins are always faster. If you ultimately need to combine the data for the app, the database will be faster than your code can do it, since that’s what it was built to do.

argv_minus_one ( @argv_minus_one@beehaw.org ) · 3 years ago

Any idea why those queries are slow, then, if not because of all the duplicate data? Missing indices or something?

veroxii ( @veroxii@lemmy.ml ) · 3 years ago

Looking at the query I think it only returns a single row per post. So not really duplicate data. It all looks very straight forward and you’d think all the “_id” and “id” columns are indexed.

I asked for an EXPLAIN ANALYZE plan to see what really happens and where the most time is spent.

If it’s indexes we’ll see quickly. It might strangely be in the WHERE clause. Not sure what Hot_rank()'s implementation is. But we’ll find that out too if we can get the plan timings. Without looking at the numbers it’s all just guessing.

And I can’t run them myself since I don’t have access to a busy instance with their amount of production data. It’s the thing about databases - what runs fast in dev, doesn’t always translate to real workloads.

argv_minus_one ( @argv_minus_one@beehaw.org ) · 3 years ago

It’s the thing about databases - what runs fast in dev, doesn’t always translate to real workloads.

Yeah, that’s what really scares me about database programming. I can have something work perfectly on my dev machine, but I’ll never find out how well it works under a real-world workload, and my employer really doesn’t like it when stuff blows up in a customer-visible way.

I decided to write a stress-test tool for my project that generates a bunch of test data and then hits the server with far more concurrent requests than I expect to see in production any time soon. Sure enough, the first time I ran it, my application crashed and burned just like Beehaw did. Biggest problem: I was using serializable transactions everywhere, and with lots of concurrent requests, they’d keep failing and retrying over and over, never making progress.

That’s a lesson I’m glad I didn’t learn in production…but it makes me wonder what lessons I will learn in production.

darkfoe ( @darkfoe@lemmy.serverfail.party ) · 3 years ago

This is why I love canary and mirror releases when feasible. Hard to do with some projects though

darkfoe ( @darkfoe@lemmy.serverfail.party ) · 3 years ago

I’m no dev on the project myself, and I haven’t studied that query enough to know, but yeah they are some monster queries. I’d have to fire up pgadmin and try them out on my personal instance to understand them better.

But as for your curiosity, I had an issue with a microservice at my job that is very sensitive to database latency (makes one call, roughly 600 requests per second on average, up to 1200 in spikes.) We solved an issue with some of the joins going on by making a materialised view for what we knew didn’t change more than once per day, which we then scheduled with pg_cron to refresh concurrently (concurrently being key so we don’t lock out reads.) Reduced our query times significantly - ie, down to milliseconds vs up to 20 seconds.

Really boils down to how often some data needs to change, so you can make some sort of way of caching it.

AnotherOverHeaven ( @AnotherOverHeaven@beehaw.org ) · edit-2 3 years ago

deleted by creator

darkfoe ( @darkfoe@lemmy.serverfail.party ) · 3 years ago

Plan to take a peak at some of the open issues on github when I get a solid evening over the next week or two. I really like the federated setup so figure I’ll contribute as much as possible.

AnotherOverHeaven ( @AnotherOverHeaven@beehaw.org ) · edit-2 3 years ago

deleted by creator