PSA: Pictures are back!

Lionir [he/him] ( @Lionir@beehaw.org ) · edit-2 2 years ago

PSA: Pictures are back!

PenguinCoder ( @Penguincoder@beehaw.org ) · 2 years ago

Well that was fun.

Didn’t go as planned of course, restored from backups, pre migration attempt. Thank you for your patience while we try to get all these moving parts working well together. Sorry for the troubles.

The Cuuuuube ( @Cube6392@beehaw.org ) · 2 years ago

I once caused an AWS outage that impacted 20% of their customers in their largest region. They called my manager to ask why we were performing around 10k writes per second to a bucket. It was fun times

Huggernaut ( @Huggernaut@beehaw.org ) · 2 years ago

They don’t limit that?! I’ve worked with a lot of AWS services and most have built in rate limits. That’s wild lol

The Cuuuuube ( @Cube6392@beehaw.org ) · 2 years ago

They do now…

longshaden ( @longshaden@beehaw.org ) · 2 years ago

lol, that’s how rules get made

The Cuuuuube ( @Cube6392@beehaw.org ) · 2 years ago

I can get into it in more detail if anyone’s interested. But basically, they had a rate limit on direct writes, but not a rate limit on cross bucket replication if you connected many buckets to replicate into a single bucket

ButterBiscuits ( @ButterBiscuits@beehaw.org ) · 2 years ago

Right? Don’t feel bad, you found a “vulnerability” and now you’re a hero

mitch ( @msprout@beehaw.org ) · 2 years ago

That was you?!

(jk, I’m on a different cloud 😂)

l4sgc ( @l4sgc@beehaw.org ) · 2 years ago

Thanks for the update and hope you have less trouble in the future! Don’t worry about the downtime I really appreciate that here it’s serving a clear purpose unlike Twitter lol

Artemisia ( @artemisia@beehaw.org ) · 2 years ago

Loads of love. There’s always ASCII art.

bassdruminphonebox ( @bassdruminphonebox@beehaw.org ) · 2 years ago

I appreciate the late night efforts and the clear communication. For me, Beehaw is a positive place I can visit, but there are other things I can do also, and I have no need for many 9s of uptime here. (I’m trying to reduce any pressure you & others might feel - perhaps not communicated it well tho, hence this addition.)

knittedmushroom ( @knittedmushroom@beehaw.org ) · 2 years ago

Every technical bump in the road we hit now is one we won’t hit/will know how to handle quickly in the future! Thank you for doing what you do for Beehaw!

Lionir [he/him] ( @Lionir@beehaw.org ) · 2 years ago

Yeah, moving to object storage is best to do now. Arguably, we should’ve done it sooner since the longer we’ve waited, the more it was gonna catch up to us and cost us in time and money.

interolivary ( @interolivary@beehaw.org ) · 2 years ago

I’d imagine the list of things you should do RIGHT NAO is pretty long though and there’s only 24h per day 😅

AndrewZabar ( @AndrewZabar@beehaw.org ) · 2 years ago

I concur. A minor inconvenience on occasion is a small price to pay for your amazing efforts! Thank you for doing what you do.

alehel ( @alehel@beehaw.org ) · 2 years ago

Surprised beehaw hosts images at all. Sounds like that could become very expensive very quickly.

douglasg14b ( @douglasg14b@beehaw.org ) · edit-2 2 years ago

It could, and will. Hopefully they are taking advantage of CDNs for image delivery so they aren’t paying high egress costs and can keep it in slow, cheap, storage.

I’m honestly surprised that Lemmy hasn’t embraced distributed, community, hosting. Many existing niche communities (outside of Lemmy) operate with the ability for others to run their service to serve up images and media, or to act as workers for computationally expensive operations like compression or encoding (Which will also save you a ton of space). Even gamificating it in the case of e-hentai.

Hard drives (Tapes even more) at home/office are incredibly cheap compared to cloud storage costs (even including networking, server, redundancy…etc hardware costs), but come with reliability concerns, which is where a distributed community becomes critical. Though you’ll always have to have them stored somewhere like Backblaze B2, or somewhere slower/cheaper/frozen to ensure safety.

Lionir [he/him] ( @Lionir@beehaw.org ) · 2 years ago

We’ll definitely be using a CDN to help avoid high egress costs.

greenskye ( @greenskye@beehaw.org ) · 2 years ago

I feel like Lemmy definitely needs to embrace distributed computing in some fashion. I have no interest in hosting my own instance, but I’m not against running a docker image that would offload some of the processing requirements large instances need. It would just need to be relatively straightforward for me to setup

The Cuuuuube ( @Cube6392@beehaw.org ) · 2 years ago

Distributed computing isn’t really a good fit for low computational tasks like forum software. It’s good for heavy calculations like “Could you please fold proteins to see if there’s any interesting stuff to be found” and “Here are 50 years of radio data. See if any of it is anomalous.” You need a sufficiently complex enough long-running task to warrant the computational overhead of a supervisor process assigning and receiving the outputs of tasks. LLMs, epigenetics, and deep space analysis are all good candidates for distributed computing. Lemmy is more of a candidate for an autoscaling clustered multi-tennent approach. The computational tasks are basic, but there’s a lot of them. Further, the computational needs are not constant. A fantastic case study for making the most of resources in the Fediverse is mastodon.world and lemmy.world running on the same server and making scale up and scale down requests to the docker daemon. The ideal world topology, in my opinion, for a Fediverse application ecosystem would be a Kubernetes cluster with three supervisor nodes and a minimum of two worker nodes, all with autoscaling enabled. The idea would be that your database resources can hold multiple databases (Lemmy, Mastodon, Peertube) AND can scale. The mechanism you would use to do this would depend on your hosting decisions.

Digression now on database solutions. There are three basic ways I could see running the perfect Fediverse database cluster. The first, and least beholden to any given cloud provider, is to run Postgres in a Kubernetes cluster either on a single machine emulated cluster at your house, or within several clustered machines. The upside to this is that no one but you controls your infrastructure. The downside is that your ability to scale is hard capped to the amount of RAM and CPU resources you physically have in your house. Next would be a similar set-up on a hosted Kubernetes cluster through a cloud provider such as Google, Microsoft, IBM, or AWS. The downside here is that tech giants are all, for various reasons, shit. Google has the best eco-friendliness score, so they’re listed first. They’re still shit, though, and one of the platforms I’m suggesting hosting is a direct competitor to one of their golden goose products.

Your next option is to just pay one of those cloud providers to host a database cluster for you, rather than using an ad hoc Kubernetes cluster solution. It will cost you more money, but the tools available to you for managing databases through these cloud providers are much better. In terms of user experience and performance, this is a clear upgrade over hosting your databases on your Kubernetes cluster. The final option I’d want to talk about is called “Aurora Serverless.” So far, I’ve only discussed ways you can scale up to meet demand, but Aurora Serverless allows you to scale down. This will be the cheapest option if you run a small instance with clear peaks and valleys of load. It’s not the best answer for a user like Beehaw, but would come with the lowest cost in terms of management and money for someone running an instance for a low number of people.

So, does that solve the image hosting problem? No. Not really. Postgres is TERRIBLE for image hosting. Right now, Beehaw is, per my understanding, using the simplest image storing solution, which is “Just keep it on the server.” This is great for a first pass at hosting a web service, and will remain fine long term for a low user instance, but will fast run into issues with any instance that hosts numerous users uploading pictures. Basically, servers have finite space because they’re running the Harvard architecture. The only solution is to bring the service down and put in bigger disks. Eventually, you reach the upper limit of how big of disks are manufactured, and how many disks you can attach via the interfaces that connect to a motherboard. A much better solution, and in fact the best solution, is what Beehaw is implementing right now: block object storage. If I’m going to tie all of this first in the DIY “I’m a strong independent Fediverse citizen, and I don’t need no corporations,” I’ll start by recommending Ceph. Ceph can run on Kubernetes and will provide block object storage based on Kubernetes persistent volumes. But more likely, you will want to aim for something with infinite storage capabilities, and your only real options for that currently are the cloud providers. You don’t have to worry about disks running out of space, and they do not charge you very much money.

I get where you’re coming from, though. “How do we all own the images so that the instances don’t run out of space but without being beholden to the corporations who own the storage?” The closest we come right now is peer 2 peer solutions, but all of them have a discovery and durability problem. In terms of discovery, the problem is “how does a server providing the Lemmy service find the peer 2 peer hosted files?” There’s no way to perform get object operations to serve the files via HTTP other than for the host server to fetch (download) the file from the peer 2 peer network and then deliver that to the user who made the request. The problem with this is that the server synced the file to its local storage, and is now hosting it, thus defeating the purpose of the peer 2 peer hosting solution. The other problem, the durability problem, is what happens when a low number of people are interested in an image, and the last person online hosting the image closes their laptop. Now no one can get the image as there was never a canonically available version of the file. The only solutions that I know of that come close to solving these problems right now are Nostr and Secure Scuttlebutt. There are major issues with these protocols as they stand right now. Firstly, people already find joining the Fediverse too hard. For Nostr you have to generate GPG keys to create your identity. This isn’t… horrible, but it definitely takes some work and some doing. You have to generate the files and then load them into your Nostr client. Secure Scuttlebutt is based on a protocol where to follow someone, someone has to invite you to follow them. People already complain about Beehaw asking you a question about what you like about Beehaw to make sure you read the rules. Imagine the frustration with a pure invite only social network where you can’t join until someone you know has joined.

The second problem is moderation. Secure Scuttlebutt is fine for this. You only ever follow people you like, you only ever see updates from people you like. Fantastic. Nostr has basically no moderation at all. If you’ve spent any time at all on the internet, you’ve probably realized by now that this is TERRIBLE. My time on Nostr was basically opening the app, seeing an entire feed full of pro-Russian propaganda, and then uninstalling it. I do think there’s something to be said for the idea of a pure peer 2 peer social network, but I don’t think we’re anywhere close to implementing it yet. So, where does that leave us?

The Fediverse. It was designed for a distributed governance system in which each instance acts as its own country with its own rules and governance, and it accidentally has some pretty neat clustering features that help it perform better under heavy load and keep data more permanent and durable. I want to emphasize that, too. The current computational and architectural benefits of the Fediverse are accidental. They’re side effects of the distributed governance, not the core purpose. I don’t expect anyone to put focus into enhancing these aspects of the Fediverse, at least not for a while. We’re much more likely to see someone design a community based social network from the ground up on peer to peer technologies. I’d be excited about that, but it will need to have more open signups than Secure Scuttlebutt, and moderation tools like… At all, unlike Nostr. The most likely solution for the latter would be collaborative blocklists. Maybe me and two of my friends have a shared view of what is and isn’t hate speech. So, we all spend some time just blocking the shit out of users. But, no one of us is who writes the block list, the block list itself is a peer 2 peer distributed construct so that we don’t all have to reach consensuses about “Hey, was this guy being a jerkass”

interolivary ( @interolivary@beehaw.org ) · 2 years ago

Lemmy definitely needs to embrace distributed computing in some fashion

It would just need to be relatively straightforward for me to setup

Pick one.

douglasg14b ( @douglasg14b@beehaw.org ) · edit-2 2 years ago

It can be though? Sites & service have been doing this for decades now. My example of e-hentai using distributed workers hosted on users machines (given they pass the networking & storage requirements) to serve up images is one of those.

The problem is the bulk of the work is on Lemmy developers to design such a solution, and then together with the FOSS community, make it accessable.

Media is the low hanging fruit, and has largely been a solved problem for quite some time. And even has semi-functional fediverse solutions. Distributed workers for encoding and compressing media is also a solved problem. And in many cases has been made as easy as downloading an executable or spinning up s docker container.

So, yes, for a set of workloads you don’t have to choose. And haven’t had to for years.

Actual distributed transactional workloads is a whole other beast, which is a problem that needs solving if we ever want to have robust and survivable communities that can deal with scaling issues without risk of dying because of a lack of funds or because someone ran off with the funds.

interolivary ( @interolivary@beehaw.org ) · 2 years ago

The problem is the bulk of the work is on Lemmy developers to design such a solution, and then together with the FOSS community, make it accessable.

Exactly so, that’s what I was thinking with my joke: no current solution exists, and were one to exist it’d probably take a while for it to actually be easy to run.

Chris Remington ( @remington@beehaw.org ) · 2 years ago

I can’t upload my puppy and flower pics!!! Fucking damn you!!! WTF did I sign up for!!!?!?!??!

PenguinCoder ( @Penguincoder@beehaw.org ) · 2 years ago

You can always host your own instance…

duck

jabib (he/him) ( @jabib@beehaw.org ) · 2 years ago

<bender.jpg> Caption: I’ll just make my own Beehaw - with blackjack and hookers!

TheOtherJake ( @TheOtherJake@beehaw.org ) · 2 years ago

<hookers.jpg> Caption: very NSFW content

Pete Hahnloser ( @Powderhorn@beehaw.org ) · 2 years ago

Thanks for everything y’all do to keep Beehaw afloat!

metaltoilet ( @metaltoilet@beehaw.org ) · edit-2 2 years ago

https://postimages.org/ :)

Lionir [he/him] ( @Lionir@beehaw.org ) · 2 years ago

But then I have to click links!

metaltoilet ( @metaltoilet@beehaw.org ) · 2 years ago

There’s a direct embed feature. I’ve used it for everything i’ve posted to reduce load on the servers.

Lionir [he/him] ( @Lionir@beehaw.org ) · 2 years ago

Well, we do proxy the image so it’s just saving on storage costs which after this move will be very cheap. 5$/TB/month.

Good to know for now though :)

psudo ( @psudo@beehaw.org ) · 2 years ago

Which object store did you go with for that price? It’s been awhile since I looked, but I remember them being more than that.

Lionir [he/him] ( @Lionir@beehaw.org ) · 2 years ago

We’ve chosen Backblaze B2, it’s one of the cheaper options. Wasabi has similar pricing.

Haatveit ( @Haatveit@beehaw.org ) · 2 years ago

Not that anyone asked, but as a long time user, Baxkblaze is great. Good choice I think!

_MusicJunkie ( @_MusicJunkie@beehaw.org ) · 2 years ago

Would be cool to know the traffic costs a few months after y’all have implemented object storage.

aka_oscar ( @aka_oscar@beehaw.org ) · 2 years ago

Does that mean we can now upload images directly to beehaw? I mean we always could but it wasnt the ideal option. It sounds like storage is not a user concern anymore.

Gaywallet (they/it) ( @Gaywallet@beehaw.org ) · edit-2 2 years ago

Maybe if you’re regularly donating 😉

seriously tho we’ll turn off image hosting if it becomes an issue, but being aware that there is a cost to hosting images on this site is never a bad thing

PenguinCoder ( @Penguincoder@beehaw.org ) · 2 years ago

Look here…

douglasg14b ( @douglasg14b@beehaw.org ) · edit-2 2 years ago

Also unlikely to actually keep pictures around… Worst thing is seeing a post with a picture that is critical to the information in the post, and that picture has long been purged from the host…

metaltoilet ( @metaltoilet@beehaw.org ) · 2 years ago

Good point

TheOtherJake ( @TheOtherJake@beehaw.org ) · 2 years ago

They will ban your IP if you direct link to images and bypass their tracker links. If you use a VPN, be sure to clear your cache before changing your IP or the ban may carry over.

metaltoilet ( @metaltoilet@beehaw.org ) · 2 years ago

How come they give a direct embed link then?

2 years ago

?? Been using them for a couple years always direct link and never had a problem?

TheOtherJake ( @TheOtherJake@beehaw.org ) · 2 years ago

Maybe it’s the VPN I am using. I dunno, I am definitely in a black list for some reason

dr_catman ( @dr_catman@beehaw.org ) · 2 years ago

Thank you for making it possible to share endless pictures of beans in the future! It will never get old.

Beans, beans, beans, more beans, perhaps a cat, beans, beans, never gets old!, beans.

_MusicJunkie ( @_MusicJunkie@beehaw.org ) · 2 years ago

Are you Beanus from digitiser?

worfamerryman ( @worfamerryman@beehaw.org ) · 2 years ago

Beehaw, Lemmy.ml, and the mastodon instance i use were all down at the same time. I thought it was some nepharious Meta plot 😂

Turns out it was object storage for beehaw and masto. Im not sure about Lemmy.ml

average650 ( @average650@beehaw.org ) · 2 years ago

Just to be clear, this is just a moving of images, and it will be back correct? Just a temporary measure?

interolivary ( @interolivary@beehaw.org ) · 2 years ago

Yep, they’re moving pictures to a service where it’s cheaper to store them rather than keeping them on the server’s hard drive

cyberdecker ( @cyberdecker@beehaw.org ) · 2 years ago

No worries on the short notice, thank you for the heads up! Sincerely appreciate the transparency.

dandelion ( @dandelion@beehaw.org ) · 2 years ago

You guys are the best!

I did an ADHD, and misread as you saying you were turning off pictures for good, but given how much I’m enjoying the Beehaw community and the hard work you guys to keep it online, I wasn’t even that upset about that! A short, well telegraphed, partial outage is nothing in comparison!

Thanks to all you wonderful people!

Vertelleus ( @vertelleus@beehaw.org ) · 2 years ago

Thanks for the update!
So to save them space we should use externally linked images from other sources.
What are some of your favorite image hosts?
I just started using https://imgbox.com/

silentdon ( @silentdon@beehaw.org ) · 2 years ago

External image hosting is probably preferable but can we trust imgbox (or any other host really) to not pull an imgur and purge for e.g. anomymous uploads?

Zoop ( @Zoop@beehaw.org ) · edit-2 2 years ago

I really quite like catbox.moe for image hosting so far! I’ve just started using it recently myself for the same reasons. (Before that, I used imgur, which is still an option, but I’m not a fan of a lot of their more recent decisions and changes, so I’m trying out other places.)

I’m glad other people are thinking about how to try and help lessen the load on these instances. Thank you for that consideration for others. It’s nice to see (and I needed to see it!) I appreciate you.

Lionir [he/him] ( @Lionir@beehaw.org ) · 2 years ago

Note that we still have to proxy them so I’m not sure how much is saved but using this while we do the migration will be necessary.

Rentlar ( @Rentlar@beehaw.org ) · 2 years ago

Good luck with your migration! I can live with a bit of instability until we are through this. I noticed past couple days that the server seemed to go down every hour on the dot… hopefully that won’t be the case once the migration is complete.

Retronautickz ( @retronautickz@beehaw.org ) · 2 years ago

If it fails, you can always tell users to upload images to pixelfed and share the link here (I’m joking, don’t take this seriously)

TemporalSoup ( @TemporalSoup@beehaw.org ) · 2 years ago

Maybe to Gifycat? It’s like a nice short-term storage

pixelpop3 ( @pixelpop3@beehaw.org ) · 2 years ago

Gfycat announced they are shutting down and deleting everything on September 1st.

jherazob ( @jherazob@beehaw.org ) · 2 years ago

That was the joke :P

Gormadt ( @Gormadt@beehaw.org ) · 2 years ago

Wait really?

Damn that really sucks

Retronautickz ( @retronautickz@beehaw.org ) · 2 years ago

Wasn’t that one dying?

Short-term indeed

argv_minus_one ( @argv_minus_one@beehaw.org ) · edit-2 2 years ago

Welcome back to onlineness! Well mostly-onlineness.