I just need to preserve some old data that I have on my computers, so I was wondering what would be the best way to archive stuff long term.

Blu-ray disks ? Multiple HDDs ? What do you guys suggest ?

  • Would probably help to know for how long, how much capacity do you need and what budget. Should also be stated external factors play a massive factor on how long a storage device can survive like enviroment, humidity and heat being the biggies

    Edit in case I fall asleep: for the budget I usually would go with an external ssd just refresh the data every year or 2 it should be ok for 8ish years maybe even 10. For a write it and forget it method you’ll want m-disc instead which are more expensive but if properly stored will last lifetimes so the failure point will be a usuable drive that can read it. If you decide to go the spinning mechanical drive route make sure to buy 2 (a backup for the backup) since they are a lot more fragile. Gold plated dvds/cds are also another write and forget option but have less capacity than m-discs

  • A couple different threat models to consider, hardware failure vs human failure. Things like RAID can effectively cover the hardware failure side and be fully transparent. Human failure is a bit more tricky. There are a number of old expressions about backups but one that’s good to keep in mind is snapshots are not backups. They’re convenient and easy to automate but if the system making them goes kerplooie they’re pretty useless.

    A tiered version is good for off device backup, using diff backups routinely to only copy the new or changed data with a periodic full backup.

    Cold disks are great but make sure to test them periodically, nothing worse than looking to restore a chunk of data only to find the backup can’t be read.

    • Things like RAID can effectively cover the hardware failure side

      Note that RAID only covers one specific hardware failure. To the point where IMHO, you cannot consider it a data security measure, only a data availability one.

      • Curious what you mean here. Aside from RAID0 all tiers allow for at least one disk to fail without loss. If the whole raid controller fails you can typically replace that independently and import the foreign config. This is all talking about hardware backed RAID of course, not a soft-raid config.

        • There are much worse ways for a RAID controller to fail than suddenly not doing anything. What if it doesn’t notice it has failed and continues to write to a subset of devices only? Great recipe for data corruption right there.

          Bad RAID controller/HBA, CPU, RAM, Motherboard, PSU are all hardware failures that RAID does very little (if anything) to mitigate. One localised incident in any of them out could make all of your drives turn into magic smoke or bits go bad.

          You cannot rely on that sort of setup for data security. It only really mitigates one relatively common hardware to push storage system uptime above 99.9%. That has a place in some scenarios where storage “only” being 99.9% available has a significant impact on total availability but you’d first have to demonstrate that that is the case.

          • Fair enough if using a more expansive version of hardware failure. Things like a house fire would presumably destroy a series of optical disks which would make most any in house option non-functional. Network based backups could also fail to transmit data securely and accurately as well so really any sort of replication solution needs validation of the data is of significant value. A first step in preservation is to not have the box that it came from burn down, and have a way to recover if someone does a ‘sudo rm -rf /’ accidentally.

            • Things like a house fire would presumably destroy a series of optical disks which would make most any in house option non-functional.

              Well, it makes any option that only uses a single location non-functional. Having two copies at home and one at a distant location (as recommended by the 3-2-1 backup rule of thumb) mitigates this issue.

              Network based backups could also fail to transmit data securely and accurately as well

              Absolutely. Though the network is usually assumed to be unreliable from the get-go, so mitigations usually already exist here (E2EE, checksums, ECC).

              really any sort of replication solution needs validation of the data is of significant value

              Absolutely correct. An untested backup is probably better than nothing but most definitely worse than a tested backup.

              and have a way to recover if someone does a ‘sudo rm -rf /’ accidentally.

              Certainly something that must be mitigated but this is getting out of “hardware failure” territory now ;)

  • CDs degrade over time and so aren’t the best way to archive data if you know you will need it again. If it’s just an ‘in case’ then it may be ok. Best bet is to buy a USB disk and then keep a second copy of it offsite. Also best practice to not use two of the same manufacturer drive.

  • I don’t know about your budget, but I’d do it onto HDDs. They’re cheap and large. But use two and regularly check them like every other month or so. If one breaks, get a replacement. That’s the most simple (if you have an external dock) and cheap solution that you OWN.

    Also copy it at least twice onto each in case of corruption. Also use a copier that verifies (fastcopy, teracopy etc)

  •  Atemu   ( @Atemu@lemmy.ml ) 
    link
    fedilink
    2
    edit-2
    8 months ago

    I use multiple offline HDDs with a policy to keep n copies between them because it’s by far the cheapest way to still own the data. It requires regular checks because HDDs are likely to fail after a decade or so and a bunch of HDDs are a pain to manage, so you will need tooling for this. I use git-annex for this purpose but it’s not particularly user-friendly.