Hi,

I’m not sure if this is the right community for my question, but as my daily driver is Linux, it feels somewhat relevant.

I have a lot of data on my backup drives, and recently added 50GB to my already 300GB of storage (I can already hear the comments about how low/high/boring that is). It’s mostly family pictures, videos, and documents since 2004, much of which has already been compressed using self-made bash scripts (so it’s Linux-related ^^).

I have a lot of data that I don’t need regular access to and won’t be changing anymore. I’m looking for a way to archive it securely, separate from my backup but still safe.

My initial thought was to burn it onto DVDs, but that’s quite outdated and DVDs don’t hold much data. Blu-ray discs can store more, but I’m unsure about their longevity. Is there a better option? I’m looking for something immutable, safe, easy to use, and that will stand the test of time.

I read about data crystals, but they seem to be still in the research phase and not available for consumers. What about using old hard drives? Don’t they need to be powered on every few months/years to maintain the magnetic charges?

What do you think? How do you archive data that won’t change and doesn’t need to be very accessible?

Cheers

  • This is actually a real problem… A lot of digital documents from the 90’s and early 2000’s are lost forever. Hard drives die over time, and nobody out there has come up with a good way to permanently archive all that stuff.

    I am a crazy person, so I have RAID, Ceph, and JBOD in various and sundry forms. Still, drives die.

  • Might be a dumb idea but hear me out. How about sealing a reputable enterprise or consumer SSD in one of those anti static bags with a desiccant and then sealing that inside a pvc pipe also with desiccant and then burying it below the frost line? You’ll just have to dig it up and refresh everything every couple of years, think 3 years at most iirc for consumer ones. Obviously this isn’t a replacement for a backup solution just archival so no interaction with it. It’ll protect it from the elements, house fires, flooding, temperature fluctuations pretty much everything and its cost effective. Hell you can even surround the hard drive bag in foam then stuff in the pvc pipe for added shock absorption. Make a map afterwards like a damn pirate (its night time so my bad if I sound deranged)

    edit I took a nap: in hindsight I should’ve clarified. I went with an ssd in this idea since its more durable than a mechanical, better price for storage capacity compared to m-disc, and most likely to be compatible with other computers in the future in case you need it for whatever reason. Of course you can use another storage media, like m disc, just know of the drawbacks. Like needing a m-disc burner (~100$), several discs depending on how big of a capacity you need (price varies), pray that there’s still a drive that can read m-disc in the future and know that’s its gonna be slow when getting your data back regardless. All you would have to do to modify the idea would be getting a disc case that kinda suspends the disc so nothing is touching it’s surfaces. Then the same idea: antistatic bag with desiccant, foam or even bubble wrap around it, stuffed in a pipe with desiccant buried below your frost line. People usually skip the “in optimal conditions” part when talking about m-disc but this way we get close to those optimal conditions

    • went with an ssd in this idea since its more durable than a mechanical, better price for storage capacity

      how? sorry but that does not add up to me. for the price of a 2 TB SSD you could by a much larger HDD

      and most likely to be compatible with other computers in the future in case you need it for whatever reason.

      both of these use SATA plugs, it should be the same

    • This is a very, very bad idea.

      SSDs are permanent flash storage, yes, but that doesn’t mean you can leave them unpowered for extended periods of time.

      Without a refresh, electrons can and do leak out of the charge traps that store the ones and zeroes. Depending on the exact NAND used, the data could start going corrupt within a year or so.

      HDDs suffer the same problem, though less so. They can go several years, possibly a decade, but you’d still be risking the data on the drive but letting it sit unpowered for an extended time.

      For the “cold storage” approach you should really be using something that’s designed to retain data in such conditions, like optical media, or tape drives.

      • Yeah that’s why I said it needs to be refreshed and also edited in an option for m-disc in case they want to go the optical route

        Of course you can use another storage media, like m disc,

        You’ll just have to dig it up and refresh everything every couple of years, think 3 years at most iirc for consumer ones

  • You might be interested in git-annex (see the Bob use case).

    It has file tracking so you can - for example - “ask” a repository at drive A where some file is, and git-annex can tell you it’s on drives C and D.

    git-annex can also enforce rules like: “always have at least 3 copies of file X, and any drive will do”; “have one copy of every file at the drives in my house, and have another at the drives in my parents’ house”; or “if a file is really big, don’t store it on certain drives”.

  • There isn’t anything that meets your criteria.

    Optical suffers from separation, hard drives break down, ssds lose their charge, tape is fantastic but has a high cost of entry.

    There’s a lot of replies here, but if I were you I’d get last generation or two’s lto machine from some surplus auction and use that.

    People hate being told to use magnetic tape, but it’s very reliable, long lived, pretty cost effective once you have a machine and surprisingly repairable.

    What few replies are talking about is the storage conditions. If your archive can be relatively small and disconnected then you can easily meet some easy requirements for long term storage like temperature and humidity stability with a cardboard box, styrofoam cut to shape and desiccant packs (remember to rotate these!). An antifungal/antimicrobial agent on some level would be good too.

  • I am using https://duplicati.com/ and https://www.backblaze.com/ ( use their b2 cloud storage its variable and 6$ a month for 1TB or less depending on how much you use) run a schedule beckup every night for my photos. It’s compressed and encrypted. I save a config file to my google so say if my house and server burn down. I just pull my config from google then redownload duplicati and boom pull my back up down. The whole set up backs up incremental so once you do the first back up its only changes that are uploaded. I love the whole set up.

    Edit: You can also just pull files you need not the whole backup.

  •  NaibofTabr   ( @NaibofTabr@infosec.pub ) 
    link
    fedilink
    English
    4
    edit-2
    4 days ago

    Someone else has mentioned M-Disc and I want to second that. The benefit of using a storage format like this is that the actual storage media is designed to last a long time, and it is separate from the drive mechanism. This is a very important feature - the data is safe from mechanical, electrical and electronic failure because the storage is independent of the drive. If your drive dies, you can replace it with no risk to the data. Every serious form of archival data storage is the same - the storage media is separate from the reading device.

    An M-Disc drive is required to write data, but any DVD or BD drive can read the data. It should be possible to acquire a replacement DVD drive to recover the data from secondary markets (eBay) for a very long time if necessary, even after they’re no longer manufactured.

    • That is an always ON approach? For example with an NAS? While that is a very save approach, it does not fit the idea of having something “on the shelf”. Thank you for the advice though :)

      • You could turn it off and turn it back on every X period of time, but that doesn’t guarantee something doesn’t go wrong in between. It sounds like you don’t have alot of data relatively speaking. Is there a reason not to keep it on your present machine and do the above? Cost? IIRC you can get a 1 tb m.2 for under $150.

  • Blu-Ray USB drive and M-Discs is about the best you can get at present. Keep the drive unplugged when not in use, it’ll probably last 10-20 years in storage.

    Seeing as there hasn’t been much advance past Blu-ray, keep an eye out for something useful to replace it in the future, or at least get another drive when you notice them becoming scarce.

  • I use LTO magnetic tape for archiving data, but unfortunately the tape drives are VERY expensive. The tape itself is relatively cheap though (this is a 5-pack at 12TB uncompressed, 30TB compressed per cardridge, totaling at 60TB uncompressed, 150TB compressed. This is a lot cheaper than hard drives, and lasts for much longer), has large storage capacity and 30+ years of shelf life. Yes, I know, LTO 9 has come out, but I won’t be upgrading, because LTO 8 works just fine for me, and is much cheaper. The drives are backwards compatible by one generation though, e.g. you can use LTO 8 tape in an LTO 9 drive.

  • I would use maybe a Raspberry Pi or old laptop with two drives (preferably different brands/age, HDD or SSD doesn’t really matter) in it using a checksumming filesystem like btrfs or ZFS so that you can do regular scrubs to verify data integrity.

    Then, from that device, pull the data from your main system as needed (that way, the main system has no way of breaking into the backup device so won’t be affected by ransomware), and once it’s done, shut it off or even unplug it completely and store it securely, preferably in a metal box to avoid any magnetic fields from interfering with the drives. Plug it in and boot it up every now and then to perform a scrub to validate that the data is all still intact and repair the data as necessary and resilver a drive if one of them fails.

    The unfortunate reality is most storage mediums will eventually fade out, so the best way to deal with that is an active system that can check data integrity and correct the files, and rewrite all the data once in a while to make sure the data is fresh and strong.

    If you’re really serious about that data, I would opt for both an HDD and an SSD, and have two of those systems at different locations. That way, if something shakes up the HDD and damages the platter, the SSD is probably fine, and if it’s forgotten for a while maybe the SSD’s memory cells will have faded but not the HDD. The strength is in the diversity of the mediums. Maybe burn a Blu-Ray as well just in case, it’ll fade too but hopefully differently than an SSD or an HDD. The more copies, even partial copies, the more likely you can recover the entirety of the data, and you have the checksums to validate which blocks from which medium is correct. (Fun fact, people have been archiving LaserDiscs and repairing them by ripping the same movie from multiple identical discs, as they’re unlikely to fade at exactly the same spots at the same time, so you can merge them all together and cross-reference them and usually get a near perfect rip of it).

    • with two drives (preferably different brands/age, HDD or SSD doesn’t really matter) in it using a checksumming filesystem like btrfs or ZFS so that you can do regular scrubs to verify data integrity.

      an important detail here is to add the 2 disks to the filesystem in a way so that the second one does not extend the capacity, but adds parity. on ZFS, this can be done with a mirror vdev (simplest for this case) or a raidz1 vdev.

  • I used to write to DVD’s, but the failure rate was astronomical - like 50% after 5 years, some with physical separation of the silvering. Plus today they’re so relatively small they’re not worth using.

    I’ve gone through many iterations and currently my home setup is this:

    • I have several systems that make daily backups from various computers and save them onto a hard drive inside one of my servers.
    • That server has an external hard drive attached to it controlled by a wifi plug controlled by home assistant.
    • Once a month, a scheduled task wakes up that external hdd and copies the contents of the online backup directory onto it. It then turns it off again and emails me “Oi, minion. Backups complete, swap them out”. That takes five minutes.
    • Then I take the usb disk and put it in my safe, removing the oldest of 3 (the classic, grandfather, father, son rotation) from there and putting that back on the server for next time.
    • Once a year, I turn the oldest HDD into an “Annual backup”, replacing it with a new one. That stops the disks expiring from old age at the same time, and annual backups aren’t usually that valuable.

    Having the hdd’s in the safe means that total failure/ransomware takes, at most, a month’s worth. I can survive that. The safe is also fireproof and in another building to the server.

    This sort of thing doesn’t need to be high capacity HDDs either - USB drives and micro-SD cards are very capable now. If you’re limited on physical space and don’t mind slower write times (which when automating is generally ok), the microSd’s and clear labelling is just as good. You’re not going to kill them through excessive writes for decades.

    I also have a bunch of other stuff that is not critical - media files, music. None of that is unique and can be replaced. All of that is backed to a secondary “live” directory on the same pc - mostly in case of my incompetence in deleting something I actually wanted. But none of that is essential - I think it’s important to be clear about what you “must save” and what is “nice to save”

    The clear thing is to sit back and work out a system that is right for you. And it always, ALWAYS should be as automated as you can make it - humans are lazy sods and easily justify not doing stuff. Computers are great and remembering to do repetitive tasks, so use that.

    Include checks to ensure the backed up data is both what you expected it to be, and recoverable - so include a calendar reminder to actually /read/ from a backup drive once or twice a year.