Michael Wolf
Projects and thoughts from Cincinnati San Jose Oakland an undisclosed location in Grass Valley
Home Projects Photos Github Mastodon Nullbrook RSS
My Floppy Disk Archiving Process
Last edited - 2025/04/13

TLDR

Background

I'd (at last) like to take some time to talk about my process for archiving 3.5" floppy disks. I'm by no means an expert in this. In fact, I'm going to do everything in my power to prevent the expert archivist from cringing at this post. But in my defense, I grew up late enough in the 90s that I rarely swapped floppies in and out of machines let alone understood the formatting. Floppies to litle wolfmd were colorful squares that made cool blinkenlights on the Gateway tower when inserted and were the constant consternation of my father. Despite being CD-native, I always found floppy disks and the machines they went to fascinating.

Through years of scouring garage sales, flea markets, and junk shops for old photos and other knick-knacks, I came across plenty of cheap collections of commercial and homemade 3.5" disks to add to the sparse selection that remained in my family. I don't think I have anything particularly astounding (aside from some cool in-box Macintosh software (link to photos), but I am a digital voyeur and I love seeing how computers were used long ago. Was this data copied to newer formats? Probably! But some of it probably wasn't so now I have it and I've gotten it somewhat searchable.

Despite the minimal objective value (and clearly near-zero subjective value ascribed to them by their previous owners), I still began to worry about the bitrot that inevitably strips the flimsy plastic of comprehensible magnetism. When checking disks on period hardware, they seemed to be generally intact (despite my frequent incompetence in loading them in the correct order) but one never knows just how long such data lasts in readable condition.

My first attempt at comprehensive archiving of these floppy disks must have happened around covid lockdown times. I'm not sure when else I would have had the time. I had a couple of USB-connected floppy drives and the system file explorer. It was a lot of manual invocations on a Windows laptop balanced precariously while swapping disks in and out. Despite having a mix of mac/ibm disks, I sallied forth the best I knew how and created .img files with dd. Many of the disks with the coolest-looking labels were unreadable on Mac or Windows but it was unclear to me if this was just a result of my misunderstanding of the formatting. On the plus side, this effort really didn't take too long. Floppies are notably easier to set on a flat bed scanner because they don't wiggle around! I also didn't have a kid back then!

As I read more about floppy disk technology, visited the Internet Archive more, and most importantly talked to Terin Stock more about the art, I realized these renderings were sadly lacking a level of professional effort. Better tools were available and I came across a large online community of genuine experts in the field. About five years after the first attempt, I decided to break the floppy disk archive out of its hibernation and give it one more run through the digital preservation cycle. This time, I'd do my best to follow some reasonable practices.

Enter: Greaseweazle

Many of those familiar with obsolete media archiving will not be surprised that I landed on the Greaseweasle as the tool of choice to make digital copies of disks which I could be proud of. Greaseweazle is a USB device which allows for interfacing with the raw (flux transition) data on nearly all floppy disks. Yann Serra's wiki is where I started, but there are many more resources out there for learning about both the obscure history and magic of floppy disk formatting hacks. I can't say I could even begin to fill my head with the technical aspects the most obscure formats and DRM mechanisms, but lucky for me at least the majority of my disks were pretty run-of-the-mill and easily read.

As this was to be my first major archiving project at this house, I decided to put together a dedicated rig (if it can be called that) out of a Dell Optiplex 780 scavenged from a yard sale. I installed Windows 10 on to be compatible the older software for working with floppy images and my flatbed scanners drivers. I air-gapped the machine since it would be both dealing with mysterious digital files from 20+ years ago and defending against Microsoft's insidious campaign of sneaky Windows 11 upgrade. A nearby Ubuntu laptop helps with online research software downloads, and indexing.

The Greaseweasle itself is just an interface between floppy drives and modern computers. For the 3.5" floppy drive itself, I found an NEC FD1231T worked the best of any drive I had on hand. Given the disk eject plunger relies on the original tower case to keep it in place, I developed a certain amount of finesse keeping it in place when swapping floppies in and out. With cables in place, hardware and functional imaging confirmed working via the FluxMyFluffyFloppy (yeah really) GUI and Disk Explorer (here's a mirror of the binary as the original site is down as of the publication of this post), and my process run by the Terin, I was ready to start copying en masse.

Don't hack my password

Operating in Bulk

As fun as it sounds, I wasn't going to run through the ~350 floppy disks and any future finds using a clicky interface. In true pretend engineer fashion I put together a couple of shitty batch scripts to allow me to insert a disk, kick off a job, go work on something else for a few minutes, catalog the disk, and move on.

:: IBM.bat
:: Archive a single IBM-1440-formatted floppy disk

@ECHO OFF

CLS

:AGAIN
ECHO Archiving %1
"C:\Users\dad\workspace\tools\greaseweazle\gw.exe" read --diskdefs "C:\Users\dad\workspace\tools\FluxMyFluffyFloppy\Greaseweazle\diskdefs.cfg" --format ibm.1440 --raw "C:\Users\dad\workspace\archive\floppies\%1.scp"

ECHO Converting to IMG
"C:\Users\dad\workspace\tools\greaseweazle\gw.exe" convert --diskdefs "C:\Users\dad\workspace\tools\FluxMyFluffyFloppy\Greaseweazle\diskdefs.cfg" --format ibm.1440 "C:\Users\dad\workspace\archive\floppies\%1.scp" "C:\Users\dad\workspace\archive\floppies\%1.img"

ECHO Opening contents
copy NUL "C:\Users\dad\workspace\archive\floppies\%1.txt"
start "C:\Windows\system32\notepad.exe" "C:\Users\dad\workspace\archive\floppies\%1.txt"
"C:\Users\dad\workspace\tools\diskexplorer\editdisk.exe" "C:\Users\dad\workspace\archive\floppies\%1.img"

This script creates a raw disk image (.scp) of the floppy disk for archival purposes, copies it into a more useful .img format and then displays the contents via Disk Explorer/editdisk.exe. I can then copy the directory listing into text file to be stored alongside the disk images and scanned image. Finally, I slap of the floppy and put a handwritten label on the disk. This takes maybe 60 seconds of focused time and 5 minutes of automation (with adjustments for inevitable human error due to context switching). Greaseweasle's built-in error correction/retries work pretty reliably but even then I had to throw away maybe 20% of all scans due to irreconcilably damaged or simply uninteresting empty disks.

The agony and the ecstasy

Scanning Labels

Compared with scanning old snapshot photos which come in heterogeneous sizes and often a fierce curl that requires the scan bed lid to be down, scanning 3.5" floppy disks is pretty easy. They don't wiggle around and they are easy to automatically crop based on xy position set after set. I use the Perfection V600 Photo that I've had for maybe fifteen years now. It's got some minor scratches on the scan bed but I haven't found them to be noticeable. To combat the apparently universal experience of flatbed scanners failing to capture the whole bed, I used four cassette tapes sitting on their long edge as a makeshift jig.

Don't mind me taking a picture of a computer screen

Swapping disks on and off the bed is always going to be toilsome but I'm not quite at the queue-disks-up-by-queueing-them-up-on-a-ramp stage yet. I also ended up using the stock Epson Scan Utility. It's clicky but I've always found timed auto-scanning setups make my heartrate jump. This hobby is supposed to be relaxing!

On Indexing

Armed (or burdened, perhaps) with a few more years of experience collecting, I also decided to take a more formal approach to cataloging each entry. The schema is simplistic and pretty much for my own personal use, but I think it'll scale better than trying to remember which lot a particular disk came from. My traditional grouping of hodge-podge archival items like photos or film has been by lot. Things from the same dusty box has a decent chance of being related. Giving each item a short descriptive name is nice, but for something like floppy disks I felt that assigning a numerical ID and a corresponding spreadsheet entry would work nicer for quick-lookup and an inevitable database integration.

The format I ended up on is:

X X X X X X
| |
| L Second digit indicates the type of label
| * 0-2 is a blank disk
| * 3-5 is handwritten/DIY
| * 6-9 is mass-market/professional
|
L First digit indicates the format of the image
* 0 IBM/PC
* 1 Apple
* 2 Commodore
...

To tie each artifact to its digital twin, I used some scotch tape and a marker. While I'm sure many museums regret the early indexing marks made in their treasured artifacts, I feel like a little scotch-taped ID tag on each floppy isn't going to do much harm. While initially I was going for a tiny sliver label, I found out pretty quickly the estate-sale tape I was using wasn't going to stick in place very long and I'd be faced with the same failed-adhesive, mixed-up-identity problem that often comes with the labels on the floppy disks themselves. Later on I began actually wrapping the tape around to the back side firmly. Time will tell how long these index tags last-- hopefully long enough for me to give more of a shit and print out labels instead

This schema is pretty simple, but I actually messed up the format already! Having a hard time identifying which disks were ibm vs apple, I had gotten into a habit of labelling everything to start with 0. The other gripe I had with this approach is that I decided to put the label on the wrong side of the disk to be easily flipped through in storage. Oh well.

I made custom floppy disk storage out of old bookcase shelves to minimize the amount of space these disks take up in storage while still being easy enough to find a given disk by its ID

Failed Disks

Even with the advanced archiving technology at my disposal, there are still a number of (fairly interesting-seeming) disks which haven't provided a readable copy. They appear to have many missing sectors, but perhaps I'm still misinterpreting their format. Given a dozen disks from the same collection have consistent bad sectors around 70-78 I am guessing the stack had a nasty run in with a magnet.

However, even some of the new-in-box mac-formatted disks I copied were coming out with multiple sectors unreadable. I'm not sure whether to chalk this up to some failing in my archiving hardware/software (update, it was me), the improper storage practices of previous owners, or just a fundamental weakness in the format's longevity. I haven't tried to do any data recovery beyond the checksums and retries included in the greaseweasel utilites. From what I've come to understand, it's somewhat hopeless at this point. The magnetic flux clock is ticking. I'm glad I could save the disks that I could.

Comparing This Approach to Others

Caralie Heinrichs and Emilie Vandal provide a comprehensive workflow for archiving 3.5" floppies in the academic context. Having only come across this after I had gone halfway through the disks, I was relieved that their approach was not too different from my own though despite some differences in polish. This paper was published in 2019, around the time Greaseweasle was first released. An older (though still very capable, though closed source) tool, Kryoflux is used instead. It's unclear if I would get any better result on the damaged floppies with Kryoflux but I certainly appreciate the open source spirit of Greaseweasle. Heinrichs and Vandal are also using Expert Witness Format (EWF), a Library of Congress supported open source disk image format and Bitcurator, a disk content analysis framework. These are both a bit heavyweight for my process but I'd love to look more into Bitcurator (or an actively developed tool that follows in its footsteps). Until then, I'd call my process artisanal.

One topic they bring up which I'd always kept in the back of my mind while archiving is personally identifiable information (PII). Photos and journals contain a little bit of PII, but disk images can contain a surprising amount more. Many of those identified in this data are still living and could be impacted by this data that was left behind. While, unlike Heinrichs and Vandal, I'm retaining the images, I do have them flagged in the database to prevent leaking anything that could be incriminating or just embarassing. It's sort of a fine line of found media collecting, but as my processes have been refined, so are my ethics.

Another step to the process that I'll be applying to all future disk imaging is saving the log of the read somewhere. It seems obvious in hindsight and a very easy thing to add to the script, but I didn't realize how useful it would be even during the writing of this blog post. Metadata about the process is always useful given how much influence it has on how the object is preserved.

Finally, the guide suggests throwing away the disks after ingestion. I find this to be short-sighted, though understandable on the archival library scale. I don't have such faith in the longevity of my digital copies that I could toss the disks and not wince a bit. I don't think I'll amass more than a closet full of disks in my life. And they are just cool, right? Why throw them away if I don't have to?

I tip my hat to Heinrichs and Vandal and all those who were dedicated enough to pursue library and archival sciences professionally. I had, and passed up, my chance to pursue work in the field in favor of boring income. I've left digital preservation as a hobby. So I'm grateful to those spiritual colleagues who are willing to share the state-of-the-art with layfolk. After all, we're all working toward the goal of saving chunks of our bitrotting past for the next generation even if it's just in our off-hours.

Let Me See 'Em!

As soon as I had all of the disks moved into the Nullbrook archive hard drive I realized I still had one more step left to complete: making them available to view! Nullbrook has been the primary window into my archival life's work but is mostly focused around visual media. I completed a rewrite of the archive in 2021 and it really hasn't changed to much since then. Fortunately, I did a decent job of keeping it simple. Django + Postgres + Nginx for static assets has been pretty solid for the amount of effort I'm willing to put into this in my free time.

Over the course of a week, I spent a few hours of bringing everything up to 2025 software versions, a few hours writing out the design, a few hours to sanitize and stuff the disk catalog spreadsheet into a postgres table, and a couple of hours to debug things. And it's up! Disks, excluding ones with obvious PII, are available to browse and download as a disk image (.img) or raw flux transition file (.scp). Most of them can be read directly, but individuals that care enough can probably reassemble the ones with missing sectors. I do plan to add some of these files, particularly drivers and other formally published software, to the Internet Archive at some point. Perhaps when I finish imaging the giant bag of floppy disks I recently picked up from a flea market in San Diego...

Closing Remarks

With my garage floppy copy-shop up and running, the primary limiting factor on time to completion was in finding days that had enough time between when my son was asleep and I was ready to enter slumberland myself. This has been by far the most comprehensive archiving endeavor I've completed. Including preparation, imaging, indexing, organizing, documenting, and image scanning this project took my about a year to complete. I offer many apologies to the VHS tapes, 8/16mm film, photos, 5.25" floppies (pending a working disk drive), and other pieces of old media that are awaiting this level of attention. It'll be your turn as soon, 8/16mm film, as this blog post is live.

I'm certain my collection will grow beyond this selection of disks in future years as I continue to scour flea markets, estate sales, and the rare but not unheard-of Craigslist posting. I don't find the value in squabbling over online auctions but I can at least provide a permanent home for those floppies that I happen to come across around here. I feel that I'm doing the best I can with the tools and patienece I have at hand and I hope others find the end result worth the effort I've put into it! If you're interested in archiving things yourself, I strongly encourage you to pursue it! These time capsules of data won't be around forever.

Winter at the archive workbench