Reverse Engineering for Good: Privacy, DRM, and Building a File Indexer
The tension
Atlas exists to answer a simple question: which files depend on which other files? Not fuzzy keyword matches or “related” suggestions — actual, concrete references. A project file that loads a plugin. A document that embeds an image. An instrument that needs a set of audio samples. These aren’t inferred relationships. They’re explicit dependencies baked into the file itself, and Atlas extracts them.
To do that, it needs to look inside files — sometimes deeply. That’s straightforward when you’re dealing with open formats like plain text or XML. It gets complicated fast when the formats are proprietary, undocumented, and wrapped in DRM.
My day job is audio design, so naturally the first Atlas pack I’m building is for audio production workflows. That means writing extractors for DAW projects, plugin presets, sample libraries — all the formats I wrestle with daily. And it’s where I ran into this tension head-on, while building the extractor for Native Instruments Kontakt — one of the most widely used sampler plugins in music production. A single Kontakt instrument file (.nki) can reference dozens or even hundreds of audio samples scattered across your filesystem. That’s exactly the kind of relationship Atlas should surface. But getting there meant reverse engineering a binary format that was never meant to be read by anyone outside of Native Instruments.
Walls everywhere
There’s no public documentation for the Kontakt file format. The only way to understand it was to open these files in a hex editor, study what other open-source projects had already figured out, and piece together the structure byte by byte.
For user-created instruments, I eventually got it working. Atlas could read the compressed data inside an .nki file and extract every sample path it references. A single Kontakt instrument might point to dozens of .wav files across multiple folders — exactly the kind of relationship map Atlas is built to surface.
But commercial instruments — the ones most producers actually use — are a different story. Native Instruments encrypts the preset data to protect their products. I dug deep enough to confirm exactly how the encryption works, and the short version is: it’s solid. There are no shortcuts through it.
This is where the tension becomes real. I could have gone further down that road. But just because you can break something open doesn’t mean you should.
Drawing the line
Atlas is a file indexer, not a cracking tool. Native Instruments encrypts their preset data to protect their intellectual property, and that’s a boundary I chose to respect. The question became: can Atlas still be useful without crossing that line?
The answer turned out to be yes, and the solution came from an unexpected place. While the preset data is encrypted, the instrument’s metadata — its name, vendor, category — sits in a SOUNDINFO chunk that lives outside the encryption boundary. It’s always readable, even on DRM-protected files. And in DAW project files, an unencrypted AUTHORING_APP chunk contains a product ID that maps back to the library name through NI’s own local database.
So Atlas can tell you “this Reaper project uses Alicia’s Keys by Native Instruments” without ever touching the encrypted data. It reads only what was left in the open.
The other side of privacy
There’s a second privacy dimension here that’s just as important: yours.
Atlas parses your files locally. It doesn’t upload your project structure to a server. It doesn’t phone home with your sample library inventory. The entire relationship graph lives on your machine. This is a deliberate design choice — a file indexer that knows everything about your creative work has no business sending that data anywhere.
This matters more than people realize. Your project files reveal your tools, your workflow, your sample libraries, your unfinished work. That’s private creative infrastructure. Atlas treats it that way.
Working with what’s in the open
The Kontakt extractor taught me a broader lesson about building tools like Atlas. When you hit a wall — whether it’s encryption, an undocumented format, or a proprietary protocol — the instinct is to break through it. But often there’s a path around it that gives you most of what you need while staying on the right side of the line.
For Kontakt, that meant combining unencrypted metadata chunks with local database lookups and plaintext catalog files. For other formats, it might mean something different. But the principle is the same: extract what’s freely available, respect what’s deliberately protected, and keep everything local.
This isn’t always the easy path. It means accepting partial information sometimes. It means some edges in the dependency graph will be less detailed than others. But it’s the right tradeoff for a tool that people need to trust with their entire filesystem.
Oh, and Native Instruments — if you’re reading this and you’re interested in improving the day-to-day life of your users, let’s talk.