Hash-Only Audio Formats
These file formats are tracked by Atlas (hashed, versioned, lineage-recorded) but do not have dedicated extractors. Atlas records the file’s identity (BLAKE3 hash), path, name, size, and modification time — the same core lineage data as any tracked file — but does not parse the file’s internal structure for metadata or references.
This document explains each format, what it contains, and why extraction is not implemented at this time.
Ogg Vorbis (.ogg) — 162,637 files
What it is: Lossy compressed audio using the Vorbis codec inside an Ogg container. Used extensively by Kontakt libraries as a space-efficient alternative to WAV for sample storage.
Internal structure: OggS page headers containing Vorbis identification, comment, and setup headers, followed by compressed audio frames. The Vorbis comment header can carry arbitrary key-value metadata tags.
What we found: In sample libraries, the Vorbis comment field contains only the encoder identification string (e.g., “Xiph.Org libVorbis I 20120203 (Omnipresent)”). No user-authored metadata, tags, or references are present. These files are generated by Kontakt’s batch compression tooling and are not individually authored.
Why no extractor: The only metadata present (encoder string) is identical across all files in a library and carries no lineage value. The files are raw sample data consumed by Kontakt instruments — their lineage relationship is captured by the NKI extractor’s sample path references, not by metadata within the OGG files themselves.
What Atlas records: Hash, path, name, file size, modification time.
Kontakt Monolith Compressed (.mwc) — 107,840 files
What it is: Native Instruments’ proprietary lossless compressed audio format. MWC1 magic header followed by PLP1 compressed audio data. Used inside Kontakt monolith containers for sample storage.
Internal structure: A 16-byte header containing format version, channel count, and sample rate, followed by PLP1-compressed audio frames. No metadata fields, no embedded references, no text content.
What we found: Pure compressed audio data. The header contains only the minimum information needed for playback (sample rate: 48000 Hz, channel count: 2). File names are content-addressed hashes (e.g., 61e4e33801c3...8c.mwc), indicating these are generated artifacts with no human-authored identity.
Why no extractor: Opaque binary audio with no metadata beyond playback parameters. The content-addressed filenames confirm these are internal artifacts of the Kontakt monolith system, not user-facing files. Sample rate and channel count could theoretically be extracted from the header, but the information has minimal lineage value for auto-generated sample data.
What Atlas records: Hash, path, name, file size, modification time.
Kontakt Automation (.nka) — 39,895 files
What it is: Kontakt automation data files containing parameter modulation curves and mapping data. Plain text format starting with %0nofb marker.
Internal structure: Newline-delimited numeric values representing automation points, modulation depths, curve shapes, and parameter mappings. Fields include MIDI note ranges, velocity values, modulation amounts, and timing data.
What we found: Structured numeric data with no file references, no metadata tags, and no human-readable content beyond the numeric values. The format appears to be a serialized automation curve or modulation matrix.
Why no extractor: The data is purely numeric parameter state — modulation curves, velocity mappings, and automation points. There are no references to external files, no authorship information, and no categorization metadata. The meaning of individual values is context-dependent on the parent NKI instrument.
What Atlas records: Hash, path, name, file size, modification time.
Standard MIDI (.mid) — 25,221 files
What it is: Standard MIDI File format containing musical performance data. MThd header followed by MTrk track chunks containing note events, control changes, and meta events.
Internal structure: Binary format with a header chunk (format type, track count, time division) and one or more track chunks. Each track contains a sequence of timestamped MIDI events and meta events (tempo, time signature, track name, key signature, lyrics, etc.).
What we found: Track names are present via FF 03 meta events (e.g., “Progressions”, “07 Hat Snare”). Tempo (FF 51) and time signature (FF 58) meta events are standard. Files range from 200 bytes (single drum pattern) to 200+ KB (full compositions).
Why no extractor: MIDI files in sample libraries are pattern/groove data — drum patterns, chord progressions, and performance templates. They don’t reference external files and their metadata (track name, tempo) has limited lineage value in this context. The track name is typically redundant with the filename. A MIDI extractor could be valuable in a DAW project context (where MIDI files reference instruments and are part of a session), but for sample library patterns, the cost/benefit doesn’t justify it at this time.
What Atlas records: Hash, path, name, file size, modification time.
Future consideration: If Atlas adds DAW session tracking beyond REAPER, MIDI files within session directories could benefit from tempo/time signature extraction for search and comparison.
NI Compressed Wave (.ncw) — 8,592 files
What it is: Native Instruments’ lossless compressed audio format for individual sample files. A predecessor/complement to the monolith .mwc format.
Internal structure: Fixed header with a 4-byte magic (01 a8 9e d6), format version, channel count, bit depth, sample rate, sample count, and block size fields, followed by compressed audio blocks. The header is 160 bytes with the remainder being compressed audio data.
What we found: Header contains sample rate (48000 Hz), bit depth (24-bit), and channel count (2) — standard audio format metadata. No embedded metadata tags, no text content, no references. File names are human-readable sample names (e.g., “EK_IR_Room_Intimate Chamber.ncw”).
Why no extractor: While sample rate and bit depth could be extracted from the fixed header, this information has minimal lineage value for individual sample files within a library. The files are consumed by NKI instruments, and that relationship is already captured by the NKI extractor’s sample path parsing.
What Atlas records: Hash, path, name, file size, modification time.
NI Monolith Container (.nkx) — 8,489 files
What it is: Native Instruments’ monolith container format that bundles multiple sample files (.ncw, etc.) into a single large file for distribution and faster loading.
Internal structure: 54 ac 70 5e magic header followed by a table of contents listing all contained sample files. Each TOC entry includes the sample filename (UTF-16LE), offset, and size within the monolith. The contained files are packed sequentially after the TOC.
What we found: The TOC contains human-readable sample filenames (e.g., “HYPHA_AG_C1.ncw”, “AElectGtr4-01.ncw”) and a “Samples” directory marker. Files are extremely large (observed: 267 MB to 1.77 GB). The TOC could theoretically be parsed to enumerate contained samples.
Why no extractor: The contained samples are not individually addressable on the filesystem — they exist only within the monolith. Since Atlas tracks filesystem entities, indexing monolith contents would create references to files that don’t exist as independent paths. The NKI extractor already handles the instrument-to-sample relationship at the logical level. Monolith TOC parsing could be revisited if Atlas adds virtual entity support for container contents.
What Atlas records: Hash, path, name, file size, modification time.
NI Content Container (.nkc) — 8,895 files
What it is: Native Instruments content container files that bundle UI resources (images, wallpapers, icons) for Kontakt library interfaces.
Internal structure: 7a 10 e1 3f magic header followed by iN markers and a resource listing. Resources are named UTF-16LE strings (e.g., “Resources”, “pictures”, “generic_wallpaper.txt”, “generic_wallpaper.PNG”) with offsets and sizes. The bulk of the file is compressed image data.
What we found: Contents are purely UI resources — wallpaper images, skin textures, and layout definitions for Kontakt’s instrument interface. No audio data, no musical metadata, no file references.
Why no extractor: UI resources have no lineage value. They are cosmetic assets for the Kontakt instrument interface and carry no information about the instrument’s musical content, sample mappings, or library relationships.
What Atlas records: Hash, path, name, file size, modification time.
8DIO Sample Format (.8dio) — 5,236 files
What it is: 8DIO’s proprietary compressed sample format. FFTC magic header followed by a creation timestamp and compressed audio data.
Internal structure: The header contains an ASCII timestamp (e.g., “Thu Oct 14 10:38:18 2021”), channel count, sample count, and format parameters, followed by GCTFF compressed audio blocks. File names encode musical information (e.g., “Sustain_D3_M1_V7_R1_X1.8dio” — note D3, mic position 1, velocity layer 7, round robin 1).
What we found: The creation timestamp is the only human-readable metadata. The filename convention encodes note, velocity, mic position, and round robin information, but this is filename-based, not embedded metadata. Files range from 370 KB to 12 MB.
Why no extractor: The embedded timestamp has minimal lineage value (it’s a build date for the sample, not a user action). The musically relevant information (note, velocity, round robin) is encoded in the filename, which Atlas already captures as part of standard file tracking. Parsing the filename convention would be fragile and 8DIO-specific.
What Atlas records: Hash, path, name, file size, modification time.
Maschine Sound/Group (.mxsnd, .mxgrp) — 4,495 files
What it is: Native Instruments Maschine sound and group preset files. Used by Maschine software for drum/instrument sounds and group channel configurations.
Internal structure: NI DSIN container format (identical structure to .nki, .nbkt). Contains hsin identity blocks, DSIN chunks, and a 2SAM DNS marker specific to Maschine’s sound engine.
What we found: The DSIN container includes a product ID in the AUTHORING_APP chunk, linking the sound to its parent Maschine expansion. The structure is parseable with existing NI container code.
Why no extractor (at this time): While the product ID is extractable using existing code, Maschine sounds are a lower-priority format compared to Kontakt and Battery. The file count is moderate (4,495) and Maschine is a less common part of production workflows compared to Kontakt. The extractor could be added later by registering .mxsnd and .mxgrp alongside .nki in the existing NI container extractor.
What Atlas records: Hash, path, name, file size, modification time.
Future consideration: Registering these extensions on the existing NI container extractor would be near-zero effort and would add library association data.
Maschine Save (.nmsv) — 1,099 files
What it is: Native Instruments Maschine project save files containing instrument state and parameter snapshots.
Internal structure: NI DSIN container format. Smaller than .mxsnd files (~8 KB). Contains hsin identity blocks and DSIN chunks with a product ID.
What we found: Same structure as other NI DSIN files, with product ID available in the AUTHORING_APP chunk. These are parameter state snapshots for Maschine instruments.
Why no extractor (at this time): Same rationale as .mxsnd/.mxgrp — lower priority Maschine format. Could be added to the NI container extractor later with minimal effort.
What Atlas records: Hash, path, name, file size, modification time.
Zebra Preset/Multi (.zpreset, .zmulti) — 1,714 files
What it is: u-he Zebra/ZebraHZ synthesizer preset and multi-instrument files.
Internal structure: Compressed binary blobs with no recognizable magic number or plaintext metadata. The files appear to be a compressed serialization of Zebra’s internal parameter state. Typical sizes: 800-850 bytes for presets.
What we found: Entirely opaque binary data. No readable strings, no embedded metadata, no file references. The compression makes reverse-engineering impractical without documentation from u-he.
Why no extractor: The format is undocumented and fully opaque. No metadata can be extracted without reverse-engineering the compression and serialization format. The file count is relatively low (1,714).
What Atlas records: Hash, path, name, file size, modification time.
Index Files (.ndx) — 937 files
What it is: Index/lookup table files used internally by sample libraries for fast access. Likely spectral analysis or amplitude envelope data.
Internal structure: Fixed header with a version marker (31 2e 0f a5), followed by large arrays of 32-bit IEEE 754 floating point values. All files are exactly 51,324 bytes — a fixed-size lookup table.
What we found: Pure numeric data — floating point arrays with values in typical audio ranges (-1.0 to 1.0, with some values extending to 4.0+). No text content, no metadata, no file references. The fixed file size and floating point data suggest these are pre-computed analysis tables (possibly FFT bins, amplitude envelopes, or impulse response data).
Why no extractor: Internal optimization data with no lineage value. These are pre-computed lookup tables generated during library installation, not user-authored content.
What Atlas records: Hash, path, name, file size, modification time.
Other Low-Count Formats
The following formats appear in small quantities and are tracked hash-only:
| Extension | Count | Description |
|---|---|---|
.bwclip | 866 | Bitwig clip launcher clips — serialized clip state |
.nkr | 404 | Kontakt resource containers — UI resources for instruments |
.rtf | 1,775 | Rich Text Format — license agreements and documentation |
.pdf | ~500 | PDF documents — user manuals and installation guides |
.png | ~490 | Images — library artwork and UI graphics |
.DS_Store | ~446 | macOS Finder metadata — filesystem noise |
.nicnt | 248 | NI content registration — has a dedicated extractor |
.asd | ~129 | Ableton Live analysis files — warp markers and transient data |
.url | ~19 | Windows URL shortcuts — links to vendor websites |
These formats are either documentation/noise (rtf, pdf, png, DS_Store, url), proprietary binary with no extractable metadata (bwclip, nkr, asd), or already have extractors (nicnt).
What Atlas records for all: Hash, path, name, file size, modification time.