Extractors Overview
Extractors are Atlas’s system for discovering structural references between files — the edges that form Atlas’s lineage graph. When you run atlas scan, extractors analyze tracked files and record which files reference, import, or depend on other files.
What Extractors Find
Extractors look for references that would appear in a dependency manifest:
- File paths in HTML attributes (
src,href) - CSS
@importandurl()references - Markdown links and image references
- Audio project file references (media files, plugins)
- Instrument library identifiers
What Extractors Never Extract
Extractors have a strict boundary. They never capture:
- Creative content (audio data, image pixels, written text)
- Preset values or automation data
- Script source code or macro logic
- Configuration values
- Any data that isn’t a structural reference to another file or resource
How Scanning Works
atlas scan # Scan all tracked files
atlas scan ~/projects/myapp # Scan specific directory
atlas scan ~/music/session.rpp # Scan specific file
When a scan runs:
- Atlas checks each file’s extension against registered extractors
- Matching extractors analyze the file’s bytes
- Discovered references are recorded in the database as edges with their type (
inputoroutput) - References are resolved against known entities when possible
- Results are deduplicated (same reference from the same file is stored once)
Atlas also records an extractor log — which extractors ran on which files, and how many references each found. Re-scanning a file with the same content hash skips extraction since results would be identical.
File Size Handling
Files up to 10MB have their contents buffered for extraction. Files larger than 10MB are still tracked (hashed via streaming) but are not passed to extractors. Files whose extension has no registered extractor use streaming hash without buffering, avoiding unnecessary memory allocation.
Edge Types
Each discovered reference is categorized as an edge:
| Type | Direction | Description | Used By |
|---|---|---|---|
input | This file needs that file | File path reference, import, linked asset | HTML, CSS, Markdown, SVG, JSON/YAML, Reaper, Kontakt, NKSN, Reaktor |
output | This file produces that file | Render or export output, preview audio | Reaper, Kontakt, NKSF, NKSN |
Display grouping (plugins, audio files, stylesheets, etc.) is derived from file extensions at render time.
Source Path
When an extractor discovers where a file was originally imported from (e.g., Reaper’s ORIGINAL_FILENAME), Atlas records this as a source_path on the target entity rather than as an edge.
Resolution
When a reference is discovered, Atlas attempts to resolve it:
- Relative paths are joined with the source file’s directory
- Absolute paths are checked as-is
- External URLs (http://, https://) are filtered out entirely — Atlas tracks local files only
A resolved reference links to a known entity in the database. An unresolved reference is still recorded so you can find broken links with atlas broken.
Extractor Packs
Extractors are organized into packs — groups of related extractors that ship together. Packs can also register ignore patterns for domain-specific file types that shouldn’t be tracked (cache files, intermediates, etc.). These are seeded as default patterns on engine startup. See Ignore Patterns for details.
Core Pack
Built-in extractors for common web and document formats:
- HTML (
.html,.htm) - CSS (
.css) - Markdown (
.md,.markdown) - SVG (
.svg) - JSON/YAML (
.json,.yaml,.yml)
See Core Pack for details.
Atlas Audio Pack
Extractors for audio production files:
- Reaper (
.rpp,.rpp-bak) — DAW project files - Kontakt (
.nki,.nbkt,.nkm) — NI instrument files, Battery kits, Kontakt multis - NKS Sound Files (
.nksf) — NKS preset metadata and categorization - NKS Snapshots (
.nksn) — NKS snapshot library association - NI Content Registration (
.nicnt) — Library identity metadata - Reaktor (
.nrkt) — Reaktor ensemble references - Bitwig Presets (
.bwpreset) — Bitwig preset metadata - AIFF (
.aif,.aiff) — AIFF audio metadata and instrument mapping - Plugin Formats (
.plist) — VST, VST3, AU, AAX, CLAP plugin metadata - WAV Metadata (
.wav) — BWF/BEXT render provenance - Hash-Only Formats — Tracked but not extracted (
.ogg,.mwc,.nka,.mid,.ncw,.nkx,.nkc,.8dio, etc.)
Querying Edges
Once you’ve scanned, use these commands to explore the edge graph:
atlas inputs <file> # What does this file need?
atlas outputs <file> # What does this file produce?
atlas dependents <file> # What uses this file?
atlas broken # Show all broken references
atlas orphans # Show files nothing references
atlas impact <file> # What breaks if this file changes?
See Edges for full usage.