Core Extractor Pack
The Core pack ships with Atlas and handles common web and document formats. It discovers file references in HTML, CSS, Markdown, SVG, JSON, and YAML files.
HTML Extractor
Extensions: .html, .htm
Scans HTML attributes for file references:
src— images, scripts, iframes, audio, videohref— stylesheets, linksaction— form targetsposter— video poster imagesdata— object/embed sourcessrcset— responsive image sources
Filtered out:
- External URLs (
http://,https://,//) - Data URIs (
data:) - JavaScript links (
javascript:) - Mail and phone links (
mailto:,tel:)
Edge type: input
Example: An HTML file containing <img src="images/logo.png"> produces an input edge to images/logo.png, resolved relative to the HTML file’s directory.
CSS Extractor
Extensions: .css
Finds two kinds of references:
url()values — fonts, images, backgrounds@importstatements — imported stylesheets
Both are recorded as input edges.
Filtered out:
- External URLs
- Data URIs
Example: @import "reset.css" produces an input edge. background: url(../images/bg.jpg) produces an input edge.
Markdown Extractor
Extensions: .md, .markdown
Finds references in standard Markdown link syntax:
[link text](path)— linked files and pages— embedded images
Filtered out:
- External URLs (
http://,https://,//) - Mail links (
mailto:)
Edge type: input
Duplicate references within the same file are automatically deduplicated.
SVG Extractor
Extensions: .svg
Finds references in SVG-specific attributes:
hrefattributesxlink:hrefattributes (legacy SVG linking)<image>element sources
Filtered out:
- External URLs
- Data URIs
- Internal SVG references (e.g.,
#gradientId) — these reference elements within the same file, not external files
Edge type: input
JSON/YAML Extractor
Extensions: .json, .yaml, .yml
Uses heuristic matching to find string values that look like file paths. A string is treated as a potential path reference if it contains:
- Path separators (
/or\), OR - A known file extension (covers 45+ common extensions including
.png,.js,.css,.wav,.html, etc.)
Filtered out:
- Version strings (e.g.,
"1.2.3") - External URLs
- Strings that don’t look like paths
Edge type: input
This extractor is intentionally conservative to minimize false positives. It works well for configuration files, package manifests, and asset maps where string values frequently reference other files.