Core Extractor Pack

The Core pack ships with Atlas and handles common web and document formats. It discovers file references in HTML, CSS, Markdown, SVG, JSON, and YAML files.

HTML Extractor

Extensions: .html, .htm

Scans HTML attributes for file references:

  • src — images, scripts, iframes, audio, video
  • href — stylesheets, links
  • action — form targets
  • poster — video poster images
  • data — object/embed sources
  • srcset — responsive image sources

Filtered out:

  • External URLs (http://, https://, //)
  • Data URIs (data:)
  • JavaScript links (javascript:)
  • Mail and phone links (mailto:, tel:)

Edge type: input

Example: An HTML file containing <img src="images/logo.png"> produces an input edge to images/logo.png, resolved relative to the HTML file’s directory.

CSS Extractor

Extensions: .css

Finds two kinds of references:

  • url() values — fonts, images, backgrounds
  • @import statements — imported stylesheets

Both are recorded as input edges.

Filtered out:

  • External URLs
  • Data URIs

Example: @import "reset.css" produces an input edge. background: url(../images/bg.jpg) produces an input edge.

Markdown Extractor

Extensions: .md, .markdown

Finds references in standard Markdown link syntax:

  • [link text](path) — linked files and pages
  • ![alt text](path) — embedded images

Filtered out:

  • External URLs (http://, https://, //)
  • Mail links (mailto:)

Edge type: input

Duplicate references within the same file are automatically deduplicated.

SVG Extractor

Extensions: .svg

Finds references in SVG-specific attributes:

  • href attributes
  • xlink:href attributes (legacy SVG linking)
  • <image> element sources

Filtered out:

  • External URLs
  • Data URIs
  • Internal SVG references (e.g., #gradientId) — these reference elements within the same file, not external files

Edge type: input

JSON/YAML Extractor

Extensions: .json, .yaml, .yml

Uses heuristic matching to find string values that look like file paths. A string is treated as a potential path reference if it contains:

  • Path separators (/ or \), OR
  • A known file extension (covers 45+ common extensions including .png, .js, .css, .wav, .html, etc.)

Filtered out:

  • Version strings (e.g., "1.2.3")
  • External URLs
  • Strings that don’t look like paths

Edge type: input

This extractor is intentionally conservative to minimize false positives. It works well for configuration files, package manifests, and asset maps where string values frequently reference other files.